# Ict110 Introduction To Data Science: Assessment Answer

## Answer:

### Introduction

Maternal health is a major concern in the whole world among other health issues. The government among other organisations such as the World Health Organisation and private health institutions are committed to improving the health of the citizens. A lot of funds have been directed in the health docket to fight these health issues, hence making a healthy nation. Maternal health has been among the main health concerns of public health because it is among the major determinants development of a country. In this case, a strong health system set would reduce pregnancy-related issues, which significantly decreases child mortality, which is a double achievement for the society.

Health data was extracted from the world development indicators database which was last updated in March 2018. This was done by pre-possessing data and extracting data which meets our criteria. Only Australian data will be analysed to evaluate the risk of maternal death based on factors such as the number of maternal deaths and the amount spend on health as a percentage of the GDP. The data was extracted using MS Excel by filtering data labels with ‘health’ and ‘death’ keywords which returned 17 variables which are related to health and deaths related to health.

In this paper, data will be loaded into R software for exploratory and advanced analysis. The exploratory data analysis will include one variable analysis which includes 3 variables – whose descriptive will be obtained and appropriate graphs reported. In addition, two variable analysis will be conducted and appropriate plots generated to present the data effectively. Also, cluster analysis and linear regression analysis will be done and reported in the advanced analysis section. Journal articles are used to reference ideas and facts included in the report.

### Data Setup

The main data file from World Development Indicators was processed in MS Excel by extracting the required variables and presenting them in a tidy dataset format, which includes a variable presentation in column format. The data was also saved in CSV format to allow easy and effective upload into R software for analysis. A metadata was created to show the description of the coded variable names that allows effective reference and a better understanding of the analysis. The R system workspace was changed to the folder with the dataset and it was imported using the code below. Also, the required packages were loaded using the library function. The variables include cluster and fpc which allow cluster analysis and visualisation. The data characteristics include in this analysis are for Australia only.

setwd("E:/Documents/745360")

mydata <- read.csv("mydata.csv")

library(cluster); library(fpc)

dim(mydata)

### Exploratory Data Analysis

#### One-Variable Analysis

### Lifetime Risk of Maternal Death

summary(SH.MMR.RISK.ZS)

## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's

## 0.01152 0.01280 0.01320 0.01358 0.01490 0.01550 1

boxplot(SH.MMR.RISK.ZS,col = 5,

main = "Lifetime Risk of Maternal Death", outline = T, names = T)

Figure 1: A Boxplot of Lifetime risk of maternal death

The lifetime risk of maternal death was presented as a probability and analysis shows that the least value was 1.152% and a maximum of 1.552%. The mean of the risk was 1.358% and a median value of 1.32%. These two statistics are not equal, indicating that the risk of maternal death was not normally distributed. The boxplot shows that the risk of maternal death is skewed to the right – which is an indication that most of the years from 1995 to 2014 had risk values which were below than the mean value.

### Number of maternal deaths within 42 days

summary(SH.MMR.DTHS)

## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's

## 19.00 20.00 20.00 20.05 21.00 21.00 1

hist(SH.MMR.DTHS,col = 5,

main = "Number of Maternal Deaths within 42 days", xlab = "Maternal Deaths")

Figure 2: A histogram of the number of maternal deaths within 42 days

The average number of women who died within 42 days of giving birth of termination of the pregnancy in any other way was 20 and median of 20.05 which is approximately 21. There is no great deviation between the mean and the median, hence the conclusion that the number of women who died due to pregnancy-related issued within 42 days after termination of the pregnancy was approximately normally distributed – which is depicted in the histogram.

### 3.1.3 Total Health Expenditure (%GDP)

summary(SH.XPD.TOTL.ZS)

## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's

## 7.260 8.006 8.473 8.443 9.023 9.422 2

boxplot(SH.XPD.TOTL.ZS, outline =T, col = 5,

main = "Total Health Expenditure (% of GDP)")

Figure 3: Boxplot of total health expenditure (% GDP)

Total expenditure on health is approximately normally distributed, with a mean of 8.443% and a median value of 8.473%. Between 1995 and 2014, the highest percentage of the GDP spend on health was 9.422% and the minimum was 7.26%. The difference between the maximum and minimum values is not significantly big. There are no outliers in the percentage of GDP spend on health(Oldford, 2016).

### Two-Variable Analysis

#### Number and risk of maternal deaths

Figure 4: Number and risk of maternal deaths

A scatter plot was selected to visualize the relationship between number and risk of maternal deaths because both are quantitative variables.

### Percent GDP spend on Health by Risk of maternal death

Figure 5: Total health expenditure by the risk of maternal death

The scatter plot was the appropriate graph to visualize the relationship between total health expenditure and the risk of maternal death because both are quantitative and continuous variables.

### Advanced Analysis

#### Clustering

Cluster analysis is a statistical technique which combines variables into the most predictive combinations to predict a phenomenon. K-means uses the estimated combinations and groups them into groups based in centroids which are developed according to developed groups in the analysis. Each centroid represents a distinct group which are related to the used variables and the combinations developed by cluster analysis. In R, the cluster package is used to provide functions to analyse data using cluster analysis technique.

## Cluster analysis ####

mydata_1 <- mydata[, c(10,11,14)]

mydata_1 <- na.omit(mydata_1)

View(mydata_1)

mydata_1 <- scale(mydata_1)

clusters <- kmeans(mydata_1, 19)

plotcluster(mydata_1, clusters$cluster)

Figure 6: Cluster plot

The cluster analysis returned 19 clusters showing that each year was independent of each other(Everitt, Landau, Leese, & Stahl, 2011). Therefore, the variables distinctly represented the years, hence there was no association between the years. These clusters can be seen in the cluster plot above.

### Linear Regression

#### Linear regression 1

fit1 <- lm(SH.MMR.RISK.ZS ~ SH.MMR.DTHS)

summary(fit1)

plot(SH.MMR.DTHS, SH.MMR.RISK.ZS, xlab = "Number of Maternal deaths",

ylab = "The risk of maternal deaths",

main = "Relationship between Number of maternal deaths and Risk")

abline(fit1, col = 2)

Figure 7: Linear plot of fit 1

There is a positive linear relationship between the number and the risk of maternal deaths. Number of a maternal deaths is a significant predictor of the risk of maternal deaths. Increasing the number of maternal deaths by one women increases the risk of maternal deaths by 0.144%. The regression equation is as shown below(Crawley, 2012).

### Linear regression 2

fit2 <- lm(SH.MMR.RISK.ZS ~ SH.XPD.TOTL.ZS)

summary(fit2)

plot(SH.XPD.TOTL.ZS, SH.MMR.RISK.ZS, xlab = "Total Health Expenditure (% of GDP)",

ylab = "The risk of maternal deaths", main = "The relationship between %GDP spend on health and Risk Maternal death")

abline(fit2, col = 2)

Figure 8: A plot of fit 2

There is a negative relationship between the health expenditure (%GDP) and the risk of maternal deaths. This shows that as the amount spend on health increases, the risk of maternal death decreases. Increasing the total amount spend on health by 1% of the GDP reduces the risk of maternal death by 0.168%(Sainani, 2013).

## Conclusion

In conclusion, the amount spend on health (%GDP), number and risk of maternal deaths recorded since 1995 for Australia is jointly differently for all years. Australia has been improving significantly on health expenditure since 1995 to 2014, which has significantly been improving the quality of health.

## Reflections

In this analysis, it was a challenge to notice that double dots used on the MS Excel database to denote missing values had an effect on R. This lead to difficulties on the R system detecting the correct datatypes. I had to remove the dots using the MS Excel software, which solved the problem and allowed proper detection of datatypes for effective data analysis.

## References

Crawley, M. J. (2012). Regression. In The R Book (pp. 449–497). https://doi.org/10.1002/9781118448908.ch10

Everitt, B. S., Landau, S., Leese, M., & Stahl, D. (2011). Cluster Analysis. Quality and Quantity (Vol. 14). https://doi.org/10.1007/BF00154794

Oldford, R. W. (2016). Self-Calibrating Quantile–Quantile Plots. American Statistician, 70(1), 74–90. https://doi.org/10.1080/00031305.2015.1090338

Sainani, K. L. (2013). Understanding linear regression. PM and R, 5(12), 1063–1068. https://doi.org/10.1016/j.pmrj.2013.10.002

## Buy Ict110 Introduction To Data Science: Assessment Answers Online

Talk to our expert to get the help with Ict110 Introduction To Data Science: Assessment Answers from Assignment Hippo Experts to complete your assessment on time and boost your grades now

The main aim/motive of the finance assignment help services is to get connect with a greater number of students, and effectively help, and support them in getting completing their assignments the students also get find this a wonderful opportunity where they could effectively learn more about their topics, as the experts also have the best team members with them in which all the members effectively support each other to get complete their diploma assignment help Australia. They complete the assessments of the students in an appropriate manner and deliver them back to the students before the due date of the assignment so that the students could timely submit this, and can score higher marks.Â The experts of the assignment help services at www.assignmenthippo.com are so much skilled, capable, talented, and experienced in their field and use our best and free Citation Generator and cite your writing assignments, so, for this, they can effectively write the best economics assignment help services.

### Get Online Support for Ict110 Introduction To Data Science: Assessment Answer Assignment Help Online

Want to order fresh copy of the **Sample Ict110 Introduction To Data Science: Assessment Answers? ** online or do you need the old solutions for Sample Ict110 Introduction To Data Science: Assessment Answer, contact our customer support or talk to us to get the answers of it.

Want to order fresh copy of the **Ict110 Introduction To Data Science: Assessment Answers? ** online or do you need the old solutions for Sample Ict110 Introduction To Data Science: Assessment Answer, contact our customer support or talk to us to get the answers of it.