Independent and Dependent Variables in stats solution

Healthcare Application Assignment

Part1:

There is a big question, a really very important question in the context of US demographics: What has caused the number of live births to go down over last one decade (2006-2015) so dramatically? The fall is so steep, especially for Mexican and Central and South American, that it starkly visible, on the other hand there is a sharp increase for Other and Unknown group. ( The question is, whats working behind this stark difference in the pattern number of live births).

Also, the question does not end here. If we go back before 2005-2006, we see that the birth rate and fertility rate for Mexicans and Central and South American is almost in a fixed range ( not varying much, even its decreasing) but there is a sharp increase in the number of live births. (What caused these mismatches).

We can also see a pattern that, over last few years, Cuban is the only group which has not shown a dramatic drop in fertility rate or birth rate. Also, at this almost constant rate, their number of live births is showing a big jump, almost doubled over last one decade. So, why this disproportionate growth rate among different demographics groups. Is it happening under some policy or it’s a social manipulation by some specific groups or there is some manipulation while recording the data (i.e. may be the complete information is not being disclosed or whatsoever). This pattern should be addressed. Also, there is no birth rate and fertility rate data is available for Other and unknown Hispanic groups.

To analyse the data in more details, the following steps is followed:

Defining Variable

Independent Variable – Hispanic Origin , Fertility rates

Dependent Variable – Live Births, Birth Rates

Identifying and explaining a null hypothesis using the dataset.

Null Hypothesis H₀ : All of the Hispanic Group has same birth rate and fertility rate

Alternate Hypothesis H₁ : Different Hispanic Group has different birth rate and fertility rate

As we can see, the birth rates and fertility rates are different for different Hispanic groups. So, the null hypothesis is rejected here and the alternate hypothesis is accepted.

Probability Distribution

For probability distribution we have sorted the column of fertility rate. The minimum value is 45.4 and the maximum value is 118.9. Some of the values are missing for some Hispanic group( Not given birth rate and fertility rate for other and unknown Hispanic). So, The Fertility rate is taken as NULL for the other and unknown Hispanic group.

Total number of birth is 22344315.

Fertility Rate	Related number of Live Births	Probability
45.4=< FR<60	701911	0.03
60=< FR<75	3137231	0.14
75=< FR< 90	2725012	0.12
90=<FR<105	6711829	0.30
105=<FR<118.9	6214921	0.28
NULL	2853411	0.13

Regression

The data has been sourced from https://data.cdc.gov/NCHS/NCHS-Natality-Measures-for-Females-by-Hispanic-Ori/s54h-bixi. This dataset includes live births, birth rates, and fertility rates (Dependent variables in our analysis) by Hispanic origin of mother(Independent variable in our analysis). As we can see, the line graph of Live Births , Birth rate and fertility rate shows a trend.

Looking into the shape of the curve, we can expect a simple linear regression between fertility rate and birth rate. A strong correlation is suggested here.

But Live births Vs Birth rate or Live births Vs Fertility rate would not show a strong correlation. This is being supported by following data:

Scatter plot

Scatterplot has been for two pairs of a quantitative variable. A year has been not considered as a quantitative variable. The first two variables I have chosen are “live births” and “birth rates”.

The pattern of the graph here indicates that there exists very weak correlation between “live births” and “birth rates” variables. (Already suspected and discussed in the coloured line graph on last page). Further, the inference gets refuted by the correlation coefficient between the two variables which is moderately high (0.51). Such phenomenon is observed mainly because of different scale of the two parameters selected. Also, we can see in the coloured line graphs that increase and decrease in the two lines under consideration in not fully synchronised for all the years. It is synchronised for some period, that is why there is at-least a moderate correlation coefficient (0.51).

Second two variables I have chosen are “birth rates” and “fertility rates”.

The pattern of the graph suggests that there is a strong positive correlation between two variables. (Also, proved by looking at the almost parallel synchronisation in the coloured line graphs). Since the two datasets are moving together in a single direction, therefore, I can make such inference. Further, it may be argued that such inferences could not be made without looking at the correlation coefficient. Correlation coefficient is determined which is 0.97. (strongly correlated).

The equation would be, Fertility rate= (3.391233) Birth rate + 14.91789

Histogram of two variable

The first variable I chose was the live births. In the histogram below, the distribution of data is not normal as evidenced in shape. Further, the mean (green vertical bar) and median (red vertical bar) are quite distant to each other. Since median is lower than mean, it has a skewed positive. This is also evident from the probability distribution under point 3. There also, the probability distribution is not normal but skewed. The same is being reflected here through the histogram.

The second variable I chose was the birth rates. In the histogram below, the distribution of data is not normal as evidenced in the shape. Further, the mean (green vertical bar) and median (red vertical bar) are quite distant to each other. Since median is higher than mean, it has a skewed negative. The non-normal behaviour of the data might be attributed to the fact that there has been sharp rise and sharp fall in the birth rates in some of the period between 1985 to 2015.

Statistical test/Hypothesis testing

Test1: I have calculated a 95% confidence interval to estimate the value of birth rate. I have used Statcrunch for this work.

Lower limit of birth rate is 17.507304 and higher limit is 19.915773. We are 95% confident that the birth rate would lie between these two limits.

Test2: I have calculated a 95% confidence interval to estimate the value of fertility rate.

Lower limit of Fertility rate is 74.17 and higher limit is 82.568747. We are 95% confident that the Fertility rate would lie between these two limits.

Test3: I hypothesize that the birth rate lies below 18.71 (found from the sample).

H₀: birthrate= 18.71 H₁: birthrate<18.71

I will use a significance (α) level of 0.05.

The p-value is 0.501, which is considerably higher than the α-level of 0.05, so we have strong evidence to NOT reject the null hypothesis. This means our claim for birth rate less than 18.71 is not supported.

Test4: I hypothesize that the fertility rate lies below 78.37 (found from the sample).

H₀: fertilityrate= 78.37 H₁: fertilityrate<78.37

I will use a significance (α) level of 0.05.

The p-value is 0.501, which is considerably higher than the α-level of 0.05, so we have strong evidence to NOT reject the null hypothesis. This means our claim for fertility rate less than 78.37 is not supported.

Relevance for other nurses

The dataset I chose is natality measures for females of Hispanic origin subgroup in the United States. Therefore, in cases where a patient of Hispanic origin comes for natality related issues, it is very pertinent to see and analyse this data set. It should be noted that dataset comprises of the following origin: Cuban, Puerto Rican, Mexican, Central and South American

Further, it should be noted that samples of different origins are different. It is worthwhile mentioning the fact that sample size plays an important role in such analysis. Larger sample size increases the time involved in the calculation and in data collection while smaller sample size may not be a true reflection of population characteristics. The total sample size involved in this analysis is 104. Though it seems a good number after looking at classification of Hispanic origin, a larger sample would produce better results.

statistics-assignment-solution-14-img17(1)

statistics-assignment-solution-14-img17(2)

Those two numbers above are a range of values in which 95% of the population mean lies. Hence it is worthwhile to mention that . This means that fertility rate of females of Hispanic origin subgroup in United States is between 82.6171 and 74.1229.

Proposal to management

While conducting a demographic assessment, natality measures play a key role. Further, it has been argued that natality measures of women of various subgroups are different. Hence Hispanic subgroup is emerging as one of the key subgroups in the United States; it is pertinent to analyse the natality measures for females of Hispanic origin subgroup in the United States. The study would benefit not only healthcare units but may also extend for understating the societal fabrics. The study would help in checking the social manipulation by any group. The study would help in assessing any ongoing policy working on controlling the demographic dynamics.

As discussed earlier, the growth rate of different Hispanic groups are different (contrasting) during pre-2005-6 and post 2005-06. (here two representatives of Contrasting groups, as an example):

Mexican:

Cuban:

But Birth Rates and Fertility Rates tell a different point:

Mexican:

Cuban:

I expect this pattern study to help in macro level of Social, Political, Medical Demographic planning.

The methodology I proposed for such study is a sample based study, where first am shall be to determine the appropriate sample size. The characteristics of the data must be analysed using descriptive measures such as mean, median, standard deviation, standard error etc. Post that test hypothesis must be tested at 95% confidence interval. Finally, empirical relationship shall be developed using correlation and scatterplot.

Not the Exact Question you were looking for? Post your question for instant answers.