.

Statistics: Quantitative Research Methods PUBL0055

Part 1: Does rain affect turnout?

1. The interaction term in a linear regression model

In statistics, linear models are used to detect the effect of explanatory variables on the response or the outcome variable. Hypothesis statements are used to detect the significance of these explanatory variables by using standardized distribution values. Also, the effect of interaction between variables can be examined by multiply the values within the variables and adding to the model. It is much easier to interpret interactions between continuous and categorical variables. Within this model, rain was measured in a continuous scale and competitive was dummy variable, with one representing a a county from a competitive state. Therefore, multiplying values within rain and competitive variables would yield a variable with rain values in the counties located in the competitive states and zeros for those emerging from non-competitive states. For this case, adding this variable in the model with be checking whether the effect of rain in counties on the turnout depends on whether the county is from a competitive state or not. The null hypothesis is as shown below.

Null hypothesis: The effect of rain on the turnout does not depend on whether the county is from a dependent state or not.

Alternative hypothesis: The effect of rain on the turnout does not depend on whether the county is from a dependent state or not.

The significance of the interaction term is determine by have a p-value less that the chosen significance level.

2. The effects of rain on turnout based on the two models.

Model 1: The coefficient of rain in the model was -1.602. This meant that an increase in the amount of rain by one inch led to the decrease of voter turnout in a county by approximately 1.602 percent.

Model2: in the second model, the model coefficient of rain variable was -4.462. Comparing the value with the coefficient in the first model, we can see that adding the interaction term in the model had an impact on the model. Based on the coefficient, we can conclude that adding an interaction term between rain and competitive variable in the model increase the effect on rain on the turnout of voters. Therefore, as the amount of rain increases by one inch, the turnout on voter in a county reduces by 4.462 percent.

3. Model prediction based on model 2

The model can be written as shown below

statistics-assignment-solution-15-img1
  1. A county with 0 inches of rain from an uncompetitive state and has unemployment rate of 6% and a high school competition rate of 80%.
  2. statistics-assignment-solution-15-img2

    A county with 0 inches of rain from an uncompetitive state and has unemployment rate of 6% and a high school competition rate of 80% would have a voter turnout of approximately 54.97%

  3. The voter turnout of a county with 3 inches of rain, in an uncompetitive state, 6% unemployment rate and 80% high school completion rate.
  4. statistics-assignment-solution-15-img3

    Therefore, the voter turnout of a county with 3 inches of rain, in an uncompetitive state, 6% unemployment rate and 80% high school completion rate would be approximately 41.58%.

  5. The turnout of voters in a county which experienced no rains, which is loaded in a competitive state and has an unemployment rate of 6% and the high school graduation rate is 80%.
  6. statistics-assignment-solution-15-img4

    %

    The voter turnout rate for a county which experienced no rains, was from a competitive state, its unemployment rate being 6% and the high school completion rate of 80% would be approximately 59.41%.

  7. A county which experienced 3 inches of rain, from a competitive state, an unemployment rate of 6% and high school graduation rate of 80%.
  8. statistics-assignment-solution-15-img5

    A county which emperienced 3 inches amount of rain and originated from a competitive state, with an unemployment rate of 6% and high school graduation rate of 80% would approximately have a voter turnout of 60.34%.

4. The conclusion of rain being a determinant of voter turnout

From the model output, I would conclude that rain is an important determinant of voter turn out expecially in the counties within uncompetitive states. The same effect is on the counties from competitive states, however, this effect has a smaller margin compared to the uncompetitive states. Therefore, this would pose a threat to the campaigners on the uncompetitive states.

Part 2: Inequality and child mortality

  1. Theoretically grounded models on child mortality
  2. First model: This model includes the inequality_gini variable as only explanatory variable

    Second model: A model with the inequality_gini variable among other significant predictors of child mortality

    First model

    statistics-assignment-solution-15-img6

    Figure 1: Scatterpot of inequality gini against child mortality

    The figure above shows that there is a positive correlation between inequality gini score and child mortality. The scatterplot visualizes possible linear and non-linear relationships and it is a powerful tool to use in building models. From this plot, we expect to have a positive coefficient in the linear model. This is because the genarl pattern of relationship between the two variables is an increasing trend, although it is not very strong.

    Linear regression assumes that the response variable and the explanatory variables are linearly correlated. Secondly, is assumes that the residuals follows a normal distribution and they have a mean of zero and a constant variance. There should one correlation between explanatory variables.

    The hypothesis

    Overall model significance hypothesis:

    Null hypothesis: There is no difference  between the null model and my model

    Alternative hypothesis: The fitted model is better compared to the null model

    Testing significance of the parameters:

    Null hypothesis: The variable coefficients within the model are not significantly different from zero

    Alternative hypothesis: The variable coefficients are significantly different from zero

    Table 1: first model R code

    first_model <- lm(child_mortality ~ inequality_gini, data = vdem)

    summary(first_model)

    The first model has inequality_gini as the only explanatory variable

    Table 2: First model output

    Estimate

    Standard Error

    p-value

    Constant

    -6.62794

    2.50486

    0.00819

    inequality_gini

    1.16374

    0.05965

    < 0.001

    The inequality_gini is a signifiant predictor or child mortality with a coefficient of 1.1637. This indicates that an increase in inequality_gini by a scale of one would increasse the child mortality by approximately 1.16 per 1000 live births in a year.  These results are as expected based on the scatterplot between the inequality gini scores and child mortality rates. Although this model is statistically significant at 5% confidence level, the inequality gini score explains little variability of child mortality. As a result, further exploration should be performed to indentify other variables which can be used to improve the model.

    statistics-assignment-solution-15-img7

    Figure 2: First model diagnostics

    The figure above shows the diagnostics of the first model. According to the first plot, the residuals do not have a mean of zero and constant variance. This is a vaiolation of the mean of residuals being zero and homoscedacity – constant variance. Further, the quantiles-quantiles plots shows that the residuals are not normally distributed bacause the dots are not clustered along the straight line.

    Second model

    Variable selection

    statistics-assignment-solution-15-img8

    Figure 3: Barplot of correlation between the variables and child mortality

    Since we assume linearity between the response and explanatory variables, we use the strength of relationships to select the variables to be included in the second model. The plot above shows the correlations which includes only the numeric variables. The plot below shows the relationship between the region name and child mortality.

    statistics-assignment-solution-15-img9

    Figure 4: Child mortality per region

    The other 6 variables will be selected based on relevance and strenght of correlation. For instance, healthcare quality is a possible predictor of child mortality within a country and it is expected that better healthcare is associated to lower child mortality rates. Education is also another factor which should be evaluated because knowledge would lead to better decision making among parents. Therefore, mothers ability to make menaingful decisions and average years of education among citizens aged above 15 years. Further, we can also include percentage of citizens living in urban areas and number of radio and television stations per capita. The reason behind selection proportion of the population living in the urban areas is because urban centres tend to be more informed and there are a lot of resources. We have also seleted number of radio and television stations because it can be a confounder on population information dissemination and awareness. 

    Table 3: Model 2 ouput

    Estimate

    Standard Error

    p-value

    Constant

    234.522.81

    < 0.001

    inequality_gini

    -0.17

    0.03

    < 0.001

    healthcare

    -1.46

    0.23

    < 0.001

    education15

    -4.27

    0.17

    < 0.001

    womens_civ_lib

    -4.41

    1.230.0003

    radio_television_per_cap

    9.79

    2.28

    < 0.001

    urban_population_pct

    0.06

    0.02

    < 0.001

    life_expectancy

    -2.36

    0.04

    < 0.001

    In the second model, all the explanatory variables wre ststistically signficant at 95% confidence level. The coefficient estimate of the inequality gini score has reduced to -0.17.  This coefficient shows the exact expectation of the effect on the inequality gini score after controlling for the other variables. The healthcare quality variable has a negative coefficient, which is as we expected, whereby better healthcare leads to a reduction in child mortality. Also, an increase on the average number of years of education for citizens above 15 years leads to reduction in child mortality rate. Similary, wome’s ability to make reasonable decision also predicts a reduction in child mortality rates in a country. However, the expected effect of number of radio and television stations was not observed from the model. In this model, it assumes that more number of media stations would lead to increase in mortality. Similarly, we expected higher number of population living in urban centers to predict an decrease in child mortality. Finally, an increase in life-expectancy lead to a decrease in child mortality. Within this model, the set of predictor variables explains around 86.17% of the total variation between countries experienced on child mortality.

    Model diagnostics

    statistics-assignment-solution-15-img10

    Figure 5: Second model diagnostic plots

    According to the second models’ diagnostic plots, the assumptions has not violated. Although the data is not uniformaly distributed , there seem to be a mean of zero for the residuals. Therefore, the models information can be used for predictions.

  3. Fixed effects in the model: Presence of unit ortime fixed effects
  4. Unit and time fixed effects

    Table 4: A model with Internet access and civil war fixed effects

    Estimate

    Standard Error

    p-value

    2.78

    < 0.001

    Constant

    233.45

    inequality_gini

    -0.16

    0.03

    < 0.001

    healthcare

    -1.55

    0.23

    < 0.001

    education15

    -4.24

    0.17

    < 0.001

    womens_civ_lib

    -1.5

    1.230.223

    radio_television_per_cap

    9.36

    2.28

    < 0.001

    urban_population_pct

    -2.34

    0.02

    < 0.001

    life_expectancy

    0.07

    0.04

    < 0.001

    factor(internet_access)1

    -5.87

    0.56

    < 0.001

    factor(civil_wafactor(civil_war)1r)1

    2.51

    1.10

    0.0228

    According to the table above, Internet access and civil way fixed effects are statistically significant. Adding these two fixed effects in the model lead to one of the variables loosing significance. The Internet access fixed effect variable means that countries with Internet access have a lower child mortality by -5.87. Also, a country which experiences civil war has a higher predicted child mortality compared to their counterparts.

    We use year as a the time fixed effects.

    Table 5: model with year as the fixed effect

    statistics-assignment-solution-15-img11
.