.

Apply underastanding of the various statistical tests

Data Analysis Assignment

Name

Course Number

Date

Faculty Name

Data Analysis Assignment

Question 1 – comparing 3 cohorts of students in terms of their self-reported age, gender and programme of study.

  1. Appropriate statistical methods for comparing age, gender and programme of study

For any statistical test, it is important to understand the data types of the variables.

Age – discrete variable

Gender – categorical variable with 2 levels

Programme of study is a categorical variable with more than two levels

Age and Programme of study

We can use stratified descriptive statistics to understand the distribution of age across the different programmes of study. Mean or median is used as the measure of central tendency based on the distribution shape of the age variable. Therefore, a one-way ANOVA will be appropriate to test whether there are significant differences between average ages across the different categories of the programme of study. If the distribution of age is skewed (not normally distributed), Kruskal Wallis test is used and a non-parametric test Bonferroni used to identify the specific difference if the test is statistically significant.

Gender and programme of study

Both of these variables are categorical variables and the use of contingency tables explains the distribution of gender categories across the different programme groups. A chi-square test is effective in testing for possible dependence between gender and selection of programme of study.

  • Hypothesis and statistical test mentioned above
  • Null hypothesis: There is no statistical difference in mean ages across the study programme groups

    Alternative hypothesis: At least one study programme groups have students with a mean age which is statistically significantly different from the others.

    Age and programme of study

    statistics-assignment-solution-10-img1

    The histogram of age in years above shows a slight skewness of the age distribution with an average of 29.3 and a standard deviation of 5.8 years. Therefore, we use the one way ANOVA to test the difference in means.

    statistics-assignment-solution-10-img2

    The distribution of a number of students on different programmes varies significantly with most of the doing psychology.

    statistics-assignment-solution-10-img3

    The p-value of the one-way ANOVA test is greater than a 5% significance level. We conclude that there are statistically significant differences in mean ages across the different study programmes.

    Gender and programme of study

    We used the chi-square test of independence between gender and programme of study

    Null hypothesis: Selection of the programme of study does not depend on gender

    Alternative hypothesis: Selection of the programme of study depends on gender

    statistics-assignment-solution-10-img4

    In all the study programmes, female (64.72%) is more than the male (35.28%) students.

    statistics-assignment-solution-10-img5 statistics-assignment-solution-10-img6

    The chi-square test is not statistically significant which indicates that study programme selection does not dependent on gender.

    Question 2 – Level of flourishing and level of perceived stress

    1. Hypothesis
    2. Null hypothesis: There is no significant association between the level of flourishing and perceived stress

      Alternative hypothesis: There is no significant association between the level of flourishing and perceived stress

      1. Pearson's correlation
      2. statistics-assignment-solution-10-img7

        There is a statistically significant (p-value < 0.001) correlation (0.98) between composite flourishing and perceived stress score.

      3. Linear regression
      4. statistics-assignment-solution-10-img8

        The linear model of predicting perceived stress level using flourishing score is statistically significant at 95% confidence level. The predictor explains approximately 95% of the variation in the perceived stress score.

    3. Any problem with the analysis above
    4. First, I investigate the distribution of the response variable to check if it meets the normality assumption.

      statistics-assignment-solution-10-img9

      The graph above shows that the response variable is highly skewed with extreme values of around negative 100. However, we do not expect to have such values because the data is an interval data and negative 100 is out of the bounds. After investigating the data, I found out that negative 99 was coded for missing values and make the definition, which leads solves the problem as shown in the histogram below.

      statistics-assignment-solution-10-img10

      The histogram above shows that the response variable is now normally distributed and it can be used in the linear regression model.

    5. Redoing the analysis in part A
    6. After redoing the correlation test, the results change significantly and they are as shown below.

      statistics-assignment-solution-10-img11

      The new correlation test after defining negative 99 as missing values indicate that there is no significant correlation between flourishing score and stressed score.

      Further, the linear regression results also change significantly and as shown below, flourishing score is not a statistically significant predictor of perceived stress score.

      statistics-assignment-solution-10-img12

      Prediction for the flourishing score of 35.

      statistics-assignment-solution-10-img13

    Question 3

    1. Testing the functionality of random assignment
    2. In this case, we have two variables, a categorical and discrete variable.  We can use descriptive statistics to understand the distribution of the variable. Therefore, I propose the use of one-way ANOVA to test there significant differences in means between the two categories (Gain and Loss). This is a classical statistical test which assumes that the response variable; in this case, fear of statistics assumes a normal distribution. Therefore, we will use the histogram to understand the distribution before applying the one-way ANOVA. If the normality assumption is violated, we then choose to use the non-parametric version – the Kruskal Wallis test.

      statistics-assignment-solution-10-img14

      The histogram above shows that fear of statistics assumes a normal distribution. Therefore, we will use the parametric test – one-way ANOVA to test for the equality of means.

    3. Hypothesis
    4. Null hypothesis: There are no differences in the average fear of statistics between the loss and gain groups

      Alternative hypothesis: There are statistically significant differences in an average fear of statistics between the loss and gain groups

      statistics-assignment-solution-10-img15
    5. Examining if there is framing effect
    6. The p-value of the test is less than the significance level, we conclude that the means of the two groups are statistically different. Those in the main frame had a higher fear of statistics score compared to those in loss frame.

      statistics-assignment-solution-10-img16
    .