Practical Data Management and Analysis for Public Health Assignment 8


Graded Assignment 8

For this week’s practice assignment you’ll be working with three simulated datasets. Please download the dataset DataAssign8A.sav, DataAssign8B.sav, and DataAssign8C.sav, and the corresponding codebooks CodebookDataAssign8A, CodebookDataAssign8B, and CodebookDataAssign8C.doc from the Practice Assignment 8 page of the Assignments section of this course.

Assignment

Part A. For this Part of the assignment our interest is in the heights of elementary school children, and our focus will be on the mean heights of children in the five grades: kindergarten through fourth.

(A.1) Using the dataset DataAssign8A, write and run SPSS commands to create dummy variables for grade: KinderDum, FirstDum, SecondDum, ThirdDum, and FourthDum. Each of these variables should take the value 1 for children in the corresponding grade, and 0 for children in other grades.

(1B) Set up and run a linear regression model with dummy variables to test the omnibus null hypothesis that the mean height is identical across the five grades, and to contrast the mean heights of children in grades 1, 2, 3, and 4 with the mean height of children in kindergarden.

(1C) Set up and run an analysis of variance (ANOVA) to test the omnibus null hypothesis that the mean height is identical across the five grades, and to contrast the mean heights across the five grades using Tukey’s Honestly Significant Difference approach to control the overall Type 1 error rate..

Part B. For this Part of the assignment our interest is in the effects of voluntary enrollment in a weight loss program on the amount of weight people lose, and we need to deal with a potential confounder: Motivation to lose weight. People who enroll in the weight loss program may be more motivated to lose weight, on average, than people who chose not to enroll in the program, so when we simply compare enrolled and unenrolled people in terms of the amount of weight they lose, we may also implicitly be comparing more motivated to less motivated people. By measuring motivation and including it in our regression model, however, we seek to adjust for its confounding influence and obtain a more credible estimate of the effect of enrollment in the weight loss program.

(B.1) Using the dataset DataAssign8B, run a linear regression model in which enrollment in the weight loss program is the independent variable and amount of weight lost is the dependent variable.

(B.2) Run a second linear regression model in which, as before, amount of weight lost is the dependent variable, but there are two independent variables rather than one: enrollment in the weight loss program (as before) and motivation to lose weight (the new addition).

Part C. For this Part of the assignment our interest is in the effects of graduating from a four year college on earnings at age 50, and we need to deal with three potential confounders: the socioeconomic status (SES) of participants’ families of origin; participants’ intelligence, and participants’ work ethic. Each of these variables may influence the likelihood that an individual attends and graduates from a four year college. And each may have an independent impact on earnings. As such, when we compare college graduates to people who did not graduate from college in terms of their earnings years later, we may implicitly be comparing people from better-off families to people from worse-off families; people with more in-born intelligence to people with less in-born intelligence; and people with stronger work ethics to people with weaker work ethics. These variables may generate an association between college graduation and earnings that is not actually attributable to an effect of college graduation. Our strategy, then, will be to adjust for these potentially confounding variables in order to obtain a more credible estimate of the effect of college graduation on earnings.

(C.1) Using the dataset DataAssign8C, run a linear regression model in which graduation from a four year college is the independent variable and earnings at age 50 is the dependent variable.

(C.2) Create dummy variables to represent the categories of high school attendance. Name these DumHSA1, DumHSA2, DumHSA3, and DumHSA4 to represent poor, fair, good, and excellent high school attendance records, respectively.

(C.3) Add to the regression model you ran in C.2 the following variables: family SES (as represented by father’s occupational prestige index), high school SAT score (a proxy for intelligence), and high school attendance pattern (a proxy for work ethic). Family SES and SAT score can be added to the model as they are; for high school attendance record, For high school attendance pattern, use three of the four dummy variables, with the omitted group serving as the reference.

Part D. Write a report that summarizes the results you obtained in these three parts. For Part A, what do the regression and ANOVA analyses tell you about differences between the grades (kindergarten through fourth) in terms of children’s average heights? For Part B, what do the two regression models suggest about the effect of enrollment in the weight loss program on amount of weight lost? To what extent is the association between enrollment and weight loss attributable to confounding by motivation? For Part C, what do the two regression models tell you about the effect of graduating from a four year college on earnings at age 50? To what extent is the association between graduation and earnings attributable to confounding by family background, in-born intelligence, and work ethic?

Submission

In the Assignments section, please upload the following items to the Practice Assignment 8 page 24 hours before live session 8:

1. Your syntax file for carrying out the analyses requested in Parts A, B, and C above;

2. Your written report as described in Part D.