The Donor Database of Blood Transfusion Service

In this assignment we consider data collected from the donor database of Blood Transfusion Service Center in Hsin-Chu City in Taiwan. The center passes their blood transfusion service bus to one university in Hsin-Chu City to gather blood donated about every three months. The current assignment involves data collected on a random sample of 748 donors. The data was obtained from the UCI Machine Learning Repository. This data was assembled by Prof. I-Cheng Yeh.

The file "transfusion.csv" contains the data. The file can be downloaded from MATH 1281 Data Files in the course page for MATH 1281. The file contains 5 variables:

  • recency= The number of months since the last donation. (numeric)
  • frequency= The total number of donations. (numeric)
  • monetary= Total blood donated (in c.c.). (numeric)
  • time= The number of months since the first donation. (numeric)
  • march2007= An indicator. Indicates those that donated blood in March, 2007. (factor)

In this assignment we consider the variables frequency and monetary.

Descriptive Statistics

Save the data set in your computer and read it into R. Compute the mean, median, the interquartile range, the standard deviation of the variable frequency and plot it's histogram. In Tasks 1-3 you are asked to describe the distribution of this variable on the basis of the computations and the plot.

Estimating Parameters

In Tasks 4-6 you are asked to estimate the expectation and standard deviation of the variable frequency. An estimator is used to estimate the expectation. This estimator has a standard deviation. You are required to estimate this standard error, which is the standard deviation of the estimator. You are required to describe which estimator was used for each estimation task.

Estimating the MSE

Consider the variable monetary. We assume that the distribution of this variable is Exponential(λ) and are interested in the estimation of the parameter λ. The proposed estimator is 1/X, where X is the sample average. In Tasks 7-8 you are required to estimate the value of the parameter and estimate the mean square error (MSE) of the estimator.

You may apply a method called The Bootstrap in order to estimate the MSE. The bootstrap method initiates by estimating the parameter λ. It proceeds with a simulation to compute the MSE, with λ equal to the value estimated from the provided data.

Submitting the Assignment

For the assignment you should complete the following 8 tasks. Tasks 1-3 refer to the descriptive statistics problem presented above, Tasks 4-6 refer to the problem of estimating parameters and Tasks 7-8 refer to the task of estimating the parameter of an Exponential distribution and estimating the MSE of the estimators. Your answers should be short and clear. We recommend that you copy and paste the tasks below into the form titled "Submit your Assignment using this Form". You can then write you answers to the tasks in the designated positions that are marked in the text:

Tasks

Descriptive Statistics:

1. The distribution of the variable "frequency" is:

__ Skewed to the left, __ Symmetric, __ Skewed to the right.

Mark the most appropriate option and explain your selection

2. The number of outlier observations in the variable "frequency" is: _____.

Explain each step in the computation of the number of outlier observations

3. Which of the following theoretical models is most appropriate to describe the distribution of the variable "frequency"?

__ Binomial, __ Poisson, __ Uniform, __ Exponential, __ Normal. 

Mark the most appropriate option and explain your selection

Estimating Parameters:

4. The estimated value of the expectation of the measurement "frequency" is:_____.

Explain your answer

5. The estimated value of the standard deviation of the measurement "frequency" is:_____.

Explain your answer

6. The estimated value of the standard deviation of the estimator that produced the estimate in 4. is:_____.

Explain your answer

Estimating the MSE:

7. The estimated value of λ for the variable "monetary" is:____.

Attach the R code for conducting the computation

8. The estimated value of the MSE of the estimator of λ is:____.

Attach the R code for conducting the computation

The distribution of the variable frequencyis: [Skewed to the right.]. (Correct answer = "Yes", incorrect answer = "No")

Explanation: Run the code:

The distribution is mostly concentrated in the region between 0 and 10, but there is a tail of larger values. The mean is larger than the median.

(Provide feedback if explanation in not adequate. Do not change the score if the match is correct but you think that the explanation is not adequate.) 

(Element 2, Yes/No scale)

2. The number of outlier observations in the variable frequencyis: [45.]. (Correct answer = "Yes", incorrect answer = "No")

Explanation: Run the code:

The outliers are in the upper tail. They are too many to count by hand. The upper threshold is Q3 + 1.5(Q3-Q1). The code "transfusion$frequency > 7 + 1.5*(7-2)" indicates these

outliers and the function "sum" sums them. (Alternatively, one may replace the function "sum" by the function "table".)

(Provide feedback if explanation in not adequate. Do not change the score if the match is correct but you think that the explanation is not adequate.) 

(Element 3, Yes/No scale)

3. Which of the following theoretical models is most appropriate to describe the distribution of the variable frequency is: [Poisson]. (Correct answer = "Yes", incorrect answer = "No")

Explanation: Frequency is a discrete variable. A good fit would be the POISSON distribution.

(Provide feedback if explanation in not adequate. Do not change the score if the match is correct but you think that the explanation is not adequate.)


(Element 4, Yes/No scale)
4. The estimated value of the expectation of the measurement frequency is: [5.514706]. (This, or a rounded version of this number = "Yes", any other number = "No")
Explanation: The sample mean is the recommended estimator of the expectation. It can be computed using the code mean(transfusion$frequency).
(Provide feedback if explanation in not adequate. Do not change the score if the match is correct but you think that the explanation is not adequate.)


(Element 5, Yes/No scale)
5. The estimated value of the standard deviation of the measurement frequency is: [5.839307 or 5.514706]. (These, or rounded up versions of these number = "Yes", any other number = "No")
Explanation: The sample standard deviation is the recommended estimator of the standard deviation of the measurement. It can be computed using the code sd(transfusion$frequency), which produces the first of the two numbers. The second number, which is the sample average, is also acceptable as an answer since for the Exponential distribution the standard deviation is equal to the expectation. 
(Provide feedback if explanation in not adequate. Do not change the score if the match is correct but you think that the explanation is not adequate.)


(Element 6, Yes/No scale)
6. The estimated value of the standard deviation of the estimator that produced the estimate in 4. is: [0.2135062 or 0.2016376]. (These, or rounded up versions of these number = "Yes", any other number = "No")
Explanation: The standard deviation of the sample average is equal to the standard deviation of the measurement, divided by the sample size. The standard deviation of the measurement was estimated in 5. The sample size is 748. The required number can be computed using the code sd(transfusion$frequency)/sqrt(748) or mean(transfusion$frequency)/sqrt(748), depending on the preferred method to estimate the standard deviation in this setting.
(Provide feedback if explanation in not adequate. Do not change the score if the match is correct but you think that the explanation is not adequate.)


(Element 7, Yes/No scale)

7. The estimated value of λ is: [0.0007264183]. (This, or a rounded version of this number = "Yes", any other number = "No")
Explanation: The proposed estimator of the parameter λ in the Exponential distribution is one over the sample average. The estimated value can be computed using the code: 1/mean(transfusion$monetary).
(Provide feedback if the attached R code is not producing the right result or if no code is attached. Do not change the score if the match is correct but the code is not working properly.)

(Element 8, Yes/No scale)
8. The estimated value of MSE of the estimator of λ is: [7.097009e-10]. (If the that the student attached works and produces a similar result = "Yes", otherwise = "No")
Explanation: We apply the Bootstrap for the estimation of the MSE. For example, one may use the following code:

 

hihi


Want latest solution of this assignment