Blood Transfusion Service Center

Written Assignment 2

In this assignment we consider data collected from the donor database of Blood Transfusion Service Center in Hsin-Chu City in Taiwan. The center passes their blood transfusion service bus to one university in Hsin-Chu City to gather blood donated about every three months. The current assignment involves data collected on a random sample of 748 donors. The data was obtained from the UCI Machine Learning Repository. This data was assembled by Prof. I-Cheng Yeh.

The file "transfusion.csv" contains the data. The file can be downloaded from MATH 1281 Data Files in the course page for MATH 1281. The file contains 5 variables:

  • recency= The number of months since the last donation. (numeric)
  • frequency= The total number of donations. (numeric)
  • monetary= Total blood donated (in c.c.). (numeric)
  • time= The number of months since the first donation. (numeric)
  • march2007= An indicator. Indicates those that donated blood in March, 2007. (factor)

In this assignment we consider the variablesfrequencyandmonetary.

Descriptive Statistics

Save the data set in your computer and read it into R. Compute the mean, median, the interquartile range, the standard deviation of the variablefrequencyand plot its histogram. In Tasks 1-3 you are asked to describe the distribution of this variable on the basis of the computations and the plot.

From the histogram, my assumption is that the data follow an Exponential distribution. There is a long tail to the right; therefore the dataset is right skewed.

Question: is there a way to prove that the dataset follows an exponential distribution?

Estimating Parameters

In Tasks 4-6 you are asked to estimate the expectation and standard deviation of the variable frequency. An estimator is used to estimate the expectation. This estimator has a standard deviation. You are required to estimate this standard error, which is the standard deviation of the estimator. You are required to describe which estimator was used for each estimation task.

The variance of the sample average is: 0.046019.

Here I am getting a much smaller variance than in the previous method. Which one should I choose? I assume the previous one, because I do not see why using the Uniform distribution, but it was in the book like this, although I couldn’t figure out why.

Estimating the MSE

Consider the variable monetary. We assume that the distribution of this variable is Exponential(λ) and are interested in the estimation of the parameter λ. The proposed estimator is 1/X, where X is the sample average. In Tasks 7-8 you are required to estimate the value of the parameter and estimate the mean square error (MSE) of the estimator.

Descriptive statistics of “monetary” (just for an impression):

You may apply a method called The Bootstrap in order to estimate the MSE. The bootstrap method initiates by estimating the parameter λ. It proceeds with a simulation to compute the MSE, with λ equal to the value estimated from the provided data.

Submitting the Assignment

For the assignment you should complete the following 8 tasks. Tasks 1-3 refer to thedescriptive statisticsproblem presented above, Tasks 4-6 refer to the problem ofestimating parametersand Tasks 7-8 refer to the task of estimating the parameter of an Exponential distribution andestimating the MSEof the estimators. Your answers should be short and clear. We recommend that you copy and paste the tasks below into the form titled "Submit your Assignment using this Form". You can then write you answers to the tasks in the designated positions that are marked in the text:

Tasks

Descriptive Statistics:

  1. The distribution of the variable "frequency" is:

__ Skewed to the left, __ Symmetric, _x_ Skewed to the right.

Mark the most appropriate option and explain your selection

  1. The number of outlier observations in the variable "frequency" is: 45.

Explain each step in the computation of the number of outlier observations

  1. Which of the following theoretical models is most appropriate to describe the distribution of the variable "frequency"?

__ Binomial, __ Poisson, __ Uniform, _x_ Exponential, __ Normal.

Mark the most appropriate option and explain your selection

Estimating Parameters:

  1. The estimated value of the expectation of the measurement "frequency" is: 5.5147.

Explain your answer

5.5147 is the sample average. Because in this sample n is larger than 30 and the observations are unrelated, we can assume it is a good estimator for the whole population.

After running the following test, I get that there is almost certainty that the population average will be in the range 5.0147 and 6.0147:

  1. The estimated value of the standard deviation of the measurement "frequency" is:_ 5.8393____.

Explain your answer

5.8393 is the standard deviation of the sample. For the same reasons than in the previous question it can be assumed as a good estimation for the standard deviation of the population.

  1. The estimated value of the standard deviation of the estimator that produced the estimate in 4. is:_0.21373_.

Explain your answer:

Simulating a sampling distribution of the mean from the given sample and finding the standard deviation, with the assumption that the mean of the sample is the mean of the population:

Estimating the MSE:

  1. The estimated value of λ for the variable "monetary" is: 0.0007253_.

Attach the R code for conducting the computation

8. The estimated value of the MSE of the estimator of λ is:_ 2.797196e-07 (basically, 0).NOTE: But because the value is found by simulation, every run of the code will show a different number, although they are all very small. I attach the code and several runs with their outputs. 

Attach the R code for conducting the computation

According to Yakir (2011, p. 170), to find the MSE, one needs to compute the estimator (lambda in our case), with 0 bias, and an alternative estimator slightly smaller than the estimator ([19/20]lambda).  The bias of the alternative estimator (alt_est) is the expectation of the alt_est – expectation of the estimator. Then MSE = variance of alt_est + (bias of alternative estimator)^2.

I CANNOT MAKE ANY SENSE OF THIS

hihi


Want latest solution of this assignment

Want to order fresh copy of the Sample Template Answers? online or do you need the old solutions for Sample Template, contact our customer support or talk to us to get the answers of it.