UCI Machine Learning Repository

In this assignment we consider data collected from the donor database of Blood Transfusion Service Center in Hsin-Chu City in Taiwan. The center passes their blood transfusion service bus to one university in Hsin-Chu City to gather blood donated about every three months. The current assignment involves data collected on a random sample of 748 donors. The data was obtained from the UCI Machine Learning Repository. This data was assembled by Prof. I-Cheng Yeh.

The file "transfusion.csv" contains the data. The file contains 5 variables:

  • recency = The number of months since the last donation. (numeric)
  • frequency = The total number of donations. (numeric)
  • monetary = Total blood donated (in c.c.). (numeric)
  • time = The number of months since the first donation. (numeric)
  • march2007 = An indicator. Indicates those that donated blood in March, 2007. (factor)

In this assignment we consider the variables frequency and monetary.

Descriptive Statistics

Save the data set in your computer and read it into R. Compute the mean, median, the interquartile range, the standard deviation of the variable frequency and plot it's histogram. In Tasks 1-3 you are asked to describe the distribution of this variable on the basis of the computations and the plot.

Estimating Parameters

In Tasks 4-6 you are asked to estimate the expectation and standard deviation of the variable frequency. An estimator is used to estimate the expectation. This estimator has a standard deviation. You are required to estimate this standard error, which is the standard deviation of the estimator. You are required to describe which estimator was used for each estimation task.

Estimating the MSE

Consider the variable monetary. We assume that the distribution of this variable is Exponential(λ) and are interested in the estimation of the parameter λ. The proposed estimator is 1/X, where X is the sample average. In Tasks 7-8 you are required to estimate the value of the parameter and estimate the mean square error (MSE) of the estimator.

You may apply a method called The Bootstrap in order to estimate the MSE. The bootstrap method initiates by estimating the parameter λ. It proceeds with a simulation to compute the MSE, with λ equal to the value estimated from the provided data.

For the assignment you should complete the following 8 tasks. Tasks 1-3 refer to the descriptive statistics problem presented above, Tasks 4-6 refer to the problem of estimating parameters and Tasks 7-8 refer to the task of estimating the parameter of an Exponential distribution and estimating the MSE of the estimators. Your answers should be short and clear. We recommend that you copy and paste the tasks below into the form titled "Submit your Assignment using this Form". You can then write you answers to the tasks in the designated positions that are marked in the text:

Tasks

Descriptive Statistics:

1. The distribution of the variable "frequency" is:

__ Skewed to the left, __ Symmetric, __ Skewed to the right.

Mark the most appropriate option and explain your selection

Since the mean is greater than the median, the distribution of the variable "frequency" is skewed to the right. The histogram is also indicated the same.

2. The number of outlier observations in the variable "frequency" is: 45.

Explain each step in the computation of the number of outlier observations

There are 45 observations above the upper fence, i.e. Q3 + 1.5 IQR = 7 + 1.5*5 = 14.5, therefore, all those 45 observations were treated as outliers.

3. Which of the following theoretical models is most appropriate to describe the distribution of the variable "frequency"?

__ Binomial, __ Poisson, __ Uniform, __ Exponential, __ Normal.

Mark the most appropriate option and explain your selection

The theoretical models most appropriate to describe the distribution of the variable "frequency" is Exponential distribution. From the histogram it can be observed that the range 0-5 had very huge frequency, then the next class 5-10 had almost half of the frequency of the first class, ans so on. This kind of huge decrease in the frequency indicated that the distribution of the variable "frequency" is matched with Exponential model is the most appropriate.

Estimating Parameters:

4. The estimated value of the expectation of the measurement "frequency" is: 5.5147.

Explain your answer

The sample mean is an estimate of the population mean. Therefore, the estimated value of the expectation of the measurement "frequency" is the sample mean of the variable “frequency” i.e. 5.5147.

5. The estimated value of the standard deviation of the measurement "frequency" is: 5.8393.

Explain your answer

The sample standard deviation is an estimate of the population standard deviation. Therefore, the estimated value of the standard deviation of the measurement "frequency" is the sample standard deviation of the variable “frequency” i.e. 5.8393.

6. The estimated value of the standard deviation of the estimator that produced the estimate in 4. is: 0.2135.

The estimated value of the standard deviation of the estimator that produced the estimate in 4. Is,

Sample standard deviation / Sqrt (n) = 5.8393 / Sqrt (748) = 0.2135

Explain your answer

Estimating the MSE:

7. The estimated value of λ for the variable "monetary" is: 0.000725.

Attach the R code for conducting the computation

The R code for conducting the computation is,

# Estimating the parameter Lambda

x_bar = mean(d1$monetary)

x_bar

Lambda_hat = 1/x_bar

Lambda_hat

8. The estimated value of the MSE of the estimator of λ is: 0.000000248.

Attach the R code for conducting the computation

The R code for conducting the computation is,

# Estimating the MSE

simdata = rexp(n = 5000, rate = Lambda_hat)

matrixdata = matrix(simdata, nrow = 1000, ncol = 5)

Lambda.exp = 1/apply(matrixdata, 1, mean)

bias.exp = 1/apply(matrixdata, 1, mean) - Lambda_hat

bias <- mean(bias.exp)

var <- var(Lambda.exp) * (999/1000)

mse <- bias^2 + var

rbind(bias, var, mse)

hihi


Want latest solution of this assignment

Want to order fresh copy of the Sample Template Answers? online or do you need the old solutions for Sample Template, contact our customer support or talk to us to get the answers of it.