BA02 Advanced Analytics Using R

WARNING - Clicking on the "SUBMIT ASSIGNMENT" button will submit the

Assignment. Be sure that you have reviewed your answers before clicking it. Attempt all the questions. All questions are compulsory. Each question carries 4 marks. There is No Negative Marking for wrong answer/s.

Please note: There are 25 questions out of which Q.No.21-25 are based on the Case Study.

Subject Code: BA02

Subject Name:

ADVANCED ANALYTICS

USING R 

Component name: TERM

END

Question 1:- A Type I error occurs when we:

a)         reject a false null hypothesis                       

b)         reject a true null hypothesis                       

c)         do not reject a false null hypothesis                       

d)         fail to make a decision regarding whether to reject a hypothesis or not                       

Question 2:- Parametric tests, make assumptions about:

a)         The population size                       

b)         The sample size                       

c)         The underlying distribution                       

d)         The standard deviation of the distribution                       

Question 3:- In regression analysis, which of the statements is true?

a)         Residual is checked but there is no rule for residuals.                       

b)         The sum of residual should be less than zero always.                       

c)         The plot of residuals need not be normally distributed.                       

d)         The plot of residuals is normally distributed.                       

Question 4:- If you have only one independent variable, how many coefficients will you require to estimate in a simple linear regression model?

a)         one                       

b)         two                       

c)         three                       

d)         Enough information is not given to comment.                       

Question 5:- Homogeneity of Variance (HOV) implies equal variance for all comparison groups. a)   TRUE                       

b)   FALSE                       

Question 6:- If we need to model the Churn of customers, where the outcome variable will have values Yes or No, i.e. the customer churns or not, then what model can we use?

a)         Linear Regression                       

b)         Logistic Regression                       

/

  Chi-square                       

  None of the above                       

Question 7:- The relationship between volume of beer consumed (x) and blood alcohol content (y) was studied in 16 male college students by using least squares regression. The following regression equation was obtained from this study: Y = -0.0127+ 0.0180x The above equation implies that:

  each unit volume beer consumed increases blood alcohol by 1.27%                       

  on average it takes 1.8 volume of beers to increase blood alcohol content by 1%                       

  each unit of beer consumed increases blood alcohol by an average of 1.8%                          each unit of beer consumed increases blood alcohol by exactly 0.018                       

Question 8:- A good visual tool to identify outlier is :

  Pie chart                       

  Boxplot                       

  Line chart                       

  Treemap                       

Question 9:- You are interested to find the association between diet supplement and bloating caused by it among 150 elderly patients. There were three types of diet supplements given to the respondents - high fibre, low fibre and no fibre. The respondents were asked about the bloating they experienced and they reported four levels of bloating high, low, moderate and none. Which hypothesis test will you use for the above situation to check if fibre diet causes bloating?

  ANOVA                       

  Homogeneity of Variance                       

  Chi Square Test of independence                          Paired or dependent t test                       

Question 10:- If there is a very strong correlation between two variables then the correlation coefficient must be:

  any value larger than 1                       

  much smaller than 0, if the correlation is negative                       

  much larger than 0, regardless of whether the correlation is negative or positive                       

  None of these alternatives is correct                       

Question 11:- A bank manager wants to see if the average quarterly balance (in rupees) of accounts, where the primary account holder is male vs where the primary account holder is female, is equal. Which hypothesis test will he use? Is there a pre-condition for the same?

  Chi Square test and no pre-condition check is applicable.                       

  Paired t test and no pre-condition needs to be checked.                       

  2 independent sample t test and Normality of distribution needs to be checked for account balance for male and female separately.                       

  1 sample t test and Normality of distribution needs to be checked for account balance.                        

Question 12:- In R, the function to check Normality of distribution are : I. ad.test() II. t.test() III. shapiro.test()

  All three                       

  I and II                       

  I and III                          II and III                       

Question 13:- A company markets 4 groups of pesticide brands. It wants to compare the effectiveness (milligram per ml of blood) of these 4 pesticides. You have performed ANOVA and found that the effectiveness across 4 groups are not equal. In order to find out which pair is unequal, which test/method can you use?

  Perform ANOVA again                       

  1 sample t test                       

  Tukey post hoc analysis                       

  Normality test                       

Question 14:- In a linear regression model, when there are multiple attributes or x variables, which value do we use to check model performance?

  Adjusted R squared value                       

  R squared value                       

  Value of Coefficients of the x variables                       

  Residual standard error                       

Question 15:- Which of the following methods do we use to find the best fit line for data in Linear Regression?

  Ordinary Least Square                       

  Maximum Likelihood Estimator                       

  Mean-squared Error                       

  None of the above                       

Question 16:- What is the function used for logistic regression in R?

  lm()                          glm()                        

  gbm()                       

  rm()                       

Question 17:- Which of the following evaluation metrics can not be applied in case of logistic regression output to compare with target?

  Accuracy                       

  Sensitivity                       

  Specificity                       

  Mean-squared error                       

Question 18:- Which of the statements is true?

  Linear regression is not a Machine Learning algorithm.                       

  Binary logistic regression is a tree based algorithm.                       

  Binary logistic regression is an unsupervised classification method                       

  Linear regression and binary logistic regression both are supervised machine learning methods.                       

Question 19:- A research team has collected some data on adults - no. of cigarettes smoked per day (X1), exercising or not daily (X2), Age (X3) and whether they have lung cancer or not. They would like to use this data to understand what lifestyle is more likely to get lung cancer. Which model can they use?

  Linear regression                       

  Cluster Analysis                        

  Binary Logistic regression                       

  PCA                       

Question 20:- Which method gives you odds ratio?

  ANOVA                       

  Chi Squared test of independence                       

  1 sample t test                       

  Binary Logistic regression                       

Case Study

The Health department of a city corporation wants to assess factors related to the likelihood that a hospital patients acquires an infection while hospitalized. The variables here are y = infection risk, x1 = average length of patient stay (days), x2 = patient age (years), x3 = no. of beds in the hospital. 

The coefficients are as follows :

Term Coeff Pvalue 

Constant 1.0 0.49

Stay 0.41 0.00

Age -0.023 0.33

Beds 0.025 0.001

Model Summary 

R sq. 70.14% R-sq(adj) 66.41%

Question 21:- Based on the output above, which are the significant factors/variables which are affecting infection risk while a patient is hospitalised?

  Stay, Age and No. Of beds in hospital                       

  Stay and No. Of beds in hospital                       

  Age                       

  Stay                       

Question 22:- What does the R-sq adjusted value signify here?

  The model explains 66.41% of the observed variation in infection risk                       

  The model explains 33.59% of the observed variation in infection risk                       

  The model explains 70% of the observed variation in stay                        

  The model explains 66.41% of variation in stay                       

Question 23:- Based on the above output, which statement about age is correct?

  Age is not a significant factor here.                       

  Age affects infection risk negatively.                       

  For one year increase in age, the average infection risk goes down by -0.23 unit                          None of the above                       

Question 24:- What are the tests for residuals?

  Residuals are not normal                       

  Normality of residual can be tested and also plotted.                       

  Check for Specificity and accuracy of residuals                       

  None of the above                       

Question 25:- If there was another factor as X-ray done during the hospital stay (values - done, not done), then can it be accommodated in linear regression model?

a)         Perform logistic regression                       

b)         Yes. Create a dummy variable for X-ray done then perform linear regression                       

c)         Yes. Do regression with all factors including X-ray                       

d)         None of the above                       

hihi


Want latest solution of this assignment

Want to order fresh copy of the Sample Template Answers? online or do you need the old solutions for Sample Template, contact our customer support or talk to us to get the answers of it.