BA02 Advanced Analytics Using R
Question 1:- A Type I error occurs when we: a) reject a false null hypothesis b) reject a true null hypothesis c) do not reject a false null hypothesis d) fail to make a decision regarding whether to reject a hypothesis or not Question 2:- Parametric tests, make assumptions about: a) The population size b) The sample size c) The underlying distribution d) The standard deviation of the distribution Question 3:- In regression analysis, which of the statements is true? a) Residual is checked but there is no rule for residuals. b) The sum of residual should be less than zero always. c) The plot of residuals need not be normally distributed. d) The plot of residuals is normally distributed. Question 4:- If you have only one independent variable, how many coefficients will you require to estimate in a simple linear regression model? a) one b) two c) three d) Enough information is not given to comment. Question 5:- Homogeneity of Variance (HOV) implies equal variance for all comparison groups. a) TRUE b) FALSE Question 6:- If we need to model the Churn of customers, where the outcome variable will have values Yes or No, i.e. the customer churns or not, then what model can we use? a) Linear Regression b) Logistic Regression / |
Chi-square
None of the above
Question 7:- The relationship between volume of beer consumed (x) and blood alcohol content (y) was studied in 16 male college students by using least squares regression. The following regression equation was obtained from this study: Y = -0.0127+ 0.0180x The above equation implies that:
each unit volume beer consumed increases blood alcohol by 1.27%
on average it takes 1.8 volume of beers to increase blood alcohol content by 1%
each unit of beer consumed increases blood alcohol by an average of 1.8% each unit of beer consumed increases blood alcohol by exactly 0.018
Question 8:- A good visual tool to identify outlier is :
Pie chart
Boxplot
Line chart
Treemap
Question 9:- You are interested to find the association between diet supplement and bloating caused by it among 150 elderly patients. There were three types of diet supplements given to the respondents - high fibre, low fibre and no fibre. The respondents were asked about the bloating they experienced and they reported four levels of bloating high, low, moderate and none. Which hypothesis test will you use for the above situation to check if fibre diet causes bloating?
ANOVA
Homogeneity of Variance
Chi Square Test of independence Paired or dependent t test
Question 10:- If there is a very strong correlation between two variables then the correlation coefficient must be:
any value larger than 1
much smaller than 0, if the correlation is negative
much larger than 0, regardless of whether the correlation is negative or positive
None of these alternatives is correct
Question 11:- A bank manager wants to see if the average quarterly balance (in rupees) of accounts, where the primary account holder is male vs where the primary account holder is female, is equal. Which hypothesis test will he use? Is there a pre-condition for the same?
Chi Square test and no pre-condition check is applicable.
Paired t test and no pre-condition needs to be checked.
2 independent sample t test and Normality of distribution needs to be checked for account balance for male and female separately.
1 sample t test and Normality of distribution needs to be checked for account balance.
Question 12:- In R, the function to check Normality of distribution are : I. ad.test() II. t.test() III. shapiro.test()
All three
I and II
I and III II and III
Question 13:- A company markets 4 groups of pesticide brands. It wants to compare the effectiveness (milligram per ml of blood) of these 4 pesticides. You have performed ANOVA and found that the effectiveness across 4 groups are not equal. In order to find out which pair is unequal, which test/method can you use?
Perform ANOVA again
1 sample t test
Tukey post hoc analysis
Normality test
Question 14:- In a linear regression model, when there are multiple attributes or x variables, which value do we use to check model performance?
Adjusted R squared value
R squared value
Value of Coefficients of the x variables
Residual standard error
Question 15:- Which of the following methods do we use to find the best fit line for data in Linear Regression?
Ordinary Least Square
Maximum Likelihood Estimator
Mean-squared Error
None of the above
Question 16:- What is the function used for logistic regression in R?
lm() glm()
gbm()
rm()
Question 17:- Which of the following evaluation metrics can not be applied in case of logistic regression output to compare with target?
Accuracy
Sensitivity
Specificity
Mean-squared error
Question 18:- Which of the statements is true?
Linear regression is not a Machine Learning algorithm.
Binary logistic regression is a tree based algorithm.
Binary logistic regression is an unsupervised classification method
Linear regression and binary logistic regression both are supervised machine learning methods.
Question 19:- A research team has collected some data on adults - no. of cigarettes smoked per day (X1), exercising or not daily (X2), Age (X3) and whether they have lung cancer or not. They would like to use this data to understand what lifestyle is more likely to get lung cancer. Which model can they use?
Linear regression
Cluster Analysis
Binary Logistic regression
PCA
Question 20:- Which method gives you odds ratio?
ANOVA
Chi Squared test of independence
1 sample t test
Binary Logistic regression
Case Study
The Health department of a city corporation wants to assess factors related to the likelihood that a hospital patients acquires an infection while hospitalized. The variables here are y = infection risk, x1 = average length of patient stay (days), x2 = patient age (years), x3 = no. of beds in the hospital.
The coefficients are as follows :
Term Coeff Pvalue
Constant 1.0 0.49
Stay 0.41 0.00
Age -0.023 0.33
Beds 0.025 0.001
Model Summary
R sq. 70.14% R-sq(adj) 66.41%
Question 21:- Based on the output above, which are the significant factors/variables which are affecting infection risk while a patient is hospitalised?
Stay, Age and No. Of beds in hospital
Stay and No. Of beds in hospital
Age
Stay
Question 22:- What does the R-sq adjusted value signify here?
The model explains 66.41% of the observed variation in infection risk
The model explains 33.59% of the observed variation in infection risk
The model explains 70% of the observed variation in stay
The model explains 66.41% of variation in stay
Question 23:- Based on the above output, which statement about age is correct?
Age is not a significant factor here.
Age affects infection risk negatively.
For one year increase in age, the average infection risk goes down by -0.23 unit None of the above
Question 24:- What are the tests for residuals?
Residuals are not normal
Normality of residual can be tested and also plotted.
Check for Specificity and accuracy of residuals
None of the above
Question 25:- If there was another factor as X-ray done during the hospital stay (values - done, not done), then can it be accommodated in linear regression model? a) Perform logistic regression b) Yes. Create a dummy variable for X-ray done then perform linear regression c) Yes. Do regression with all factors including X-ray d) None of the above |
hihi
Want to order fresh copy of the Sample Template Answers? online or do you need the old solutions for Sample Template, contact our customer support or talk to us to get the answers of it.