.

Analysing Bike Sharing System Solution

We have conducted the normality test using Q-Q plot and Kolmogorov-Smirnov test and observed that the variables dteday, season_code, yr, month_code, weekday_code, workingday, temp, atemp, hum, and windspeed are not normally distributed (where season_code, month_code and weekday_code are the derived from character to numeric so that they can be used in regression analysis).

Ques 1)

From the output below we can see that there is a strong correlation between yr vs dteday, month_code vs season_code, and aetemp vs temp. Hence, we can say that there is a problem of multi-collinearity.

statistics-assignment-solution-16-img1

Since there was a problem of multi-collinearity we have included only seven explanatory variables i.e. season_code, yr, weekday_code, workingday, atemp, hum, and windspeed. From the below table, we can see that the best model is when we include all seven of the explanatory variables in the model (i.e. season_code, yr, weekday_code, workingday, atemp, hum, and windspeed). Since we have used stepwise selection criteria therefore, variables will be included/removed from the model at each step of this selection process until we get the best model with highest R-square value. This is the best model because the value of R-square is highest which tells approximately 78.1% (as the value of the R2=0.7801) of the variability in the model is explained by the model and since it is close to 1 we can say that this model is a good fit and almost all the explanatory variable are putting significant impact in the model as the p-value is less than the level of the significance i.e. 0.05, except workingday variable.

statistics-assignment-solution-16-img2

Ques 2

statistics-assignment-solution-16-img3

From the above table we can see that the model is significant as the p-value for ANOVA table is less than the level of the significance i.e. 0.05. Also, we can see that the p-values for casual (p<0.0001), season (p<0.0001), weekday (p<0.0001) and interaction i.e. season*weekday (p<0.0001) are putting significant impact on the count of registered users as all the p-values are less than the level of the significance i.e. 0.05. Same can be assessed by the following graph.

statistics-assignment-solution-16-img4

Ques 3)

statistics-assignment-solution-16-img5

From the above table we can see that there is no statistical significant relationship between the day of the week and the type of bike rental, casual vs registered, as the p-value for the Spearman’s correlation between weekday and casual vs registered is not less than the level of the significant i.e. 0.05.

Ques 4)

Since, our objective was to assess if there are factors which are putting the impact on the Total count of bike rentals (casual and registered) and those factors were instant, dteday, season, yr, month, weekday, workingday, temp, atemp, hum, windspeed, casual, registered. To serve this purpose we have used multiple linear regression model, however when we tested the assumptions the assumption of normality failed based on the normality test using Q-Q plot and Kolmogorov-Smirnov test. Also, there was a strong correlation between yr vs dteday, month_code vs season_code, and aetemp vs temp, which caused the problem of multi-collinearity. Therefore, we had dropped dteday, month_code and temp from our model and then performed stepwise regression to get the best model and conclude that the best model is when we include all seven of the explanatory variables in the model (i.e. season_code, yr, weekday_code, workingday, atemp, hum, and windspeed) because the value of R-square is highest which tells approximately 78.1% (as the value of the R2=0.7801) of the variability in the model is explained by the model and since it is close to 1 we can say that this model is a good fit and almost all the explanatory variable had put a significant impact in the model except workingday variable.

Then we performed ANCOVA analysis by including casual, season, weekday and interaction i.e. season*weekday and observed that the model is significant and observed that casual (p<0.0001), season (p<0.0001), weekday (p<0.0001) and interaction i.e. season*weekday (p<0.0001) were putting significant impact on the count of registered users. Finally, we have seen that there was no statistical significant relationship between the day of the week and the type of bike rental, casual vs registered.

.