1Z0-931 Autonomous Database

Excel Data Analysis Capstone Project

Overview

This project requires that you use the tools learned throughout this portion of the course to create a model for a real-world situation – creating a model to predict the success of NBA teams. 

Due to the COVID Pandemic, a tragic event occurred – the NBA season was interrupted! This season was shortened from its regular 82 games; however, we really need a full set of statistics. The only solution the world has is to use your advanced analysis skills to predict what those win totals should be. These win predictions will be written into the history books, so we need to be extremely accurate.

Note: You do not need to have an advanced, or even a basic, knowledge of basketball or the NBA. This entire project can be completed without learning the details of what all these statistics mean in relation to an actual game – just like some of the examples contained inputs that we didn’t fully understand (Wine Quality, Kuwait Icebergs). DO NOT spend your time trying to learn about basketball, nor trying to apply basketball knowledge you may have to the assignment. It will not help. Use the techniques we’ve discussed, demonstrated, and put to use in this course. 

Part 1: Collect and Prepare Data

For this model you will need to prepare two sets of data. The easiest way is to have each as its own worksheet in the same Excel workbook:

  • Source Data – 5 seasons of data beginning with the 2018-2019 season and going back to the 2014-2015 season. This will have all statistics from the table (including wins and playoff status), but with the win proxy stats removed.
    • Note: This model must not contain any win-like statistics such as losses, winning percentage, Pythagorean wins (PW)/losses (PL), margin of victory, SRS, etc. When you’ve created the sheet, you must remove the Win-proxy stats, some of those are L, PW, PL, MOV, and SRS. There may be others if you’ve added optional data to your model. Ensure you double check this as inclusion of any win/loss statistics will ruin the accuracy of your model.
  • The source for your data is Basketball Reference (https://www.basketballcom/leagues/NBA_2019.html). The exact data used to build your model is something that you must determine – the starting point is to download the Miscellaneous Stats table into Excel:
  • Subject Data – The most recent season (2019-2020). This will have the same set of statistics as the source data, with the exception that the wins and playoff columns should be blank – this is what we are predicting. The classification will use the data to predict the playoff status and the prediction will predict the number of wins.
    • Your goals are to use the past set of statistics to build 2 models that will predict the number of wins and the playoff status of teams for the 2019 – 2020 season.

Creating a Predictive Model

Once your data is prepared you can begin creating your predictive model. Any and all of the tools we looked at in the course are available to you. You may find that as you proceed in building your model that data needs to be added or removed from your initial worksheet. 

As you are going through this process you must take note of what method you are using, what changes you make to the model data, and why you are making those decisions. You will need to present both your model and the reasoning of why you built it as you did, and why it is superior to the alternatives that proved to be less accurate. The process of developing your model is the most important part of this process, so ensure you are making logical improvements and documenting the reasoning and impact.  

Note: Use the source data to build a predictive model targeted at predicting number of wins, then use that model to predict the number of wins on the subject data.

Creating a Classification Model

In this step you must create a classification model that uses your source data to predict if teams will be in the playoffs or not. Follow the same process as the prediction model, using classification tools to split the teams into the two groups.

Note: Use the source data to build a classification model targeted on the playoff status, then use that model to predict the playoff status on the subject data.

Comparing the Results

Use the template to insert a copy of your final predictions – the predictions on the number of wins and the playoff status. You must insert the data into the template exactly as shown – this will be used to calculate your accuracy. The teams must be sorted alphabetically, then insert the wins and playoff status in their respective columns.  

Grading

Deliverables

Note: If you’re unsure or unclear on what you’re being asked to complete, please ask early. There will be substantial time devoted to this project and it is important that you progress down the correct path to succeed. 

The deliverables for this project are:

  • A paper that contains the results of your modeling exercise, the explanation of what you did, and the reasoning behind why your chosen model is the best.

o Please include screenshots of your Orange workflow(s)*

  • Your Excel Workbook containing your spreadsheets
  • Your results pasted into the accuracy template.

Rubric

Item

Weighting

Notes

Spreadsheet Construction

  5%

 

Predictive Model

10%

 

Classification Model

10%

 

Report

50%

 

Explanation of Process

15%

 

Justification of Model Choice

15%

 

Presentation of Results

15%

 

Possible Improvements

5%

 

Accuracy of Model

10%

Scaled mark based on relative accuracy. 

Quality of Presentation

15%

Subjective judgement on how well your methods are presented.

* Note:

  • 1 screenshot of your Orange Prediction workflow
  • 1 screenshot of your Orange Classification workflow
Want latest solution of this assignment