INFO703 – Big Data and Analytics - Discovering Data

In this lab you will learn how to use IBM Watson Analytics for handling data for prediction.

Discovering Data

Now that we've loaded and refined our data set, it's time to shift into the next mode of analysis, discover, and this is where things get exciting. It's where we start to find patterns, reveal insights, and develop visualizations to help us craft our story. Let's go ahead and click the Watson Analytics logo to return to our home screen, where you will now see the new refined airline satisfaction data set. We will kick it off by simply clicking on the data set to reveal what Watson calls cognitive starting points.

In other words, these are trends and relationships that have been detected during the upload process, and ones that are good friend Watson is prompting us to explore. We have a number of potential paths to choose here, plus options to create our own visualizations from scratch, if we would rather go rogue. Let's start with the first starting point, what are the values of price sensitivity by origin state.

Clicking that starting point will automatically create an appropriate visualization, and place it in the first tab of our new discovery set, which is basically a collection of visualizations that we can later use as elements to assemble a dashboard.

We've got some options here, all of which are specific to this particular visualization. On the top left, we can change the visualization type, or formatting options, or we can make adjustments using the data tray along the bottom of the screen. For example, if we wanted to see price sensitivity by destinations, instead of origins, all we need to do is drag our destination state city hierarchy from the data tray, and swap out the origin state city.

Now, because we're looking at hierarchy field, it means we can select a particular state, like Iowa for instance, expand the options by clicking the ellipses, and go down, or drill down into the city level.

Now, we're seeing price sensitivity by city within the state of Iowa.

To go back, you can either right-click the city labels, and choose go up, or you can simply use the undo button at the top of the screen.

It's important to note that we also still have access to data shaping tools, like calculations, groups, and hierarchies via the data tray. For instance, if we click one of the fields, click the ellipses, you will see we have the same options that we saw in our refine interface. The only difference is that now the modifications that we make will remain local to this particular discovery set. This visualization is almost what you are looking for, but you would also like to get a sense of volume by state, in terms of the number of survey respondents.

To do this, simply scroll to find the rows column and drag it into the size tray in the lower left. You have a custom geospatial map that now shows price sensitivity and response volume by state. Doubleclick tab to give it a custom name. Call it Price Sensitivity by State. You have just created your first discovery.

Using natural language

One question that we'd like explore is where IBM Airlines falls in comparison to other airlines in terms of customer satisfaction and a great way to investigate something like this is to take advantage of Watson's natural language processing. So, let's add a new tab and simply type what we're looking for. In this case, type “Compare avg satisfaction by airline name”. And when you hit enter, Watson is essentially identifying keywords within your query in order to recommend a number of different discoveries sorted by relevance.

So, for example, the word compare allows Watson to focus in on charts that help show comparisons, like bar charts, packed bubbles, word clouds, or tree maps. Had you used words like trend or over time, you would likely see a completely different set of recommendations consisting of more line and area charts. Also, the word average, or in this case, abbreviated to AVG, is also meaningful, especially in cases where a variable could be aggregated in multiple ways.

Finally, the column names themselves, satisfaction and airline name, as well as the by keyword, help to communicate the fact that you are looking to make a comparison across airlines, where your measure is average satisfaction. Watson's pretty slick when it comes to interpreting these natural language queries, so even if you misspell a column, like nme instead of name, you should still see the same relevant results. Now, there is a bit of a learning curve when it comes to asking clear and meaningful questions, but thankfully, there's this handy how to ask a question interface.

Here, you can select different categories of questions, like compare, aggregate, or predict to get a sense of the syntax that you need to use, or you can just populate these dropdown boxes to create and ask new questions from scratch.

So, let's go ahead and check out of here and select the first recommendation, the bar chart showing satisfaction by airline name. This automatically creates a second tap, or discovery, within our discovery set.

Now, make some modifications to this chart and the first will be to right click the airline name access label and sort my airlines descending by average satisfaction value. Now, it becomes clear that IBM Airlines falls roughly in the middle of the pack at number nine out of the 14 total airlines.

I can also right click on my y-axis label which gives me options to change the summarization modes. For instance, from average to sum, minimum, maximum, count, et cetera, or I can edit the airline name field itself to set a particular condition.

In this case, let's add a top bottom condition to show only the top 10 airlines by satisfaction.

Now, you may notice that every time you make a change or update a visualization, thE discovery pane on the right side of the page will update dynamically with new content, this is simply a way for Watson Analytics to help guide you towards additional related discoveries and help trigger some new ideas that you might not have considered. So, let's go ahead and rename this tab Satisfaction by airline.

Custom visualization

Up to this point, we've been mostly sticking with the discoveries and visualizations that Watson Analytics is recommending for us, but we don't have to. For example, let's say we click on one of the dynamic discoveries next to the satisfaction chart. It really could be any of them. In this case, go with the top airline by price sensitivity bubble chart, and let's say we click on that and then realize that this isn't really at all what we're looking for. Well, we always have the option to completely customize our visualizations or even build new ones from scratch.

For instance, we can go up in the top left, drill into our visualization options, and here we have some recommended options plus a number of other visualizations. In this case, go to a completely different route, and choose a combination chart. Now hide the visualization pane, and at this point, you can just use the data tray to drag and drop new fields into the trays beneath the chart to customize your visualization. we arecurious to see how satisfaction ratings differ by airline status, so find airline status and drag it into the x-axis or column tray, and then you can find satisfaction and drag satisfaction into the line position tray. And, finally, instead of showing the length of the columns based on price sensitivity, swap in rows to get a gauge of how many survey respondents fall into each airline status category.

Now we're starting to reveal some really interesting insights. Blue travellers, who make up the default lowest classNameaccount for the largest volume and the lowest average satisfaction ratings, which we might expect. What's surprising, however, is that platinum members, who have the highest airline status, aren't even as satisfied as gold or silver members.

So let's continue to dig. Maybe this has something to do with IBM Airlines in particular. To test whether or not that's the case, we can drill into the airline name and select only IBM Airlines. Since the visualization remains largely unchanged, it seems as though this is an industry-wide issue.

Okay, so what about something like gender? If we add gender to the multiplier tray, and we can search for it here, it will essentially duplicate our chart for each category, in this case male and female.

So now some differences are really beginning to emerge. While the distribution of volume remains similar across genders, we can now see that, on average, male platinum members are less satisfied than female platinum travelers, relative to silver and gold. So, let's keep this visualization as is, and name our tab “Sat. by status and gender”.

Identifying key drivers

So far, we've been focusing primarily on descriptive analytics, which is all about explaining the what. Now it's time to address the why and to do that, we'll need to understand the key drivers of satisfaction. So, create a new tab and select the starting point labelled “What drives Satisfaction”. If you don't see this particular starting point, keep in mind that you can always just type, What drives Satisfaction, or something similar in the question line above.

So when you click this discovery, it will take some time to process. And what's happening right now, is that Watson Analytics is actually building a linear regression model behind the scenes, to quantify the impact that each field and each potential combination of fields, has on our dependent variable or target, which in this case, is satisfaction. So once the model finishes calculating, which may take several minutes, you'll see a spiral chart laying out each factor, organized by strength, starting from the centre of our spiral.

Strength essentially captures how well each driver can explain the variance in our target. For example, “Type of Travel” can explain roughly 34% of the variance in Satisfaction, while factors listed lower in this list have less predictive power. Now you can click View More to expand the full list and hover over each factor to see where it falls in our spiral chart or vice versa.

One really great feature of Watson Analytics is that once you identify these key drivers, you can drill into any of them in particular, to create additional discoveries. So, name this tab, “Satisfaction Drivers”.

Then, drill into one of the factors to keep exploring. In this case, we'd like to see how “Type of Travel” is related to Satisfaction, since you now know from the model, it is relatively strong predictors at 34%.

If you click the plus sign to the right of the factor, it will create a new discovery tab, pre-populated with the visualization that Watson deems most fitting to represent this relationship.

Now, click on the Visualization and choose Heatmap. Then, add Age Range to Columns. In this case, you have a heat map with type of travel on the y-axis and age range into buckets on the x-axis. The shading in the heat map reflects average satisfaction, where darker shades indicate higher values and lighter shades indicate lower values.

So, you can see that personal travellers, in the bottom row, are generally less satisfied, as are business travellers who are 62 and above. To dig a bit deeper, let's filter only on IBM Airlines. So go down to the data tray, find Airline Name, and select IBM Airlines, and now explore Gender again. So let's go into our Multiplier tray, search for Gender, and split this chart out.

Now it becomes clear, that regardless of gender, personal travellers in the bottom row and older business travellers in the top-right cells, tend to drive the lowest satisfaction ratings. So I'm happy with this discovery, I can go ahead and name this tab Sat for Satisfaction, by Travel Type, Age, Gender.

Predictive models

Now that we know our key drivers of satisfaction, we can take things a step even further with predictive models, which generate decision rules and decision trees. These will allow us to understand the specific customer profiles associated with particularly high or low levels of satisfaction. So, let's create a new tab, and type something like “I want to know more about satisfaction”. You really don't have to write like a robot here.

Watson Analytics does a really good job extracting meaning from your query and recommending discoveries, even if you use colloquial or filler words. So, by choosing what is a predictive model for satisfaction, the second starting point here, we can generate the decision rules and decision trees that will help us learn more about our most meaningful customer profiles. So, starting with decision rules, we see a list of profiles here, sorted in descending order by the predicted value of satisfaction, along with the number of records, or respondents, that fall into each profile.

We can also see the consolidated confidence level of the model as a whole, which in this case is 48%.

To put that in perspective, a confidence of 100% would basically indicate that the profiles below could perfectly predict the values of satisfaction, which isn't very realistic. Now, what this tells us, for example, is that male, gold status, business travellers, who are either under 50 years old, we can statistically derive a predicted satisfaction of 4.36, which is second only to 40 to 49-year-old males with two or more loyalty cards.

So, these are important customer profiles, as they represent the most satisfied travellers. Now, by similar logic, we can instead sort by ascending satisfaction to see the customer profiles associated with the lowest satisfaction ratings. Unsurprisingly, we see that age is a condition present in most of these profiles, and that the top two profiles associated with the lowest satisfaction ratings overall include travellers above the age of 62, which validates the key driver analysis that we looked at earlier.

So, now we know that older travellers, especially those who experience a flight delay, tend to be among the least satisfied customer profiles. Now, the last thing to note about decision rules is that the results will look slightly different depending on the type of variable that you're using as a target. In this case, satisfaction is a continuous variable, meaning that it can take any value between one and five. That said, had we chosen to instead predict a binary or categorical target, Watson Analytics would use a logistic regression model, rather than linear, and return the probability of each resulting value.

This is the type of approach you might use to model something like the probability of employee attrition or whether or not the stock price will rise or fall.

Decision Trees

Next, let's switch our output to a decision tree, which will essentially convey the same information as our decision rules, but in a more visually appealing format. So we can close the discoveries pane, and zoom out a bit to see what's going on here. This looks a little bit confusing at first, but we can interpret this by reading from left to right. So starting on the left, which you can think of as the trunk of our tree, each point at which our tree splits, represents a factor.

And these factors are organized from left-to-right based on predictive strength. So in this case, type of travel is our first and most predictive factor. And from there, among mileage ticket travellers, airline status is the next most predictive factor. Followed by arrival delay greater than five minutes, and so on and so forth. What this creates is a tree, where each branch represents a unique pattern or profile, that leads to a predicted satisfaction rating.

And again, note that these darker shades represent higher satisfaction levels, and the lighter shades represent lower levels. And what's nice is that to make this a bit more readable, we can actually collapse branches to only focus on particular paths. In this case, let's focus on branches that yield the lowest satisfaction ratings. So that we have profiles that begin to emerge. So you can simply. collapse all of the paths that yield high satisfaction ratings, and now we're essentially narrowing down our tree, to visualize just the profile that leads to the lowest predicted satisfaction.

And this should look familiar based on our decision rules. we've got personal travelers, who are blue or platinum status, experiencing a delay greater than five minutes, who are over the age of 62. So now that you've structured this tree in a format that you’re comfortable with, you'll name this tab satisfaction profiles. And at this point you think we have a solid collection of insights here. So let's go ahead and save this entire discovery set “Airline Satisfaction Discoveries”.

Displaying results

By now we've laid all the groundwork, we've loaded and refined our data and built out a number of discoveries to better understand satisfaction drivers, trends, and customer profiles. it's time to move on to the third and final phase of the process, displaying our findings. So let's click the Watson Analytics logo to return to the home screen, where you'll see the new Airline Satisfaction Discoveries set that we just created within the Discover tab. Now we can go ahead and switch over to the Display tab and click New display to get started.

At this point, we can give our display a name, let's call it “Airline Satisfaction Dashboard” and choose a type which includes Dashboards, Infographics, and Expert Storybooks. Infographics are essentially one page, vertically-oriented dashboards and since we'll be making use of multiple tabs, let's go ahead and choose the Dashboard option. Last but not least, after clicking Create, we can select from a number of preset templates or use free form mode. For this Dashboard, let's go with four equal sections.

Okay, now that we've got this blank slate in front of us, let's take a minute to explore the interface. On the left side of our screen we have several expandable panes, Discoveries, Widgets, Format, and Filters. Discoveries is where we can easily access all of the visualizations that we've built allowing us to quickly drag and drop them right into our Dashboard. From here, we can also create brand new discoveries without ever leaving the Display interface.

Keep in mind that we aren't limited to a single discovery set here. We can populate this Dashboard with visualization and discoveries from across multiple sources and data sets. Next up, Widgets are essentially just objects or things that you can add to your Dashboard, including text boxes, pictures, videos, links to webpages, and shapes. Format is where you can customize the look and feel of your Dashboard by changing elements like borders, backgrounds, or themes. Keep in mind that the options within this pane will update dynamically based on the content of the Dashboard and what you're currently selecting.

For example, if you've selected multiple items or visualizations, the Format pane will only show options common to all of them. Last but not least, the Filters pane is where we can access all of our columns and use them to filter our Dashboard in a number of different ways. To demonstrate how this works, we'll need to populate our Dashboard with something. So let's go ahead and open up the Discoveries pane. Navigate to our Airline Satisfaction Discoveries and drag our Price Sensitivity by State map into the top left corner.

All we need to do is click, drag, and release in the center of the target section, where it will turn blue and snap right into place. Now when it comes to filters, there are three different levels of filtering. If we return to our Filters pane, we can drag a field, let's use Year, into the All tabs drop zone in the top left to create what's called a global filter, which will effect the visualizations throughout the entire Dashboard. The second option is to create a filter specific to the tab we're looking at, which can be done by simply dragging a field, like Month, into the drop zone labeled This tab.

Or by dragging it directly into the Dashboard itself. Finally, the third option is to create a visualization-specific filter. To do this, we can select our map, click the upper right corner to expand, and add filters using the field names or data tray, just like we have in the past. So let's close out this Visualization view and remove the filters we've just added. To delete filters, simply click the ellipses next to each filter name and choose Delete. Or select any filters within the Dashboard itself and just press the delete key to remove. So now that we have a handle on all of our options, it's time to start building.

Assembling a multi-tabbed dashboard

At this point in the course, you have the option to follow along with the following steps, or you can take things in a completely different direction, it's totally up to you. Feel free to make your own creative choices, or customize the dashboard however you see fit. If you are following along, let's start by diving into our discoveries pane, dragging satisfaction by airline into the center of the top right quadrant, and then adjusting some of the formatting settings. To make this fit a little better, let's uncheck the show title, and show filtered details options, and also drill into the show drop down to remove the axis titles.

Now, keep in mind that, even though we're using a template, we arenot constrained to these four boxes. In fact, you can simply drag the edges to improve the fit. You can make some similar adjustments to the price sensitivity map to optimize the space that you have to work with, so you will select a map, and uncheck title and filter details. Next, let's go back into the Discoveries pane, and grab Satisfaction by travel type, age, and gender. Drop it into the lower left corner, and apply the same formatting updates.

We will have the title, filter details, and legend, to give it a better fit. we're still way too cramped at this point, and we can't even see the split between males and females. Even though we originally chose a template with four squares, we can simply drag the right edge of this visualization, and extend it all the way to the right, where it will snap right into place. Now, let's add a few filters. If we go to the filters pane, we can enable classNameas a global filter, and Airline status as a local filter, specific to this particular tab.

As you change these values, take a look at how all of the visuals update accordingly. Now, when you need to clear those filters, simply click the ellipses, and select clear filter.

This view gives you a pretty good feeling. let's go ahead and name the tab Key Insights, by doubleclicking the tab name. If we click the plus sign, we can create a second tab with the same four box template. When we do this, note that our universal filter for classNamehas followed us, since it will be applied to all tabs in the dashboard, not just the one that we originally added it to.

In this new tab, we'd like to feature the key drivers, and predictive model outputs that we've built. we'll dive into the discoveries pane, and drag satisfaction drivers to the top left, and satisfaction profiles to the top right. Now, obviously some formatting adjustments need to be made here, so let's command or control click to select both visualizations, open up our formatting pane, and remove the title and filter options from both at the same time.

Finally, let's drag the bottom edge down to the bottom to make better use of the space, and we will hide any open panes, so that we can see our entire view. Now, what's nice about these visualizations is that they are completely dynamic. We can toggle between the spiral and the drivers list, switch from the decision rules to a decision tree, and change the profile sorting to view top versus bottom satisfaction ratings. This tab looks good to me, so let's go ahead and name it Drivers and Profiles.

Creating new content within a display

Okay, so let's say we just crossed paths with our CMO, and he mentioned that he really wants to see a line chart showing satisfaction by age and status among female travelers only. Even though that's a really specific request, and we don't have a visualization in our discovery set that quite addresses his question, we can easily whip one up right here in the display interface. So let's create a third tab with a single box and name it Age & Status.

Now, let's open our Discoveries pane and click New discovery from the top. From here, we can drill into our refined airline data set, type a question like “Satisfaction across age ranges”, and select the first staring point, how do the values of satisfaction compare by age range. This will import the visualization directly into our dashboard where we can edit and customize it any way we see fit.

So to create what the CMO is looking for, let's open up our visualization options, convert these columns to a line, and drag Airline Status from the data tray into the color drop zone to break out satisfaction levels among blue, silver, gold, and platinum travelers. We can also add a visualization specific filter here by navigating to Gender in the data tray and limiting our view to female travelers only.

Also, it turns out that our CMO hates jagged lines, and really loves triangles, so let's make sure to open up our formatting tools and customize our line and symbol options from within the Variations menu. So we'll select smooth lines and triangle vertex symbols.

Now all we need to do is click on , then the custom visualization has been added to the dashboard. As a final tweak, let's go into our format options, hide the filter details, and switch our title from the default smart title to a custom title.

Now we can simply double-click the title on the chart to make it anything we'd like. So let's call it “Satisfaction by Age & Status - Female Travelers”. So let's close out our format pane, and there you have it. Our dashboard is ready to roll.

Sharing content and setting permissions

Now that our dashboard is complete, let's save it by selecting save as from the top menu, and we can keep the default name here, of Airline Satisfaction Dashboard, and just make sure it gets added to our personal folder. Now, let's return to the homepage, where we can see all of the assets that we've created over the course of this project. In the data section, we have our original CSV file, as well as the refined data set, in the discovery section, we have our full discovery set, with six individual tabs or visualizations, and in the display section, we have the dashboard that we just built.

When we open up our dashboard, we now have the option to toggle between view mode and edit mode by clicking on the pencil in the top menu. When you see the eyeglass menu, it means that you are in edit mode, and when you click those eyeglasses to reveal the pencil, you are now in view mode. In view mode, we no longer have access to the discoveries, widgets, filters, or formatting tools, and don't have the option to expand or modify individual visualizations. That said, we do still have the ability to adjust filters, and interact with the visualizations in the dashboard as usual.

This is essentially a preview of what other users would be able to see. Now, one of the great features of Watson Analytics is the ability to share content in a number of different ways. We all know how frustrating it is to try to send massive Excel spreadsheets around the office, or waste time building slides, and capturing screenshots. With Watson Analytics, sharing content is intuitive, and absolutely seamless. All we need to do is click the share button, and determine what format we would like to use.

In this case, let's email our CMO directly, and share what we found.

Send this to reza.rafeh@wintec.ac.nz, give it a subject line, “Satisfaction Dashboard”, and write a quick message. “Hi Reza, Check out my dashboard.” You can include a link using the button in the left bottom corner.

Here, you have a few other options. You can click on the images to determine if you want to include all of the tabs in your dashboard, or just specific tabs. Finally, you can choose the format that you want this dashboard to be attached as, so image, PDF, or PowerPoint. The other options are to simply download the dashboard as images, PDFs, or PowerPoint, Tweet the content directly, or generate a shareable link that we can pass along to other Watson Analytics users any way that we see fit.

Not the Exact Question you were looking for? Post your question for instant answers.