Data Mining For Business Intelligence

CHAPTER 4 – DATA MINING FOR BUSINESS INTELLIGENCE

Objectives: After completing this chapter, you should be able to:

  1. Define data mining as an enabling technology for BI
  2. Understand the objectives and benefits of business analytics and data mining
  3. Recognize the wide range of applications of data mining
  4. Learn the standardized data mining processes
  5. Learn different methods and algorithms of data mining
  6. Build awareness of existing data mining software tools
  1. Opening Vignette: Data Mining Goes to Hollywood! Read pages 132-135.

Answer questions 1-7 on page 134.

Summarize the answers

  1. Why should Hollywood decision makers use data mining?

Data mining supports and improves predictability when enough is known about the situation to identify the predictors (independent variables) and to build a model. Data mining can improve the accuracy of predicting box-office receipts, which are critical to their financial success. With data mining, decisions are based on data-driven forecasting models and a classification model rather than on hunches and wild guesses. Importantly, predictive models are effective in early stages of movie production before huge investments have been made. Of course, minimizing investments in flops improves profitability.

  1. What are the top challenges for Hollywood managers? Can you think of other industry segments that face similar problems?

Hollywood managers have to allocate their scarce resources (budget, actors, facilities, directors, etc.) to get the highest returns on their investments. Movies are capital investments for Hollywood. They invest in movies for the same reason that other types of companies (manufacturers, retailers, service sector, entertainment, financial) make investments--to maximize return on investment (ROI). The top challenges facing all of these industry sectors are to identify which investments and which combination of investments will maximize ROI at a particular time; and which variables (predictors) to consider in evaluating alternatives.

  1. Do you think the researchers used all of the relevant data to build prediction models?

Students’ opinions will differ depending on their view of what is meant by “relevant.” Most students should recognize that all or almost all relevant data could be collected and analyzed given the capabilities of data mining tools, but that such a data collection effort could be too time-consuming and expensive given that all relevant data is not necessary to develop a reliable model. For students who are very literal, they might legitimately claim that it is not possible to know whether all relevant data had been used. It is important that students support their opinion by explaining the basis for it.

  1. Why do you think the researchers chose to convert a regression problem into a classification problem?

Students may have trouble with this question until after they have read the chapter. However, they read in the case that rather than forecasting the point estimate of box-office receipts, the researchers classified a movie based on its box-office receipts in one of nine categories. Those categories ranged from “flop” to “blockbuster,” because greater precision might not improve the outcome or might not be possible. The use of categories not only simplified the decision situation, but also might be as precise as feasible. Regression provides a point-estimate which would have less reliability than broader classification categories. Regression also requires much more data to achieve reliability than a classification problem. Blockbuster movies (and flops) have unique factors that may not be captured in a regression model, but their common characteristics may be sufficient to predict their general degree of success.

  1. How do you think these prediction models can be used? Can you think of a good production system for such models?

The prediction models can be used to select the combination of independent variables, e.g., MPAA rating, competition, actors (star value), genre, special effects, sequel, number of screens, and current tastes.

Decision makers can use such models to evaluate tradeoffs and to determine how much to invest in the production. Since each variable impacts the cost of movie production (movies have budgets), these prediction models can be used to determine which tradeoffs to make to maximize success.

  1. Do you think the decision makers would easily adapt to such an information system?

Again, students’ answers will differ since both answers—yes or no—are feasible. Some students may think the decision makes would not easily adapt because movie production is more of an art than a science. Other students may think that given the increased competition in the entertainment industry, decision makers need to adapt to a more scientific method.

  1. What can be done to further improve the prediction models explained in this case?

Prediction models can be improved by updating the models as new information (movies) becomes available. Narrower models can be built depending on the type of movie. For example, each type of movie (drama, sci-fi, animation, etc.) may have its own prediction model. As researchers develop a better understanding of what leads to “movie success,” more specialized and sophisticated models can be built.

  1. Introduction – Data Mining Concepts and Definitions, section 4.1
  1. Tom Davenport (author of Competing on Analytics) argued that the latest strategic weapon for companies in ANALYTICAL DECISION MAKING.

Companies like Amazon.com, Capital One, Marriott, Oakland A’s have used analytics to gain that competitive edge by understanding their customers.

Because of the improvement in technology and decreased cost, data base sizes have grown exponentially and the tools are available to analyze these data.

The term data mining is relatively new, but has historical roots in traditional statistical analyses from the 1980s. It was originally described as the process through which previously unknown patterns in data were discovered.

  1. Factors behind the sudden popularity in Data Mining
  1. reduction in cost of data storage and processing and increased hardware capacity have provided the ability to collect and accumulate data
  1. with increased database capacities and the

availability of analysis tools, many companies

recognized that they have untapped data and the tools to analyze.

  1. consolidation in a data warehouse, data both at the customer level and from various sources, gives the ability to analyze from a more complete view
  1. Applications of Data Mining

Data mining is used to:

  1. identify successful therapies for illnesses and to discover new drugs
  2. reduce fraudulent behavior (insurance claims and credit card usage)
  3. identify customer buying patterns
  4. reclaim profitable customers
  5. aid in market-basket analysis
  6. better target customers/clients

Application Case 4.1 Business Analytics and Data Mining Help 1-800-Flowers Excel in Business, pages 136-137.

Problem: Needed to make decisions in real time to increase retention, reduce costs, and retain customers; and to respond to competition in e-commerce.

Solution: SAS data mining tools to discover novel patterns about its customers and turn that knowledge into business transactions.

Results:

1) More efficient marketing campaigns

2) Reduced mailings, increased response rates

3) Better customer experience

4) Increased repeat sales

  1. Definitions, Characteristics, and Benefits, page 137
  1. Data mining is used to describe knowledge discovery in databases.
  1. Data mining uses statistical, mathematical, and other techniques to extract and identify useful information and subsequent knowledge from large data bases.
  1. Data mining is also referred to as knowledge extraction, data archeology, data exploration, data dredging, and information harvesting.
  2. See the major characteristics and objectives of data mining on pages 138-139.
  3. a) Data are often buried deep in large databases which contain data from several years
  4. b) Environment is usually client/server or web based
  5. c) Sophisticated new tools
  6. d) Miner is often an end-user with little or no programming skill. Armed with data drills and power tools to ask ad-hoc queries and get answers quickly.
  7. e) Miners must be creative to interpret the results when they find unexpected results.
  8. f) Data mining tools are combined with spreadsheets and other software tools so the mined data can be analyzed and deployed quickly and easily.
  9. g) Massive data and search efforts may require parallel processing when data mining.

Application Case 4.2 Police Department Fights Crime with Data Mining, page 141

What can we learn from this vignette?

Law-enforcement agencies around the globe use data mining to fight terrorism and crime. Data mining techniques improve crime fighting by quickly and easily finding patterns and trends in unsolved criminal cases.

Note: this has raised many ethical, legal, privacy, and political issues. Better intelligence coordination among police departments while observing respect for civil liberties is an important issue.

  1. How Data Mining Works, page 141
  1. Data mining finds patterns and defines those patterns in terms of mathematical rules. Those rules can then be used for prediction or association in an attempt to aid in decision making.
  2. Because the data mining is driven by experience and experimentation, depending on the problem situation and the analyst's knowledge, the whole process can be time consuming and iterative, i.e., one should expect to go back and forth through the steps quite a few times.
  1. Data mining algorithms fall into FOUR broad

categories:

  • Associations find the commonly co-occurring groupings of things, such as beer and diapers going together in market-basket analysis.
    1. Link analysis
    2. Sequence analysis
  • Predictions tell the nature of future occurrences of certain events based on what has happened in the past, such as predicting the winner of the Super Bowl or forecasting the absolute temperature of a given day.
    1. Classification
    2. Regression
  • Clusters identify natural groupings of things based on their known characteristics, such as assigning customers in different segments based on their demographics and past purchase behaviors.
  • Sequential relationships discover time-ordered events, such as predicting than an existing banking customer who already has a checking account will open a savings account followed by an investment account within a year.
  1. Other data mining tools include:
  2. regression analysis
  3. visualization
  4. time-series forecasting
  1. Hypothesis-Driven Data Mining, page 145.
  2. Begins with a proposition by the user, who then seeks to validate the truthfulness of the proposition
  3. Example: A marketing manager may begin with the following proposition: “Are DVD player sales related to sales of television sets?”
  1. Discovery-Driven Data Mining
  2. Finds patterns, associations, and other relationships hidden within datasets.
  3. It can uncover facts that an organization had not previously known or even contemplated.
  4. Examples: Insurance fraud detection and market segmentation Table 2 is an example of insurance fraud, and Table 3 is an example of market segmentation.

http://businessintelligence.com/article/64

Application Case 4.3 Motor Vehicle Accidents and Driver Distractions, page 144. This case illustrates how cluster analysis was combined with other data mining techniques to identify the causes of accidents. It is an example of hypothesis-driven data mining.

Problem: Needed a way to study the correlation between motor vehicle accidents and driver distractions.

Solution: Three data mining techniques were used on crash information from Fatality Analysis Reporting System (FARS):

  1. Kohonen networks detected clusters and revealed patterns of input variables.
  2. Decision trees explored and classified the effect of each incident on successive events and suggested the relationship between inattentive drivers and physical/mental conditions.
  3. A neural network model was trained and tested to observe the effectiveness of the model.

Data mining software tools were used by SPSS Inc., an IBM company.

Benefits: Cluster analysis was combined with other data mining techniques to accurately verify the assumptions that the causes of certain accidents were due to driver distractions.

  1. Classification Analysis (page 143):
  1. Classification procedures are the most common of all data mining approaches.
  1. Classification involves identifying patterns of data as belonging to a certain category:
  1. Credit Approval – good or bad credit risk
  2. Store Location – good, moderate, bad
  3. Target Marketing – likely customer or no hope

(you receive a credit card application in the mail because the credit card company has targeted you as likely to accept their application for a credit card)

  1. Fraud Detection – yes/no
  2. Telecommunications – likely or not likely to turn to a another phone company
  3. Route or segmentation decisions – prioritize crashes as high, moderate, or no severity
  4. Any ADS examples covered in previous chapters
  1. Basic idea:
  1. Define the data of interest. For each observation in a data set you have values on an outcome/class variable (Y) of interest that represents various groups (Y=0, 1, 2, 3, …), and predictors (X1, X2, …, Xp for p-predictors).
  1. Using that data, develop a model/mathematical equation for predicting the outcome (Y) based upon the predictors. (The best model is selected that results in the highest predictive accuracy.)
  1. Use that model derived in part b to predict outcomes (Ys) for observations for which you don’t know their outcomes (Ys). In others words, if you have an observation where the outcome of interest (Y) is unknown, you can predict the Y value based upon the known predictors.

Example: Suppose I obtain ALL data from past ISDS 2000 students (assuming students randomly assign themselves to the class)

(Y) Grade: A, B, C, D, F (coded as 1,2,3,4,5)

(X1) High School GPA

(X2) Current Overall College GPA

(X3) ACT score

(X4) # of Hours Completed before taking ISDS 2000

(X5) # of Hours Worked per Week

I would build a model on that data. Then when a student registers for ISDS 2000 and asks me, on the first day of class, what grade do you think I will get in this class? I can use their values for X1 through X5 to predict into which group that student belongs (A, B, C, D, or F)

  1. Based upon the characteristics of the data, various mathematical techniques are used to develop models for classification. These techniques fall into the following categories (page 145):
  1. Decision Trees – Outcome (Y) is categorical

and predictors (Xs) are categorical or numeric

  1. Statistical analyses such as:

(1) Linear Discrimant Analysis (LDA):

Outcome (Y) is categorical and predictors (Xs) are numerical each having normal distributions and equal variances.

(Sometimes referred to as a 0-1 Linear

Regression)

(2) Logistic Regression Analysis (LOG):

Outcome (Y) is categorical and predictors

(Xs) are categorical or numeric

(Same data conditions as Decision Tree, so you can choose to use either Decision Tree or Logistic Regression)

  1. Neural Networks
  2. Bayesian Classifiers
  3. Genetic Algorithms
  4. Rough Set Approach

Application Case 4.4 A Mine on Terrorist Funding, page 148, is an example of a homeland security initiative where data mining is used to track funding of terrorists’ activities.

Problem: Needed a better way to detect crimes such as customs fraud, income tax evasion, money laundering, and terrorist funding.

Solution: Used data mining techniques to analyze financial transactions and detect criminal activity. One example was the analysis of data on import and export transactions because illegal international trade practices have been used to fund terrorists.

Benefits: More efficient evaluation of financial transaction data aided in fighting terrorism and increased the quality of intelligence information.

III. Data Mining Project Processes (how to conduct a data mining analysis?)

  1. Introduction – Most initiatives in business must follow a series of steps that help to standardize and validate the process. In particular, data miner practitioners have proposed several different approaches for managing and standardizing this process. These are:
  1. Some proposed models include:
  1. CRISP-DM – Cross-Industry Standard Process for Data Mining is one of the most popular nonproprietary standard methodologies for data mining. See Figure 4.5, page 149. The steps are:
  1. Business Understanding and Data Understanding: There must be much discussion to first understand the business environment and what business questions must be addressed in order to remain competitive. This goes hand-in-hand with determining what variables must be measured in order to quantify the process.

Example: If you want to predict whether or not alcohol is involved in a crash, managers/policy makers should communicate with emergency personnel in order to recognize what factors seem to be associated with alcohol involvement (ex: suppose after communicating, it seems that young adult males driving sports car crash into a tree at night on the weekend are associated with alcohol involvement …then you should collect data on age, gender, vehicle type, number of vehicles involved in crash, day of week, and time of day)…during that point, their needs to be discussion of what data is available as well.

  1. Data preparation – collect data, enter information into a data format to make available for use, edit, and save.

Example: A crash is reported in paper report form, then entered into a database, data from different public safety agencies are standardized, edited. (A manual is provided that educates public safety agencies on how data must be recorded in the report)

(Note: Steps a and b can take as much as 80 percent of the time allotted for the data mining

process. Because latter steps of CRISP-DM are built on the outcome of the former ones, the earlier steps need extra attention in order not to put the whole study in an incorrect path from the start.)

  1. Modeling – based upon the types of variables and the purpose of the analyses, a particular data mining procedure (classification, clustering, association, prediction) is selected for detecting patterns and relationships. That procedure is employed in order to find the best mathematical explanation of the patterns that exist.

(Note: Many times, when modeling, the researcher finds that data must be edited or additional variables should be included. This requires revisiting steps a and b.

  1. Evaluate the model to determine its effectiveness.

In classification, for example, you want to make sure that your model (set of predictors) is the best model for being able to predict group membership. You further want to make sure your model (based upon your sample) is a “fairly good” representation of the relationships that exist in the population.

  1. Deploy the model. Once you determine the best model for describing the business process, you must use that model for making business decisions.

Example: Suppose you are a Macy’s marketing analyst targeting customers that have a Macy’s credit card. You use data mining to come up with a model to help you determine to which customers you will mail a Macy’s catalog. Your model indicates that those card-holding customers most likely to go to your store to make a purchase are Females, ages 25-35, with a college degree, own a home, have visited a Macy’s store in the last 3 months, and have a Macy’s credit card balance between $300 and $600. You would get your marketing department to send catalogs to those customers that fit the profile (and would continue to do that on a continual basis, say monthly basis).

Keep in mind – you don’t want to spend money on the catalog and postage for those customers you don’t have a real chance at getting into your store.

  1. DMAIC – stands for Define, Measure, Analyze, Improve, and Control and is utilized in Six-Sigma based data mining processes. This process is ordinarily utilized in manufacturing, service delivery, management, and other business activities that rely on eliminating defects, waste, quality control problems.
  1. SEMMA – stands for Sample, Explore, Modify, Model, and Assess and was developed by the SAS Institute.

Application Case 4.5 Data Mining in Cancer Research, pages 155-156

What can we learn from this vignette?

  1. Regression Analysis Example to Illustrate the Data Mining Process
  1. Example: A sample of 34 chain stores are randomly

selected for a test-market study of OmniPower Bars.

Suppose you want to build a regression equation with the

goal to predict Number of Bars Sold (Y).

  1. Obtain your data

Store

Sales (Y)

Price (X1)

Promotion (X2)

1

4141

59

200

2

3842

59

200

3

3056

59

200

4

3519

59

200

5

4226

59

400

.

.

.

.

.

.

.

.

.

.

.

.

30

1882

99

400

31

2159

99

400

32

1602

99

400

33

3354

99

600

34

2927

99

600

  1. Select the appropriate model based upon criteria:

The question is: Does a model with X1, X2, or both X1 and X2 perform best when predicting Y? Could it be that none of the predictors are good?

Question: Is X1 a good predictor of Y?

Or: Is there a relationship between X1 and Y?

Model is appopriate because points are linear and have equal spread around the regression line, so you can test to see if X1 is a good predictor of Y.

H0: β1 = 0 (X1 is not a good predictor of Y)

H1: β1 ≠ 0 (X1 is a good predictor of Y)

At α = 0.05 level of significance, p < α, so reject Ho and conclude there is sufficient evidence that X1 is a ‘good’ predictor of Y. Goodness of Fit = Adjusted R2 = 0.526 (scale ranging from 0 – 1)

= 7512.348 – 56.714X1

b0 = 7512.348 means if X1=0 then the predicted number of bars

sold is approximately 7512.

b1=-56.714 for every increase in price (by 1 penny), the number of

bars sold will decrease by approximately 56.

Question: Is X2 a good predictor of Y?

Or: Is there a relationship between X2 and Y?

Model is appopriate because points are linear and have equal spread around the regression line, so you can test to see if X2 is a good predictor of Y.

H0: β2 = 0 (X2 is not a good predictor of Y)

H1: β2 ≠ 0 (X2 is a good predictor of Y)

At α = 0.05 level of significance, p < α, so reject Ho and conclude there is sufficient evidence that X2 is a ‘good’ predictor of Y. Goodness of Fit = Adjusted R2 = 0.264 (scale ranging from 0 – 1)

= 1496.016 + 4.128X2

b0 = 1496.016 means if X2=0 then the predicted number of bars

sold is approximately 1496.

b1= 4.128 means for every increase in amount spent on promotions

(by $1), the number of bars sold will increase by approx 4.

Question: Is the predictor set (X1 and X2) a good predictor of Y?

Or: Is there a relationship between (X1 and X2) and Y?

At α = 0.05 level of significance, p < α, for both X1 and X2, so there is sufficient evidence that BOTH X1 and X2 together are ‘good’ predictors of Y. Goodness of Fit = Adjusted R2 = 0.742 (scale ranging from 0 – 1)

= 5837.5208 - 53.2173X1 + 3.6131X2

So which model do you select? ANS: The model with the

largest adjusted R2 and make sure all predictors are significant.

Conclusion: The best mathematical model that describes the

number of Omni bars sold is one that uses price and amount

spent on promotions. (Patterns: As price increases, the

numbers sold decrease, as amount spent on promotions

increases, the numbers sold increase.)

  1. Evaluation:
  1. DEPLOYMENT: Use the mathematical model for

prediction:

Suppose you are opening up a new store location. You want to predicted the Sales (number of bars sold) if you’ve set your price at 79 cents and allot $400 for promotional expenditures?

= 5837.5208 - 53.2173X1 + 3.6131X2

= 5837.5208 - 53.2173(79) + 3.6131(400)

= 3078.57 OmniPower Bars per month

  1. Examples in Classification Analysis
  1. Statistical Techniques – Linear Discriminant Analysis
  1. Data Conditions: Outcome (Y) is categorical and

predictors (X) are numeric. This analysis is robust to violating the requirement that predictors must be numeric.

  1. Example: You want to predict injury severity for a vehicle crash on La. Highways. (n=10000 crashes)
  1. Outcome (Y): 1 = Injury exists, 0 = No Injury
  2. Predictors (Xs): Hour of Crash, Alcohol Involvement, Day of Week, Drugs Involved, Number of Vehicles, and Fatality
  3. Data Mining Software gives output for the subset of variables that has the best predictive accuracy

Coefficients

Classification Functions

Group 1

Injury Exists

Group 0

No Injury

Constant

-15.67

-13.17

Crash Hour (1-24) (X1)

0.51

0.51

Alc Involved (1=Y, 0=N) (X2)

4.89

3.65

Day of Week (1-7) (X3)

1.17

1.16

Drugs (1=Y, 0=N) (X4)

1.72

-0.03

#Vehicles (X5)

8.02

7.67

Fatality (1=Y, 0=No) (X6)

4.37

3.93

Obs

Predicted Class

Actual Class

Prob. for 1 (success)

CR_HOUR

ALCOHOL

DAY_OF_WK

DRUGS

NUM_VEH

FATALITY

1

0

0

0.1447

12

0

4

0

2

0

2

1

1

0.8361

11

1

7

1

3

0

LCF1 = -15.67 + 0.51X1+4.89X2+1.17X3+1.72X4 +8.02X5+4.37X6

LCF0 = -13.17 + 0.51X1+3.65X2+1.16X3 - 0.03X4+7.67X5+3.93X6

For a driver, plug in the Xs and calculate LCF1 and LCF2.

Classification Rule:

If LCF1 > LCF0, then the crash is predicted to be from group 1 (Injury)

If LCF1 ≤ LCF0, then the crash is predicted to be from group 0 (None)

Output also gives line listing of data:

And measure of accuracy:

Classification Matrix

Predicted Class

Actual Class

1

0

1

1507

23

0

326

8144

Percent correct = (1507+8144)/10000 = 0.9651

  1. Deployment: Suppose a crash is reported at 2am on Saturday morning where alcohol and drugs are involved with 2 vehicles but no fatalities.

X1=2, X2=1, X3=7, X4=1, X5=2, X6=0

LCF1= -15.67 + 0.54(2)+4.89(1)+1.17(7)+1.72(1) +8.02(2)+4.37(0) = 16.25

LCF0= -13.17 + 0.54(2)+3.65(1)+1.16(7) - 0.03(1)+7.67(2)+3.93(0) = 14.99

Because LCF1 > LCF0, it is predicted that the crash DOES have an injury

  1. Decision Tree – uses geometric representations to arrive and IF-THEN-ELSE rules for classification.
  1. Suppose you wanted to build a decision tree to make a YES-NO decision on whether an applicant would be admitted into an MBA program, using GMAT scores and GPA.

Obs

GMAT

GPA

DECISION

1

650

2.75

NO

2

580

3.5

NO

3

600

3.5

YES

4

450

2.95

NO

5

700

3.25

YES

6

590

3.5

YES

7

400

3.85

NO

8

640

3.5

YES

Software will partition so that each region has points belonging only to one category or another.(Stars=NO)

So classification rule is: Classify as NO if GPA ≤ 2.95 or GMAT≤580

Classification Tree:

  1. Example in Book:
  1. Clustering
  1. Introduction: Clustering partitions a collection of things (i.e.: observations or ROWS, customers, students, etc.) into natural groupings or segments such that the members share similar characteristics but the groups themselves are highly different. (Note: this analysis is different from classification analyses in that the groups are unknown and created.)

(Example: Students may remember taking a career inventory survey and based on your response to many questions, you were told – or put into a cluster indicating - what occupation would suit you best)

Other examples:

  1. Harry Potter - the Sorting Hat determines to which House (Gryffindor, Hufflepuff, Ravenclaw and Slytherin) to assign first-year students at the Hogwarts School.
  1. Seating guests at a wedding or social events
  1. The goal of clustering is to create groups so that the members of the group have maximum similarity (a lot in common) and the members across group have minimal similarity (differ).
  2. Common Applications include Market Segmentation – an

analysis that aids in dividing customers into groups based

upon data descriptions (variables) so that you can target

those groups with different advertising campaigns. Examples:

  1. Suppose you are targeting your most loyal customers.

You could design a market segmentation questionnaire

for an airline asking for demographic information such

behavior items such as frequency of flying, how

purchased tickets, who traveled with, cities flown to,

where sat, airlines flown, money spent on airline

tickets, etc.

(Your data may indicate, for example, that your most

“frequent flyers” are males who fly alone in your first class

section during the week, obtain tickets online, utilize rental

cars, and stay at the Marriott Hotel. You may also find that

your big money makers are “families with teenage children”

traveling in your coach section during holidays to winter

vacation destinations, and stay at resort areas or “honey-

mooners” traveling to big-ticket destinations from Saturday to Saturday with no children.)

  1. Gender, age, income, housing type, and education

level are common demographic variables for clustering.

For example, some brands are targeted only to women,

others only to men. Music downloads tend to be targeted to the young, while hearing aids are targeted to the elderly. Education levels often define market segments. For instance, private elementary schools might define their target market as highly educated households containing women of childbearing age.

  1. Marriott International utilized data mining analyses their set of customers and as a result created different hotels experiences:
  1. Marriott Suites...Permanent vacationers
  2. Fairfield Inn...Economy Lodging
  3. Residence Inn...Extended Stay
  4. Courtyard By Marriott...Business Travellers

VII. Association

  1. Introduction: Analyses aimed at associations that establish

relationships among items (variables or COLUMNS)

within a given record.

  1. The goal is to create groups of VARIABLES that are

similar.

  1. Business Application:

Market Basket Analysis in the retail business refers to research that provides the retailer with information to help

understand the purchase behavior of a buyer.

This information enables the retailer to understand the

buyer's needs and modify the store's layout accordingly (i.e.: product placement)

Examples:

  1. Popular example in academics: A super market discovers through market basket analysis that customers who bought diapers often bought beer. They then placed diapers close to beer coolers and their sales increased dramatically; the explanation being that fathers who are sent out to buy diapers often buy a beer as well, as a reward.

  1. People who buy cold medicine frequently will also

buy tissue. A lot of customers will go to the store just

for milk, so milk is placed in the back of the store.

  1. During Thanksgiving, you will see Walmart displays

with Jiffy cornbread mix, canned sweet potatoes,

brown sugar, pecans, pie shells, flour. (Use of cross promotional programs)

  1. Cross selling on the web: Amazon.com's use of

"customers who bought book A also bought book B."

(From: Intro to Business Data Mining, Olson & Shi, 2007)

Flowers

Softball

Glove

Peat

Fertilizer

Spade

Bat

Flowers

32

3

0

12

18

6

1

Softball

3

25

6

0

3

2

12

Glove

0

6

8

0

1

0

5

Peat

12

0

0

15

8

10

0

Fertilizer

18

3

1

8

21

15

2

Spade

6

2

0

10

15

16

1

Bat

1

12

5

0

2

1

14

Can you detect a customer profile by looking at buying behavior?

Application Case 4.6 Highmark, Inc., Employs Data Mining to Manage Insurance Costs, pages 163-164.

This case explains the data in managed care organizations and underscores the need for data mining.

Here is the link to Highmark, Inc. website

https://www.highmark.com/hmk2/index.shtml

Here are the links to current Highmark and SAS success stories.

  • Highmark maximizes Medicare revenues with SAS® Enterprise Miner

http://www.sas.com/success/highmarkmedicare.html

Challenge:

Find un- or misdiagnosed patients with illnesses that qualify for higher Medicare reimbursements

Solution: Highmark relies on SAS Enterprise Miner to build decision trees that map likely outcomes a patient faces based on measures such as symptoms, health history and demographics.

Benefits:

The insurer estimates it saves millions by finding patients with one of 27 illnesses that qualify for higher Medicare reimbursements

SAS helps us do some very sophisticated work. If we didn’t have SAS we couldn’t come up with some of the answers we’ve gotten.” ~Brian Day, Director of Advanced Analytics

2) Highmark makes healthcare-fraud prevention top priority with SAS®.

http://www.sas.com/success/highmark_fraud.html

Challenge:

Prevent fraudulent healthcare insurance claims from getting paid.

Solution: SAS Enterprise Miner automates modeling to make it easier for investigators to spot questionable activity.

Benefits:

$11.5 million in savings in 2005; work that used to take eight hours now takes minutes; investigators can handle a 30-percent increase in caseloads.

We use SAS to enable a finite number of people to handle more cases than they were able to handle before.” ~Shawn McNelis, Vice President of Healthcare Informatics

From the Highmark homepage: https://www.highmark.com/hmk2/index.shtml

Click on About Highmark, Click on We Fight Fraud, and Click on The Red Flags of Fraud to illustrate concrete factors that could indicate potential fraud.

Or go directly to Red Flags of Fraud https://www.highmark.com/hmk2/about/mission/redflags.shtml

  1. Why are companies such as Highmark using data mining applications?
  2. To analyze patient information to find relationships between medical conditions (such as, diabetes) and other parameters to improve treatment outcomes.
  3. To maximize revenue from Medicare
  4. To detect and prevent fraud
  5. Why were managed care organizations initially hesitant to use data mining applications?

Because of its cost and complexity.

  1. What are the potential threats that could arise due to data mining applications?
  2. competitive espionage stealing the results of the data mining application
  3. improper use of its results (such as making decisions on the basis of factors that act as surrogates for race/sex/age/etc.),
  4. damage to the business if the applica­tion is used incorrectly by someone who does not understand what it does.
  5. What complexities arise when data mining is used in health care organizations?
  6. So much data, they do not want to add to the complexity by adding data mining applications.
  7. They may be unable to decide why and how to analyze their data.
  8. Assume that you are an employer and that your managed care organization raises your rate based on the results of data mining and predictive modeling software. Would you accept the organization’s software predictions?
  9. Insist on seeing the jus­tification and analysis.
  10. Ask for access to the data in order to per­form their own analysis—which could conceivably lead to a request for a reduction.

Application Case 4.7 Coors Improves Beer Flavors with Neural Networks, pages 172-173. This case illustrates where predictive features of neural networks are used to analyze and improve beer flavor. Because of their ability to model highly complex real-world problems, researchers and practitioners have found many uses for neural networks.

What can we learn from this vignette?

In general, build awareness of how neural networks were used.

Application Case 4.8 Predicting Customer Churn – A Competition of Different Tools, pages 178-179.

What can we learn from this vignette?

In general, build awareness of how the data and existing data mining software tools are used in the vital area of customer retention.

End of Chapter Application Case: Data Mining Helps Develop Custom-Tailored Product Portfolios for Telecommunication Companies, pages 185-186. Answer questions 1-4 on page 186.

summarize

  1. Why do you think that consulting companies are more likely to use data mining tools and techniques? What specific value proposition do they offer?

Consulting companies use data mining tools and techniques because the results are valuable to their clients. Consulting companies can develop data mining expertise and invest in the hardware and software and then earn a return on those investments by selling those services. Data mining can lead to insights that provide a competitive advantage to their clients.

  1. Why was it important for argonauten360° to employ a comprehensive tool that has all modeling capabilities?

In order to offer a comprehensive set of intelligence services, the company needed a comprehensive tool—or else their analysts needed to learn many different tools.

After 12 months of evaluating a wide range of data mining tools, the company chose Statistica Data Miner because it provided the ideal combination of features to satisfy most every analyst’s needs and requirements with user-friendly interfaces.

  1. What was the problem that argonauten360° helped solve for a call-by-call provider?

It is a very competitive business, and the success of the call-by-call telecommunications provider depends greatly on attractive per-minute calling rates. Rankings of those rates are widely published, and the key is to be ranked somewhere in the top-five lowest-cost providers while maintaining the best possible margins.

  1. Can you think of other problems for telecommunication companies that are likely to be solved with data mining?
  • Predicting customer churn (lost customers)
  • Predicting demand for capacity
  • Predicting the volume of calls for customer service based on time of day.
  • Predicting demographic shifts

Want latest solution of this assignment

Want to order fresh copy of the Sample Template Answers? online or do you need the old solutions for Sample Template, contact our customer support or talk to us to get the answers of it.