Decision Trees in Big Data Analytics

Decision Trees in Big Data analytics

It is an algorithm employed for supervised learning problems like regression or classification
It is also termed as classification tree which has non leaf node or an internal node which is named with a input feature
The arc arriving from a named node with a feature are termed with every aspect of the potential values of the feature. Every leaf of the tree is named with a classNameor a possibility distribution across the classes
The practice of top down stimulation approach of decision tree is an illustration of a greedy algorithm which is the key strategy for knowing decision trees
The decision trees employed in data mining are classified into 2 types. they are

e.g. finding spam email and non spam email

e.g. worker salary

The decision tree is simple and it has a few problems. One of the problem is decision tree generates increase in variance in the resultant model. To lighten the problem, decision tree ensemble functions are designed. There are 2 categories of such methods are employed widely. They are

Bagging decision tree: It is employed to develop various decision trees by repetitively re sampling data of training with replacement and selecting trees for an consent forecast. The algorithm is termed as random forest
Boosting decision tree: It considers weak learners, here, decision trees to particular strong learner, in a repetitive manner. It sets a weak tree to the data and repetitively keeps setting weak learners as it need to rectify the defect of the earlier model