.

Decision Trees in Big Data Analytics

Decision Trees in Big Data analytics

  • It is an algorithm employed for supervised learning problems like regression or classification
  • It is also termed as classification tree which has non leaf node or an internal node which is named with a input feature
  • The arc arriving from a named node with a feature are termed with every aspect of the potential  values of the feature. Every leaf of the tree is named with a classNameor a possibility distribution across the classes
  • The practice of top down stimulation approach of decision tree is an illustration of a greedy algorithm which is the key strategy for knowing decision trees
  • The decision trees employed in data mining are classified into 2 types. they are
    • Classification tree which represents a nominal instance of a response
    • e.g. finding spam email and non spam email

    • Regression tree which represents a real number of the forecasted outcome
    • e.g. worker salary

  • The decision tree is simple and it has a few problems. One of the problem is decision tree generates increase in variance in the resultant model. To lighten the problem, decision tree ensemble functions are designed. There are 2 categories of such methods are employed widely. They are
    • Bagging decision tree: It is employed to develop various decision trees by repetitively re sampling data of training with replacement and selecting trees for an consent forecast. The algorithm is termed as random forest
    • Boosting decision tree: It considers weak learners, here, decision trees to particular strong learner, in a repetitive manner. It sets a weak tree to the data and repetitively keeps setting weak learners as it need to rectify the defect of the earlier model
decision trees img1 decision trees img2
.