.

K Means Clustering in big Data analytics

  • This clustering targets to divide n interpretation into k clusters where every interpretation lies on cluster with the nearby mean, helping as a prototype of the cluster. This ends in a division of the data storage area into voronoi cells
  • The collection of interpretation is considered as x1,x2,x3, …xn
  • Here every interpretation is a d dimensional real vector,
  • K means clustering targets to divide the n interpretations into k groups as G= g1,g2,g3,…gn
  • To reduce the inside cluster sum of squares the following is defined
  • k means clustering img1
  • The formula displays the aim function which is reduced in order to determine the best prototypes in k-means clustering
  • It identifies groups which are variant from one another and every member of each group must be identical with the remaining members of every cluster

The underlying code highlights the way to execute the clustering algorithm k means in a language R

k means clustering img2
  • To determine the best value for k, plot the inside group sum of squares for various values of k.
  • It minimizes when further groups are added, also determine a point where the reduction in the inside group sum of squares begin falling gradually.
  • The value is denoted as the best value i.e. k=6
  • k means clustering img3

At present, the value of k is denoted, it is required to execute algorithm with the specified value

k means clustering img4
.