.

K-means clustering algorithm

INFO703 - Big Data and Analytics - Lab

In this lab you will work with Talend Open Studio to apply K-means clustering algorithm to your dataset. Before starting this lab, you are recommended to watch the following video:

https://www.youtube.com/watch?v=oEHe8VlXZc8&t=1234s

In the virtual machine you already imported in Oracle VM VirtualBox, Talend Open Studio has been installed for you which needs these command to start running:

cd talend

cd tos_bd-20161216_1026-v6.3.1

./TOS_BD-linux-gtk-x86_64

Select an existing project: ASSIGNMENT3. There are two jobs: Clustering and Clusteredusers which the former calculates the distance of each record to centroids and the latter selects the centroid with minimum distance as the head cluster for the record.

Big Data Sample Assignments 4

Apply the same approach to cluster your own dataset as required for Assignment - Part 3.

.