K-means clustering and its usage…

HarvinderSingh
2 min readSep 7, 2022

--

In this world of towering growth of artificial intelligence and technology, Data plays a very crucial role. All the algorithms are being curated around data only. The very first thing in AI is machine learning, in which the computers are trained on a certain set of data known as dataset, and this dataset after several iterations and trainings helps in generation of a ml model, which is further used to solve the problems for which it was build.

The Learnings of the Machine depend upon the data itself, there are 3 types of learnings: Supervised learning, Unsupervised learning and Reinforcement learning. In supervised learning the data we have is labeled and self explanatory. In Unsupervised learning the data is unlabeled but the algorithms used are designed to find out some patterns in the dataset and try to segregate or map data in different groups. In Reinforcement learning the ai device learns from its actions.

K-Means Clustering is a clustering algorithm, which falls under the category of Unsupervised Learning. It tries to understand the data pattern and clusters it into K number of groups.

K means credits: https://www.altoros.com/blog/using-k-means-clustering-in-tensorflow/
K-means Clustering src:https://www.altoros.com/blog/using-k-means-clustering-in-tensorflow/

K-means clustering can be used in almost every domain, ranging from banking to recommendation engines, cyber security, document clustering to image segmentation. It is typically applied to data that has a smaller number of dimensions, is numeric, and is continuous.

k-means clustering is one of the commonly used clustering algorithms in cyber security analytics aimed at dividing security related data into groups of similar entities, which in turn can help in gaining important insights about the known and unknown attack patterns. This technique helps a security analyst to focus on the data specific to some clusters only for the analysis. To improve performance, k-means can exploit the triangle inequality to skip many point-center distance computations, without affecting the clustering results. Using this analytics many malwares and attacks can be prevented and help in the safeguarding of the system and important information from being hacked.

--

--