[SOLVED]Using k-means clustering on web log data

Regular Contributor

[SOLVED]Using k-means clustering on web log data

I have a data set from a access web log file which I'm interested in finding similar clusters. (I'm an absolute beginner of data mining). So far I have referred many research papers on the same problem domain.

An Efficient Approach for Clustering Web Access Patterns from Web Logs

Classifying the user intent of web queries using k-means clustering

I want to use k-means clustering to cluster web pages. Although these papers discuss about the algorithm, they do not specify the way of providing input data set. k-means calculate similarity between data points using Euclidean distance. So how to normalize my dataset to be mined using k-means since urls can not directly used for k-means. Any help/good reference on this?

Example Dataset(p1..pn are different web pages)


Regular Contributor

Re: Using k-means clustering on web log data

Hi Star,

I'm not an expert but the way I would approach the problem is to create a table with p1...pn as columns and individual users as rows.
The values filling the table would be the count of how many times a page has been visited by the user.

UserID p1 p2 p3 ..
User1 1 1 1 1
User2 1 1 0 0
User3 1 0 0 0

Just an idea.. Smiley Happy
Regular Contributor

Re: Using k-means clustering on web log data

Hi ighyboo,

Thanks for the reply, this is what exactly ended up in doing.