[SOLVED] Clustering(K-Means) data from database

Ruca · September 2012

Hi all,
Sorry if this problem was already solved, but I’m a newbie and I was not able to locate a similar one.
My problem is the following:
I’ve a table with the following columns: doc_id; term; weight. Basically, for each document there are several terms occurrences and a weight associated to each term. This means that, each document is categorized by a set of pair attributes (term, weight)
Example:
Doc_id term weight
Doc1 color 0,45
Doc1 height 0,22
Doc1 weight 0,05
Doc2 altitude 0,04
Doc2 weight 0,35
I intend to perform a clustering analysis using k-means in order to check which documents are more similar against a predefined k clusters.
When I connect the "read database" operator to the "clustering" operator an error message appears saying that clustering doesn’t accept polynomial attributes. It’s not my intention to change both “doc_id” and “term” attributes to nominal ones. The result that I'm expecting should be somthing similar to:
Cluster_0 (Doc1, Doc32, Docx,...), Cluster_1(Doc_2, Doc45, Docy,...), etc.
Does anyone came across such problem?
Thank you for your support.

Best regards,

MariusHelf · September 2012

Hi Ruca,

first of all you have to De-Pivot your data with the equally named operator to get a dataset which contains exactly one document per row, like this:


Doc_id color height weight altitude
Doc1    0,45   0,22   0,05        0
Doc2       0      0   0,35     0,04

Then define Doc_id as Id with Set Role, and apply the clustering. That's it

Best, Marius

Ruca · September 2012

Thank you Marius for your support. It worked like a charm.
I've used the PIVOT operator instead of the DE-PIVOT.
Regards,

MariusHelf · September 2012

Ruca wrote:
I've used the PIVOT operator instead of the DE-PIVOT.

Oh sorry, of course you have to use Pivot oO

Happy Mining!

~Marius

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

[SOLVED] Clustering(K-Means) data from database

Answers