The Altair Community is migrating to a new platform to provide a better experience for you. The RapidMiner Community will merge with the Altair Community at the same time. In preparation for the migration, both communities are on read-only mode from July 15th - July 24th, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here.
Options

[SOLVED] Clustering(K-Means) data from database

RucaRuca Member Posts: 13 Contributor II
edited November 2018 in Help
Hi all,
Sorry if this problem was already solved, but I’m a newbie and I was not able to locate a similar one.
My problem is the following:
I’ve a table with the following columns: doc_id; term; weight. Basically, for each document there are several terms occurrences and a weight associated to each term. This means that, each document is categorized by a set of pair attributes (term, weight)
Example:
Doc_id term weight
Doc1 color 0,45
Doc1 height 0,22
Doc1 weight 0,05
Doc2 altitude 0,04
Doc2 weight 0,35
I intend to perform a clustering analysis using k-means in order to check which documents are more similar against a predefined k clusters.
When I connect the "read database" operator to the "clustering" operator an error message appears saying that clustering doesn’t accept polynomial attributes. It’s not my intention to change both “doc_id” and “term” attributes to nominal ones. The result that I'm expecting should be somthing similar to:
Cluster_0 (Doc1, Doc32, Docx,...), Cluster_1(Doc_2, Doc45, Docy,...), etc.
Does anyone came across such problem?
Thank you for your support.

Best regards,

Answers

  • Options
    MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    Hi Ruca,

    first of all you have to De-Pivot your data with the equally named operator to get a dataset which contains exactly one document per row, like this:

    Doc_id color height weight altitude
    Doc1    0,45  0,22  0,05        0
    Doc2      0      0  0,35    0,04
    Then define Doc_id as Id with Set Role, and apply the clustering. That's it :)

    Best, Marius
  • Options
    RucaRuca Member Posts: 13 Contributor II
    Thank you Marius for your support. It worked like a charm.
    I've used the PIVOT operator instead of the DE-PIVOT.
    Regards,
  • Options
    MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    Ruca wrote:
    I've used the PIVOT operator instead of the DE-PIVOT.
    Oh sorry, of course you have to use Pivot oO

    Happy Mining!

    ~Marius
Sign In or Register to comment.