ID column in K-means clustering

molsenmolsen Member Posts: 6 Contributor II
edited November 2018 in Help

Hello,

I am doing a text clustering on text by using K-means and the output goes to an Excel file.

All is working fine, but I can't seem to the original ID column into the new spreadsheet?

Instead a new column is created with ascending numbers.

 

This ID column: id column.JPGOriginal example set   Into this column:   excel.JPG

 

This is my workflow:

workflow.JPG

Best Answer

  • molsenmolsen Member Posts: 6 Contributor II
    Solution Accepted

    I found a way to pass on the ID column on from the "Process Documents from Data" operator to the "K-means clustering" operator.

    It turned out that the only thing I had missed was a small checkmark called "Add meta information":

    add meta info.JPG

    After that I got the ID data all the way through to the Excel fil at the end!

Answers

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    Use a Set Role operator to set your ID column to the ID role. Then it should pass through to the clusters.

  • molsenmolsen Member Posts: 6 Contributor II

    Thank you for the reply, like this?

    set role.JPG

     

    Because it seems like the Id gets lost in the Process Documents, anything I have to do there?

    process docs.JPG

     

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    Double check your process, I'm able to pass an attribute that's set with an ID role through Process Documents from Data.

    ID.png

     

    Which comes from this sumple Search Twitter Process.

     

    <?xml version="1.0" encoding="UTF-8"?><process version="7.3.000">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="7.3.000" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="social_media:search_twitter" compatibility="7.3.000" expanded="true" height="68" name="Search Twitter" width="90" x="112" y="34">
    <parameter key="connection" value="Twitter Connection"/>
    <parameter key="query" value="rapidminer"/>
    </operator>
    <operator activated="true" class="select_attributes" compatibility="7.3.000" expanded="true" height="82" name="Select Attributes" width="90" x="246" y="34">
    <parameter key="attribute_filter_type" value="subset"/>
    <parameter key="attributes" value="Id|Text"/>
    </operator>
    <operator activated="true" class="nominal_to_text" compatibility="7.3.000" expanded="true" height="82" name="Nominal to Text" width="90" x="380" y="34">
    <parameter key="attribute_filter_type" value="single"/>
    <parameter key="attribute" value="Text"/>
    </operator>
    <operator activated="true" class="text:process_document_from_data" compatibility="7.2.001" expanded="true" height="82" name="Process Documents from Data" width="90" x="514" y="34">
    <list key="specify_weights"/>
    <process expanded="true">
    <operator activated="true" class="text:tokenize" compatibility="7.2.001" expanded="true" height="68" name="Tokenize" width="90" x="246" y="34"/>
    <connect from_port="document" to_op="Tokenize" to_port="document"/>
    <connect from_op="Tokenize" from_port="document" to_port="document 1"/>
    <portSpacing port="source_document" spacing="0"/>
    <portSpacing port="sink_document 1" spacing="0"/>
    <portSpacing port="sink_document 2" spacing="0"/>
    </process>
    </operator>
    <connect from_op="Search Twitter" from_port="output" to_op="Select Attributes" to_port="example set input"/>
    <connect from_op="Select Attributes" from_port="example set output" to_op="Nominal to Text" to_port="example set input"/>
    <connect from_op="Nominal to Text" from_port="example set output" to_op="Process Documents from Data" to_port="example set"/>
    <connect from_op="Process Documents from Data" from_port="example set" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>

     

  • molsenmolsen Member Posts: 6 Contributor II

    Danm, I'm not able to make it work Mr. T-Bone!

    Can you share your workflow?

    Or do you know of a guide that shows how to pass the ID through?

    I have made mine based on this tutorial:

    k-means clustering tutorial

Sign In or Register to comment.