The problem is the result of clustering

jabrajabra Member Posts: 20 Contributor I
edited November 2018 in Help

Hello
Dear engineers
I want to cluster
I have five columns
I want to cluster in the third column, which is the text
With the select attribute operator I chose the third column for clustering.
I want to put the clustering result at the end of the clustering, in the output of all the columns, plus the column.
what should I do???
Thank you so much if you help me

Answers

  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn

    Hi @jabra,

     

    Can you share your dataset and your process, please ?

    Otherwise, can you give an example of what you want to obtain : I have difficulties to understand what you want to do.

     

    Regards,

     

     

    Lionel

  • jabrajabra Member Posts: 20 Contributor I

    Hello
    thanks for your response
    I have no access to the data and my rapidminer file. Which I send.
    But
    look
    I have five columns with the names: idiot. name . lable. Address. Description . I have
    I want to cluster the description based on the column name.
    But
    At the end of the clustering on the output. I have all the columns with the cluster output column. that's mean
      Idiot name . lable. Address. Description and cluster
    In the output, I can tell which sentence in the cluster has the x lable.
    Thank you very much if you help me

  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn

    Hi @jabra

     

    I propose you this process (to adapt and complete with your own data) : 

    <?xml version="1.0" encoding="UTF-8"?><process version="8.2.000">
    <operator activated="true" class="select_attributes" compatibility="8.2.000" expanded="true" height="82" name="Select Attributes" width="90" x="313" y="34">
    <parameter key="attribute_filter_type" value="single"/>
    <parameter key="attribute" value="label"/>
    <parameter key="attributes" value=""/>
    <parameter key="use_except_expression" value="false"/>
    <parameter key="value_type" value="attribute_value"/>
    <parameter key="use_value_type_exception" value="false"/>
    <parameter key="except_value_type" value="time"/>
    <parameter key="block_type" value="attribute_block"/>
    <parameter key="use_block_type_exception" value="false"/>
    <parameter key="except_block_type" value="value_matrix_row_start"/>
    <parameter key="invert_selection" value="true"/>
    <parameter key="include_special_attributes" value="true"/>
    </operator>
    </process>

    Does this process answer to your need ?

     

    Regards,

     

    Lionel

     

  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn

    Hi again @jabra,

     

    Here you can find a new version of the previous process (maybe more adapted to your need) : 

    <?xml version="1.0" encoding="UTF-8"?><process version="8.2.000">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="8.2.000" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="retrieve" compatibility="8.2.000" expanded="true" height="68" name="Retrieve Iris" width="90" x="179" y="34">
    <parameter key="repository_entry" value="//Samples/data/Iris"/>
    </operator>
    <operator activated="true" class="select_attributes" compatibility="8.2.000" expanded="true" height="82" name="Select Attributes" width="90" x="313" y="34">
    <parameter key="attribute_filter_type" value="single"/>
    <parameter key="attribute" value="label"/>
    <parameter key="invert_selection" value="true"/>
    <parameter key="include_special_attributes" value="true"/>
    </operator>
    <operator activated="true" class="generate_id" compatibility="8.2.000" expanded="true" height="82" name="Generate ID" width="90" x="514" y="238"/>
    <operator activated="true" class="concurrency:k_means" compatibility="8.2.000" expanded="true" height="82" name="Clustering" width="90" x="447" y="34">
    <parameter key="k" value="3"/>
    </operator>
    <operator activated="true" class="select_attributes" compatibility="8.2.000" expanded="true" height="82" name="Select Attributes (2)" width="90" x="581" y="85">
    <parameter key="attribute_filter_type" value="single"/>
    <parameter key="attribute" value="cluster"/>
    <parameter key="include_special_attributes" value="true"/>
    </operator>
    <operator activated="true" class="generate_id" compatibility="8.2.000" expanded="true" height="82" name="Generate ID (2)" width="90" x="715" y="85"/>
    <operator activated="true" class="concurrency:join" compatibility="8.2.000" expanded="true" height="82" name="Join" width="90" x="849" y="85">
    <list key="key_attributes"/>
    </operator>
    <operator activated="true" class="generate_attributes" compatibility="8.2.000" expanded="true" height="82" name="Generate Attributes" width="90" x="983" y="85">
    <list key="function_descriptions">
    <parameter key="a1" value="concat(str([a1]),&quot;_&quot;,[cluster])"/>
    <parameter key="a2" value="concat(str([a2]),&quot;_&quot;,[cluster])"/>
    <parameter key="a3" value="concat(str([a3]),&quot;_&quot;,[cluster])"/>
    <parameter key="a4" value="concat(str([a4]),&quot;_&quot;,[cluster])"/>
    </list>
    </operator>
    <connect from_op="Retrieve Iris" from_port="output" to_op="Select Attributes" to_port="example set input"/>
    <connect from_op="Select Attributes" from_port="example set output" to_op="Clustering" to_port="example set"/>
    <connect from_op="Select Attributes" from_port="original" to_op="Generate ID" to_port="example set input"/>
    <connect from_op="Generate ID" from_port="example set output" to_op="Join" to_port="left"/>
    <connect from_op="Clustering" from_port="cluster model" to_port="result 1"/>
    <connect from_op="Clustering" from_port="clustered set" to_op="Select Attributes (2)" to_port="example set input"/>
    <connect from_op="Select Attributes (2)" from_port="example set output" to_op="Generate ID (2)" to_port="example set input"/>
    <connect from_op="Generate ID (2)" from_port="example set output" to_op="Join" to_port="right"/>
    <connect from_op="Join" from_port="join" to_op="Generate Attributes" to_port="example set input"/>
    <connect from_op="Generate Attributes" from_port="example set output" to_port="result 2"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    <portSpacing port="sink_result 3" spacing="0"/>
    </process>
    </operator>
    </process>

    Regards,

     

    Lionel

  • marcin_blachnikmarcin_blachnik Member Posts: 61 Guru

    Hi
    The easiest and the fastest way is to define a special roles for other columns except the one you want to cluster. In this case you would not need any select attributes, joins etc. You can do it because RapidMiner uses for any analysis (including clustering, classification regression etc) only the regular attributes. 

     

    Just put Set Role operator and type in "target role" (define your own role)

  • elena2020chaoelena2020chao Member Posts: 13

    Hello
    Dear Friends
    I use the process document from data operator. I want to have columns in the tokenize of words in addition to the main columns and labels and clustering.
    How to change
    Thank you for helping me too
    Thankful

  • jabrajabra Member Posts: 20 Contributor I

    Hello
    Very much of the process you sent. Thank you
    Just dear dear engineer
    What if I want to see the results of tokenize in the output? As our friend's question is (@ elena2020chao)

    And how to evaluate the outcome?
    See error

    m1.JPG
    Thanks again if you send the process

  • jabrajabra Member Posts: 20 Contributor I

    Hello
    Has anyone ever done this? Who can help me? I need very much ...
    Thank you so much if you help me

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    @jabra You have nominal values in data set that the performance operator can't use. 

     

    You need to convert everything to a numerical value. 

  • jabrajabra Member Posts: 20 Contributor I

    Hello
    Thank you
    But
    I am clustering on the text field
    What should I do?
    Thankful

Sign In or Register to comment.