How can I have some melting function in rapidminer?

smmsammsmmsamm Member Posts: 7 Contributor I
edited December 2018 in Help

I am beginner in dataminer,

I have a list of 10000 rows and about 200 column like this :

 

look,1,2,3,4,5,6,7,8

book,4,5,6,7,8,102,104,107

look,6,7,8,9

hook,100,101,102

cook,7,8,9

build,102,103,104,107

hook,103,104,105

...

 

at first i need to make unique list of words:

look,1,2,3,4,5,6,7,8,9

book,4,5,6,7,8,102,104,107

hook,100,101,102,103,104,105

cook,7,8,9

build,102,103,104,107

 

Now I need to find lines with at least 3 (or n) similar values and generate a new list:

 

look,1,2,3,4,5,6,7,8,9

book,4,5,6,7,8,102,104,107

cook,7,8,9

*************

book,4,5,6,7,8,102,104,107

build,102,103,104,107

*************

hook,100,101,102,103,104,105

build,102,103,104,107

*************

 

Please help me in anyway

thank you

Answers

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn
    What is melting function?
  • smmsammsmmsamm Member Posts: 7 Contributor I

    I Searched the internet and someone said python melt can help me, but I don't know how can I do in rapidminer!

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,503 RM Data Scientist

    Hi,

    from the pandas doc for melt:

    “Unpivots” a DataFrame from wide format to long format, optionally leaving identifier variables set.

    I guess it maps to something along the lines of De-Pivot.  

    Best,

    Martin

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn
    I guess I learned something new today!
  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager

    so that's a fun puzzle.  I would begin like this (you will need @land's Statistics Extension to run this process):

     

    <?xml version="1.0" encoding="UTF-8"?><process version="7.6.001">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="7.6.001" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="retrieve" compatibility="7.6.001" expanded="true" height="68" name="Retrieve smmsamm" width="90" x="45" y="85">
    <parameter key="repository_entry" value="smmsamm"/>
    </operator>
    <operator activated="true" class="de_pivot" compatibility="7.6.001" expanded="true" height="82" name="De-Pivot" width="90" x="179" y="85">
    <list key="attribute_name">
    <parameter key="foo" value="att[2-9]"/>
    </list>
    <parameter key="index_attribute" value="bar"/>
    </operator>
    <operator activated="true" class="select_attributes" compatibility="7.6.001" expanded="true" height="82" name="Select Attributes" width="90" x="313" y="85">
    <parameter key="attribute_filter_type" value="single"/>
    <parameter key="attribute" value="bar"/>
    <parameter key="invert_selection" value="true"/>
    </operator>
    <operator activated="true" class="numerical_to_polynominal" compatibility="7.6.001" expanded="true" height="82" name="Numerical to Polynominal" width="90" x="447" y="85">
    <parameter key="attribute_filter_type" value="single"/>
    <parameter key="attribute" value="foo"/>
    </operator>
    <operator activated="true" class="rmx_stat:cross_table" compatibility="1.3.000" expanded="true" height="82" name="Extract Cross Table" width="90" x="581" y="85">
    <parameter key="group_attribute_a" value="att1"/>
    <parameter key="group_attribute_b" value="foo"/>
    </operator>
    <connect from_op="Retrieve Untitled 3smmsamm" from_port="output" to_op="De-Pivot" to_port="example set input"/>
    <connect from_op="De-Pivot" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
    <connect from_op="Select Attributes" from_port="example set output" to_op="Numerical to Polynominal" to_port="example set input"/>
    <connect from_op="Numerical to Polynominal" from_port="example set output" to_op="Extract Cross Table" to_port="example set input"/>
    <connect from_op="Extract Cross Table" from_port="cross table output" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>

    That said I am certain there is a cleverer way to do this!


    Scott

     

  • smmsammsmmsamm Member Posts: 7 Contributor I

    I updated my rapidminer and installed statics extension:

    !error0.jpg

    but I Get error:

    !error1.jpg
    and I can not find missing extension:

    !error2.jpg

    Would you please help again.

    Thank you

  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager

    hmm I'm not sure the extension in the marketplace is up-to-date (Sebastian?).  I would go directly to the website: https://oldworldcomputing.com/products/statistics-extension-for-rapidminer

     

    Scott

  • smmsammsmmsamm Member Posts: 7 Contributor I

    This is my csv file.
    would you please test with it?

  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager

    so the process I posted was not intended to be a finished product - just something to get you in the right direction.  :)  If you take that csv file and put it in my process, you get the attached result.

     

    Scott

  • smmsammsmmsamm Member Posts: 7 Contributor I

    Oh thank you sir, You are the master
    but These were samples data for test
    my real data have about 100000 difeerent value, with this method I will have about 100000 Columns?
    Is it possible to convert the list to my wanted list?

     

    look,1,2,3,4,5,6,7,8,9

    book,4,5,6,7,8,102,104,107

    cook,7,8,9

    *************

    book,4,5,6,7,8,102,104,107

    build,102,103,104,107

    *************

    hook,100,101,102,103,104,105

    build,102,103,104,107

    *************

  • smmsammsmmsamm Member Posts: 7 Contributor I

     

    !error0.jpg

    I mean these coloums convert to rows with header values?

  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager

    Your flattery is noted and not deserved.  There are many here who are far more masterful than I.  That said, I think at this point I would recommend getting more knowledgable with RapidMiner Studio before moving forward with large data sets like the one you describe - actions such as renaming attributes and so forth are the beginning of a long journey.  I would highly recommend starting with the "Getting Started with RapidMiner" YouTube playlist.  The whole beauty of RapidMiner is that you can learn to create your own processes and be a master yourself!

     

    Scott

  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn

    Hi all,

    I just published the most recent version of our extensions on the marketplace. So if that was the problem, it should be gone now. At least I can use it with the most recent version of RM.

     

    Greetings,

     Sebastian

Sign In or Register to comment.