Options

Any Ideas?

ScottyScotty Member Posts: 6 Contributor II
edited November 2018 in Help
Hi All,

I am trying to convert the following output from

Link cluster able adsl adsl_faceplate alarms
http://test1 cluster_2 .0 .0 .0 .0
http://test2 cluster_2 .0 .0 .0 .0
http://test3 cluster_0 .1 .0 .0 .0
http://test4 cluster_2 .0 .0 .0 .0
http://test5 cluster_1 .0 .1 .0 .0
http://test6 cluster_1 .0 .0 .0 .0
http://test7 cluster_0 .0 .0 .0 .0
http://test8 cluster_2 .0 .0 .0 .0
http://test9 cluster_1 .0 .0 .0 .0
http://test10 cluster_0 .1 .0 .0 .0

to 


Link Cluster Word Score
http://test1 cluster_2 able .0
http://test2 cluster_2 able .0
http://test3 cluster_0 able .1
http://test4 cluster_2 able .0
http://test5 cluster_1 able .0
http://test6 cluster_1 able .0
http://test7 cluster_0 able .0
http://test8 cluster_2 able .0
http://test9 cluster_1 able .0
http://test10 cluster_0 able .1
http://test1 cluster_2 adsl .0
http://test2 cluster_2 adsl .0
http://test3 cluster_0 adsl .0
http://test4 cluster_2 adsl .0
http://test5 cluster_1 adsl .1
http://test6 cluster_1 adsl .0
http://test7 cluster_0 adsl .0
http://test8 cluster_2 adsl .0
http://test9 cluster_1 adsl .0

Any ideas how this could be done?
There are thousands of rows and columns

Thanks
S

Answers

  • Options
    StaryVenaStaryVena Member Posts: 126 Contributor II
    Hi,
    maybe if you describe rules used for conversion, it will be easer to help you. Because I don't see any. Look at operators for generating attributes (
    Generate Attributes, Generate Aggregation, ...)

    Cheers,
    Vaclav
  • Options
    ScottyScotty Member Posts: 6 Contributor II
    Hi Vaclav,

    Sorry, I will explain a bit more.

    I use the k-means clustering operator to cluster text from a webcrawl that have been pre-processed (split into tokens, stop words removed etc).

    The cluster set result which consists of 3500 examples of data detailing the URL, the cluster result and the 8500 attributes from the text looks like


    Link            cluster    able  adsl  adsl_faceplate  alarms .......................(8500)...............z
    http://test1 cluster_2  .0  .0  .0  .0 .....................................0
    http://test2 cluster_2  .0  .0  .0  .0 .......................................0
    http://test3 cluster_0  .1  .0  .0  .0 ...................................0
    http://test4 cluster_2  .0  .0  .0  .0 ......................................0
    http://test5 cluster_1  .0  .1  .0  .0 ......................................0
    http://test6 cluster_1  .0  .0  .0  .0 ......................................0
    http://test7 cluster_0  .0  .0  .0  .0 ......................................0
    http://test8 cluster_2  .0  .0  .0  .0 ......................................0
    http://test9 cluster_1  .0  .0  .0  .0 ......................................0
    http://test10 cluster_0  .1  .0  .0  .0 ......................................0
    ....
    ....
    ....
    (3500)
    ...
    ...
    http://test3500 cluster_0  .1  .0  .0  .0 ......................................0

    I am looking to try and get the data into the following format.

    Link            Cluster      Word  TF-IDF Score
    http://test1 cluster_2  able  .0
    http://test1 cluster_2  adsl  .0
    http://test1 cluster_2  adsl_faceplate  .0
    http://test1 cluster_2  alarms  .0
    http://test1 cluster_2  .......  .0
    http://test1 cluster_2  z  .0
    http://test2 cluster_2  able  .0
    http://test2 cluster_2  adsl  .0
    http://test2 cluster_2  adsl_faceplate  .0
    http://test2 cluster_2  alarms  .0
    http://test2 cluster_2  .......  .0
    http://test2 cluster_2  z  .0
    http://test3 cluster_0  able  .0
    http://test3 cluster_0  adsl  .0
    http://test3 cluster_0  adsl_faceplate  .0
    http://test3 cluster_0  alarms  .0
    http://test3 cluster_0  .......  .0
    http://test3 cluster_0  z  .0
    ....
    ....
    http://test3500 cluster_0  able  .0
    http://test3500 cluster_0  adsl  .0
    http://test3500 cluster_0  adsl_faceplate  .0
    http://test3500 cluster_0  alarms  .0
    http://test3500 cluster_0  .......  .0
    http://test3500 cluster_0  z  .0

    Does this make a bit more sense?

    Thanks
    Scott
  • Options
    IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    Hi,

    you can use the operator "Pivot" and "De-Pivot" for tasks like this. You can find examples on myexperiment.org:

    http://www.myexperiment.org/search?filter=TYPE_ID%28%2262%22%29&;query=pivoting

    Simply install the Community Extension for RapidMiner to access and directly download the processes uploaded there (search the forum for more information about the Community Extension).

    Cheers,
    Ingo
  • Options
    ScottyScotty Member Posts: 6 Contributor II
    Hi Ingo,

    Thanks for the advice. Maybe you could point me to the example that is closest to what I am trying to do. Although similar I think the output I am after is very different.

    I suspect de-pivot is somehow involved.

    Many Thanks

    Scott
Sign In or Register to comment.