🎉 🎉 RAPIDMINER 9.10 IS OUT!!! 🎉🎉

Download the latest version helping analytics teams accelerate time-to-value for streaming and IIOT use cases.

CLICK HERE TO DOWNLOAD

How to join multiple excel sheets to combine them into one cluster (k-means) ?

DDresenDDresen Member Posts: 10 Contributor I
edited July 2020 in Help
Hey there,

I'm trying to join (in this example 2 but the purpose is to join a huge number of excels) some excel sheets with the join operator to cluster similar documents from different datasets. My problem is, that the join operator overwrites the datasets which are identical in their structure to that the exampleset that arrives at the cluster operator is empty. Attached you will find the process I'm using + the datasets. 

How do I solve this? Thanks in advance!

Best Answer

Answers

  • mschmitzmschmitz Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,123  RM Data Scientist

    are you sure you want to join and not append those two sets?

    ~Martin
    - Head of Data Science Services at RapidMiner -
    Dortmund, Germany
    DDresen
  • DDresenDDresen Member Posts: 10 Contributor I
    Hi @mschmitz

    my bad, you're absolutely right! Now that I've changed the process there is another problem. Why are those empty? 
  • mschmitzmschmitz Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,123  RM Data Scientist
    What do you mean by "those"?

    ~Martin
    - Head of Data Science Services at RapidMiner -
    Dortmund, Germany
  • DDresenDDresen Member Posts: 10 Contributor I
    I'm sorry, just trying to make sense of my problem which is hard when you dont really know what your problem is. But I guess I evaluated what the problem is. As you can see in my attached process I'm trying to read multiple datasets which only cotain text, seperated by ','. After replacing missing values I'm using the process Documents operator to tokenize, transform cases etc. and the most important part of it: to create tf-idf word vectors of those tokens. After doing this for each dataset I would like to append those vector-matrices to cluster them afterwards. 
    This is where it cracks. It is obviously not possible to append examplesets with different attribute names (which have now the values of the generated text-tokens - I will attach a picture for better understanding)


    So my question is: How do I append those matrices to cluster them afterwards? 
Sign In or Register to comment.