Rapid Miner _ Many-to-many

dennis_enkelmandennis_enkelman Member Posts: 4 Newbie
Hi everyone, 
I am new here and I just started using rapidminer. 
I want to use it for my research in chemistry but I have already a few problems how to set up my data right for rapidminer, since my dataset does not really look like the examples I find out there.

To make things easier lets just say I am searching for matching substance pairs:
->I have around 60 substances.
->Theoretically every substance can match every other substance.
->Practically a few are known to match, others are known not to match and some are a " maybe" .
->Every substance has a lot of different properties (> 10), which I know all (for example color, smell, molecular weight,...).

When I now want to create a dataset, I have a " n: m " problem, which would be solved in a classical database with tables where the individual matches are linked by IDs.

Is there a way in rapidminer to link my matches from two identical tables? Or should I think about a way to express the many-to-many relationship in one table? If so, any Ideas how to do that ? :D 

Thanks for your help in advance! 

Best Answer


  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,453 RM Data Scientist

    it feels like you just want to use a join operator?

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • dennis_enkelmandennis_enkelman Member Posts: 4 Newbie
    Im not sure if I am right, but I understood the join operator as "merging" two tables to one.

    What I want to do is to include the information of couples into rapidminer. In the following the software should learn to predict if two different substances fit together or not.

    I would understand how to use "join" in a one-to-many relationship, but in my case I can't :/ Do you still think "join" is the right operator here? I am insecure if I just don't understand it right or if it does not work. Sorry for my inexperience ;)

    Attached you find an example dataset of two data tables and one table combining the primary keys of the couples. Lets say there is a missing couple between 4 and 3. If rapid miner would learn the (hypothetical) connections of color and molar weight it could predict the missing couple. (Dont think about the content here, I just try to keep it simple)


  • dennis_enkelmandennis_enkelman Member Posts: 4 Newbie
    I mean I could join every combination to one single row, but this would create huge amounts of data. with 60 substances I would have 3600 rows. with 100 already 10000. Later it might be necessary to include also groups of three substances that "fit", which would give up to 1000000 rows.
    In databases its so easy to link such relationships via a many to many function. I was hoping there is something similar in RM.

  • dennis_enkelmandennis_enkelman Member Posts: 4 Newbie
    Yeah thats it, I can imagine my problem in the star schema and was hoping to import this in rapid miner as it is. Transferring it into this "one line" representation increases the data amount immensely. But probably you are right, it might be also a good thing to keep it simple by increasing the amount of rows and not thinking in database dimensions.

    I will try that! Thanks for your advice ;)

Sign In or Register to comment.