"Clustering models"

OzoneOzone Member Posts: 17 Maven
edited May 2019 in Help
Okay, so it is not a smart solution to read out the model file, convert it and to read in the tree structure!

What I want to find out is:

If there are some structures (e.g. rules: if...then...else....) in my datasets ( RapidMiner detects for me, hopefully ),

how can i figure out if different datasets have the same structures ( in sense of the rules, not only the data )???

What do you think about this option:

1. I learn a model ( e.g. a decision tree ) for every dataset and apply it to all other datasets.

Then I will check the performance for every model and if the performance is almost equal to the reference performance, the rules are similar ( if there is an unambiguous solution )


  • Fabian_WewersFabian_Wewers Member Posts: 3 Contributor I
    Hi Ozone,

    why did you start a new thread?
    Why is the awchisholm's solution not practible?
    I don't get the point whether you only want to find duplicates of whole rules (even here the export to a file works) or want to compare each tree element to each other.
    Another question: Do you only have rules with the depth of one ore branched/combined rules as well?

    There are you few points you have to pay attention to with your new strategy:
    • You can receive the same performance result of different models on the same dataset, so it might be sufficient for a heuristic
    • The performance of a classification can be measured in different ways, so there might be different results for different models on the same datasets
    Hoping my comment supports you, greetings

  • OzoneOzone Member Posts: 17 Maven
    Hi Fabian,

    I had to copy and paste the text.I changed the subject and now there is a new thread, sorry!

    1. I don't want to find duplicates of models, I'm only looking for similar models in the sense of same structures/patterns/rules in a classfication task.

    2. I'll try awchisholm's solution, but I'm not familiar with Xpath.

    3. Maybe another option is to count the number of same features in the first, second, third,... node! For example: 27 datasets have attribute 11 as first node and have the same attributes at the second layer...For this application it could be advantageous to build a tree of rules with the same operator.

    4. I understand your worries with the performance strategies and I'll think about it. Thanks!

  • awchisholmawchisholm RapidMiner Certified Expert, Member Posts: 458 Unicorn
    It was tricky but I managed to make an example using XPath to parse a file containing PMML so you can see the attributes used in a decision tree.

    See it here - http://rapidminernotes.blogspot.com/2011/05/how-to-read-pmml-file-to-determine.html


  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    Hi Andrew,

    that's cool stuff  ;D

  • OzoneOzone Member Posts: 17 Maven
    Thats really great, I will try it!

  • cristina_hcristina_h Member Posts: 1 Contributor I

    Could you please tell me how I could calculate indicators such as MIA or CDI in RapidMiner? I want to evaluate the results I get using 4 different techniques in order to select the best one.

Sign In or Register to comment.