🎉 🎉   RAPIDMINER 9.5 BETA IS OUT!!!   🎉 🎉

GRAB THE HOTTEST NEW BETA OF RAPIDMINER STUDIO, SERVER, AND RADOOP. LET US KNOW WHAT YOU THINK!

CLICK HERE TO DOWNLOAD

🦉 🎤   RapidMiner Wisdom 2020 - CALL FOR SPEAKERS   🦉 🎤

We are inviting all community members to submit proposals to speak at Wisdom 2020 in Boston.


Whether it's a cool RapidMiner trick or a use case implementation, we want to see what you have.
Form link is below and deadline for submissions is November 15. See you in Boston!

CLICK HERE TO GO TO ENTRY FORM

Questions on Automodel (AM)

mznmzn Member, University Professor Posts: 10  University Professor
edited June 27 in Help
Hello, I have few questions on Automodel (AM):
1. How does "weights" (given under "General" tab) differ from "feature sets". For example, in one simulation, AM shows that a certain input has an importance of 1, however by examining feature sets in a couple of algorithms (say 4 out 7) that were selected by AM for this analysis, these 4 algorithms do not select this particular input (when I view "feature sets").

2. In "Optimal trade-offs between complexity and error" graph. I can find a model of complexity of 4 and an error of 15%. However, for this particular algorithm the accuracy was 72%. I guess I am not sure on how these two relate to each other. 

3. Given the above, what would be the best way to know the critical inputs in a dataset? Say that I trying to identify critical inputs in one dataset using AM and this is my thought process: what are these critical inputs for GLM, LR, DL, DT, RF, GBT etc. such that I can pinpoint identified inputs that re-occur between algorithms. I guess, this is my way of identifying such parameters (i.e. if they show up in different algorithms, then they are of high importance to the dataset). Any tips on this are appreciated. Thanks!
Tagged:

Best Answer

Answers

  • sgenzersgenzer 12Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,538  Community Manager
    tagging @IngoRM
    ----------------------
    Don't forget to submit your great ideas for Wisdom 2020! Deadline is November 15.

    Wisdom 2020 – Call for Speakers Form 

  • mznmzn Member, University Professor Posts: 10  University Professor
    Perfect!
    I really liked this statement
    "..but in my opinion there is no "critical inputs for the data set".  There is only "critical inputs for a specific model on a data set"."

    and I think this is what I was missing! Thanks again.

    varunm1sgenzer
  • sgenzersgenzer 12Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,538  Community Manager
    wow - thank you @IngoRM. Need to keep this post!
    ----------------------
    Don't forget to submit your great ideas for Wisdom 2020! Deadline is November 15.

    Wisdom 2020 – Call for Speakers Form 

    IngoRM
  • mznmzn Member, University Professor Posts: 10  University Professor
    @IngoRM one more question. How can I justify an analysis showing an input in the correlation matrix to have a negative correlation, when I know from experimental observations that this factor is more likely to be of positive correlation with the observation? Thanks!
  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,666  RM Founder
    @mzn Is this input column by any chance nominal?
    RapidMiner Wisdom 2020
    February 11th and 12th 2020 in Boston, MA, USA

  • mznmzn Member, University Professor Posts: 10  University Professor
    @IngoRM
    No, it is a numerical value (in this particular case, it is the spacing between two different components say columns in a building). Thank you
  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,666  RM Founder
    edited April 15
    Hmmm, since the math behind correlations is pretty simple and has been used 1,000s of times by as many users I kind of doubt that there is a bug (still possible of course).  Is there any chance that the result can be correct and your prior knowledge may be off?  You know, I have to ask :-)
    If you are really 100% sure that the results are off, can you possibly share the data and the process so we can have a look on this together (maybe a screen share)?
    Thanks,
    Ingo
    RapidMiner Wisdom 2020
    February 11th and 12th 2020 in Boston, MA, USA

  • mznmzn Member, University Professor Posts: 10  University Professor
    @IngoRM
    Sorry for the late reply. I am re-running the analysis using a different machine and will get back to you sometime tom. or Tuesday morning (I can definitely share the data + screen(s)). Thank you for your time.
  • mznmzn Member, University Professor Posts: 10  University Professor
    @IngoRM
    So, I have re-ran the analysis on two different computers and found the following:
    1. My home PC yields good results as you can here were the factor (s) has a + correlation (as expected):
    2. My Office PC shows that the factor "s" has a - correlation (which is not quite true).

    This is what I have found, the database files are identical (expect one had two columns with rounded digits -  the 2nd case), so I guess this was the issue. Thank you!
  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,666  RM Founder
    Yeah, probably that was the issue indeed.  Since all values for column S are different (while everything else is the same).
    Cheers,
    Ingo
    RapidMiner Wisdom 2020
    February 11th and 12th 2020 in Boston, MA, USA

Sign In or Register to comment.