Questions on Automodel (AM)

mznmzn Member, University Professor Posts: 10 University Professor
edited June 2019 in Help
Hello, I have few questions on Automodel (AM):
1. How does "weights" (given under "General" tab) differ from "feature sets". For example, in one simulation, AM shows that a certain input has an importance of 1, however by examining feature sets in a couple of algorithms (say 4 out 7) that were selected by AM for this analysis, these 4 algorithms do not select this particular input (when I view "feature sets").

2. In "Optimal trade-offs between complexity and error" graph. I can find a model of complexity of 4 and an error of 15%. However, for this particular algorithm the accuracy was 72%. I guess I am not sure on how these two relate to each other. 

3. Given the above, what would be the best way to know the critical inputs in a dataset? Say that I trying to identify critical inputs in one dataset using AM and this is my thought process: what are these critical inputs for GLM, LR, DL, DT, RF, GBT etc. such that I can pinpoint identified inputs that re-occur between algorithms. I guess, this is my way of identifying such parameters (i.e. if they show up in different algorithms, then they are of high importance to the dataset). Any tips on this are appreciated. Thanks!
Tagged:

Best Answer

Answers

  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager
    tagging @IngoRM
  • mznmzn Member, University Professor Posts: 10 University Professor
    Perfect!
    I really liked this statement
    "..but in my opinion there is no "critical inputs for the data set".  There is only "critical inputs for a specific model on a data set"."

    and I think this is what I was missing! Thanks again.

  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager
    wow - thank you @IngoRM. Need to keep this post!
  • mznmzn Member, University Professor Posts: 10 University Professor
    @IngoRM one more question. How can I justify an analysis showing an input in the correlation matrix to have a negative correlation, when I know from experimental observations that this factor is more likely to be of positive correlation with the observation? Thanks!
  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    @mzn Is this input column by any chance nominal?
  • mznmzn Member, University Professor Posts: 10 University Professor
    @IngoRM
    No, it is a numerical value (in this particular case, it is the spacing between two different components say columns in a building). Thank you
  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    edited April 2019
    Hmmm, since the math behind correlations is pretty simple and has been used 1,000s of times by as many users I kind of doubt that there is a bug (still possible of course).  Is there any chance that the result can be correct and your prior knowledge may be off?  You know, I have to ask :-)
    If you are really 100% sure that the results are off, can you possibly share the data and the process so we can have a look on this together (maybe a screen share)?
    Thanks,
    Ingo
  • mznmzn Member, University Professor Posts: 10 University Professor
    @IngoRM
    Sorry for the late reply. I am re-running the analysis using a different machine and will get back to you sometime tom. or Tuesday morning (I can definitely share the data + screen(s)). Thank you for your time.
  • mznmzn Member, University Professor Posts: 10 University Professor
    @IngoRM
    So, I have re-ran the analysis on two different computers and found the following:
    1. My home PC yields good results as you can here were the factor (s) has a + correlation (as expected):
    2. My Office PC shows that the factor "s" has a - correlation (which is not quite true).

    This is what I have found, the database files are identical (expect one had two columns with rounded digits -  the 2nd case), so I guess this was the issue. Thank you!
  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    Yeah, probably that was the issue indeed.  Since all values for column S are different (while everything else is the same).
    Cheers,
    Ingo
Sign In or Register to comment.