Is it possible to download the Inputs selected by automodel and their corresponding parameters

varunm1varunm1 Moderator, Member Posts: 918   Unicorn
edited June 27 in Help
Hello,

I am working on automodel for my data with 77 attributes. I am trying to get all the details of attributes (Columns) analysis done by automodel (Correlation, ID-ness, Stability and Missing Values). Is it possible to download this data showed by auto model into excel or any other file format? 

One more question is what is the "?" in ID-ness column in automodel.

Thanks,
Varun
Regards,
Varun
Rapidminer Wisdom 2020 (User Track): Call for proposals 

https://www.varunmandalapu.com/
Tagged:

Best Answer

Answers

  • mschmitzmschmitz Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 2,188  RM Data Scientist
    i am afraid this is currently not possible, since this is not done with operators. You would need to build the functionality manually with operators. But maybe @IngoRM knows a trick I am not aware of?
    BR,
    Martin
    - Head of Data Science Services at RapidMiner -
    Dortmund, Germany
    varunm1
  • varunm1varunm1 Moderator, Member Posts: 918   Unicorn
    Thanks @mschmitz. I need to look at some statistics so noted manually as I don't see any option. 
    Regards,
    Varun
    Rapidminer Wisdom 2020 (User Track): Call for proposals 

    https://www.varunmandalapu.com/
  • mschmitzmschmitz Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 2,188  RM Data Scientist
    have a look at Extract Statistics in Operator Toolbox. It gives you all the statistics of the normal Stats view. You can join this with Weight by Correlation. Then you already have two.

    BR,
    Martin
    - Head of Data Science Services at RapidMiner -
    Dortmund, Germany
    varunm1
  • varunm1varunm1 Moderator, Member Posts: 918   Unicorn
    @mschmitz sure I will try this. Thanks
    Regards,
    Varun
    Rapidminer Wisdom 2020 (User Track): Call for proposals 

    https://www.varunmandalapu.com/
  • varunm1varunm1 Moderator, Member Posts: 918   Unicorn
    Thanks @IngoRM this clears my questions.
    Regards,
    Varun
    Rapidminer Wisdom 2020 (User Track): Call for proposals 

    https://www.varunmandalapu.com/
    sgenzer
  • kypexinkypexin Moderator, RapidMiner Certified Analyst, Member Posts: 280   Unicorn
    Hi, I would like to add my 5 cents here as I also would like to have the ability to access the details about variables quality used in auto model. Here's my current use case: 
    • I have a dataset with 450+ attributes.
    • I start with auto model just to get the feeling how data is structured and what modelling capabilitiesd are there 'out of the box'.
    • Auto model checks inputs quality metrics which results in removing around 300+ 'bad' attributes, so I am left with diminished dataset having only quality attributes.
    • From here, I would like to continue with the diminished dataset and perform further feature selection outside auto model.
    • Ideally here I would like to have an operator which would detect all attributes with IDness, stability and correlation above certain configurable thresholds, so I can execute this in the scope of a separate modelling process (not within auto model) and also have access to all quality metrics of attributes.

    Telcontar120varunm1topaz_nIngoRM
  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,682  RM Founder
    Man, you are a on a roll here with new ideas :smiley: - @sgenzer, I would recommend to turn this also into a feature request here so that PM can take notice...
    RapidMiner Wisdom 2020
    February 11th and 12th 2020 in Boston, MA, USA

    varunm1kypexinrfuentealba
  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,682  RM Founder
    I am sure you could :wink:   Well, here is your open mic :smiley:
    RapidMiner Wisdom 2020
    February 11th and 12th 2020 in Boston, MA, USA

    kypexinmschmitz
  • SGolbertSGolbert RapidMiner Certified Analyst, Member Posts: 341   Unicorn
    edited April 30
    Hi,

    I have a similar problem, but with 1000 attributes. I have checked the underlying process of automodel, and it simply filters the bad attributes out by name using a Select Attributes operator (in Process -> Preprocessing -> Remove Columns?). This feels a lot like a black box to me . . .

    I hope we get the operator and the fixed automodel process soon!


    Edit: A workaround is to use the automodel process and only keep the preprocessing part:





    Regards,
    Sebastian
    varunm1
  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,682  RM Founder
    Sorry, but...
    This feels a lot like a black box to me . . .
    How is this a black box?  We show you the table, you select the set of attributes you want to use with checkmarks, we select them.  It can't be less black-boxy in my opinion ;)
    RapidMiner Wisdom 2020
    February 11th and 12th 2020 in Boston, MA, USA

  • varunm1varunm1 Moderator, Member Posts: 918   Unicorn
    Hello @IngoRM

    Need one clarification. Is there any use to have both stability and ID-ness in the automodel as these look like similar things?
    Regards,
    Varun
    Rapidminer Wisdom 2020 (User Track): Call for proposals 

    https://www.varunmandalapu.com/
  • jczogallajczogalla Employee, Member Posts: 125   RM Engineering

    I think the main difference is that ID-ness means that all values are distinctively different (like 1, 2, 3,... or mostly all different nominal values), while stability means that nearly all values are the same. Similar concepts, but not exactly the same.

    Cheers
    Jan
    varunm1sgenzer
  • varunm1varunm1 Moderator, Member Posts: 918   Unicorn
    Thanks, Jan. Yep, they are quite similar but not the same. I am trying to understand how both of them are helpful in Automodel when selecting attributes (we can set a condition based on one of these) or is there any other use for this?
    Regards,
    Varun
    Rapidminer Wisdom 2020 (User Track): Call for proposals 

    https://www.varunmandalapu.com/
  • SGolbertSGolbert RapidMiner Certified Analyst, Member Posts: 341   Unicorn
    edited May 2

    I meant that when I wanted to see what Automodel was doing in the background, there was nothing on the process that pointed out how these quality measures were being calculated. We can discuss whether it is a black box or not, but I think we can agree on that this is not desired.

    Regards,
    Sebastian

  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,682  RM Founder
    Ok, let's indeed not getting into semantics here :D
    The issue is, that we cannot actually do this via operators alone, at least not in all cases.  The reason is that the ultimate decision what to include has been made by the user, not RapidMiner.  Auto Model shows only recommendations (the traffic lights), the user decides to follow them or not.  Especially in the case that the user want to add a column which would have been recommended for removal, I do not see an easy way to achieve this via operators (of course it possible by keeping all original columns in an extra data set, use an operator selecting based on the recommendations, select the ones which should be kept despite the recommendation from the original set and join them back together, but, you know... that does not really seem to be justified here...).  Hope this better explains why we show the recommendations, the user makes the selection, and we simply apply the selection.
    One improvement I could think of is to add the reasons for the recommendation in the annotation of the Select Attributes operator, at least for all attributes where the user followed the recommendation.  But that again would not explain the cases where the user does not or where the user has a different reason for (de-)selecting...
    To be honest, I personally think it would be best to keep this as is and if this is important, you can always annotate yourself.  With the upcoming deployment offering of RM, there will also more ways of adding annotations to models which could be used for that...
    Hope this helps,
    Ingo
    RapidMiner Wisdom 2020
    February 11th and 12th 2020 in Boston, MA, USA

  • varunm1varunm1 Moderator, Member Posts: 918   Unicorn
    @IngoRM if possible can you help me in understanding this?
    I am trying to understand how both of them (ID & Stability) are helpful in Automodel when selecting attributes (we can set a condition based on one of these) or is there any other use for this?
    Regards,
    Varun
    Rapidminer Wisdom 2020 (User Track): Call for proposals 

    https://www.varunmandalapu.com/
  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,270   Unicorn
    Interesting discussion about the black box topic, but let's not lose track of the original suggestion here, which still has merit in my view.

    Regardless of the selection method used (and I can see the arguments for leaving it the way it is in Automodel), the current Automodel process calculates a value for each attribute for 5 quantities: correlation, id-ness, stability, missing, and text-ness (that's a new one!).  

    It would be nice to have an operator which generated these same values inside any process and provided the results as a dataset.  You could then use that operator to create filtering/weighting/selection rules of your own choosing based on whatever threshold values you wanted.  Currently you can do that for things like missing value percentage or correlation (because there are operators that can be used to calculate those) but not for the others (as far as I know).  So there is still a gap in the capabilities of Automodel vs non-automodel processes.

    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
    SGolbertvarunm1
  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,682  RM Founder
    @Telcontar120 ; That I fully agree with!  We have it on our list anyway, I just wanted to manage expectations that this likely won't change the processes generated by AM, that's all ;)

    I am trying to understand how both of them (ID & Stability) are helpful in Automodel when selecting attributes (we can set a condition based on one of these) or is there any other use for this?
    Well, in general this kind of thing (constant columns or ID-like columns) are something I pay some attention to, not just when building models but when I work with data in general (e.g. for creating visualizations).  But for modeling, it just makes ton of sense to exclude them since it will make your models faster and likely better.  Hence the recommendations in the Select Inputs step.  A bit more detail: ID-like attributes are typically not helpful because you cannot really generalize from these columns, i.e. there is nothing to learn from if all (categorical) values are different.  And they can be really problematic for entropy-based learners like decision trees.  Stable columns typically do not hurt much (a little bit for distance-based learners but...), they are just not necessary and slow things down.
    Hope this helps,
    Ingo
    RapidMiner Wisdom 2020
    February 11th and 12th 2020 in Boston, MA, USA

    varunm1
  • varunm1varunm1 Moderator, Member Posts: 918   Unicorn
    edited May 2
    Sorry, If my question is not clear. Doesn't 0 percent stability mean 100 percent ID? so what I am thinking is are two of these measures necessary or having stability measure is enough for the model and we can decide based on that?
    Regards,
    Varun
    Rapidminer Wisdom 2020 (User Track): Call for proposals 

    https://www.varunmandalapu.com/
  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,682  RM Founder
    Ah, sorry, now I get you. 
    Doesn't 0 percent stability mean 100 percent ID?
    Yes, but you can also have let's say 50% stability and still 100% ID-ness (e.g. with a data set of two rows with two different values).  So since not always "stability = 100% - ID-ness" is true we simply show both values and base the recommendation on each of those individually...
    Hope that helps,
    Ingo
    RapidMiner Wisdom 2020
    February 11th and 12th 2020 in Boston, MA, USA

    varunm1
  • varunm1varunm1 Moderator, Member Posts: 918   Unicorn
    Thanks, @IngoRM now I get it :smile:
    Regards,
    Varun
    Rapidminer Wisdom 2020 (User Track): Call for proposals 

    https://www.varunmandalapu.com/
    sgenzer
  • DILLONDILLON Member Posts: 1 Learner I
    see Extract Statistics in Operator Toolbox. It gives every one of you the estimations of the conventional Stats see. You can join this with Weight by Correlation. By then you starting at now have two.
    varunm1sgenzermschmitz
  • michaelglovenmichaelgloven RapidMiner Certified Analyst, Member Posts: 42  Guru
    great thread - what's the basis of the new quality measure "text-ness"? thanks!
    Tghadially
Sign In or Register to comment.