ALL FEATURE REQUESTS HERE ARE MONITORED BY OUR PRODUCT TEAM.

VOTING MATTERS!

IDEAS WITH HIGH NUMBERS OF VOTES (USUALLY ≥ 10) ARE PRIORITIZED IN OUR ROADMAP.

NOTE: IF YOU WISH TO SUGGEST A NEW FEATURE, PLEASE POST A NEW QUESTION AND TAG AS "FEATURE REQUEST". THANK YOU.

FeatureSet converter

christos_karraschristos_karras Member Posts: 50 Guru
edited March 2020 in Product Ideas
I generated some features using the Automatic Feature Engineering Operator. Now I would like to manipulate the FeatureSet as an example set, but I can't find any converter between FeatureSetIOObject and ExampleSet (including in the Converters Extension). Would it be possible to create operators for the following (if they don't already exist):
- FeatureSet to ExampleSet
- ExampleSet to FeatureSet

And until this is available, would I be able to implement something myself using the scripting operator? 

I would need this for various reasons, for example:
- Remove some generated features because I know they don't make sense and were probably selected just by coincidence based on the provided data. Example: exp(exp(exp([SourceFeature])))
- Combine multiple feature sets generated using different methods
- Distinguish "raw" features from generated features, to exclude them from some specific operators, without assigning a special role to the generated features (because I still want them to be considered by most operators including models)
2
2 votes

Open for Voting · Last Updated

IC-1808

Comments

  • yyhuangyyhuang Administrator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 364 RM Data Scientist
    Hi @christos_karras,

    Thanks for sharing the feedback!! We will create feature requests for internal prod/dev team.

    Have you tried "Apply Feature Set" operator on the "raw" data with the FeatureSetIOObject to do feature selection/generation? Then you can pick and remove the generated feature with "Select Attribute" (invert selection). 

    If you convert the feature set object to a data-set, how would you use the data-set afterwords?



    Thanks again for your inputs!

    YY

  • christos_karraschristos_karras Member Posts: 50 Guru
    yyhuang, thanks for creating the feature request.

    Yes I'm using Apply Feature Set on the raw data to generate the same features on new data. However, I generated features based on different models (for example Linear Regression, Decision Tree) and want to test them all in the whole data preparation pipeline as inputs to the different machine learning models I'm testing (Boosted Trees, Random Forest, Generalized Linear Model). So I'm applying multiple feature sets in a loop by using Apply Feature Set at each iteration. 

    But in fact I would like to build a single feature set that contains the best features found from the different methods. I would like to have the ability to analyze each generated feature to see if it makes sense, then build a feature set that contains only the ones that make sense according to the analysis, which may come from different generated feature sets. I may also want to modify some of the generated features. For example, if I have a generated feature = A/B, but from domain knowledge I know that B should be replaced by an average of B and C (because both B and C have the same impact on the results), then I would want to replace A/B by A/(0.5*(B+C)).

    I would do this kind of manipulations by storing the ExampleSet in the RapidMiner repository, using the data editor, saving back the ExampleSet and then converting it back to a FeatureSet.

    The easiest workaround, which we'll probably end doing in the short term, is to use Generate Attributes to reimplement the same expressions as those found in the feature sets. However this is not as convenient as having a FeatureSet object that can be reused at different stages in the process. For example, another thing I might want to do is to use the Feature Set on something similar to the "Work on Subset" operator, which would allow working either only on the features that are in the feature set, or only on the features that are *not* in the feature set. With the "manual" approach using Generate Attributes, I would have to re-enter the list of generated features at each "Work on Subset" operator.

    Thansks
Sign In or Register to comment.