Application Guidelines

GeoffbGeoffb Member Posts: 2 Contributor I
edited June 2019 in Help
I think it would be useful to have more guidelines to help users decide which models/clustering methods etc might be applicable for their particular data and also the pre processing methods required.  Could be in a wizard type questionnaire and/or depicted in a decision tree? For instance the data might have lots of nulls but I am not sure how to easily filter our which approaches can still be used?

For instance some key questions might be:

Is your data all numerical, all categorical or combo?
If Categorical data would you like suggestions on how to convert categorical to numerical?
Does your data have null values?
If has nulls do you want to remove, pre-process or only use algorithms that can handle nulls?
Are you wanting to understand your data or predict outcomes?
Do you have training data for supervised methods and modelling?
Do you have lots of information dimension which you need to reduce?
If so then do you want recommendations on how to test your results?
Would you like a suggestion on the approach based on these question??
A typical application could also be depicted such as for Churn prediction?


  • Options
    MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,517 RM Data Scientist

    there are some plugins for RM which try to do such a thing and in the end wisdom of the crowds is going into this directions

    Personally i am not sure if this solves the problem completly. The quesions you raised are good and might help to get a solid but not perfect result. All those questions lead to good first guesses for algorithms. The key to good results is the correct preprocessing and usage of domain knowledge. This can not be done with such a questionaire but only by a analyst.

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
Sign In or Register to comment.