Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
Prediction/ Overview in Dataset
I want to predict the missing dates for "End Date Real" on the bottom.
As well, how do I get along with these missing data in the middle?
As well I want to find the relevance of the attributes (cavity and weight) regarding the lead time of an assignment during correction (C1, C2...) as well as the whole reliability on the planned dates (
Start Date | End Date Contract | End Date Real). |
I hope this aint to special. What do I have to do? Do you have any hints how I could start?
Greets Newbie 01
1
Best Answer
-
BalazsBarany Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert Posts: 955 UnicornHi @Newbie_01,
first you have to decide if you have a classification or a regression problem.
Classification is a yes/no decision.
Regression is numeric prediction. Predicting the missing date looks like this, but you might want to recode your dates to numbers, e. g. the number of days after 2019-01-01 or some other start date. Use Generate Attributes and the Date functions there for this.
There are modeling algorithms that can cope with missing data, but it can be better to make an informed decision on how to handle and fill in missings. For example, you might know that C2 is always at least one week after C1, and so on. You would then fill in the missing data with the appropriate value. You could also try the Impute Missing Values operator to do this automatically.
Some learning algorithms include the attribute importance in their output. There are also operators called Weight by ..., that rank the importance of the attributes according to their algorithm. But you will often get different or even contradictory answers from different algorithms. If you have access to AutoModel, there is also a variable importance ranking there.
I hope this helps you to start with your analysis.
Regards,
Balázs7