Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
Impute Missing Values Error
I am trying to fill in the missing values in the dataset using the Impute Missing Values operator.
As I integrate the operator into the process and when connecting to the dataset I get the information about it as shown in the image below.
When I enter the operator the information I get without having integrated knn yet is that there are no missing values as you can see in the image below.
If I put the k-NN I took the following error
And if I connect the exa with the mod with a straight line it produces a logical error "Wrong Connection".
Any idea?
Thank you
Tagged:
0
Best Answer
-
CKönig Employee, Member Posts: 70 RM Team MemberHi @dasoxori,
the discrepancy between the whole dataset and the dataset on the input port of the inner subprocess can be partly explained by the default setting of "learn on complete cases".
The operator "Impute Missing Values" essentially builds a prediction model to predict the missing values. Per default, the option "learn on complete cases" makes sure no examples with any missing value get fed into the training subprocess, since some machine learning algorithms cannot handle missing values. So the effect of having no missing values on the inside is totally correct, as long as that option is activated. If you deactivate it, the missing values should be shown again. Still, the metadata is probably not completely correct, since the total number of examples is not accurate in the case the examples with missing attributes are excluded.
The follow-up error "Example set is empty" is most likely a result of the same parameter setting: your dataset seems to include lots and lots of missing values. Is it possible, that there is no "complete" example (row) in your dataset? That way, all of the examples get discarded and there is no data left for training the impute model.
Kind regards,
Christian1
Answers
What you're viewing here is the metadata rather than the data itself - there are instances where it might not keep up with the data. It seems like potentially there's a problem with your dataset - could you right-click on your k-NN and add a Breakpoint Before to view the data and report back?
Best,
Roland