Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
Select Attributes Not Poulating Attribute List
Hi Guys,
I am running into a strange problem with the select attributes operator. In many cases with a complex process select attributes simply does not see the meta data passed from other operators (read csv in a loop etc...). I have tried validating the process enabling syncing of meta data for each step in the process etc. Nothing seems to work.
To get around this I am forced to save the example set in the repository right before my select attributes operator is executed, then use a second process to select attributes from the data saved in the repository, then copy the selected attributes operator back into the original process to select the subset of attributes I want.
How can we solve this issue? ???
Thanks,
-Gagi
I am running into a strange problem with the select attributes operator. In many cases with a complex process select attributes simply does not see the meta data passed from other operators (read csv in a loop etc...). I have tried validating the process enabling syncing of meta data for each step in the process etc. Nothing seems to work.
To get around this I am forced to save the example set in the repository right before my select attributes operator is executed, then use a second process to select attributes from the data saved in the repository, then copy the selected attributes operator back into the original process to select the subset of attributes I want.
How can we solve this issue? ???
Thanks,
-Gagi
0
Answers
right now there are two reasons why the meta data propagation does not work as expected:
- there is a bug in the MD transformation or we simply forgot to transer meta data at places where this would have been indeed possible
- it is simply not possible to create / transform / propagate the meta data at design time
"Read CSV" in a loop is probably a candidate for the second reason: The loop probably defines the file name as macro and this is again used in the "Read CSV" operator, right? In that case, we simply cannot create the meta data since we do not know the file - only the macro name.If I got it right, there is probably no real good option to overcome this issue here, sorry.
Cheers,
Ingo
In-fact the problem is a bit stranger. If I simply use a Read CSV operator on a reasonably large file (6K examples 2K attributes) and then connect directly to select attributes the problem still persists. I understand that with a macro it will not work, but shouldn't it work with read csv and a direct path to the file. I even ran the process to ensure the file was loaded and meta data was present.
In any case the problems seem to be more related with files containing many Attributes not many examples. :-\
-Gagi
ah, ok. There was a couple of versions (somewhere between 5.0.000 and 5.1.000) where we completely disabled the reading of meta data from files since it simply took too long and blocked process design until the reading was finished. Even worse: the first versions re-read the meta-data for each check again. Maybe you still use one of those versions?
We optimized this and now the meta data should be cached and the reading should no longer block the design. But still, before reading the meta data was not finished, it cannot be used by the following operators. This is indicated by the bar in the lower right corner of RapidMiner. Reading the meta data can easily take its time, especially in case of many attributes (many examples are not as bad since we only read a sample for determining the meta data as far as I remember). So maybe the meta data will be available after waiting a bit longer before you connect the next operator (look at the bar in the bottom).
So what can you do else? Well, I recommend to import (a sample of) the file into your repository and completely design the process before you replace the "Retrieve" operator for the repository entry again by Read CSV (in case of regular updates or for any other reason, I recommend using the repository alone otherwise). The main advantage is that reading the meta data from the repository is much faster and reliable than guessing it from files. As an alternative, you could store the data in a database - but if any of those suggestions is actually applicable of course depends on your requirements. Maybe they help anyway.
Cheers,
Ingo
I only run into these problems when importing lots of data (one time into the repository). I generally have many CSV files that I have to load, parse, filter, etc and finally append. After all that Is done working from the repository is fine.
Which gives me an idea! Maybe there should be an import multiple files operator. That can take a set of CSVs find common attributes append them etc. Just a thought. ;D
-Gagi