Select Attributes Not Poulating Attribute List

dragoljubdragoljub Member Posts: 241 Contributor II
edited November 2018 in Help
Hi Guys,

I am running into a strange problem with the select attributes operator. In many cases with a complex process select attributes simply does not see the meta data passed from other operators (read csv in a loop etc...). I have tried validating the process enabling syncing of meta data for each step in the process etc. Nothing seems to work.

To get around this I am forced to save the example set in the repository right before my select attributes operator is executed, then use a second process to select attributes from the data saved in the repository, then copy the selected attributes operator back into the original process to select the subset of attributes I want.

How can we solve this issue?  ???

Thanks,
-Gagi

Answers

  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    Hi,

    right now there are two reasons why the meta data propagation does not work as expected:
    • there is a bug in the MD transformation or we simply forgot to transer meta data at places where this would have been indeed possible
    • it is simply not possible to create / transform / propagate the meta data at design time
    "Read CSV" in a loop is probably a candidate for the second reason: The loop probably defines the file name as macro and this is again used in the "Read CSV" operator, right? In that case, we simply cannot create the meta data since we do not know the file - only the macro name.

    If I got it right, there is probably no real good option to overcome this issue here, sorry.

    Cheers,
    Ingo
  • dragoljubdragoljub Member Posts: 241 Contributor II
    Thanks for the response Ingo,

    In-fact the problem is a bit stranger. If I simply use a Read CSV operator on a reasonably large file  (6K examples 2K attributes) and then connect directly to select attributes the problem still persists. I understand that with a macro it will not work, but shouldn't it work with read csv and a direct path to the file. I even ran the process to ensure the file was loaded and meta data was present.

    In any case the problems seem to be more related with files containing many Attributes not many examples.  :-\

    -Gagi
  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    Hi again,

    ah, ok. There was a couple of versions (somewhere between 5.0.000 and 5.1.000) where we completely disabled the reading of meta data from files since it simply took too long and blocked process design until the reading was finished. Even worse: the first versions re-read the meta-data for each check again. Maybe you still use one of those versions?

    We optimized this and now the meta data should be cached and the reading should no longer block the design. But still, before reading the meta data was not finished, it cannot be used by the following operators. This is indicated by the bar in the lower right corner of RapidMiner. Reading the meta data can easily take its time, especially in case of many attributes (many examples are not as bad since we only read a sample for determining the meta data as far as I remember). So maybe the meta data will be available after waiting a bit longer before you connect the next operator (look at the bar in the bottom).

    So what can you do else? Well, I recommend to import (a sample of) the file into your repository and completely design the process before you replace the "Retrieve" operator for the repository entry again by Read CSV (in case of regular updates or for any other reason, I recommend using the repository alone otherwise). The main advantage is that reading the meta data from the repository is much faster and reliable than guessing it from files. As an alternative, you could store the data in a database - but if any of those suggestions is actually applicable of course depends on your requirements. Maybe they help anyway.

    Cheers,
    Ingo
  • dragoljubdragoljub Member Posts: 241 Contributor II
    Hey Ingo,

    I only run into these problems when importing lots of data (one time into the repository). I generally have many CSV files that I have to load, parse, filter, etc and finally append. After all that Is done working from the repository is fine.

    Which gives me an idea! Maybe there should be an import multiple files operator. That can take a set of CSVs find common attributes append them etc. Just a thought.  ;D

    -Gagi
Sign In or Register to comment.