Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
Import Wizzard (Value Type Determination) for every way of importing data
spitfire_ch
Member Posts: 38 Maven
Hi,
when importing data using the wizard, Step 3, which allows you to determine the type and role of each attribute is extremely useful. Unfortunately, when not using the File/Import Data menu but rather an import operator, this wizard is not available - or at least I haven't found a way to open it. For example when using the Read Excel operator, the only thing I can influence is first row as names. I know there is the option to change the role of individual attributes using the Exchange Roles operator, but it would be more straight forward if you could open the import wizard by - let's say - double clicking the import operator.
In addition, you can display the attributes of an already existing data repository already at design time, which is very handy. However, I couldn't find a way to change the attribute roles or types of attributes in an existing repository. Again, it would be very nice if you could not only display the meta data, but also edit it (without having to reimport the data from scratch).
Thank you for considering this - and of course for all the effort you invest in this absolutely great tool! Data mining has never been as convenient, before!
Hanspeter
when importing data using the wizard, Step 3, which allows you to determine the type and role of each attribute is extremely useful. Unfortunately, when not using the File/Import Data menu but rather an import operator, this wizard is not available - or at least I haven't found a way to open it. For example when using the Read Excel operator, the only thing I can influence is first row as names. I know there is the option to change the role of individual attributes using the Exchange Roles operator, but it would be more straight forward if you could open the import wizard by - let's say - double clicking the import operator.
In addition, you can display the attributes of an already existing data repository already at design time, which is very handy. However, I couldn't find a way to change the attribute roles or types of attributes in an existing repository. Again, it would be very nice if you could not only display the meta data, but also edit it (without having to reimport the data from scratch).
Thank you for considering this - and of course for all the effort you invest in this absolutely great tool! Data mining has never been as convenient, before!
Hanspeter
0
Answers
I completely agree, it would be indeed really nice if you could also define the meta data directly at the operator place and also after the import into the repository. The first point (at the operator) will be addressed with the next version as far as I know. At least I believe to remember having seen a button for starting the wizard as one of the parameters of the "Read Excel" operator recently in the developer version.
The second idea (chaning the meta data in the repository) might be more tricky and I am not sure if it can be solved in general. There are two problems: first, there might already be processes which depend on the meta data as it was. If you change it, those processes might no longer run. Of course this is something which could be checked for. And the user has to ensure that the data will not be changed / removed / moved etc. anyway so this is something we could argue which is within the responsibility of the user anyway. The second problem might be bigger: many changes will not be possible without re-importing the data. For example, if your original data had a date and a time, you only specified the value type "date" during import, the time information might be lost. Role changes, however, should be possible in any case and hence we will certainly discuss which parts of the meta data could be changed in a future version.
Thanks for those nice ideas. All the best,
Ingo
that's great news!
I didn't fully consider the problems you mention for the second idea. You are totally right, that would be tricky. Maybe one could create a second repository based on the original one to avoid the first problem. The second problem might be addressed if the repository not only contained the processed (during import) data, but (optionally) also the original data it came from (at least all the attributes with their original datetype). That would provide somewhat more flexibility. But the first idea seems to be more important, anyway. So it might not be worth the effort.
That the first idea is already planned, is excellent!
Thanks a lot and best regards to Germany from Switzerland
yes, creating copies or letting the user do this during the meta data change would be an option. Keeping the original data might be more of a problem and I am also not sure if this would be desired in all cases. Anyway, thanks for your thoughts - we will consider those during our next developer meeting.
Cheers and all the best into the (certainly sunny?) south,
Ingo
Sun? Not seen that for a while now
All the best into the north
Hanspeter