Import Wizzard (Value Type Determination) for every way of importing data

spitfire_chspitfire_ch Member Posts: 38 Guru
edited June 2019 in Help

when importing data using the wizard, Step 3, which allows you to determine the type and role of each attribute is extremely useful. Unfortunately, when not using the File/Import Data menu but rather an import operator, this wizard is not available - or at least I haven't found a way to open it. For example when using the Read Excel operator, the only thing I can influence is first row as names. I know there is the option to change the role of individual attributes using the Exchange Roles operator, but it would be more straight forward if you could open the import wizard by - let's say - double clicking the import operator.

In addition, you can display the attributes of an already existing data repository already at design time, which is very handy. However, I couldn't find a way to change the attribute roles or types of attributes in an existing repository. Again, it would be very nice if you could not only display the meta data, but also edit it (without having to reimport the data from scratch).

Thank you for considering this - and of course for all the effort you invest in this absolutely great tool! Data mining has never been as convenient, before!



  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    Hello Hanspeter,

    I completely agree, it would be indeed really nice if you could also define the meta data directly at the operator place and also after the import into the repository. The first point (at the operator) will be addressed with the next version as far as I know. At least I believe to remember having seen a button for starting the wizard as one of the parameters of the "Read Excel" operator recently in the developer version.

    The second idea (chaning the meta data in the repository) might be more tricky and I am not sure if it can be solved in general. There are two problems: first, there might already be processes which depend on the meta data as it was. If you change it, those processes might no longer run. Of course this is something which could be checked for. And the user has to ensure that the data will not be changed / removed / moved etc. anyway so this is something we could argue which is within the responsibility of the user anyway. The second problem might be bigger: many changes will not be possible without re-importing the data. For example, if your original data had a date and a time, you only specified the value type "date" during import, the time information might be lost. Role changes, however, should be possible in any case and hence we will certainly discuss which parts of the meta data could be changed in a future version.

    Thanks for those nice ideas. All the best,
  • spitfire_chspitfire_ch Member Posts: 38 Guru
    Hello Ingo,

    that's great news!

    I didn't fully consider the problems you mention for the second idea. You are totally right, that would be tricky. Maybe one could create a second repository based on the original one to avoid the first problem. The second problem might be addressed if the repository not only contained the processed (during import) data, but (optionally) also the original data it came from (at least all the attributes with their original datetype). That would provide somewhat more flexibility. But the first idea seems to be more important, anyway. So it might not be worth the effort.

    That the first idea is already planned, is excellent!

    Thanks a lot and best regards to Germany from Switzerland

  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder

    yes, creating copies or letting the user do this during the meta data change would be an option. Keeping the original data might be more of a problem and I am also not sure if this would be desired in all cases. Anyway, thanks for your thoughts - we will consider those during our next developer meeting.

    Cheers and all the best into the (certainly sunny?) south,
  • spitfire_chspitfire_ch Member Posts: 38 Guru
    No, I certainly wouldn't keep the original data in all cases - it could just be an option, if the user absolutely wants to be able to change the types / roles after creating the repository (e.g. to investigate how different data types influence an algorithm). But I would not set this option to true by default.

    Sun? Not seen that for a while now ;)

    All the best into the north
Sign In or Register to comment.