🦉 🎤   RapidMiner Wisdom 2020 - CALL FOR SPEAKERS DEADLINE IS NOVEMBER 15   🦉 🎤

CLICK HERE TO GO TO ENTRY FORM

JSON file rotation

Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,267   Unicorn
edited December 2018 in Product Feedback - Resolved

Raw JSON files often contain data for mulitple examples in a repeated array format.  However, the current "JSON to Data" operator ignores that and simply imports all fields into a single row---in effect, ignoring the array structure and pretending that each JSON file contains only a single record. 

 

It is possible, with a lot of extra post-processing effort, to turn that into a typical dataset, with separate rows for each example and the same attribute set for all examples, using a combination of Pivot, Transpose, Generate Attributes, Split, etc..  However, this transformation should really be an automatic part of the initial import process, or at least an option.  

 

JSON files are becoming more and more popular as the returned format for API calls and web services, and it is a shame that RapidMiner handles them so poorly in its current implementation.  Enhancing the Read JSON operator would go a long way to making it more functional for working with that type of semi-structured data.

Brian T.
Lindon Ventures 
Data Science Consulting from Certified RapidMiner Experts
BalazsBaranysgenzerkaymanThomas_OttSGolberttftemmerfuentealbaTelcontar120Andy2
9
9 votes

Declined · Last Updated

very good solution available with OWC Web Automation extension

Comments

  • sgenzersgenzer 12Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,581  Community Manager

    YES YES YES YES. Thank you @Telcontar120 You're speaking my language!

    ----------------------
    Don't forget to submit your great ideas for Wisdom 2020! Deadline is November 15.

    Wisdom 2020 – Call for Speakers Form 
  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761   Unicorn

    Yes, I vote for this. I've been forced to manipulate JSON outside of RapidMiner before loading it in and it's not an 'elegant' solution. 

  • tftemmetftemme Administrator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, RMResearcher, Member Posts: 118  RM Research

     

    For the time being, you may want to have a look into the 'Split Document into Collection' operator from the Operator Toolbox Extension. If you have a specific character (e.g. end-of-line character as \n) to separate the JSON string of your different examples, you can use the operator to split your single input document into a collection of documents. This you can feed into the JSON to Data operator to convert into an ExampleSet with more than one example.

    I have to say, this will not solve the issue with an array of JSON objects in one input document out of the hand, but maybe the operator is useful in other cases.

  • SGolbertSGolbert RapidMiner Certified Analyst, Member Posts: 341   Unicorn

    I agree. I think that having a wizard like in Read XML would be the best.

  • rfuentealbarfuentealba Moderator, RapidMiner Certified Analyst, Member, University Professor Posts: 426   Unicorn

    Hey people,

     

    I am designing an extension (but still not coding it) to create this kind of complex structure, taking some stuff from @mschmitz, the suggestions by @SGolbert on this same thread and some experiences from a Ruby gem I built years ago. I think I'm going to need some help because I don't know how to expose this properly through RapidMiner Server or if it's feasible.

     

  • sgenzersgenzer 12Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,581  Community Manager

    hi @rfuentealba that's awesome! I hope you know about our developer resources. :) 

    ----------------------
    Don't forget to submit your great ideas for Wisdom 2020! Deadline is November 15.

    Wisdom 2020 – Call for Speakers Form 
  • rfuentealbarfuentealba Moderator, RapidMiner Certified Analyst, Member, University Professor Posts: 426   Unicorn

    Hi @sgenzer, yes I did. I have some questions but these are outside the scope of this idea. I'll send you a PM you once I finish what I'm doing ;)

  • SGolbertSGolbert RapidMiner Certified Analyst, Member Posts: 341   Unicorn

    I have seen that the developer tutorial on GitHub doesn't correspond to the one on the documentation (the one on GitHub is about game data processing).

     

    Additionally, my text editor (Atom) has trouble to find the import statements with the java-importer plugin, but that's what I get for not using the right tools (Idea or Eclipse).

  • sgenzersgenzer 12Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,581  Community Manager
    ----------------------
    Don't forget to submit your great ideas for Wisdom 2020! Deadline is November 15.

    Wisdom 2020 – Call for Speakers Form 
Sign In or Register to comment.