JSON to data and de-pivot with missing attributes and values

eduard_wagnereduard_wagner Member Posts: 3 Contributor I
edited December 2018 in Help

Hi everyone,

 

after all i could not find out how to get this problem fixed: i have a JSON file that is inconsistent with its attribute and values, some are missing. Like this:

[{"date":1465632900,"high":0.00199281,"low":0.00199281,"open":0.00199281,"close":0.00199281,"volume":0.00078269,"quoteVolume":0.39276167,"weightedAverage":0.00199281},{"date":1465633200, "low":0.00199281,"open":0.00199281,"close":0.00199281,"volume":0.00034535,"quoteVolume":0.17329899,"weightedAverage":0.00199281}]

As you see for the second entry there is the seconed attribute "high" and its value missing. Its not possible to De-Pivot the JSON Data without creating the missing attributes+values before, this error shows up: "series must have the same lenght: no conversion is performed. I tried the "Declare missing value", but of cause this needs the attribute before correcting any value.

 

Unfortunataly the dataset is pretty large. Is there an way/operator to get the process fixed?

Thanks a lot!

 

Best Answer

  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager
    Solution Accepted

    Hello @eduard_wagner - yes this gets rather icky with JSON arrays.  :)  This is a workaround for you:

     

    <?xml version="1.0" encoding="UTF-8"?><process version="7.6.001">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="7.6.001" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="text:create_document" compatibility="7.5.000" expanded="true" height="68" name="Create Document" width="90" x="45" y="34">
    <parameter key="text" value="[{&quot;date&quot;:1465632900,&quot;high&quot;:0.00199281,&quot;low&quot;:0.00199281,&quot;open&quot;:0.00199281,&quot;close&quot;:0.00199281,&quot;volume&quot;:0.00078269,&quot;quoteVolume&quot;:0.39276167,&quot;weightedAverage&quot;:0.00199281},{&quot;date&quot;:1465633200, &quot;low&quot;:0.00199281,&quot;open&quot;:0.00199281,&quot;close&quot;:0.00199281,&quot;volume&quot;:0.00034535,&quot;quoteVolume&quot;:0.17329899,&quot;weightedAverage&quot;:0.00199281}]"/>
    </operator>
    <operator activated="true" class="text:json_to_data" compatibility="7.5.000" expanded="true" height="82" name="JSON To Data" width="90" x="179" y="34"/>
    <operator activated="true" class="transpose" compatibility="7.6.001" expanded="true" height="82" name="Transpose" width="90" x="313" y="34"/>
    <operator activated="true" class="split" compatibility="7.6.001" expanded="true" height="82" name="Split" width="90" x="447" y="34">
    <parameter key="attribute_filter_type" value="single"/>
    <parameter key="attribute" value="id"/>
    <parameter key="include_special_attributes" value="true"/>
    <parameter key="split_pattern" value="\][.]"/>
    </operator>
    <operator activated="true" class="replace" compatibility="7.6.001" expanded="true" height="82" name="Replace" width="90" x="581" y="34">
    <parameter key="replace_what" value="\["/>
    </operator>
    <operator activated="true" class="pivot" compatibility="7.6.001" expanded="true" height="82" name="Pivot" width="90" x="715" y="34">
    <parameter key="group_attribute" value="id_1"/>
    <parameter key="index_attribute" value="id_2"/>
    <parameter key="consider_weights" value="false"/>
    </operator>
    <operator activated="true" class="rename_by_replacing" compatibility="7.6.001" expanded="true" height="82" name="Rename by Replacing" width="90" x="835" y="34">
    <parameter key="replace_what" value=".*1[_]"/>
    </operator>
    <connect from_op="Create Document" from_port="output" to_op="JSON To Data" to_port="documents 1"/>
    <connect from_op="JSON To Data" from_port="example set" to_op="Transpose" to_port="example set input"/>
    <connect from_op="Transpose" from_port="example set output" to_op="Split" to_port="example set input"/>
    <connect from_op="Split" from_port="example set output" to_op="Replace" to_port="example set input"/>
    <connect from_op="Replace" from_port="example set output" to_op="Pivot" to_port="example set input"/>
    <connect from_op="Pivot" from_port="example set output" to_op="Rename by Replacing" to_port="example set input"/>
    <connect from_op="Rename by Replacing" from_port="example set output" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>

    Scott

Answers

  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager

    Hello @eduard_wagner - welcome to the community.  So there are a few ways to parse out JSON files: 1) use the "JSON to Data" operator; 2) use the "JSON to XML" operator to convert to XML, and then use "Read XML", 3) if you're getting the JSON from a webservice, you can use the built-in JSON parser in the Enrich Data via Webservice operator.  These operators are in the Text Processing / Web Mining extensions (accessible via the Marketplace).

     

    If you need more help, perhaps post your RapidMiner process in this thread as XML using the </> tool.  It makes it much easier for us to help.

     

    Scott

  • eduard_wagnereduard_wagner Member Posts: 3 Contributor I

    Thanks for your help @sgenzer. I got the process as you discribed, here is the xml:

     

    Create Document  --> JSON to Data --> De-Pivot --> res

    (for this example i use a reduced JSON dataset)

     

     

    <?xml version="1.0" encoding="UTF-8"?><process version="7.5.001">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="7.5.001" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="text:create_document" compatibility="7.4.001" expanded="true" height="68" name="Create Document" width="90" x="112" y="34">
    <parameter key="text" value="[{&quot;date&quot;:1465632900,&quot;high&quot;:0.00199281,&quot;low&quot;:0.00199281,&quot;open&quot;:0.00199281,&quot;close&quot;:0.00199281,&quot;volume&quot;:0.00078269,&quot;quoteVolume&quot;:0.39276167,&quot;weightedAverage&quot;:0.00199281},{&quot;date&quot;:1465633200, &quot;low&quot;:0.00199281,&quot;open&quot;:0.00199281,&quot;close&quot;:0.00199281,&quot;volume&quot;:0.00034535,&quot;quoteVolume&quot;:0.17329899,&quot;weightedAverage&quot;:0.00199281}]"/>
    </operator>
    <operator activated="true" class="text:json_to_data" compatibility="7.4.001" expanded="true" height="82" name="JSON To Data" width="90" x="246" y="34"/>
    <operator activated="true" class="de_pivot" compatibility="7.5.001" expanded="true" height="82" name="De-Pivot" width="90" x="648" y="34">
    <list key="attribute_name">
    <parameter key="close" value="\[\d+\]\.close"/>
    <parameter key="high" value="\[\d+\]\.high"/>
    <parameter key="low" value="\[\d+\]\.low"/>
    <parameter key="open" value="\[\d+\]\.open"/>
    <parameter key="date" value="\[\d+\]\.date"/>
    <parameter key="volume" value="\[\d+\]\.volume"/>
    <parameter key="quoteVolume" value="\[\d+\]\.quoteVolume"/>
    <parameter key="weightedAverage" value="\[\d+\]\.weightedAverage"/>
    </list>
    <parameter key="index_attribute" value="id"/>
    </operator>
    <connect from_op="Create Document" from_port="output" to_op="JSON To Data" to_port="documents 1"/>
    <connect from_op="JSON To Data" from_port="example set" to_op="De-Pivot" to_port="example set input"/>
    <connect from_op="De-Pivot" from_port="example set output" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>

     

     

    The process error occurs in the De-pivot operator, when it tries to create the attributes and fill in the values. Problem is, that the attribute/JSON key "high" and its value is missing for the second object of the JSON dataset. 

    Is there a way that the depivot operator skips missing attributes/values or inserts predefined values? I tried to catch the error with the "Replace / Declare missung value" operators, but they cant be applied, since the attributes are not specified before depivoting. Also the "Generate Attribute" operator didnt work.

     

    Hope you see the problem in the xml :) 

     

    Thanks a lot!

    Eduard

  • eduard_wagnereduard_wagner Member Posts: 3 Contributor I

    It works, awesome, thanks @sgenzer! I see, the pivot operator groups according to the id attribute and when pivoting the missing values are shown as question marks. 

  • danhleedanhlee Member Posts: 2 Contributor I

    Where do you place this XML code?

  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager
  • danhleedanhlee Member Posts: 2 Contributor I

    Hey @sgenzer

     

    Thanks so much for the info! I'll check it out now.

Sign In or Register to comment.