[SOLVED] Data To JSON operator issue with array greater than 10 items

mrmikevmrmikev Member Posts: 13 Contributor II
edited November 2018 in Help
The result of our process gets saved as an array of arrays in MongoDB.  When the data is run through Data to JSON operator, here's what we get:
{ "inputs" : [ { "matrix" : [ { "amount" : 0 },
           { "amount" : 10 },
           { "amount" : 20 },
           { "amount" : 30 },
           { "amount" : 40 },
           { "amount" : 50 },
           { "amount" : 60 },
           { "amount" : 70 },
           { "amount" : 80 },
           { "amount" : 90 }
         ],
       "matrix[10]" : { "amount" : 100 },
       "matrix[11]" : { "amount" : 110 },
       "matrix[12]" : { "amount" : 120 }
     },
     { "matrix" : [ { "amount" : 0 },
           { "amount" : 10 },
           { "amount" : 20 },
           { "amount" : 30 },
           { "amount" : 40 },
           { "amount" : 50 },
           { "amount" : 60 },
           { "amount" : 70 },
           { "amount" : 80 },
           { "amount" : 90 }
         ],
       "matrix[10]" : { "amount" : 100 },
       "matrix[11]" : { "amount" : 110 },
       "matrix[12]" : { "amount" : 120 }
     }
   ] }
Here's what we expect:
{ "inputs" : [ { "matrix" : [ { "amount" : 0 },
           { "amount" : 10 },
           { "amount" : 20 },
           { "amount" : 30 },
           { "amount" : 40 },
           { "amount" : 50 },
           { "amount" : 60 },
           { "amount" : 70 },
           { "amount" : 80 },
           { "amount" : 90 },
           { "amount" : 100 },
           { "amount" : 110 },
           { "amount" : 120 }
         ] },
     { "matrix" : [ { "amount" : 0 },
           { "amount" : 10 },
           { "amount" : 20 },
           { "amount" : 30 },
           { "amount" : 40 },
           { "amount" : 50 },
           { "amount" : 60 },
           { "amount" : 70 },
           { "amount" : 80 },
           { "amount" : 90 },
           { "amount" : 100 },
           { "amount" : 110 },
           { "amount" : 120 }
         ] }
   ] }
In fact, if we start with the expected outcome data in a Create Document operator, run it through JSON to Data (all looks good so far!), then directly through Data to JSON, we still get the malformed results.  I've attached a sample process that demonstrates as much:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="6.3.000">
 <context>
   <input/>
   <output/>
   <macros/>
 </context>
 <operator activated="true" class="process" compatibility="6.3.000" expanded="true" name="Process">
   <process expanded="true">
     <operator activated="true" class="text:create_document" compatibility="6.1.000" expanded="true" height="60" name="Create Document" width="90" x="112" y="120">
       <parameter key="text" value="{&quot;inputs&quot;:[{&quot;matrix&quot;:[{&quot;amount&quot;:0},{&quot;amount&quot;:10},{&quot;amount&quot;:20},{&quot;amount&quot;:30},{&quot;amount&quot;:40},{&quot;amount&quot;:50},{&quot;amount&quot;:60},{&quot;amount&quot;:70},{&quot;amount&quot;:80},{&quot;amount&quot;:90},{&quot;amount&quot;:100},{&quot;amount&quot;:110},{&quot;amount&quot;:120}]},{&quot;matrix&quot;:[{&quot;amount&quot;:0},{&quot;amount&quot;:10},{&quot;amount&quot;:20},{&quot;amount&quot;:30},{&quot;amount&quot;:40},{&quot;amount&quot;:50},{&quot;amount&quot;:60},{&quot;amount&quot;:70},{&quot;amount&quot;:80},{&quot;amount&quot;:90},{&quot;amount&quot;:100},{&quot;amount&quot;:110},{&quot;amount&quot;:120}]}]}"/>
     </operator>
     <operator activated="true" class="text:json_to_data" compatibility="6.1.000" expanded="true" height="76" name="JSON To Data" width="90" x="246" y="120"/>
     <operator activated="true" class="text:data_to_json" compatibility="6.1.000" expanded="true" height="76" name="Data To JSON" width="90" x="380" y="120"/>
     <connect from_op="Create Document" from_port="output" to_op="JSON To Data" to_port="documents 1"/>
     <connect from_op="JSON To Data" from_port="example set" to_op="Data To JSON" to_port="example set 1"/>
     <connect from_op="Data To JSON" from_port="documents" to_port="result 1"/>
     <portSpacing port="source_input 1" spacing="0"/>
     <portSpacing port="sink_result 1" spacing="0"/>
     <portSpacing port="sink_result 2" spacing="0"/>
   </process>
 </operator>
</process>
It appears second level array items with an array index greater than one digit do not get interpreted appropriately.

Thank you in advance for your help!

Thank you for getting this resolved promptly with the 6.4.1 Text Mining Extension release.

Answers

  • MichaelKnopfMichaelKnopf Administrator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 31 RM Data Scientist
    Thank you for reporting this issue. Your description and example process have been very helpful to reproduce this problem. It is definitely a bug in our implementation.

    Unfortunately, I see no way to work around it with the current release of the the Text Processing extension.

    We will try to fix this as soon as possible. I'll keep you updated.
  • MichaelKnopfMichaelKnopf Administrator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 31 RM Data Scientist
    It appears second level array items with an array index greater than one digit do not get interpreted appropriately.
    Spot on. Caused by two left over brackets in a regular expression.

    We hope to release an update of the extension by the end of this week or the beginning of next week.
  • MichaelKnopfMichaelKnopf Administrator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 31 RM Data Scientist
    And it's out!
  • mrmikevmrmikev Member Posts: 13 Contributor II
    Great!  I'll download it, run it through the paces, then mark this as solved.

    Thank you for the prompt turn-around on this! :)
Sign In or Register to comment.