Options

[solved] Issue with Extract Information operator JsonPath query type

mrmikevmrmikev Member Posts: 13 Contributor II
edited November 2018 in Help
I'm currently trying a basic example of the Extract Information operator using the JsonPath Query type.  No matter how I structure the jsonpath query expression(s), I get either the entire document or an error:
  • $.store.book yields the entire document, not just the books.
  • $.store.book[0] yields: Process Failed. net.minidev.json.JSONObject cannot be cast to net.minidev.json.JSONArray.
Both of the above jsonpath queries and sample json document work well in jsonpath expression tester.

Here's the process:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="6.3.000">
 <context>
   <input/>
   <output/>
   <macros/>
 </context>
 <operator activated="true" class="process" compatibility="6.3.000" expanded="true" name="Process">
   <process expanded="true">
     <operator activated="true" class="text:create_document" compatibility="6.1.000" expanded="true" height="60" name="Create Document" width="90" x="45" y="30">
       <parameter key="text" value="{ &quot;store&quot;: {&#10;    &quot;book&quot;: [ &#10;      { &quot;category&quot;: &quot;reference&quot;,&#10;        &quot;author&quot;: &quot;Nigel Rees&quot;,&#10;        &quot;title&quot;: &quot;Sayings of the Century&quot;,&#10;        &quot;price&quot;: 8.95&#10;      },&#10;      { &quot;category&quot;: &quot;fiction&quot;,&#10;        &quot;author&quot;: &quot;Evelyn Waugh&quot;,&#10;        &quot;title&quot;: &quot;Sword of Honour&quot;,&#10;        &quot;price&quot;: 12.99&#10;      },&#10;      { &quot;category&quot;: &quot;fiction&quot;,&#10;        &quot;author&quot;: &quot;Herman Melville&quot;,&#10;        &quot;title&quot;: &quot;Moby ****&quot;,&#10;        &quot;isbn&quot;: &quot;0-553-21311-3&quot;,&#10;        &quot;price&quot;: 8.99&#10;      },&#10;      { &quot;category&quot;: &quot;fiction&quot;,&#10;        &quot;author&quot;: &quot;J. R. R. Tolkien&quot;,&#10;        &quot;title&quot;: &quot;The Lord of the Rings&quot;,&#10;        &quot;isbn&quot;: &quot;0-395-19395-8&quot;,&#10;        &quot;price&quot;: 22.99&#10;      }&#10;    ],&#10;    &quot;bicycle&quot;: {&#10;      &quot;color&quot;: &quot;red&quot;,&#10;      &quot;price&quot;: 19.95&#10;    }&#10;  }&#10;}"/>
     </operator>
     <operator activated="true" class="text:extract_information" compatibility="6.1.000" expanded="true" height="60" name="Extract Information" width="90" x="447" y="30">
       <parameter key="query_type" value="JsonPath"/>
       <list key="string_machting_queries"/>
       <list key="regular_expression_queries"/>
       <list key="regular_region_queries"/>
       <list key="xpath_queries"/>
       <list key="namespaces"/>
       <list key="index_queries"/>
       <list key="jsonpath_queries">
         <parameter key="booksOnly" value="$.store.book"/>
       </list>
     </operator>
     <connect from_op="Create Document" from_port="output" to_op="Extract Information" to_port="document"/>
     <connect from_op="Extract Information" from_port="document" to_port="result 1"/>
     <portSpacing port="source_input 1" spacing="0"/>
     <portSpacing port="sink_result 1" spacing="0"/>
     <portSpacing port="sink_result 2" spacing="0"/>
   </process>
 </operator>
</process>
Any direction on how the jsonpath query expressions should look for the RapidMiner is appreciated.

RapidMiner Studio 6.3.0000 (rev: 251598) - Professional Plus
Windows 8.1

Answers

  • Options
    homburghomburg Moderator, Employee, Member Posts: 114 RM Data Scientist
    Hi mrmikev,

    the problem here is that you need a "Documents to Data" in order to make use of the meta data "Extract Information" generates. Even in this case only the first item of a list is shown. You may use "Cut Document" to get a collection of those items and "Combine Document" to merge them to one line. Here is a process that shows how to do that:
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="6.3.001">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="6.3.001" expanded="true" name="Process">
        <process expanded="true">
          <operator activated="true" class="text:create_document" compatibility="6.1.000" expanded="true" height="60" name="Create Document" width="90" x="45" y="30">
            <parameter key="text" value="{ &quot;store&quot;: {&#10;    &quot;book&quot;: [ &#10;      { &quot;category&quot;: &quot;reference&quot;,&#10;        &quot;author&quot;: &quot;Nigel Rees&quot;,&#10;        &quot;title&quot;: &quot;Sayings of the Century&quot;,&#10;        &quot;price&quot;: 8.95&#10;      },&#10;      { &quot;category&quot;: &quot;fiction&quot;,&#10;        &quot;author&quot;: &quot;Evelyn Waugh&quot;,&#10;        &quot;title&quot;: &quot;Sword of Honour&quot;,&#10;        &quot;price&quot;: 12.99&#10;      },&#10;      { &quot;category&quot;: &quot;fiction&quot;,&#10;        &quot;author&quot;: &quot;Herman Melville&quot;,&#10;        &quot;title&quot;: &quot;Moby ****&quot;,&#10;        &quot;isbn&quot;: &quot;0-553-21311-3&quot;,&#10;        &quot;price&quot;: 8.99&#10;      },&#10;      { &quot;category&quot;: &quot;fiction&quot;,&#10;        &quot;author&quot;: &quot;J. R. R. Tolkien&quot;,&#10;        &quot;title&quot;: &quot;The Lord of the Rings&quot;,&#10;        &quot;isbn&quot;: &quot;0-395-19395-8&quot;,&#10;        &quot;price&quot;: 22.99&#10;      }&#10;    ],&#10;    &quot;bicycle&quot;: {&#10;      &quot;color&quot;: &quot;red&quot;,&#10;      &quot;price&quot;: 19.95&#10;    }&#10;  }&#10;}"/>
          </operator>
          <operator activated="true" class="multiply" compatibility="6.3.001" expanded="true" height="94" name="Multiply" width="90" x="179" y="30"/>
          <operator activated="true" class="text:extract_information" compatibility="6.1.000" expanded="true" height="60" name="Extract Information" width="90" x="313" y="30">
            <parameter key="query_type" value="JsonPath"/>
            <list key="string_machting_queries"/>
            <list key="regular_expression_queries"/>
            <list key="regular_region_queries"/>
            <list key="xpath_queries"/>
            <list key="namespaces"/>
            <list key="index_queries"/>
            <list key="jsonpath_queries">
              <parameter key="booksOnly" value="$.store.book.title"/>
            </list>
          </operator>
          <operator activated="true" breakpoints="after" class="text:documents_to_data" compatibility="6.1.000" expanded="true" height="76" name="Documents to Data" width="90" x="447" y="30">
            <parameter key="text_attribute" value="text"/>
          </operator>
          <operator activated="true" breakpoints="after" class="text:cut_document" compatibility="6.1.000" expanded="true" height="60" name="Cut Document" width="90" x="313" y="120">
            <parameter key="query_type" value="JsonPath"/>
            <list key="string_machting_queries"/>
            <list key="regular_expression_queries"/>
            <list key="regular_region_queries"/>
            <list key="xpath_queries"/>
            <list key="namespaces"/>
            <list key="index_queries"/>
            <list key="jsonpath_queries">
              <parameter key="booksOnly" value="$.store.book.title"/>
            </list>
            <process expanded="true">
              <connect from_port="segment" to_port="document 1"/>
              <portSpacing port="source_segment" spacing="0"/>
              <portSpacing port="sink_document 1" spacing="0"/>
              <portSpacing port="sink_document 2" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="text:combine_documents" compatibility="6.1.000" expanded="true" height="76" name="Combine Documents" width="90" x="447" y="120"/>
          <connect from_op="Create Document" from_port="output" to_op="Multiply" to_port="input"/>
          <connect from_op="Multiply" from_port="output 1" to_op="Extract Information" to_port="document"/>
          <connect from_op="Multiply" from_port="output 2" to_op="Cut Document" to_port="document"/>
          <connect from_op="Extract Information" from_port="document" to_op="Documents to Data" to_port="documents 1"/>
          <connect from_op="Documents to Data" from_port="example set" to_port="result 1"/>
          <connect from_op="Cut Document" from_port="documents" to_op="Combine Documents" to_port="documents 1"/>
          <connect from_op="Combine Documents" from_port="document" to_port="result 2"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
          <portSpacing port="sink_result 3" spacing="0"/>
        </process>
      </operator>
    </process>
    Cheers,
    Helge
Sign In or Register to comment.