[solved] Issue with Extract Information operator JsonPath query type

mrmikev · April 2015

I'm currently trying a basic example of the Extract Information operator using the JsonPath Query type. No matter how I structure the jsonpath query expression(s), I get either the entire document or an error:

$.store.book yields the entire document, not just the books.
$.store.book[0] yields: Process Failed. net.minidev.json.JSONObject cannot be cast to net.minidev.json.JSONArray.

Both of the above jsonpath queries and sample json document work well in jsonpath expression tester.

Here's the process:


<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="6.3.000">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="6.3.000" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="text:create_document" compatibility="6.1.000" expanded="true" height="60" name="Create Document" width="90" x="45" y="30">
        <parameter key="text" value="{ &quot;store&quot;: {&#10;    &quot;book&quot;: [ &#10;      { &quot;category&quot;: &quot;reference&quot;,&#10;        &quot;author&quot;: &quot;Nigel Rees&quot;,&#10;        &quot;title&quot;: &quot;Sayings of the Century&quot;,&#10;        &quot;price&quot;: 8.95&#10;      },&#10;      { &quot;category&quot;: &quot;fiction&quot;,&#10;        &quot;author&quot;: &quot;Evelyn Waugh&quot;,&#10;        &quot;title&quot;: &quot;Sword of Honour&quot;,&#10;        &quot;price&quot;: 12.99&#10;      },&#10;      { &quot;category&quot;: &quot;fiction&quot;,&#10;        &quot;author&quot;: &quot;Herman Melville&quot;,&#10;        &quot;title&quot;: &quot;Moby ****&quot;,&#10;        &quot;isbn&quot;: &quot;0-553-21311-3&quot;,&#10;        &quot;price&quot;: 8.99&#10;      },&#10;      { &quot;category&quot;: &quot;fiction&quot;,&#10;        &quot;author&quot;: &quot;J. R. R. Tolkien&quot;,&#10;        &quot;title&quot;: &quot;The Lord of the Rings&quot;,&#10;        &quot;isbn&quot;: &quot;0-395-19395-8&quot;,&#10;        &quot;price&quot;: 22.99&#10;      }&#10;    ],&#10;    &quot;bicycle&quot;: {&#10;      &quot;color&quot;: &quot;red&quot;,&#10;      &quot;price&quot;: 19.95&#10;    }&#10;  }&#10;}"/>
      </operator>
      <operator activated="true" class="text:extract_information" compatibility="6.1.000" expanded="true" height="60" name="Extract Information" width="90" x="447" y="30">
        <parameter key="query_type" value="JsonPath"/>
        <list key="string_machting_queries"/>
        <list key="regular_expression_queries"/>
        <list key="regular_region_queries"/>
        <list key="xpath_queries"/>
        <list key="namespaces"/>
        <list key="index_queries"/>
        <list key="jsonpath_queries">
          <parameter key="booksOnly" value="$.store.book"/>
        </list>
      </operator>
      <connect from_op="Create Document" from_port="output" to_op="Extract Information" to_port="document"/>
      <connect from_op="Extract Information" from_port="document" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>

Any direction on how the jsonpath query expressions should look for the RapidMiner is appreciated.

RapidMiner Studio 6.3.0000 (rev: 251598) - Professional Plus
Windows 8.1

homburg · April 2015

Hi mrmikev,

the problem here is that you need a "Documents to Data" in order to make use of the meta data "Extract Information" generates. Even in this case only the first item of a list is shown. You may use "Cut Document" to get a collection of those items and "Combine Document" to merge them to one line. Here is a process that shows how to do that:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="6.3.001">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="6.3.001" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="text:create_document" compatibility="6.1.000" expanded="true" height="60" name="Create Document" width="90" x="45" y="30">
        <parameter key="text" value="{ &quot;store&quot;: {&#10;    &quot;book&quot;: [ &#10;      { &quot;category&quot;: &quot;reference&quot;,&#10;        &quot;author&quot;: &quot;Nigel Rees&quot;,&#10;        &quot;title&quot;: &quot;Sayings of the Century&quot;,&#10;        &quot;price&quot;: 8.95&#10;      },&#10;      { &quot;category&quot;: &quot;fiction&quot;,&#10;        &quot;author&quot;: &quot;Evelyn Waugh&quot;,&#10;        &quot;title&quot;: &quot;Sword of Honour&quot;,&#10;        &quot;price&quot;: 12.99&#10;      },&#10;      { &quot;category&quot;: &quot;fiction&quot;,&#10;        &quot;author&quot;: &quot;Herman Melville&quot;,&#10;        &quot;title&quot;: &quot;Moby ****&quot;,&#10;        &quot;isbn&quot;: &quot;0-553-21311-3&quot;,&#10;        &quot;price&quot;: 8.99&#10;      },&#10;      { &quot;category&quot;: &quot;fiction&quot;,&#10;        &quot;author&quot;: &quot;J. R. R. Tolkien&quot;,&#10;        &quot;title&quot;: &quot;The Lord of the Rings&quot;,&#10;        &quot;isbn&quot;: &quot;0-395-19395-8&quot;,&#10;        &quot;price&quot;: 22.99&#10;      }&#10;    ],&#10;    &quot;bicycle&quot;: {&#10;      &quot;color&quot;: &quot;red&quot;,&#10;      &quot;price&quot;: 19.95&#10;    }&#10;  }&#10;}"/>
      </operator>
      <operator activated="true" class="multiply" compatibility="6.3.001" expanded="true" height="94" name="Multiply" width="90" x="179" y="30"/>
      <operator activated="true" class="text:extract_information" compatibility="6.1.000" expanded="true" height="60" name="Extract Information" width="90" x="313" y="30">
        <parameter key="query_type" value="JsonPath"/>
        <list key="string_machting_queries"/>
        <list key="regular_expression_queries"/>
        <list key="regular_region_queries"/>
        <list key="xpath_queries"/>
        <list key="namespaces"/>
        <list key="index_queries"/>
        <list key="jsonpath_queries">
          <parameter key="booksOnly" value="$.store.book.title"/>
        </list>
      </operator>
      <operator activated="true" breakpoints="after" class="text:documents_to_data" compatibility="6.1.000" expanded="true" height="76" name="Documents to Data" width="90" x="447" y="30">
        <parameter key="text_attribute" value="text"/>
      </operator>
      <operator activated="true" breakpoints="after" class="text:cut_document" compatibility="6.1.000" expanded="true" height="60" name="Cut Document" width="90" x="313" y="120">
        <parameter key="query_type" value="JsonPath"/>
        <list key="string_machting_queries"/>
        <list key="regular_expression_queries"/>
        <list key="regular_region_queries"/>
        <list key="xpath_queries"/>
        <list key="namespaces"/>
        <list key="index_queries"/>
        <list key="jsonpath_queries">
          <parameter key="booksOnly" value="$.store.book.title"/>
        </list>
        <process expanded="true">
          <connect from_port="segment" to_port="document 1"/>
          <portSpacing port="source_segment" spacing="0"/>
          <portSpacing port="sink_document 1" spacing="0"/>
          <portSpacing port="sink_document 2" spacing="0"/>
        </process>
      </operator>
      <operator activated="true" class="text:combine_documents" compatibility="6.1.000" expanded="true" height="76" name="Combine Documents" width="90" x="447" y="120"/>
      <connect from_op="Create Document" from_port="output" to_op="Multiply" to_port="input"/>
      <connect from_op="Multiply" from_port="output 1" to_op="Extract Information" to_port="document"/>
      <connect from_op="Multiply" from_port="output 2" to_op="Cut Document" to_port="document"/>
      <connect from_op="Extract Information" from_port="document" to_op="Documents to Data" to_port="documents 1"/>
      <connect from_op="Documents to Data" from_port="example set" to_port="result 1"/>
      <connect from_op="Cut Document" from_port="documents" to_op="Combine Documents" to_port="documents 1"/>
      <connect from_op="Combine Documents" from_port="document" to_port="result 2"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
      <portSpacing port="sink_result 3" spacing="0"/>
    </process>
  </operator>
</process>

Cheers,
Helge

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

[solved] Issue with Extract Information operator JsonPath query type

Answers