RapidMiner

RapidMiner now offering a 30 day free trial of RapidMiner Studio Large! Learn more

Confused about the Query Expression blocks on the Extract Information Operator

Newbie sgtcrom7
Newbie

Confused about the Query Expression blocks on the Extract Information Operator

I'm trying to extract word counts from a block of text. I have the Create Document Operator (where I have pasted my text) linked to the Extract Information Operator. I have the words I want tot extract (terrorist and civilian) entered into the attribute name blocks, what should I be putting in the query expression blocks? Thanks.

5 REPLIES

Re: Confused about the Query Expression blocks on the Extract Information Operator

Hi @sgtcrom7,

 

Can this process meet your needs? 

 

<?xml version="1.0" encoding="UTF-8"?><process version="8.1.001">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="8.1.001" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="text:create_document" compatibility="8.1.000" expanded="true" height="68" name="Create Document" width="90" x="112" y="85">
        <parameter key="text" value="My taylor is rich. He's a terrorist, but before he was a civilian."/>
      </operator>
      <operator activated="true" class="text:process_documents" compatibility="8.1.000" expanded="true" height="103" name="Process Documents" width="90" x="313" y="85">
        <parameter key="vector_creation" value="Term Occurrences"/>
        <process expanded="true">
          <operator activated="true" class="text:tokenize" compatibility="8.1.000" expanded="true" height="68" name="Tokenize" width="90" x="313" y="34"/>
          <connect from_port="document" to_op="Tokenize" to_port="document"/>
          <connect from_op="Tokenize" from_port="document" to_port="document 1"/>
          <portSpacing port="source_document" spacing="0"/>
          <portSpacing port="sink_document 1" spacing="0"/>
          <portSpacing port="sink_document 2" spacing="0"/>
        </process>
      </operator>
      <operator activated="true" class="select_attributes" compatibility="8.1.001" expanded="true" height="82" name="Select Attributes" width="90" x="514" y="85">
        <parameter key="attribute_filter_type" value="subset"/>
        <parameter key="attributes" value="civilian|terrorist"/>
        <parameter key="regular_expression" value="civilian"/>
      </operator>
      <connect from_op="Create Document" from_port="output" to_op="Process Documents" to_port="documents 1"/>
      <connect from_op="Process Documents" from_port="example set" to_op="Select Attributes" to_port="example set input"/>
      <connect from_op="Process Documents" from_port="word list" to_port="result 2"/>
      <connect from_op="Select Attributes" from_port="example set output" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
      <portSpacing port="sink_result 3" spacing="0"/>
    </process>
  </operator>
</process>

Regards,

 

Lionel

 

 

Newbie sgtcrom7
Newbie

Re: Confused about the Query Expression blocks on the Extract Information Operator

Lionel, 

Thanks for the reply. I'm not sure if this solves my problem or not. I just downloaded the free version of RapidMiner today, and I am only using the drag and drop functions. So I'm actually not sure where I would enter code like that. Sorry if I wasted your time. I really do appreciate you trying to help.

I did eventually end up figuring out how to get word counts out of my documents however, so maybe as I keep playing with the software I can learn enough to ask better questions. What I need to do is actually very simple, I'm just trying to figure out how to make it less labor intensive. Again, thanks.  

Re: Confused about the Query Expression blocks on the Extract Information Operator

@sgtcrom7,

 

A / To use the code I provide, you have to follow these steps (this is the method to share a process between RapidMiner users) : 

1.Activation of the XML panel : 

Date_A_B.png

 

2. Copy and paste the XML code I provided in the XML panel

Date_A_B_2.png

3. Click on the "check button"

Date_A_B_3.png

4. Normally, the process appears in the process window....

 

B/ To learn the basics of RapidMiner, I encourage you to start by following :

 - the tutorials (menu Help)

 - the training videos (menu Help

Tutorials.png

 

I hope it helps,

 

Regards,

 

Lionel

Newbie sgtcrom7
Newbie

Re: Confused about the Query Expression blocks on the Extract Information Operator

The code didn't run, but thanks for explaining that to me. I'm making progress! 

Re: Confused about the Query Expression blocks on the Extract Information Operator

Hi @sgtcrom7,

 

Glad you make progress....but....


"The code didn't run...."

 

Here some hypothesis : 

 - did you fully copy the code I provided and/or

 - did you clear the existing code before copying my code in the XML panel ? here the instructions : 

 

Tutorial_1.png

 

"never give up...." : 

 

Don't hesitate to reply if it does'nt work.

 

Regards,

 

Lionel

 

 

ezCater's RapidMiner Journey