Options

IE Plugin - How use CRF and Tagging

B_B_ Member Posts: 70 Maven
edited November 2018 in Help
I've got the IE plugin installed and read Felix Jungermann 's 2009 and 2010 papers which provide some ideas about the plugin. 

Unfortunately there are aren't any detailed examples about how to use the operators. Web searches only turn up Felix' papers.

I've set up this basic process based on the diagrams in his papers but get errors from the CRF operator.  Do you have some examples of NER and POS tagging processes?  Thanks

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.1.001">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="5.1.001" expanded="true" name="Process">
    <process expanded="true" height="505" width="748">
      <operator activated="true" class="text:create_document" compatibility="5.1.000" expanded="true" height="60" name="Create Document" width="90" x="45" y="30">
        <parameter key="text" value="Sentence one.  This is the second sentence.  Finaly, the last sentence!"/>
        <parameter key="add label" value="true"/>
        <parameter key="label_type" value="text"/>
        <parameter key="label_value" value="textvar"/>
      </operator>
      <operator activated="true" class="text:documents_to_data" compatibility="5.1.000" expanded="true" height="76" name="Documents to Data" width="90" x="45" y="120">
        <parameter key="text_attribute" value="textvar"/>
        <parameter key="label_attribute" value="textlabel"/>
        <parameter key="add_meta_information" value="false"/>
      </operator>
      <operator activated="true" class="informationExtraction:sentence_tokenizer" compatibility="1.0.000" expanded="true" height="76" name="SentenceTokenizer" width="90" x="45" y="210">
        <parameter key="attributeName" value="false"/>
        <parameter key="optionalAttribute" value="textvar"/>
        <parameter key="new token-name" value="sentvar"/>
      </operator>
      <operator activated="true" class="informationExtraction:word_tokenizer" compatibility="1.0.000" expanded="true" height="76" name="WordTokenizer" width="90" x="179" y="165">
        <parameter key="attributeName" value="false"/>
        <parameter key="optionalAttribute" value="sentvar"/>
        <parameter key="new token-name" value="blah"/>
      </operator>
      <operator activated="true" class="multiply" compatibility="5.1.001" expanded="true" height="112" name="Multiply" width="90" x="313" y="165"/>
      <operator activated="true" class="informationExtraction:crf_operator" compatibility="1.0.000" expanded="true" height="76" name="ConditionalRandomField" width="90" x="447" y="75">
        <parameter key="text-Attribute name" value="sentvar"/>
        <process expanded="true" height="505" width="774">
          <connect from_port="example set source" to_port="example set sink"/>
          <portSpacing port="source_example set source" spacing="0"/>
          <portSpacing port="sink_example set sink" spacing="0"/>
        </process>
      </operator>
      <operator activated="true" class="apply_model" compatibility="5.1.001" expanded="true" height="76" name="Apply Model" width="90" x="648" y="255">
        <list key="application_parameters"/>
      </operator>
      <connect from_op="Create Document" from_port="output" to_op="Documents to Data" to_port="documents 1"/>
      <connect from_op="Documents to Data" from_port="example set" to_op="SentenceTokenizer" to_port="example set input"/>
      <connect from_op="SentenceTokenizer" from_port="example set output" to_op="WordTokenizer" to_port="example set input"/>
      <connect from_op="WordTokenizer" from_port="example set output" to_op="Multiply" to_port="input"/>
      <connect from_op="Multiply" from_port="output 1" to_op="ConditionalRandomField" to_port="example set input"/>
      <connect from_op="Multiply" from_port="output 2" to_op="Apply Model" to_port="unlabelled data"/>
      <connect from_op="Multiply" from_port="output 3" to_port="result 3"/>
      <connect from_op="ConditionalRandomField" from_port="example set output" to_port="result 1"/>
      <connect from_op="ConditionalRandomField" from_port="model output" to_op="Apply Model" to_port="model"/>
      <connect from_op="Apply Model" from_port="labelled data" to_port="result 2"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
      <portSpacing port="sink_result 3" spacing="0"/>
      <portSpacing port="sink_result 4" spacing="0"/>
    </process>
  </operator>
</process>



Answers

  • Options
    landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,
    the best would be to contact him himself. He might help you with.

    Greetings,
      Sebastian
Sign In or Register to comment.