Options

POS-Tagger (STTS)

D_ZedD_Zed Member Posts: 2 Contributor I
edited November 2018 in Help
Hello at all,

I am a new RapidMiner-User and want text analysis chat-dialougs in german.

For these problem I want use a POS-Tagger with the Stuttgart-Tübingen-Tagse (STTS).

Can somebody explain me how I can use this in RapidMiner.

Thank You

D_Zed

Answers

  • Options
    ReneRene Member Posts: 24 Contributor II
    Hi and welcome to RM,

    here's a short example which tries to extract nouns and proper names from a given document:

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.2.003">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.2.003" expanded="true" name="Process">
        <process expanded="true" height="224" width="279">
          <operator activated="true" class="text:create_document" compatibility="5.2.001" expanded="true" height="60" name="Create Document" width="90" x="45" y="30">
            <parameter key="text" value="ach dem Foulspiel von Roman Weidenfeller im Strafraum der Dortmunder schnappte sich Arjen Robben sofort den Ball.&#10;&#10;Ohne zu überlegen marschierte er schnellen Schrittes auf den Elfmeterpunkt zu und legte sich den Ball zurecht.&#10;&#10;Was dann folgte, ist bekannt. (DIASHOW: Der 30. Spieltag).&#10;&#10;Robben wurde zur tragischen Figur des Spitzenspiels zwischen Dortmund und Bayern, das die Borussen mit 1:0 für sich entscheiden konnten (Bericht).&#10;&#10;Sein verschossener Elfmeter war aber nur der Höhepunkt der 14 albtraumhaften Minuten des Niederländers, der in der Kabine &quot;total niedergeschlagen&quot; war, wie Bayern-Manager Christian Nerlinger bestätigte. "/>
          </operator>
          <operator activated="true" class="text:process_documents" compatibility="5.2.001" expanded="true" height="94" name="Process Documents" width="90" x="45" y="120">
            <process expanded="true" height="355" width="334">
              <operator activated="true" class="text:tokenize" compatibility="5.2.001" expanded="true" height="60" name="Tokenize" width="90" x="45" y="30"/>
              <operator activated="true" class="text:filter_tokens_by_pos" compatibility="5.2.001" expanded="true" height="60" name="Filter Tokens (by POS Tags)" width="90" x="188" y="30">
                <parameter key="language" value="German"/>
                <parameter key="expression" value="NN.*|NE.*"/>
              </operator>
              <connect from_port="document" to_op="Tokenize" to_port="document"/>
              <connect from_op="Tokenize" from_port="document" to_op="Filter Tokens (by POS Tags)" to_port="document"/>
              <connect from_op="Filter Tokens (by POS Tags)" from_port="document" to_port="document 1"/>
              <portSpacing port="source_document" spacing="0"/>
              <portSpacing port="sink_document 1" spacing="0"/>
              <portSpacing port="sink_document 2" spacing="0"/>
            </process>
          </operator>
          <connect from_op="Create Document" from_port="output" to_op="Process Documents" to_port="documents 1"/>
          <connect from_op="Process Documents" from_port="word list" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
    To read in your chat logs you would probably use the "Process Documents from Files" operator (instead of "Create Doc + Process Docs") and nest the tokenizer + tagger in there.

    Greets
    from Berlin,
    René
  • Options
    RWingerterRWingerter Member Posts: 38 Contributor II
    Thanks for the example!

    Roland
Sign In or Register to comment.