Options

Export wordlist into Database

guitarslingerguitarslinger Member Posts: 12 Contributor II
edited November 2018 in Help
Hi,

I am trying to export a wordlist into a database table or a csv.
How can I do this?

The standard operators only accept examplesets as inputs.
Can i convert a wordlist into an example set?

Thx in advance,
Martin

Answers

  • Options
    colocolo Member Posts: 236 Maven
    Hi Martin,

    the operator "WordList to Data" should help you with that. ;)

    Greetings,
    Matthias
  • Options
    guitarslingerguitarslinger Member Posts: 12 Contributor II
    Oh, thanks...  :D

    Thank god there are no stupid questions... :)
  • Options
    alejandro_mauroalejandro_mauro Member Posts: 1 Contributor I
    Hi!!

    I am new to RapidMiner and I am trying to do the same, and I have found a problem when exporting to CSV using first the WordList to Data.

    I have words with a "Total Occurrences" superior to 100, and when exporting it to the CSV I only get those under 100

    Example in my wordlist I have
    Word      Total Occurrence
    Rx            327
    Dg            100
    Viene        96

    When exporting to CSV, I don't get "Dg" for example that has 100 ocurrences, I only get from "Viene" to under...

    I don't get why the CSV is using the total occurrences column as %, and not showing data greater than 100.

    Does anyone has an idea on how to solve this?
  • Options
    colocolo Member Posts: 236 Maven
    Hi,

    I don't experience this problem. The following process is working fine for me:
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.0">
     <context>
       <input/>
       <output/>
       <macros/>
     </context>
     <operator activated="true" class="process" compatibility="5.0.8" expanded="true" name="Process">
       <process expanded="true" height="607" width="787">
         <operator activated="true" class="web:get_webpage" compatibility="5.0.3" expanded="true" height="60" name="Get Page" width="90" x="45" y="30">
           <parameter key="url" value="http://www.microsoft.com/en/us/default.aspx"/>
           <parameter key="random_user_agent" value="true"/>
           <list key="query_parameters"/>
         </operator>
         <operator activated="true" class="text:process_documents" compatibility="5.0.6" expanded="true" height="94" name="Process Documents" width="90" x="179" y="30">
           <process expanded="true" height="607" width="787">
             <operator activated="true" class="text:tokenize" compatibility="5.0.6" expanded="true" height="60" name="Tokenize" width="90" x="45" y="30"/>
             <connect from_port="document" to_op="Tokenize" to_port="document"/>
             <connect from_op="Tokenize" from_port="document" to_port="document 1"/>
             <portSpacing port="source_document" spacing="0"/>
             <portSpacing port="sink_document 1" spacing="0"/>
             <portSpacing port="sink_document 2" spacing="0"/>
           </process>
         </operator>
         <operator activated="true" class="text:wordlist_to_data" compatibility="5.0.6" expanded="true" height="76" name="WordList to Data" width="90" x="313" y="75"/>
         <operator activated="true" class="write_csv" compatibility="5.0.8" expanded="true" height="60" name="Write CSV" width="90" x="447" y="75">
           <parameter key="csv_file" value="C:\test.csv"/>
         </operator>
         <connect from_op="Get Page" from_port="output" to_op="Process Documents" to_port="documents 1"/>
         <connect from_op="Process Documents" from_port="example set" to_port="result 1"/>
         <connect from_op="Process Documents" from_port="word list" to_op="WordList to Data" to_port="word list"/>
         <connect from_op="WordList to Data" from_port="example set" to_op="Write CSV" to_port="input"/>
         <connect from_op="Write CSV" from_port="through" to_port="result 2"/>
         <portSpacing port="source_input 1" spacing="0"/>
         <portSpacing port="sink_result 1" spacing="0"/>
         <portSpacing port="sink_result 2" spacing="0"/>
         <portSpacing port="sink_result 3" spacing="0"/>
       </process>
     </operator>
    </process>
    I am using the latest version available through subversion, maybe there are some relevant fixes included which the official version doesn't include yet. Then you could perhaps try to convert the attributes containing the word count to a nominal value ("Numerical to Polynominal" operator) and hope that no conversion to a percentage value takes place.

    Regards,
    Matthias
  • Options
    landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,
    does this only occur in the written csv file or already in the exampleSet? Set a breakpoint to find out what is in the example set.

    Greetings,
      Sebastian
  • Options
    up201708850up201708850 Member Posts: 2 Contributor I
    Add one  "Process Documents from Data"  between "WordList to Data" and  "Write Database"
Sign In or Register to comment.