RapidMiner

RapidMiner

[SOLVED] Write TSV

Contributor II

[SOLVED] Write TSV

Hi,
I need to write an "Example set" as a tsv file.
I'm trying to use the "Write CSV" operator.
What kind of value can I insert in "column separator" field?
The value "\t" seems to work only in "Read CSV" .....
Thanks in advance for support
9 REPLIES
Regular Contributor

Re: Write TSV

Are you looking to insert a tab separator? I'm not sure about that, but if you don't want to use any standard like comma or semi-colon, usually the pipe (|) is a commonly used option.
Contributor II

Re: Write TSV

Thanks for suggestion, but unfortunately it doesn't work.
Here's the case
Bye


<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.3.005">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="5.3.005" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="retrieve" compatibility="5.3.005" expanded="true" height="60" name="Retrieve Iris" width="90" x="112" y="120">
        <parameter key="repository_entry" value="//Samples/data/Iris"/>
      </operator>
      <operator activated="true" class="write_csv" compatibility="5.3.005" expanded="true" height="76" name="Write CSV" width="90" x="514" y="120">
        <parameter key="csv_file" value="C:\a.txt"/>
        <parameter key="column_separator" value="|"/>
        <parameter key="quote_nominal_values" value="false"/>
      </operator>
      <connect from_op="Retrieve Iris" from_port="output" to_op="Write CSV" to_port="input"/>
      <connect from_op="Write CSV" from_port="through" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>

Regular Contributor

Re: Write TSV

Well, for me it works, below are the first four lines that your process generates:

a1|a2|a3|a4|id|label
5.1|3.5|1.4|0.2|id_1|Iris-setosa
4.9|3.0|1.4|0.2|id_2|Iris-setosa
4.7|3.2|1.3|0.2|id_3|Iris-setosa

Which error message are you getting?
Contributor II

Re: Write TSV

No, msg error
The point is that I need that the columns are separated by Tab character to complete my data process .... This is a first part of an ETL process and the output file will be processed by another program .... that needs tab separators
Bye
Regular Contributor

Re: Write TSV

I see, you definitely need the tab as a separator. I found the following to work for me, but I'm not sure if this is really a solution, or just a work around. Yet, it works for me.

I manually created a tab separated file, then I read it with the "read CSV" operator, and chose "tab" as separator (again, while READING). Then, I went to the settings of this operator, and just copied whatever was in the field "column separator). It looked empty, but I just double clicked in it, then copied. You may also just double click between the two brackets below, and copy (without the brackets)

( )

Then, paste this as column separator into your "write CSV operator".

I hope that works, it did the job for me. I opened the generated file in LibreOffice and indicated "tab" as delimiter, and it opened as expected.
Regular Contributor

Re: Write TSV

I just realized that whatever I pasted between the brackets got lost when posting the message, sorry about that.

But just follow the procedure as I described, and copy the field separator value from the "read CSV" to the "write CSV operator", this should do the job.

I know it doesn't look very smooth, but I hope it gets you a step forward...

Or, here's the code for the operator:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.3.008">
 <context>
   <input/>
   <output/>
   <macros/>
 </context>
 <operator activated="true" class="process" compatibility="5.3.008" expanded="true" name="Process">
   <process expanded="true">
     <operator activated="true" class="write_csv" compatibility="5.3.008" expanded="true" height="76" name="Write CSV" width="90" x="380" y="75">
       <parameter key="csv_file" value="/home/macphotobiker/Desktop/tsv.tsv"/>
       <parameter key="column_separator" value="&#9;"/>
     </operator>
     <connect from_op="Write CSV" from_port="through" to_port="result 1"/>
     <portSpacing port="source_input 1" spacing="0"/>
     <portSpacing port="sink_result 1" spacing="0"/>
     <portSpacing port="sink_result 2" spacing="0"/>
   </process>
 </operator>
</process>
Super Contributor

Re: Write TSV

The problem is that the Java framework does not allow to enter a tab character into the input field, because when pressing the tab key the cursor moves to the next field.

To get a tab character into the parameter, you have to copy it from somewhere. You can e.g. press tab in a normal text editor and copy the resulting (seemingly empty) character into RapidMiner.

Best regards,
Marius
Contributor II

Re: Write TSV

Thank you very much, MacPhotoBiker!!!
Your solution works perfectly for my purpose !!!!!
Thanks also to Marius
Bye
Regular Contributor

Re: [SOLVED] Write TSV

Perfect 4of4, glad I could help.

Good luck with your project.