"[SOLVED]the existence of \ in the csv file causes error in read-csv operator"

huaiyanggongzihuaiyanggongzi Member Posts: 39 Contributor II
edited June 2019 in Help
In my csv file, the content in the cell ( first row and first column) end with \ . When I import this csv file using read csv operator, Rapidminer tends to append the value in the cell on the right side of this cell to the original content in this cell.

Here is the original csv file
column1                column2
T \2 02/ FEB \                F
ABC                              F
Here is the output csv file
column1                   column2
T 2 02/ FEB ,F
ABC                                   F
We can see that the entry in the first row of column 2 is empty. I think this scenario should be caused by the ending character \. But I am not clear why and how to solve it assuming I want to keep \ in the content.

Here is my process script
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.3.008">
 <context>
   <input/>
   <output/>
   <macros/>
 </context>
 <operator activated="true" class="process" compatibility="5.3.008" expanded="true" name="Process">
   <description>This getting started process shows the first step of learning and storing a model.
After a model is learned, you can load (Retrieve operator) the model and apply it to a test data set (see 2. Getting Started: Retrieve and Apply Model). The process is NOT concerned with evaluation of the model.

This process will not immediately run in RapidMiner because you have to adjust the repository path in the Retrieve operator.

Tags: Rapidminer, model, learn, learning, store, first step</description>
   <process expanded="true">
     <operator activated="true" class="read_csv" compatibility="5.3.008" expanded="true" height="60" name="Read CSV" width="90" x="45" y="75">
       <parameter key="csv_file" value="C:\Users\Desktop\test\test9.csv"/>
       <parameter key="column_separators" value=","/>
       <parameter key="first_row_as_names" value="false"/>
       <list key="annotations">
         <parameter key="0" value="Name"/>
       </list>
       <parameter key="encoding" value="GBK"/>
       <list key="data_set_meta_data_information">
         <parameter key="0" value="TXT.true.binominal.attribute"/>
         <parameter key="1" value="LA.true.binominal.attribute"/>
       </list>
     </operator>
     <operator activated="true" class="write_csv" compatibility="5.3.008" expanded="true" height="76" name="Write CSV" width="90" x="514" y="165">
       <parameter key="csv_file" value="C:\Users\Desktop\test\test9-copy.csv"/>
       <parameter key="column_separator" value=","/>
     </operator>
     <connect from_op="Read CSV" from_port="output" to_op="Write CSV" to_port="input"/>
     <connect from_op="Write CSV" from_port="through" to_port="result 1"/>
     <portSpacing port="source_input 1" spacing="0"/>
     <portSpacing port="sink_result 1" spacing="0"/>
     <portSpacing port="sink_result 2" spacing="0"/>
   </process>
 </operator>
</process>
Tagged:

Answers

  • Marco_BoeckMarco_Boeck Administrator, Moderator, Employee, Member, University Professor Posts: 1,993 RM Engineering
    Hi,

    the Read CSV operator has the parameter "escape character" set to '\'. What that means is that the character following every escape character is taken literally - e.g. a column separator would not be treated as a column separator but as a value for the cell instead. Try setting the escape character to something that does not occur in your CSV file.

    Regards,
    Marco
Sign In or Register to comment.