Options

CSV with uncommon header can't be processed correctly

mugicagonzalez_mugicagonzalez_ Member Posts: 14 Contributor II
edited December 2018 in Help
Hi all,

I am using the "Read CSV" operator to read a CSV-file with multiple lines. The problem is that the first few lines are all technical information that are not in a valid CSV format, so I define them as Comment. But then, only column one of the last row with the values is read. 

Is this a common error? I think it might be caused because there are more lines, with different amount of columns, but because I define these as Comment I don't understand why it doesn't work.

This is my operator for "TEST_Jette.csv"
<?xml version="1.0" encoding="UTF-8"?><process version="8.1.003">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="8.1.003" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="read_csv" compatibility="8.1.003" expanded="true" height="68" name="Read CSV" width="90" x="179" y="34">
        <parameter key="csv_file" value="/Users/pello/Downloads/TEST_Jette.csv"/>
        <parameter key="skip_comments" value="true"/>
        <parameter key="parse_numbers" value="false"/>
        <parameter key="decimal_character" value=","/>
        <parameter key="first_row_as_names" value="false"/>
        <list key="annotations">
          <parameter key="0" value="Comment"/>
          <parameter key="1" value="Comment"/>
          <parameter key="2" value="Comment"/>
          <parameter key="3" value="Comment"/>
          <parameter key="4" value="Comment"/>
          <parameter key="5" value="Comment"/>
          <parameter key="6" value="Comment"/>
          <parameter key="7" value="Comment"/>
          <parameter key="8" value="Comment"/>
          <parameter key="9" value="Comment"/>
          <parameter key="10" value="Comment"/>
          <parameter key="11" value="Comment"/>
          <parameter key="12" value="Comment"/>
          <parameter key="13" value="Comment"/>
          <parameter key="14" value="Comment"/>
          <parameter key="15" value="Comment"/>
          <parameter key="16" value="Comment"/>
          <parameter key="17" value="Name"/>
        </list>
        <parameter key="encoding" value="UTF-8"/>
        <parameter key="read_all_values_as_polynominal" value="true"/>
        <list key="data_set_meta_data_information">
          <parameter key="0" value="timestamp.true.polynominal.attribute"/>
        </list>
      </operator>
      <connect from_op="Read CSV" from_port="output" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>


Thanks in advance
Pello

Tagged:

Best Answer

  • Options
    mugicagonzalez_mugicagonzalez_ Member Posts: 14 Contributor II
    Solution Accepted
    SOLVED! Thanks to to jczgalla (can't post link to thread)! 

    <?xml version="1.0" encoding="UTF-8"?><process version="8.1.003">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="8.1.003" expanded="true" name="Process">
        <process expanded="true">
          <operator activated="true" class="open_file" compatibility="8.1.003" expanded="true" height="68" name="Open File" width="90" x="45" y="34">
            <parameter key="filename" value="/Users/pello/Downloads/TEST_Jette.csv"/>
          </operator>
          <operator activated="true" class="text:read_document" compatibility="8.1.000" expanded="true" height="68" name="Read Document" width="90" x="179" y="34">
            <parameter key="extract_text_only" value="false"/>
          </operator>
          <operator activated="true" class="text:cut_document" compatibility="8.1.000" expanded="true" height="68" name="Cut Document" width="90" x="313" y="34">
            <parameter key="query_type" value="Regular Expression"/>
            <list key="string_machting_queries"/>
            <list key="regular_expression_queries">
              <parameter key="text" value="((?:[^&quot;]+?|&quot;(.|\n)*?&quot;|)*?)\n"/>
            </list>
            <list key="regular_region_queries"/>
            <list key="xpath_queries"/>
            <list key="namespaces"/>
            <list key="index_queries"/>
            <list key="jsonpath_queries"/>
            <process expanded="true">
              <operator activated="true" class="text:remove_document_parts" compatibility="8.1.000" expanded="true" height="68" name="Remove Document Parts" width="90" x="45" y="34">
                <parameter key="deletion_regex" value="&quot;"/>
              </operator>
              <connect from_port="segment" to_op="Remove Document Parts" to_port="document"/>
              <connect from_op="Remove Document Parts" from_port="document" to_port="document 1"/>
              <portSpacing port="source_segment" spacing="0"/>
              <portSpacing port="sink_document 1" spacing="0"/>
              <portSpacing port="sink_document 2" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="text:documents_to_data" compatibility="8.1.000" expanded="true" height="82" name="Documents to Data" width="90" x="447" y="34">
            <parameter key="text_attribute" value="text"/>
          </operator>
          <operator activated="true" class="select_attributes" compatibility="8.1.003" expanded="true" height="82" name="Select Attributes" width="90" x="581" y="34">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="text"/>
          </operator>
          <operator activated="true" class="filter_example_range" compatibility="8.1.003" expanded="true" height="82" name="Filter Example Range" width="90" x="715" y="34">
            <parameter key="first_example" value="18"/>
            <parameter key="last_example" value="19"/>
          </operator>
          <operator activated="true" class="split" compatibility="8.1.003" expanded="true" height="82" name="Split" width="90" x="849" y="34">
            <parameter key="split_pattern" value=";"/>
          </operator>
          <operator activated="true" class="rename_by_example_values" compatibility="8.1.003" expanded="true" height="82" name="Rename by Example Values" width="90" x="983" y="34"/>
          <connect from_op="Open File" from_port="file" to_op="Read Document" to_port="file"/>
          <connect from_op="Read Document" from_port="output" to_op="Cut Document" to_port="document"/>
          <connect from_op="Cut Document" from_port="documents" to_op="Documents to Data" to_port="documents 1"/>
          <connect from_op="Documents to Data" from_port="example set" to_op="Select Attributes" to_port="example set input"/>
          <connect from_op="Select Attributes" from_port="example set output" to_op="Filter Example Range" to_port="example set input"/>
          <connect from_op="Filter Example Range" from_port="example set output" to_op="Split" to_port="example set input"/>
          <connect from_op="Split" from_port="example set output" to_op="Rename by Example Values" to_port="example set input"/>
          <connect from_op="Rename by Example Values" from_port="example set output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
    


Answers

  • Options
    sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager
    yes sorry @mugicagonzalez_ we don't allow "Newbies" to use hyperlinks any more due to high numbers of clickbait spammers.

    [Helpful hint from community manager - if you just "like" a few posts or mark something as solution or practically anything else, you will gain points and move way beyond Newbie quickly!!]

    Scott

  • Options
    hughesfleming68hughesfleming68 Member Posts: 323 Unicorn
    edited November 2018
    Already answered.
Sign In or Register to comment.