Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
[SEMI-SOLVED] Reading CSV file of unknown structure into purely nominal/text
tennenrishin
Member Posts: 177 Contributor II
What is the easiest way to read a CSV file that has an unknown set (and number) of attributes (named in the first row), into an exampleset where each value is read simply as a nominal (or text) attribute?
My attempt,
Failing that, what is the easiest way to do it if the number of attributes is known (but not the names)?
My attempt:
My attempt,
<?xml version="1.0" encoding="UTF-8" standalone="no"?>parses numeric-appearing data as numeric attributes.
<process version="5.3.008">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.3.008" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="read_csv" compatibility="5.3.008" expanded="true" height="60" name="Read CSV" width="90" x="112" y="30">
<parameter key="csv_file" value="/blahblahblah/VTX.csv"/>
<parameter key="column_separators" value=","/>
<parameter key="parse_numbers" value="false"/>
<list key="annotations"/>
<list key="data_set_meta_data_information"/>
</operator>
<connect from_op="Read CSV" from_port="output" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
Failing that, what is the easiest way to do it if the number of attributes is known (but not the names)?
My attempt:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>only reads the last attribute and discards the rest.
<process version="5.3.008">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.3.008" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="read_csv" compatibility="5.3.008" expanded="true" height="60" name="Read CSV" width="90" x="112" y="30">
<parameter key="csv_file" value="/blahblahblah/VTX.csv"/>
<parameter key="column_separators" value=","/>
<parameter key="parse_numbers" value="false"/>
<list key="annotations"/>
<list key="data_set_meta_data_information">
<parameter key="0" value=".true.nominal.regular"/>
<parameter key="1" value=".true.nominal.regular"/>
<parameter key="2" value=".true.nominal.regular"/>
<parameter key="3" value=".true.nominal.regular"/>
<parameter key="4" value=".true.nominal.regular"/>
<parameter key="5" value=".true.nominal.regular"/>
</list>
</operator>
<connect from_op="Read CSV" from_port="output" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
Tagged:
0
Answers
if you just use the CSV operator as in your first example, you can simply follow it up with a "Numerical to Polynominal" operator, set to include all attributes. Or if you like, you can even follow that one up with a "Nominal to Text" operator. After that, all your attributes are of the type 'Text'. Regards,
Marco
but then "00005" ends up as "5", for example. I need plain text original attributes, and I don't know their names at design time. This seems like a very basic requirement, or am I missing something obvious?
Regards,
Isak
unfortunately I think there is no out of the box way atm. I've modified your second process to at least do what you want: Regards,
Marco