Splitting attributes & Getting specific records

pjm · September 2016

Hi

Trying to break an attribute down into 2 pieces: main and remainder

e.g. Fruit | 2KG | £2.00

so want to break off fruit from the rest and have the remainder in another attribute.

Also working on a 50k dataset and want to get 1k specific id numbers i have in mind

thanks for help. 1st time user

sgenzer · September 2016

Another approach is to use Generate Attributes where you take the prefix up to the first space:

att2 prefix(att1,index(att1," "))

If you want the remainder in another attribute:

att3 suffix(att1,length(att1)-length(att2))

Sometimes you can be off by one character so just add/subtract 1 as needed. I use this more than Split as it gives me a lot more customization.

Scott

sgenzer · September 2016

Oh that's much easier. Just use "Filter Examples", select "single" and your ID attribute, select the "include special attributes" checkbox, and under custom filter just make two entries: ID > 94000 and another that is ID < 149000. Make sure the "and" button at the bottom is selected.

You can also use the Filter Example Range operator which is slightly easier but will only filter by example number which may or may not be the same as your IDs.

Scott

MartinLiebig · September 2016

Hi pjm,

welcome to the community!

The split operator should do the job. I have attached a demo processes for it.

~Martin

pjm · September 2016

thx its looking for something in xpath variables for read xml
. So what im looking do do is have something like: Adam| Benji | Colin
. Then set Adam as the main after the split and the other 2 in a seperate variable sub or something. Tried for the split operator: .*| but it results in: vara: A, varb: d, verc: a, vard: m

sgenzer · September 2016

Not sure what you mean by the ID numbers - are the 1k IDs randomized among the 50k examples? I use the Generate ID operator sometimes but not sure this is what you're looking for.

Scott

pjm · September 2016

thx for help on generate attributes think this could help me a lot for that problem

with the ids the 1000 are from 94,000 to just over 149,000

but none of the other ids fall in that range

so im looking for a subset of the csv file that only takes records in that range

thx

jason_xie · November 2017

Scott,

Your answer was really helpful. But what would you do if you want to split by 3rd Space?

For example I have a column that has content like Nov 14 2016 12:50 AM, I want to split the date and time into 2 columns.

Thanks!

sgenzer · November 2017

Hi @jason_xie - for that I would use a nice RegEx in the Split operator:

<?xml version="1.0" encoding="UTF-8"?><process version="8.0.000-BETA">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="8.0.000-BETA" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="generate_data_user_specification" compatibility="8.0.000-BETA" expanded="true" height="68" name="Generate Data by User Specification" width="90" x="313" y="85">
        <list key="attribute_values">
          <parameter key="text" value="&quot;Nov 14 2016 12:50 AM&quot;"/>
        </list>
        <list key="set_additional_roles"/>
      </operator>
      <operator activated="true" class="split" compatibility="8.0.000-BETA" expanded="true" height="82" name="Split" width="90" x="581" y="85">
        <parameter key="split_pattern" value="(?&lt;=20[0-9][0-9])\s"/>
      </operator>
      <connect from_op="Generate Data by User Specification" from_port="output" to_op="Split" to_port="example set input"/>
      <connect from_op="Split" from_port="example set output" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>

Scott

jason_xie · November 2017

Thanks! I ended up adding values to the index() output in the prefix() expression to adjust the space cutoffs.

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Splitting attributes & Getting specific records

Best Answers

Answers

Howdy, Stranger!

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Splitting attributes &amp; Getting specific records

Best Answers

Answers

Splitting attributes & Getting specific records