"Split string with n characters into n columns with each cell with only one character"

komal_chenthamakomal_chenthama Member Posts: 3 Contributor I
edited June 2019 in Help

I have around 700 rows with strings of varying length from 50 to 2000. They look like this:

 

MRILTWAITLLSLACFSLTEKYCYYPNGQIAVSDSPCNPNADDSACCDGDKGMMCMSNNLCRGPGGTTVRSSCTDKSWDSTACAALCMTENTVPADLTSCANVTGSDTTYCCDNHRVPCCDASIARFDVLPSKPQIFAIWDDSASAYLSINLPGTATTTATTTTSSSPAYPTDPPPSNTQPSSTPSNPPSPDAASAAALSLAVQAGIGVGAAVLALALAVVVYLVVKLRRNKNAVLAAGQRGQAGAVHGQYQGGVGVGGYDGWENKHMDKNGGGVGNGGGGAAAWYHPPAYGEPYHGGSGFGVVPRQELDAWPSVGYGQPRRQRHRQSHGQGYVQRFELPATPLGAPRRAF
MKTPLIFLLHLGLLQTCLGKKCYYPGGEEAPGDLPCDTEAEHSPCCAGGKIAGACLANKLCLAKGNPDWYARGSCTDPTFEAPECPKFCLSHEGRGWNLDYCFSQTGSETAFCCEGDANCCAAGRLEIQPAPTHVWALWNGAVSRYDVVTPLGTAKETSAPTSSATSSGTTSDAVEHSSTETTSASTTGTAAGGDRSDATGSANSNSNANSNESTGLSTGAQAGIGVGAAAGALLLAAVAFLWWRMNRMQKAMLVAQQQAAAAYPPPETPAYYSRTPAEKHELMAERPTHELAGQHYYVQGDTRSAELSSQPAYTPVESPAAGRNYGP
MRSVYIALAAALCWTGTLSASPAGAKDDVEVAMMAGRRRLTRTSGRYRSEFAALGARQGDQQCGAQFGRCPGDLCCSSYGFCGDSVDHCHPLFDCQTQYGTCGWPRAVPTTSARPTTSSTPAPPTTTTPSSTSVRPPTTSTSVTIPVPSGGLEVTQNGMCGNNTMCIGNPNYGPCCSQFFWCGSSIEFCGAGCQSDFGACLGIPGQPGNPITNGTTTSGGGSGPTSSPPTTRPTSTRVSTTTTTTTSSRTTSSSPSVTLPAGQTSSTDGRCGNNVNCLGSRFGRCCSQFGYCGDGDQYCPYIVGCQPQFGYCDPQ

 

I would like to split character into n(length of the string in that cell) columns, such that each cell contains only one character. And this should be done for all the rows. Then each letter is to replaced by a specific score (decimal). How can this be achieved? Please help. 

 

Tagged:

Answers

  • kypexinkypexin Moderator, RapidMiner Certified Analyst, Member Posts: 291 Unicorn

    Hi @komal_chenthama

     

     

    It would not be too complicated using Loop operator and Generate Macro + Generate Attributes inside it, where macro would be just a counter of loop iterations and each new attribute would take out substring of length 1 and at position equals the number of current iteration. 

     

    But the question is, would it be possible to make all strings of equal length before, with dummy of special characters? As otherwise each example would generate different number of attributes (equals to each string length) and you potentially may end up with an error. And honestly I am afraid I cannot come up with a very quick solution to accomplish it using RapidMiner, at least at the moment.

  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn

    Hi @komal_chenthama,

     

    I like this kind of problem !!

    The trick, here, is to replace the "no-spaces" by "-" (or in other words, add "-" between the letters) and 

    then use the Split operator with the "-" pattern : 

    <?xml version="1.0" encoding="UTF-8"?><process version="8.2.001">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="8.2.001" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="operator_toolbox:create_exampleset" compatibility="1.2.000" expanded="true" height="68" name="Create ExampleSet" width="90" x="112" y="34">
    <parameter key="generator_type" value="comma_separated_text"/>
    <list key="function_descriptions"/>
    <list key="numeric_series_configuration"/>
    <list key="date_series_configuration"/>
    <list key="date_series_configuration (interval)"/>
    <parameter key="input_csv_text" value="Id,text&#10;1,MRILTWAITLL&#10;2,MKTPLIF&#10;3,MRSVYIALAAALCWTGT"/>
    </operator>
    <operator activated="true" class="replace" compatibility="8.2.001" expanded="true" height="82" name="Replace" width="90" x="246" y="34">
    <parameter key="attribute_filter_type" value="single"/>
    <parameter key="attribute" value="text"/>
    <parameter key="replace_what" value="\B"/>
    <parameter key="replace_by" value="-"/>
    </operator>
    <operator activated="true" class="split" compatibility="8.2.001" expanded="true" height="82" name="Split" width="90" x="380" y="34">
    <parameter key="attribute_filter_type" value="single"/>
    <parameter key="attribute" value="text"/>
    <parameter key="split_pattern" value="-"/>
    </operator>
    <connect from_op="Create ExampleSet" from_port="output" to_op="Replace" to_port="example set input"/>
    <connect from_op="Replace" from_port="example set output" to_op="Split" to_port="example set input"/>
    <connect from_op="Split" from_port="example set output" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>

    Does this process answer to your need ?

     

    Regards,

     

    Lionel

Sign In or Register to comment.