Options

Remove characters not desired in a field

bea11005bea11005 Member Posts: 20 Maven
edited December 2018 in Help

Hello.

I need to get from a Text Fiel in a table a number inside.

The field contains that: 

{"op":"&","c":[{"type":"date","d":">=","t":1480582800},{"type":"group"}],"showc":[true,true]}

 

I need only the number. I can't remove {} [] characters, I've been using Replace operator to remove the other characters.

How can I do to extract the number and remove {}[] characters? Is there other way to extract the number without Replace operator?

 

Best Answers

  • Options
    kaymankayman Member Posts: 662 Unicorn
    Solution Accepted

    Not sure what you mean, but using this regex as a replacement operation on your original string the result will be 

     

    1480582800

    Which is in the end what you needed, isn't it? 

     

    The example I added shows you this, the document contains your string, the replacement uses the regex and the outcome will be your number

  • Options
    kaymankayman Member Posts: 662 Unicorn
    Solution Accepted

    Then you use the regex in a replace operator.

     

    like this :

     

    <?xml version="1.0" encoding="UTF-8"?><process version="7.6.001">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="7.6.001" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="text:create_document" compatibility="7.5.000" expanded="true" height="68" name="Create Document" width="90" x="112" y="85">
    <parameter key="text" value="{&quot;op&quot;:&quot;&amp;&quot;,&quot;c&quot;:[{&quot;type&quot;:&quot;date&quot;,&quot;d&quot;:&quot;&gt;=&quot;,&quot;t&quot;:1480582800},{&quot;type&quot;:&quot;group&quot;}],&quot;showc&quot;:[true,true]}"/>
    </operator>
    <operator activated="true" class="text:documents_to_data" compatibility="7.5.000" expanded="true" height="82" name="Documents to Data" width="90" x="246" y="85">
    <parameter key="text_attribute" value="txt"/>
    </operator>
    <operator activated="true" class="replace" compatibility="7.6.001" expanded="true" height="82" name="Replace" width="90" x="380" y="85">
    <parameter key="attribute_filter_type" value="single"/>
    <parameter key="attribute" value="txt"/>
    <parameter key="replace_what" value="^.*?&quot;t&quot;:(\d+)\}.*$"/>
    <parameter key="replace_by" value="$1"/>
    </operator>
    <connect from_op="Create Document" from_port="output" to_op="Documents to Data" to_port="documents 1"/>
    <connect from_op="Documents to Data" from_port="example set" to_op="Replace" to_port="example set input"/>
    <connect from_op="Replace" from_port="example set output" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>

    I only used a document because your string contains a lot of quotes, but the logic works the same when it is an example set

  • Options
    kaymankayman Member Posts: 662 Unicorn
    Solution Accepted

    Use the 'parse numbers' operator, this will make an integer from your nominals. Just ensure they are all numeric

Answers

  • Options
    lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn

    Hi @bea11005,

     

    Can you share your dataset, please?

     

    Otherwise, have you try the Extract Information operator of the Text Processing extension (to download and install from the MarketPlace)

     

    Regards, 

     

    Lionel

  • Options
    kaymankayman Member Posts: 662 Unicorn

    You need to escape the special characters like \{ , this will tell the system to tread it as a 'normal' character

    In order to get your number the following regex should work fine :

     

    ^.*?"t":(\d+)\}.*$ 

     

    Basically this says : Start at the beginning and ignore everything while you look for the first "t":, then grab everything that is a number, from the first close curly brackets go to the end and ignore everything again.

     

    Below working sample using the text operators, but the logic goes for any replacement operator

     

    <?xml version="1.0" encoding="UTF-8"?><process version="7.6.001">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="7.6.001" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="text:create_document" compatibility="7.5.000" expanded="true" height="68" name="Create Document" width="90" x="246" y="340">
    <parameter key="text" value="{&quot;op&quot;:&quot;&amp;&quot;,&quot;c&quot;:[{&quot;type&quot;:&quot;date&quot;,&quot;d&quot;:&quot;&gt;=&quot;,&quot;t&quot;:1480582800},{&quot;type&quot;:&quot;group&quot;}],&quot;showc&quot;:[true,true]}"/>
    </operator>
    <operator activated="true" class="text:replace_tokens" compatibility="7.5.000" expanded="true" height="68" name="Replace Tokens" width="90" x="380" y="340">
    <list key="replace_dictionary">
    <parameter key="^.*?&quot;t&quot;:(\d+)\}.*$" value="$1"/>
    </list>
    </operator>
    <connect from_op="Create Document" from_port="output" to_op="Replace Tokens" to_port="document"/>
    <connect from_op="Replace Tokens" from_port="document" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>

     

  • Options
    bea11005bea11005 Member Posts: 20 Maven

    @kayman the answer to this regular expression is ?....

  • Options
    bea11005bea11005 Member Posts: 20 Maven

    I have the text chain in a field of a table, not in a document.....

  • Options
    bea11005bea11005 Member Posts: 20 Maven

    @kayman thank you for your help!!!

    it was the replace by parameter.....I was wrting nothing instead of $1

  • Options
    bea11005bea11005 Member Posts: 20 Maven

    Now I need to convert the extracted number into a numerical type....Any idea?

  • Options
    bea11005bea11005 Member Posts: 20 Maven

    Thank you very much :)

Sign In or Register to comment.