Strange behavior: Nominal to Binomial Operator

Ina_KIna_K Member Posts: 9 Contributor II
edited August 2019 in Help

Hi all,

after preparation of data for FP-Growth (binary representation) my example set looks like this:

20-01-2017_nom_to_bin_prob.png

 

The next step is transforming the data from nominal to binomial with the help of Nominal to Binomial.

<operator activated="true" breakpoints="after" class="nominal_to_binominal" compatibility="7.2.001" expanded="true" height="103" name="Nominal to Binominal" width="90" x="715" y="34">
<parameter key="create_view" value="true"/>
<parameter key="regular_expression" value="[0-9]+"/>
</operator>

This is the input to Nominal to Binomial:

20-01-2017_input_for_nom_to_bin_prob.png

The operator Nominal to Binomial runs without any error messages - but unfortunately without a result set also..


When ticking the parameter transform binomial*, RapidMiner splits one column into two, which is not what I need for the subsequent FP-Growth operator. *desperate attempt via try-and-error

20-01-2017_output_from_nom_to_bin_prob.png

Any suggestions? ...please..

 

 

The whole process (if necessary):

 

<?xml version="1.0" encoding="UTF-8"?><process version="7.2.001">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="7.2.001" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" breakpoints="after" class="subprocess" compatibility="7.2.001" expanded="true" height="82" name="Pivot data" width="90" x="45" y="34">
<process expanded="true">
<operator activated="true" class="jdbc_connectors:read_database" compatibility="7.2.001" expanded="true" height="68" name="Read Database" width="90" x="179" y="136">
<parameter key="connection" value="DB_Name"/>
<parameter key="query" value="SELECT *&#10;FROM &quot;DB_SCHEMA&quot;.&quot;TBLBIT&quot;&#10;WHERE ROWNUM &lt; 100000"/>
<enumeration key="parameters"/>
</operator>
<operator activated="false" class="jdbc_connectors:stream_database" compatibility="7.2.001" expanded="true" height="68" name="Stream Database" width="90" x="179" y="34">
<parameter key="connection" value="DB_NAME"/>
<parameter key="table_name" value="ZZ_RM_TEST"/>
<parameter key="recreate_index" value="true"/>
</operator>
<operator activated="true" breakpoints="after" class="select_attributes" compatibility="7.2.001" expanded="true" height="82" name="Select Attributes" width="90" x="313" y="34">
<parameter key="attribute_filter_type" value="subset"/>
<parameter key="attributes" value="|TBLUNIQUELRU_ID|BITID|EVENT"/>
</operator>
<operator activated="true" class="filter_examples" compatibility="7.2.001" expanded="true" height="103" name="Filter Examples" width="90" x="447" y="34">
<list key="filters_list">
<parameter key="filters_entry_key" value="BITID.is_not_missing."/>
</list>
</operator>
<operator activated="true" breakpoints="after" class="pivot" compatibility="7.2.001" expanded="true" height="82" name="Pivot" width="90" x="581" y="34">
<parameter key="group_attribute" value="TBLUNIQUELRU_ID"/>
<parameter key="index_attribute" value="BITID"/>
<parameter key="consider_weights" value="false"/>
<parameter key="skip_constant_attributes" value="false"/>
</operator>
<connect from_op="Read Database" from_port="output" to_op="Select Attributes" to_port="example set input"/>
<connect from_op="Select Attributes" from_port="example set output" to_op="Filter Examples" to_port="example set input"/>
<connect from_op="Filter Examples" from_port="example set output" to_op="Pivot" to_port="example set input"/>
<connect from_op="Pivot" from_port="example set output" to_port="out 1"/>
<portSpacing port="source_in 1" spacing="0"/>
<portSpacing port="sink_out 1" spacing="0"/>
<portSpacing port="sink_out 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="subprocess" compatibility="7.2.001" expanded="true" height="82" name="Rename" width="90" x="179" y="34">
<process expanded="true">
<operator activated="true" class="rename_by_replacing" compatibility="7.2.001" expanded="true" height="82" name="Rename Att. EVENT_" width="90" x="45" y="34">
<parameter key="regular_expression" value="EVENT_"/>
<parameter key="replace_what" value="EVENT_"/>
</operator>
<operator activated="true" class="rename_by_replacing" compatibility="7.2.001" expanded="true" height="82" name="Rename Attr. .0" width="90" x="179" y="34">
<parameter key="attribute_filter_type" value="regular_expression"/>
<parameter key="regular_expression" value="[0-9]*[.][0]"/>
<parameter key="replace_what" value="[.]0"/>
</operator>
<operator activated="false" class="replace" compatibility="7.2.001" expanded="true" height="82" name="Replace TINF" width="90" x="313" y="136">
<parameter key="regular_expression" value="' '[0-9][.][0-9]"/>
<parameter key="replace_what" value="TINF"/>
<parameter key="replace_by" value="Y"/>
</operator>
<operator activated="true" class="replace" compatibility="7.2.001" expanded="true" height="82" name="Replace [A-Z]" width="90" x="447" y="34">
<parameter key="regular_expression" value="' '[0-9][.][0-9]"/>
<parameter key="replace_what" value="[A-Z]+"/>
<parameter key="replace_by" value="Y"/>
</operator>
<operator activated="false" class="replace" compatibility="7.2.001" expanded="true" height="82" name="Replace TPSD" width="90" x="447" y="187">
<parameter key="regular_expression" value="' '[0-9][.][0-9]"/>
<parameter key="replace_what" value="TPSD"/>
<parameter key="replace_by" value="Y"/>
</operator>
<operator activated="false" class="replace" compatibility="7.2.001" expanded="true" height="82" name="Replace PWRO" width="90" x="581" y="187">
<parameter key="regular_expression" value="' '[0-9][.][0-9]"/>
<parameter key="replace_what" value="PWRO"/>
<parameter key="replace_by" value="Y"/>
</operator>
<operator activated="false" breakpoints="after" class="replace" compatibility="7.2.001" expanded="true" height="82" name="Replace TFLD" width="90" x="715" y="187">
<parameter key="regular_expression" value="' '[0-9][.][0-9]"/>
<parameter key="replace_what" value="TFLD"/>
<parameter key="replace_by" value="Y"/>
</operator>
<operator activated="true" breakpoints="after" class="replace_missing_values" compatibility="7.2.001" expanded="true" height="103" name="Replace Missing Values" width="90" x="983" y="34">
<parameter key="create_view" value="true"/>
<parameter key="attribute_filter_type" value="single"/>
<parameter key="attribute" value="TBLUNIQUELRU_ID"/>
<parameter key="regular_expression" value="[0-9]+"/>
<parameter key="value_type" value="nominal"/>
<parameter key="invert_selection" value="true"/>
<parameter key="default" value="none"/>
<list key="columns">
<parameter key="Infinity" value="zero"/>
</list>
</operator>
<operator activated="false" class="replace_infinite_values" compatibility="7.2.001" expanded="true" height="103" name="Replace Infinite Values" width="90" x="916" y="187">
<parameter key="attribute_filter_type" value="single"/>
<parameter key="attribute" value="TBLUNIQUELRU_ID"/>
<parameter key="invert_selection" value="true"/>
<parameter key="default" value="none"/>
<list key="columns"/>
</operator>
<connect from_port="in 1" to_op="Rename Att. EVENT_" to_port="example set input"/>
<connect from_op="Rename Att. EVENT_" from_port="example set output" to_op="Rename Attr. .0" to_port="example set input"/>
<connect from_op="Rename Attr. .0" from_port="example set output" to_op="Replace [A-Z]" to_port="example set input"/>
<connect from_op="Replace [A-Z]" from_port="example set output" to_op="Replace Missing Values" to_port="example set input"/>
<connect from_op="Replace TPSD" from_port="example set output" to_op="Replace PWRO" to_port="example set input"/>
<connect from_op="Replace PWRO" from_port="example set output" to_op="Replace TFLD" to_port="example set input"/>
<connect from_op="Replace Missing Values" from_port="example set output" to_port="out 1"/>
<portSpacing port="source_in 1" spacing="0"/>
<portSpacing port="source_in 2" spacing="0"/>
<portSpacing port="sink_out 1" spacing="0"/>
<portSpacing port="sink_out 2" spacing="0"/>
</process>
</operator>
<operator activated="false" class="set_role" compatibility="7.2.001" expanded="true" height="82" name="Set Role" width="90" x="179" y="187">
<parameter key="attribute_name" value="TBLUNIQUELRU_ID"/>
<parameter key="target_role" value="id"/>
<list key="set_additional_roles"/>
</operator>
<operator activated="false" class="select_attributes" compatibility="7.2.001" expanded="true" height="82" name="Select Attributes (3)" width="90" x="313" y="187">
<parameter key="attribute_filter_type" value="regular_expression"/>
<parameter key="regular_expression" value="['?' ]"/>
<parameter key="invert_selection" value="true"/>
</operator>
<operator activated="true" breakpoints="after" class="replace_missing_values" compatibility="7.2.001" expanded="true" height="103" name="Replace Missing Values (2)" width="90" x="313" y="34">
<parameter key="attribute_filter_type" value="single"/>
<parameter key="attribute" value="TBLUNIQUELRU_ID"/>
<parameter key="invert_selection" value="true"/>
<parameter key="default" value="value"/>
<list key="columns"/>
<parameter key="replenishment_value" value="N"/>
</operator>
<operator activated="true" breakpoints="after" class="select_attributes" compatibility="7.2.001" expanded="true" height="82" name="Select Attributes (2)" width="90" x="447" y="34">
<parameter key="attribute_filter_type" value="single"/>
<parameter key="attribute" value="TBLUNIQUELRU_ID"/>
<parameter key="attributes" value="TBLUNIQUELRU_ID||Infinity"/>
<parameter key="invert_selection" value="true"/>
</operator>
<operator activated="false" breakpoints="after" class="replace" compatibility="7.2.001" expanded="true" height="82" name="Replace '?'" width="90" x="581" y="238">
<parameter key="regular_expression" value="((.*[.])(0))"/>
<parameter key="replace_what" value="\?"/>
<parameter key="replace_by" value="'?'"/>
</operator>
<operator activated="true" breakpoints="after" class="nominal_to_binominal" compatibility="7.2.001" expanded="true" height="103" name="Nominal to Binominal" width="90" x="715" y="34">
<parameter key="create_view" value="true"/>
<parameter key="regular_expression" value="[0-9]+"/>
</operator>
<operator activated="true" breakpoints="before,after" class="fp_growth" compatibility="7.2.001" expanded="true" height="82" name="FP-Growth" width="90" x="849" y="34"/>
<connect from_op="Pivot data" from_port="out 1" to_op="Rename" to_port="in 1"/>
<connect from_op="Rename" from_port="out 1" to_op="Replace Missing Values (2)" to_port="example set input"/>
<connect from_op="Replace Missing Values (2)" from_port="example set output" to_op="Select Attributes (2)" to_port="example set input"/>
<connect from_op="Select Attributes (2)" from_port="example set output" to_op="Nominal to Binominal" to_port="example set input"/>
<connect from_op="Nominal to Binominal" from_port="example set output" to_op="FP-Growth" to_port="example set"/>
<connect from_op="FP-Growth" from_port="example set" to_port="result 1"/>
<connect from_op="FP-Growth" from_port="frequent sets" to_port="result 2"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
</process>
</operator>
</process>

 

 

 

Tagged:

Answers

  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder

    Hi,

     

    For 0s and 1s it is actually simpler to use the operator "Numerical to Binominal" to achieve what you want, i.e. getting the same amount of columns with "true" and "false" values.  If your data is not numerical already (could not tell from the screenshots) then you can transform the data into numerical first with "Parse Numbers".

     

    Hope this helps,

    Ingo

  • Ina_KIna_K Member Posts: 9 Contributor II

    Hello Ingo,

     

    since the data type of the Select Attribute output example set definitely is nominal (second screenshot) I guess Nominal to Binomial was the right operator. The 1s and 0s merely where nominal symbols not numeric values.

    After I changed the symbols to nominal values 'Y' and 'N'´it works. I guess non-numerical numbers cant be processed as nominal values in this operator.

     

    Thanks for your advice!

Sign In or Register to comment.