Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
Simple Crosstab with totals
I have seen other questions posted here and via Google Searches, but perhaps the solution was for prior RM versions where operators are either no longer available, or are now required paid licenses via 3rd party tools.
My thought was to simply Pivot the data, which works as expected, but the ability to add a total is lacking. In my case, it's just a simple 2x2, which I can do with pivot, but the table is lacking totals.
Could I add them up, sure? But that feels clunky given how powerful, and intuitive, the tool is to use.
Is is not possible without paying for a license via an extension found in the Marketplace? I am really going to enjoy using this as a teaching tool for my class, but it feels like I must be missing something obvious when generating simple 2x2 tables, with totals, is basically EDA 101.
Thanks,
Brock
My thought was to simply Pivot the data, which works as expected, but the ability to add a total is lacking. In my case, it's just a simple 2x2, which I can do with pivot, but the table is lacking totals.
Could I add them up, sure? But that feels clunky given how powerful, and intuitive, the tool is to use.
Is is not possible without paying for a license via an extension found in the Marketplace? I am really going to enjoy using this as a teaching tool for my class, but it feels like I must be missing something obvious when generating simple 2x2 tables, with totals, is basically EDA 101.
Thanks,
Brock
Tagged:
0
Best Answer
-
BalazsBarany Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert Posts: 955 UnicornHi again,
here's the Building Block.
It selects the nominal attribute, Transposes the column, and extract the column name from it.
It should work as long as you have this structure in your data (one nominal attribute, 2x2 or maybe more numerical attributes).
Regards,
Balázs6
Answers
Dortmund, Germany
And to your other question, those examples are definitely more complicated than I am looking for out of the box.
Cheers,
Brock
Scott
Dortmund, Germany
you can do it in RapidMiner.
It needs a few steps:
Unfortunately, it's not a good building block (yet) because the first (grouping) attribute name is hard coded.
Generate Aggregation creates the "total" attribute with the sum of all numeric columns (attribute filter type=value_type, value type=numeric).
Aggregate creates the sums with "use default aggregation", attribute filter type ... numeric, and default aggregation function = sum.
Then we add the text "Total" with the attribute name from the un-aggregated example set and rename the aggregated attributes from "sum(whatever)" to just "whatever" using Rename by Replacing (replace what: sum.(.+).$ ; replace by: $1).
Maybe we could extract the attribute name from the incoming example set, save it as a macro and reuse that in the Generate Attributes step. Then it would be completely generic and a possible building block. But that's an exercise for the reader ;-)
Regards,
Balázs
Maybe my solution is not the easiest but it works:
data.loc['Total']= data.sum()</code><?xml version="1.0" encoding="UTF-8"?><process version="9.3.001"> <context> <input/> <output/> <macros/> </context> <operator activated="true" class="process" compatibility="9.3.001" expanded="true" name="Process"> <parameter key="logverbosity" value="init"/> <parameter key="random_seed" value="2001"/> <parameter key="send_mail" value="never"/> <parameter key="notification_email" value=""/> <parameter key="process_duration_for_mail" value="30"/> <parameter key="encoding" value="SYSTEM"/> <process expanded="true"> <operator activated="true" class="generate_nominal_data" compatibility="9.3.001" expanded="true" height="68" name="Generate Nominal Data" width="90" x="45" y="34"> <parameter key="number_examples" value="100"/> <parameter key="number_of_attributes" value="1"/> <parameter key="number_of_values" value="5"/> <parameter key="use_local_random_seed" value="false"/> <parameter key="local_random_seed" value="1992"/> </operator> <operator activated="true" class="blending:pivot" compatibility="9.3.001" expanded="true" height="82" name="Pivot" width="90" x="179" y="34"> <parameter key="group_by_attributes" value="att1"/> <parameter key="column_grouping_attribute" value="label"/> <list key="aggregation_attributes"> <parameter key="label" value="count"/> </list> <parameter key="use_default_aggregation" value="false"/> <parameter key="default_aggregation_function" value="first"/> </operator> <operator activated="true" class="rename_by_replacing" compatibility="9.3.001" expanded="true" height="82" name="Rename by Replacing" width="90" x="313" y="34"> <parameter key="attribute_filter_type" value="single"/> <parameter key="attribute" value="att1"/> <parameter key="attributes" value=""/> <parameter key="use_except_expression" value="false"/> <parameter key="value_type" value="attribute_value"/> <parameter key="use_value_type_exception" value="false"/> <parameter key="except_value_type" value="time"/> <parameter key="block_type" value="attribute_block"/> <parameter key="use_block_type_exception" value="false"/> <parameter key="except_block_type" value="value_matrix_row_start"/> <parameter key="invert_selection" value="true"/> <parameter key="include_special_attributes" value="false"/> <parameter key="replace_what" value=".*_"/> </operator> <operator activated="true" class="replace_missing_values" compatibility="9.3.001" expanded="true" height="103" name="Replace Missing Values" width="90" x="514" y="34"> <parameter key="return_preprocessing_model" value="false"/> <parameter key="create_view" value="false"/> <parameter key="attribute_filter_type" value="all"/> <parameter key="attribute" value=""/> <parameter key="attributes" value=""/> <parameter key="use_except_expression" value="false"/> <parameter key="value_type" value="attribute_value"/> <parameter key="use_value_type_exception" value="false"/> <parameter key="except_value_type" value="time"/> <parameter key="block_type" value="attribute_block"/> <parameter key="use_block_type_exception" value="false"/> <parameter key="except_block_type" value="value_matrix_row_start"/> <parameter key="invert_selection" value="false"/> <parameter key="include_special_attributes" value="false"/> <parameter key="default" value="zero"/> <list key="columns"/> </operator> <operator activated="true" class="generate_aggregation" compatibility="9.3.001" expanded="true" height="82" name="Generate Aggregation" width="90" x="648" y="34"> <parameter key="attribute_name" value="Total_Column"/> <parameter key="attribute_filter_type" value="single"/> <parameter key="attribute" value="att1"/> <parameter key="attributes" value=""/> <parameter key="use_except_expression" value="false"/> <parameter key="value_type" value="attribute_value"/> <parameter key="use_value_type_exception" value="false"/> <parameter key="except_value_type" value="time"/> <parameter key="block_type" value="attribute_block"/> <parameter key="use_block_type_exception" value="false"/> <parameter key="except_block_type" value="value_matrix_row_start"/> <parameter key="invert_selection" value="true"/> <parameter key="include_special_attributes" value="false"/> <parameter key="aggregation_function" value="sum"/> <parameter key="concatenation_separator" value="|"/> <parameter key="keep_all" value="true"/> <parameter key="ignore_missings" value="true"/> <parameter key="ignore_missing_attributes" value="false"/> </operator> <operator activated="true" class="transpose" compatibility="9.3.001" expanded="true" height="82" name="Transpose" width="90" x="782" y="34"/> <operator activated="true" class="rename_by_example_values" compatibility="9.3.001" expanded="true" height="82" name="Rename by Example Values" width="90" x="916" y="34"> <parameter key="row_number" value="1"/> </operator> <operator activated="true" class="parse_numbers" compatibility="9.3.001" expanded="true" height="82" name="Parse Numbers" width="90" x="1050" y="34"> <parameter key="attribute_filter_type" value="all"/> <parameter key="attribute" value=""/> <parameter key="attributes" value=""/> <parameter key="use_except_expression" value="false"/> <parameter key="value_type" value="nominal"/> <parameter key="use_value_type_exception" value="false"/> <parameter key="except_value_type" value="file_path"/> <parameter key="block_type" value="single_value"/> <parameter key="use_block_type_exception" value="false"/> <parameter key="except_block_type" value="single_value"/> <parameter key="invert_selection" value="false"/> <parameter key="include_special_attributes" value="false"/> <parameter key="decimal_character" value="."/> <parameter key="grouped_digits" value="false"/> <parameter key="grouping_character" value=","/> <parameter key="infinity_representation" value=""/> <parameter key="unparsable_value_handling" value="fail"/> </operator> <operator activated="true" class="generate_aggregation" compatibility="9.3.001" expanded="true" height="82" name="Generate Aggregation (2)" width="90" x="1184" y="34"> <parameter key="attribute_name" value="Total_Row"/> <parameter key="attribute_filter_type" value="single"/> <parameter key="attribute" value="att1"/> <parameter key="attributes" value=""/> <parameter key="use_except_expression" value="false"/> <parameter key="value_type" value="attribute_value"/> <parameter key="use_value_type_exception" value="false"/> <parameter key="except_value_type" value="time"/> <parameter key="block_type" value="attribute_block"/> <parameter key="use_block_type_exception" value="false"/> <parameter key="except_block_type" value="value_matrix_row_start"/> <parameter key="invert_selection" value="true"/> <parameter key="include_special_attributes" value="true"/> <parameter key="aggregation_function" value="sum"/> <parameter key="concatenation_separator" value="|"/> <parameter key="keep_all" value="true"/> <parameter key="ignore_missings" value="false"/> <parameter key="ignore_missing_attributes" value="false"/> </operator> <operator activated="true" class="transpose" compatibility="9.3.001" expanded="true" height="82" name="Transpose (2)" width="90" x="1050" y="238"/> <connect from_op="Generate Nominal Data" from_port="output" to_op="Pivot" to_port="input"/> <connect from_op="Pivot" from_port="output" to_op="Rename by Replacing" to_port="example set input"/> <connect from_op="Rename by Replacing" from_port="example set output" to_op="Replace Missing Values" to_port="example set input"/> <connect from_op="Replace Missing Values" from_port="example set output" to_op="Generate Aggregation" to_port="example set input"/> <connect from_op="Generate Aggregation" from_port="example set output" to_op="Transpose" to_port="example set input"/> <connect from_op="Transpose" from_port="example set output" to_op="Rename by Example Values" to_port="example set input"/> <connect from_op="Rename by Example Values" from_port="example set output" to_op="Parse Numbers" to_port="example set input"/> <connect from_op="Parse Numbers" from_port="example set output" to_op="Generate Aggregation (2)" to_port="example set input"/> <connect from_op="Generate Aggregation (2)" from_port="example set output" to_op="Transpose (2)" to_port="example set input"/> <connect from_op="Transpose (2)" from_port="example set output" to_port="result 1"/> <portSpacing port="source_input 1" spacing="0"/> <portSpacing port="sink_result 1" spacing="0"/> <portSpacing port="sink_result 2" spacing="0"/> </process> </operator> </process> </pre><div>Another way is using the Python process<br>and use this line of code to add a row with the totals of each column<br><pre class="CodeBlock"><code>#
Dortmund, Germany
Thanks, I am mostly focused in R and python, but I will give it a review.
To circle back, this behaved exactly as expected for my example above. Thanks!