The RapidMiner community is on read-only mode until further notice. Technical support via cases will continue to work as is. For any urgent licensing related requests from Students/Faculty members, please use the Altair academic forum here.

iterate and extract according to value

vabmvabm Member Posts: 4 Contributor I
edited November 2018 in Help
Hi,
I am new to RapidMiner and I am struggling doing some simple iterations. I have a dataset that has different user ratings for a number of products and I need to extract the best product for each user according to these ratings. The file looks something like this:

ID-user,ID-product,Rating
003,040,3
004,330,4
034,330,5
003,032,3
(...)
I can extract the best product for each user using something like: read_csv -> select attribute (set user id) -> sort (best to worst) -> filter examples (index=1), but is really inconvenient if I have a lot of users to process.
I know this can be done with 'loop attributes' and macros, but I can't find an example to use as guide.

Any help/guidance would be more than welcome, thanks !!

Answers

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,525 RM Data Scientist
    Hi,

    I think the easiest way to do this is to combine an aggregate with a join. See attached process. Please be aware that his process produces two lines for a customer if there are two best rated products. You can use either remove duplicates or another aggregate to handle this.

    ~Martin

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="7.0.000">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="7.0.000" expanded="true" name="Process">
        <process expanded="true">
          <operator activated="true" breakpoints="after" class="subprocess" compatibility="7.0.000" expanded="true" height="82" name="Subprocess" width="90" x="45" y="136">
            <process expanded="true">
              <operator activated="true" class="retrieve" compatibility="7.0.000" expanded="true" height="68" name="Retrieve Iris" width="90" x="45" y="34">
                <parameter key="repository_entry" value="//Samples/data/Iris"/>
              </operator>
              <operator activated="true" class="select_attributes" compatibility="7.0.000" expanded="true" height="82" name="Select Attributes" width="90" x="179" y="34">
                <parameter key="invert_selection" value="true"/>
                <parameter key="include_special_attributes" value="true"/>
              </operator>
              <operator activated="true" class="generate_attributes" compatibility="7.0.000" expanded="true" height="82" name="Generate Attributes" width="90" x="313" y="34">
                <list key="function_descriptions">
                  <parameter key="ID-user" value="round(rand()*50)"/>
                  <parameter key="ID-product" value="round(rand()*10)"/>
                  <parameter key="Rating" value="round(rand()*15)"/>
                </list>
              </operator>
              <connect from_op="Retrieve Iris" from_port="output" to_op="Select Attributes" to_port="example set input"/>
              <connect from_op="Select Attributes" from_port="example set output" to_op="Generate Attributes" to_port="example set input"/>
              <connect from_op="Generate Attributes" from_port="example set output" to_port="out 1"/>
              <portSpacing port="source_in 1" spacing="0"/>
              <portSpacing port="sink_out 1" spacing="0"/>
              <portSpacing port="sink_out 2" spacing="0"/>
            </process>
            <description align="center" color="transparent" colored="false" width="126">Generate a fitting example set</description>
          </operator>
          <operator activated="true" class="multiply" compatibility="7.0.000" expanded="true" height="103" name="Multiply" width="90" x="179" y="136"/>
          <operator activated="true" class="aggregate" compatibility="7.0.000" expanded="true" height="82" name="Aggregate" width="90" x="313" y="85">
            <list key="aggregation_attributes">
              <parameter key="Rating" value="maximum"/>
            </list>
            <parameter key="group_by_attributes" value="ID-user"/>
          </operator>
          <operator activated="true" class="join" compatibility="7.0.000" expanded="true" height="82" name="Join" width="90" x="447" y="136">
            <parameter key="use_id_attribute_as_key" value="false"/>
            <list key="key_attributes">
              <parameter key="ID-user" value="ID-user"/>
              <parameter key="maximum(Rating)" value="Rating"/>
            </list>
          </operator>
          <connect from_op="Subprocess" from_port="out 1" to_op="Multiply" to_port="input"/>
          <connect from_op="Multiply" from_port="output 1" to_op="Aggregate" to_port="example set input"/>
          <connect from_op="Multiply" from_port="output 2" to_op="Join" to_port="right"/>
          <connect from_op="Aggregate" from_port="example set output" to_op="Join" to_port="left"/>
          <connect from_op="Join" from_port="join" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="105"/>
          <portSpacing port="sink_result 2" spacing="42"/>
        </process>
      </operator>
    </process>
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • vabmvabm Member Posts: 4 Contributor I
    That's amazing, thank you so much. Is there any good book or tutorial list you could recommend? Sometimes is difficult to understand the documentation.
    And how do I mark this thread as solved?
    Cheers
  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,525 RM Data Scientist
    Well, in this special case it is very hard to recommend a book. Joins and Aggregates are standard operations you do in a SQL database. If you are familiar with this way of thinking you can creativly combine them.

    In general there are some references like:
    https://rapidminer.com/resource/data-mining-masses/ - Very basic

    http://www.amazon.com/Exploring-Data-RapidMiner-Andrew-Chisholm/dp/1782169334/ref=sr_1_3?s=books&;ie=UTF8&qid=1454406443&sr=1-3&keywords=rapidminer - A bit more advanced i think, andrew is supporting here in the forums

    http://www.amazon.com/Predictive-Analytics-Data-Mining-RapidMiner/dp/0128014601/ref=sr_1_1?s=books&;ie=UTF8&qid=1454406443&sr=1-1&keywords=rapidminer - My favorite if it comes down to learn predictive analytics w/o pure math

    I was thinking about putting together some kind of kind with tipps and tricks. I started to do so on my blog. Let's see - maybe i will create some document somewhen soon.

    ~Martin

    P.S: My blog can be found at: http://data-analytics.ghost.io/
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • vabmvabm Member Posts: 4 Contributor I
    Thanks Martin!
Sign In or Register to comment.