🦉 🎤   RapidMiner Wisdom 2020 - CALL FOR SPEAKERS DEADLINE IS NOVEMBER 15   🦉 🎤

CLICK HERE TO GO TO ENTRY FORM

"How to join TID"

wasylniwasylni Member Posts: 6 Contributor II
edited June 11 in Help
Hello,

I am trying to find some association rules using RapidMiner 4.4.000
I am connecting to MySQL database and dig data from it by entering below SQL statements:

SELECT
  `orders_products`.`orders_id` AS `TID`, `orders_products`.`products_name` AS
  `ITEM`
FROM
  `orders_products`;


after executing above I am getting data :

TID ITEM
1 Speed 2: Cruise Control
2 Red Corner
3 Speed 2: Cruise Control
4 Microsoft IntelliMouse Explorer
5 The Replacement Killers
5 Courage Under Fire
6 The Matrix
7 There's Something About Mary
7 Speed 2: Cruise Control
8 Under Siege 2 - Dark Territory
8 SWAT 3: Close Quarters Battle
.
.
.
up to 25000 records (15000 transactions)

Can someone advise how to distinct TID to have all products for given TID in one row (like below)

TID ITEM
1 Speed 2: Cruise Control
2 Red Corner
3 Speed 2: Cruise Control
4 Microsoft IntelliMouse Explorer
5 The Replacement Killers, Courage Under Fire
6 The Matrix
7 There's Something About Mary, Speed 2: Cruise Control
8 Under Siege 2 - Dark Territory, SWAT 3: Close Quarters Battle

???

Thanks,
wasyl





Tagged:

Answers

  • wasylniwasylni Member Posts: 6 Contributor II
    when Nominal2Binominal is applied I am getting
    1.0 "true" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false"
    2.0 "false" "true" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false"
    3.0 "true" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false"
    4.0 "false" "false" "true" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false"
    5.0 "false" "false" "false" "true" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false"
    5.0 "false" "false" "false" "false" "true" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false"
    6.0 "false" "false" "false" "false" "false" "true" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false"
    7.0 "false" "false" "false" "false" "false" "false" "true" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false"
    7.0 "true" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false"
    8.0 "false" "false" "false" "false" "false" "false" "false" "true" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false"
    8.0 "false" "false" "false" "false" "false" "false" "false" "false" "true" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false"
    9.0 "false" "false" "false" "false" "false" "false" "false" "false" "false" "true" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false"
    9.0 "false" "true" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false"


    but still can not Distinct unique TID

  • TobiasMalbrechtTobiasMalbrecht Moderator, Employee, Member Posts: 289  Guru
    Hi,

    you have to use the [tt]Examples2AttributesPivoting[/tt] operator. Presumably, you also have to generate a column consisting of ones to acutally fill the newly created attributes with some values.

    Kind regards,
    Tobias
  • wasylniwasylni Member Posts: 6 Contributor II
    Thanks,

    can you give me some tip on how to get results like these below (pivoting does not work or I am doing something wrong)

    example transaction 5
    5.0 "false" "false" "false" "true" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false"
    5.0 "false" "false" "false" "false" "true" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false"

    i would like to get distinct data (in one row) like

    5.0  "false" "false" "false" "true" "true" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false"
  • steffensteffen Member Posts: 347  Guru
    Hello

    I studied your problem and tried various approaches, but I wasnt successful, too. The best result I could achieve with the Example2AttributePivoting was this:

    <operator name="Root" class="Process" expanded="yes">
        <operator name="CSVExampleSource" class="CSVExampleSource">
            <parameter key="filename" value=""/>
        </operator>
        <operator name="Numerical2Polynominal" class="Numerical2Polynominal">
        </operator>
        <operator name="ValueIterator" class="ValueIterator" expanded="yes">
            <parameter key="attribute" value="TID"/>
            <operator name="MaterializeDataInMemory" class="MaterializeDataInMemory">
            </operator>
            <operator name="ExampleFilter" class="ExampleFilter">
                <parameter key="condition_class" value="attribute_value_filter"/>
                <parameter key="parameter_string" value="TID=%{a}"/>
            </operator>
            <operator name="IdTagging" class="IdTagging">
            </operator>
        </operator>
        <operator name="ExampleSetMerge" class="ExampleSetMerge">
        </operator>
        <operator name="Example2AttributePivoting" class="Example2AttributePivoting">
            <parameter key="group_attribute" value="TID"/>
            <parameter key="index_attribute" value="id"/>
        </operator>
    </operator>
    ... giving data in csv format as in your first post.

    The result of this approach breaks the one column = one item - policy, which is not preferred in general.

    So what now: The only thing I can suggest is to apply Nominal2Binominal, Sorting using TID and then your own selfwritten - operator which aggregates the rows using the boolean "OR"-operator for every TID for every column. Maybe there is an approach using only available operators, but I cannot think of any.

    regards,

    Steffen

    PS @RapidMiner-Team:  This is another usecase for scripting, isnt it ? ;)
  • wasylniwasylni Member Posts: 6 Contributor II
    How can I build my own operator,

    Can I apply somecomplex  ;)  SQL/PLSQL statements or something similar to distinct mentioned above rows ?

    regards,
    wasylni
  • keithkeith Member Posts: 157  Guru
    This can be done entirely with MySQL's GROUP_CONCAT operator http://dev.mysql.com/doc/refman/5.1/en/group-by-functions.html#function_group-concat, and doesn't need RM at all (assuming I've understood the question correctly):

    SELECT
      `orders_products`.`orders_id` AS `TID`,
      GROUP_CONCAT(`orders_products`.`products_name` ORDER BY 1 SEPARATOR ',') as `ITEM`
    FROM
      `orders_products`;
    GROUP BY `orders_products`.`orders_id`

    I use Oracle and MS SQL Server, which don't have GROUP_CONCAT, and so haven't tested this in MySQL, but the link above is from the MySQL documentation, so if it doesn't work, it should be easy to figure out why.
  • TobiasMalbrechtTobiasMalbrecht Moderator, Employee, Member Posts: 289  Guru
    Hi,

    finally I had another look into your issue. I have to correct my former advise to create a column of ones before applying the pivoting operator. Nevertheless you have to create kind of dummy column that has to have different values for each dataset row. Here is the code:

    <operator name="Root" class="Process" expanded="yes">
        <operator name="CSVExampleSource" class="CSVExampleSource">
            <parameter key="filename" value="tdata.csv"/>
        </operator>
        <operator name="IdTagging" class="IdTagging">
        </operator>
        <operator name="ChangeAttributeRole" class="ChangeAttributeRole">
            <parameter key="name" value="id"/>
        </operator>
        <operator name="Example2AttributePivoting" class="Example2AttributePivoting">
            <parameter key="group_attribute" value="TID"/>
            <parameter key="index_attribute" value="ITEM"/>
        </operator>
        <operator name="Numerical2Polynominal" class="Numerical2Polynominal">
        </operator>
        <operator name="AttributeSubsetPreprocessing" class="AttributeSubsetPreprocessing" expanded="yes">
            <parameter key="condition_class" value="attribute_name_filter"/>
            <parameter key="attribute_name_regex" value="TID"/>
            <parameter key="invert_selection" value="true"/>
            <operator name="Mapping" class="Mapping">
                <parameter key="attributes" value=".*"/>
                <list key="value_mappings">
                </list>
                <parameter key="replace_what" value="?"/>
                <parameter key="replace_by" value="false"/>
                <parameter key="add_default_mapping" value="true"/>
                <parameter key="default_value" value="true"/>
            </operator>
        </operator>
    </operator>
    Hope that solves your problem,
    Tobias
  • wasylniwasylni Member Posts: 6 Contributor II
    Hello,
    Thanks for all your replies guys!

    I have tested all proposed solutions to my problem and it looks like the best for further progres will be simplest solution proposed by keith,
    as I am running out of memory on the one proposed by steffen, and there seems to be some problem with Tobias solution(or maybe its me ???).


    What I have done is:
    connected to MySQL database
    I`ve applied sql
       <parameter key="query"	value="SELECT&#10;  `orders_products`.`orders_id` AS `TID`,&#10;  GROUP_CONCAT(`orders_products`.`products_id` ORDER BY 1 SEPARATOR &#39;,&#39;) as `ITEM`&#10;FROM&#10;  `orders_products` GROUP BY `orders_products`.`orders_id`;"/>
            <parameter key="label_attribute" value="ITEM"/>
            <parameter key="id_attribute" value="TID"/>
    and I have result as shown below:

    Row No TID      ITEM
    1 1.0 18
    2 2.0 14
    3 3.0 18
    4 4.0 26
    5 5.0 4,16
    6 6.0 6
    7 7.0 18,19
    8 8.0 10,21
    9 9.0 7,14
    10 10.0 26
    11 11.0 12,17
    12 12.0 12
    13 13.0 1
    14 14.0 2,14,23,25

    .
    .
    .
    up to transactions15000


    can you please advise what do I have to apply next in order to get some rules of above data set?

    Thanks,
    wasylni


  • wasylniwasylni Member Posts: 6 Contributor II
    I think I might have found solution:

    I will apply

    1)GROUP_CONCAT `orders_products`.`products_id` as ITEM

    2)split

    3)apply FPGrowth

    4)appl AM

    what do you think ?
    <operator name="Root" class="Process" expanded="yes">
        <operator name="DatabaseExampleSource" class="DatabaseExampleSource">


    SESION DETAILS HERE



            <parameter key="query" value="SELECT&#10;  GROUP_CONCAT(`orders_products`.`products_id` ORDER BY 1 SEPARATOR &#39;,&#39;) as `ITEM`&#10;FROM&#10;  `orders_products` GROUP BY `orders_products`.`orders_id`;"/>
        </operator>
        <operator name="Split" class="Split">
            <parameter key="attributes" value="ITEM"/>
            <parameter key="apply_to_special_features" value="true"/>
            <parameter key="split_mode" value="unordered_split"/>
        </operator>
        <operator name="FPGrowth" class="FPGrowth">
            <parameter key="find_min_number_of_itemsets" value="false"/>
            <parameter key="min_support" value="0.0010"/>
        </operator>
        <operator name="AssociationRuleGenerator" class="AssociationRuleGenerator">
            <parameter key="keep_frequent_item_sets" value="true"/>
            <parameter key="min_confidence" value="0.5"/>
        </operator>
    </operator>
  • TobiasMalbrechtTobiasMalbrecht Moderator, Employee, Member Posts: 289  Guru
    Hi,

    if it works (?) this seems to be a solution to your problem! ;) The steps after having loaded the data seems to be ok, if the data is loaded as you have posted.

    Kind regards,
    Tobias
Sign In or Register to comment.