Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.

cross selling

svpriyansvpriyan Member Posts: 29 Maven
edited November 2018 in Help
Hai,
This is Priyan,

My .xls data looks like this
c- customer id  and t- item id...

c1      t1
c1    t2
c1    t3
c1    t4
c1    t5
c2    t1
c2    t2
c3    t1
c4    t4
c5    t5
c6    t6

and i need to make it as this, using rapid miner.. ! Is any poss to do this

transaction_id          t1                t2
c1                            1                1
c2                            1                1
c3                            1                0
c4                            0                1

Is any one knows it..
thanks

Answers

  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi Priyan,
    you could binarize the column with the items, convert the nominal trues into numerical 1s and then aggregate over the customer ids. Since the items seem to be different the trues should be disjoint and hence can simply be added.

    Greetings,
      Sebastian
  • earmijoearmijo Member Posts: 271 Unicorn
    I was after the answer to this question too. Thanks Sebastian. Here's my not-so-elegant translation of your answer. My variables are called "tid" and "item". Of course, for a large number of items listing each variable in the aggregation operator is very tedious.


    [tt]<operator name="Root" class="Process" expanded="yes">
        <operator name="CSVExampleSource" class="CSVExampleSource">
            <parameter key="filename" value="c:\list.csv"/>
        </operator>
        <operator name="AttributeSubsetPreprocessing" class="AttributeSubsetPreprocessing" expanded="yes">
            <parameter key="attribute_name_regex" value="item"/>
            <parameter key="condition_class" value="attribute_name_filter"/>
            <operator name="Nominal2Binominal" class="Nominal2Binominal">
            </operator>
            <operator name="Nominal2Numerical" class="Nominal2Numerical">
            </operator>
        </operator>
        <operator name="Aggregation" class="Aggregation">
            <list key="aggregation_attributes">
              <parameter key="item = i1" value="sum"/>
              <parameter key="item = i2" value="sum"/>
              <parameter key="item = i3" value="sum"/>
              <parameter key="item = i4" value="sum"/>
              <parameter key="item = i5" value="sum"/>
              <parameter key="item = i6" value="sum"/>
            </list>
            <parameter key="group_by_attributes" value="tid"/>
        </operator>
    </operator>
    [/tt]
  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,
    thats pretty correct and I didn't thought of this...Ok: Here is another way to do it
    <operator name="Root" class="Process" expanded="yes">
        <operator name="NominalExampleSetGenerator" class="NominalExampleSetGenerator">
            <parameter key="number_of_attributes" value="1"/>
        </operator>
        <operator name="ChangeAttributeRole" class="ChangeAttributeRole">
            <parameter key="name" value="label"/>
        </operator>
        <operator name="Aggregation" class="Aggregation">
            <list key="aggregation_attributes">
              <parameter key="att1" value="count"/>
            </list>
            <parameter key="group_by_attributes" value="label|att1"/>
        </operator>
        <operator name="Example2AttributePivoting" class="Example2AttributePivoting">
            <parameter key="group_attribute" value="label"/>
            <parameter key="index_attribute" value="att1"/>
        </operator>
    </operator>
    Greetings,
      Sebastian
  • svpriyansvpriyan Member Posts: 29 Maven
    Hai
    Thanks for you all,
    I have some doubts here,
    I had a mistake initially to write the specif clearly.
    cid, item both are numeric stage,  - in this case , is it worth to use  Nom to Binomial.
    It seems  diff to use the first answer because i have 1000 customers and 500 items. do any possibilities to modify this !
    thanks again
    Priyan
  • earmijoearmijo Member Posts: 271 Unicorn
    Thanks again Sebastian. Your code is indeed elegant.

    Svpriyan: If the id variable and the item variable are numeric just add a filter to convert them to nominal.

    Here's the data (stored in the file list.csv):

    tid item
    1 1
    1 2
    1 3
    2 1
    2 4
    3 5
    3 4
    3 6
    4 1


    Here's the final code as I'm implementing it to solve my question:
    <operator name="Root" class="Process" expanded="yes">
        <operator name="CSVExampleSource" class="CSVExampleSource">
            <parameter key="filename" value="c:\list.csv"/>
        </operator>
        <operator name="Numerical2Polynominal" class="Numerical2Polynominal">
        </operator>
        <operator name="Aggregation" class="Aggregation">
            <list key="aggregation_attributes">
              <parameter key="item" value="count"/>
            </list>
            <parameter key="group_by_attributes" value="tid|item"/>
        </operator>
        <operator name="Example2AttributePivoting" class="Example2AttributePivoting">
            <parameter key="group_attribute" value="tid"/>
            <parameter key="index_attribute" value="item"/>
        </operator>
    </operator>
Sign In or Register to comment.