Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.

"Market Basket Data Format"

svpriyansvpriyan Member Posts: 29 Maven
edited May 2019 in Help
Hello Colleagues,
I am having a relation with TIDs, ITEM IDs.

TID  ITEM
1      1
1      2
  1      3
2      1
3      4
3        5
3        6

Now,  I am intended to change that into Market Basket Data Format which might  look like

TID    ITEM
1      1 2 3
2      1
3      4 5 6
4      1 8

Is that possible to do with RapidMiner?
Could any one help me on this

Thanks
Priyan

Answers

  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,
    this is indeed possible. You should at first binarize your Item attribute using the nominal2binominal operator. You then will get a column for every possible value of item, each Line exactly containing one 1 for an item.
    You then could aggregrate over the tid using the aggregation operator, building the sum over examples having the same tid. So there is finally only one row for every transaction, containing the values of sold items in the appropriate attributes.

    Greetings,
      Sebastian
  • svpriyansvpriyan Member Posts: 29 Maven
    Hai,
    Thanks for the Information, I tried what you explained here, but i still in error. could you suggest to improve it.
    ERROR:- TID does not exists.
    Thanks
    Priyan


    <operator name="Root" class="Process" expanded="yes">
        <operator name="CSVExampleSource" class="CSVExampleSource">
            <parameter key="filename" value="C:\rapid.csv"/>
        </operator>
        <operator name="Nominal2Binominal" class="Nominal2Binominal">
        </operator>
        <operator name="ChangeAttributeRole" class="ChangeAttributeRole">
            <parameter key="name" value="ITEM"/>
        </operator>
        <operator name="Aggregation" class="Aggregation">
            <list key="aggregation_attributes">
              <parameter key="TID" value="sum"/>
            </list>
        </operator>
        <operator name="Example2AttributePivoting" class="Example2AttributePivoting">
            <parameter key="group_attribute" value="TID"/>
            <parameter key="index_attribute" value="ITEM"/>
        </operator>
        <operator name="ResultWriter" class="ResultWriter">
            <parameter key="result_file" value="C:\result16.res"/>
        </operator>
    </operator>

  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,
    probably it's exactly what it states: TID doesn't exist. Please use a breakpoint before and check if this attribute still exists. But I assume, that it has been binomalised and hence doesn't exist anymore. You have to ensure that only ITEM is binomalised. Thatfore you have to change the role of TID into Id and ensure that ITEM is binominal.

    Greetings,
      Sebastian
  • svpriyansvpriyan Member Posts: 29 Maven
    Hai,
    Thanks for the Info. I changed according to your feedback though still  i am in problem.
    for Example2AttributePivoting :- Group Attribute & Index Attribute.
    What can i do with these two attributes.

    <operator name="Root" class="Process" expanded="yes">
        <operator name="CSVExampleSource" class="CSVExampleSource">
            <parameter key="filename" value="C:\Documents and Settings\Administrator\Desktop\excel\rapid.csv"/>
        </operator>
        <operator name="Nominal2Binominal" class="Nominal2Binominal">
        </operator>
        <operator name="ChangeAttributeRole" class="ChangeAttributeRole">
            <parameter key="name" value="TID"/>
            <parameter key="target_role" value="id"/>
        </operator>
        <operator name="Aggregation" class="Aggregation">
            <list key="aggregation_attributes">
              <parameter key="TID" value="sum"/>
            </list>
        </operator>
        <operator name="Example2AttributePivoting" class="Example2AttributePivoting">
            <parameter key="group_attribute" value="ITEM"/>
            <parameter key="index_attribute" value="TID"/>
        </operator>
        <operator name="ResultWriter" class="ResultWriter">
            <parameter key="result_file" value="C:\Documents and Settings\Administrator\Desktop\answer16.res"/>
        </operator>
    </operator>
    Thanks

    Priyan
  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,
    I have no idea, what this operator does. I have never used it. But I think you will not need it in this context. The data should be in the correct format before this operator.

    Greetings,
      Sebastian
  • svpriyansvpriyan Member Posts: 29 Maven
    Hai,
    I am really sorry to ask again here, i got stuck with this. sorry for troubling you.
    the code i used

    <operator name="Root" class="Process" expanded="yes">
        <operator name="CSVExampleSource" class="CSVExampleSource">
            <parameter key="filename" value="C:\Documents and Settings\Administrator\Desktop\excel\rapid.csv"/>
        </operator>
        <operator name="Nominal2Binominal" class="Nominal2Binominal">
        </operator>
        <operator name="ChangeAttributeRole" class="ChangeAttributeRole">
            <parameter key="name" value="TID"/>
            <parameter key="target_role" value="id"/>
        </operator>
        <operator name="Aggregation" class="Aggregation">
            <list key="aggregation_attributes">
              <parameter key="TID" value="sum"/>
            </list>
        </operator>
        <operator name="ResultWriter" class="ResultWriter">
            <parameter key="result_file" value="C:\Documents and Settings\Administrator\Desktop\answer16.res"/>
        </operator>
    </operator>
    What i can use for the Group by attribute on the image i attached in the Aggregation. (image 1)
    Do i need to add more than this to format the data.
    i get finally this result only( image 2)

    It would be a great help if you could give a feedback sir.

    thanks
    priyan


    [attachment deleted by admin]
  • IngoRMIngoRM Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    Hi,

    starting from your original post (and using the data you provided there) this is the process which performs the desired transformation from transactional data to the basket data format:

    <operator name="Root" class="Process" expanded="yes">
        <operator name="SimpleExampleSource" class="SimpleExampleSource">
            <parameter key="filename" value="C:\Dokumente und Einstellungen\Mierswa\Desktop\market_data.txt"/>
            <parameter key="read_attribute_names" value="true"/>
        </operator>
        <operator name="IdTagging" class="IdTagging">
        </operator>
        <operator name="ChangeAttributeRole" class="ChangeAttributeRole">
            <parameter key="name" value="id"/>
        </operator>
        <operator name="Example2AttributePivoting" class="Example2AttributePivoting">
            <parameter key="group_attribute" value="TID"/>
            <parameter key="index_attribute" value="ITEM"/>
        </operator>
        <operator name="Numerical2Polynominal" class="Numerical2Polynominal">
        </operator>
        <operator name="AttributeSubsetPreprocessing" class="AttributeSubsetPreprocessing" expanded="yes">
            <parameter key="condition_class" value="attribute_name_filter"/>
            <parameter key="attribute_name_regex" value="TID"/>
            <parameter key="invert_selection" value="true"/>
            <operator name="Mapping" class="Mapping">
                <parameter key="attributes" value=".*"/>
                <list key="value_mappings">
                </list>
                <parameter key="replace_what" value="?"/>
                <parameter key="replace_by" value="false"/>
                <parameter key="add_default_mapping" value="true"/>
                <parameter key="default_value" value="true"/>
            </operator>
        </operator>
        <operator name="FPGrowth" class="FPGrowth">
        </operator>
    </operator>
    <operator name="Root" class="Process" expanded="yes">
        <operator name="SimpleExampleSource" class="SimpleExampleSource">
            <parameter key="filename" value="C:\Dokumente und Einstellungen\Mierswa\Desktop\market_data.txt"/>
            <parameter key="read_attribute_names" value="true"/>
        </operator>
        <operator name="IdTagging" class="IdTagging">
        </operator>
        <operator name="ChangeAttributeRole" class="ChangeAttributeRole">
            <parameter key="name" value="id"/>
        </operator>
        <operator name="Example2AttributePivoting" class="Example2AttributePivoting">
            <parameter key="group_attribute" value="TID"/>
            <parameter key="index_attribute" value="ITEM"/>
        </operator>
        <operator name="Numerical2Polynominal" class="Numerical2Polynominal">
        </operator>
        <operator name="AttributeSubsetPreprocessing" class="AttributeSubsetPreprocessing" expanded="yes">
            <parameter key="condition_class" value="attribute_name_filter"/>
            <parameter key="attribute_name_regex" value="TID"/>
            <parameter key="invert_selection" value="true"/>
            <operator name="Mapping" class="Mapping">
                <parameter key="attributes" value=".*"/>
                <list key="value_mappings">
                </list>
                <parameter key="replace_what" value="?"/>
                <parameter key="replace_by" value="false"/>
                <parameter key="add_default_mapping" value="true"/>
                <parameter key="default_value" value="true"/>
            </operator>
        </operator>
        <operator name="FPGrowth" class="FPGrowth">
        </operator>
    </operator>

    Please note that you have to adapt the input operator.

    Cheers,
    Ingo
  • svpriyansvpriyan Member Posts: 29 Maven
    Dear Ingo,
    Thank you very much.
    I able to find the way.
    thanks
    Priyan
Sign In or Register to comment.