Options

"Market Basket Data Format"

svpriyansvpriyan Member Posts: 29 Maven
edited May 2019 in Help
Hello Colleagues,
I am having a relation with TIDs, ITEM IDs.

TID  ITEM
1      1
1      2
  1      3
2      1
3      4
3        5
3        6

Now,  I am intended to change that into Market Basket Data Format which might  look like

TID    ITEM
1      1 2 3
2      1
3      4 5 6
4      1 8

Is that possible to do with RapidMiner?
Could any one help me on this

Thanks
Priyan

Answers

  • Options
    landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,
    this is indeed possible. You should at first binarize your Item attribute using the nominal2binominal operator. You then will get a column for every possible value of item, each Line exactly containing one 1 for an item.
    You then could aggregrate over the tid using the aggregation operator, building the sum over examples having the same tid. So there is finally only one row for every transaction, containing the values of sold items in the appropriate attributes.

    Greetings,
      Sebastian
  • Options
    svpriyansvpriyan Member Posts: 29 Maven
    Hai,
    Thanks for the Information, I tried what you explained here, but i still in error. could you suggest to improve it.
    ERROR:- TID does not exists.
    Thanks
    Priyan


    <operator name="Root" class="Process" expanded="yes">
        <operator name="CSVExampleSource" class="CSVExampleSource">
            <parameter key="filename" value="C:\rapid.csv"/>
        </operator>
        <operator name="Nominal2Binominal" class="Nominal2Binominal">
        </operator>
        <operator name="ChangeAttributeRole" class="ChangeAttributeRole">
            <parameter key="name" value="ITEM"/>
        </operator>
        <operator name="Aggregation" class="Aggregation">
            <list key="aggregation_attributes">
              <parameter key="TID" value="sum"/>
            </list>
        </operator>
        <operator name="Example2AttributePivoting" class="Example2AttributePivoting">
            <parameter key="group_attribute" value="TID"/>
            <parameter key="index_attribute" value="ITEM"/>
        </operator>
        <operator name="ResultWriter" class="ResultWriter">
            <parameter key="result_file" value="C:\result16.res"/>
        </operator>
    </operator>

  • Options
    landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,
    probably it's exactly what it states: TID doesn't exist. Please use a breakpoint before and check if this attribute still exists. But I assume, that it has been binomalised and hence doesn't exist anymore. You have to ensure that only ITEM is binomalised. Thatfore you have to change the role of TID into Id and ensure that ITEM is binominal.

    Greetings,
      Sebastian
  • Options
    svpriyansvpriyan Member Posts: 29 Maven
    Hai,
    Thanks for the Info. I changed according to your feedback though still  i am in problem.
    for Example2AttributePivoting :- Group Attribute & Index Attribute.
    What can i do with these two attributes.

    <operator name="Root" class="Process" expanded="yes">
        <operator name="CSVExampleSource" class="CSVExampleSource">
            <parameter key="filename" value="C:\Documents and Settings\Administrator\Desktop\excel\rapid.csv"/>
        </operator>
        <operator name="Nominal2Binominal" class="Nominal2Binominal">
        </operator>
        <operator name="ChangeAttributeRole" class="ChangeAttributeRole">
            <parameter key="name" value="TID"/>
            <parameter key="target_role" value="id"/>
        </operator>
        <operator name="Aggregation" class="Aggregation">
            <list key="aggregation_attributes">
              <parameter key="TID" value="sum"/>
            </list>
        </operator>
        <operator name="Example2AttributePivoting" class="Example2AttributePivoting">
            <parameter key="group_attribute" value="ITEM"/>
            <parameter key="index_attribute" value="TID"/>
        </operator>
        <operator name="ResultWriter" class="ResultWriter">
            <parameter key="result_file" value="C:\Documents and Settings\Administrator\Desktop\answer16.res"/>
        </operator>
    </operator>
    Thanks

    Priyan
  • Options
    landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,
    I have no idea, what this operator does. I have never used it. But I think you will not need it in this context. The data should be in the correct format before this operator.

    Greetings,
      Sebastian
  • Options
    svpriyansvpriyan Member Posts: 29 Maven
    Hai,
    I am really sorry to ask again here, i got stuck with this. sorry for troubling you.
    the code i used

    <operator name="Root" class="Process" expanded="yes">
        <operator name="CSVExampleSource" class="CSVExampleSource">
            <parameter key="filename" value="C:\Documents and Settings\Administrator\Desktop\excel\rapid.csv"/>
        </operator>
        <operator name="Nominal2Binominal" class="Nominal2Binominal">
        </operator>
        <operator name="ChangeAttributeRole" class="ChangeAttributeRole">
            <parameter key="name" value="TID"/>
            <parameter key="target_role" value="id"/>
        </operator>
        <operator name="Aggregation" class="Aggregation">
            <list key="aggregation_attributes">
              <parameter key="TID" value="sum"/>
            </list>
        </operator>
        <operator name="ResultWriter" class="ResultWriter">
            <parameter key="result_file" value="C:\Documents and Settings\Administrator\Desktop\answer16.res"/>
        </operator>
    </operator>
    What i can use for the Group by attribute on the image i attached in the Aggregation. (image 1)
    Do i need to add more than this to format the data.
    i get finally this result only( image 2)

    It would be a great help if you could give a feedback sir.

    thanks
    priyan


    [attachment deleted by admin]
  • Options
    IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    Hi,

    starting from your original post (and using the data you provided there) this is the process which performs the desired transformation from transactional data to the basket data format:

    <operator name="Root" class="Process" expanded="yes">
        <operator name="SimpleExampleSource" class="SimpleExampleSource">
            <parameter key="filename" value="C:\Dokumente und Einstellungen\Mierswa\Desktop\market_data.txt"/>
            <parameter key="read_attribute_names" value="true"/>
        </operator>
        <operator name="IdTagging" class="IdTagging">
        </operator>
        <operator name="ChangeAttributeRole" class="ChangeAttributeRole">
            <parameter key="name" value="id"/>
        </operator>
        <operator name="Example2AttributePivoting" class="Example2AttributePivoting">
            <parameter key="group_attribute" value="TID"/>
            <parameter key="index_attribute" value="ITEM"/>
        </operator>
        <operator name="Numerical2Polynominal" class="Numerical2Polynominal">
        </operator>
        <operator name="AttributeSubsetPreprocessing" class="AttributeSubsetPreprocessing" expanded="yes">
            <parameter key="condition_class" value="attribute_name_filter"/>
            <parameter key="attribute_name_regex" value="TID"/>
            <parameter key="invert_selection" value="true"/>
            <operator name="Mapping" class="Mapping">
                <parameter key="attributes" value=".*"/>
                <list key="value_mappings">
                </list>
                <parameter key="replace_what" value="?"/>
                <parameter key="replace_by" value="false"/>
                <parameter key="add_default_mapping" value="true"/>
                <parameter key="default_value" value="true"/>
            </operator>
        </operator>
        <operator name="FPGrowth" class="FPGrowth">
        </operator>
    </operator>
    <operator name="Root" class="Process" expanded="yes">
        <operator name="SimpleExampleSource" class="SimpleExampleSource">
            <parameter key="filename" value="C:\Dokumente und Einstellungen\Mierswa\Desktop\market_data.txt"/>
            <parameter key="read_attribute_names" value="true"/>
        </operator>
        <operator name="IdTagging" class="IdTagging">
        </operator>
        <operator name="ChangeAttributeRole" class="ChangeAttributeRole">
            <parameter key="name" value="id"/>
        </operator>
        <operator name="Example2AttributePivoting" class="Example2AttributePivoting">
            <parameter key="group_attribute" value="TID"/>
            <parameter key="index_attribute" value="ITEM"/>
        </operator>
        <operator name="Numerical2Polynominal" class="Numerical2Polynominal">
        </operator>
        <operator name="AttributeSubsetPreprocessing" class="AttributeSubsetPreprocessing" expanded="yes">
            <parameter key="condition_class" value="attribute_name_filter"/>
            <parameter key="attribute_name_regex" value="TID"/>
            <parameter key="invert_selection" value="true"/>
            <operator name="Mapping" class="Mapping">
                <parameter key="attributes" value=".*"/>
                <list key="value_mappings">
                </list>
                <parameter key="replace_what" value="?"/>
                <parameter key="replace_by" value="false"/>
                <parameter key="add_default_mapping" value="true"/>
                <parameter key="default_value" value="true"/>
            </operator>
        </operator>
        <operator name="FPGrowth" class="FPGrowth">
        </operator>
    </operator>

    Please note that you have to adapt the input operator.

    Cheers,
    Ingo
  • Options
    svpriyansvpriyan Member Posts: 29 Maven
    Dear Ingo,
    Thank you very much.
    I able to find the way.
    thanks
    Priyan
Sign In or Register to comment.