Options

Spend Data classification to UNSPSC

SolihullSolihull Member Posts: 6 Contributor II
edited November 2018 in Help
Hi all - I'm new and this is my first post.

Do you think that spend data classification to UNSPSC (or any other given taxonomy) can be achieved using Rapid-i?

Before would look like this:
Description                                                    Supplier
cartouche pour 5SIMX                                  INMAC

After would look like this:
Description                                                    Supplier                                          USPSC (Could be eClass or any other taxonomy inc in house)
cartouche pour 5SIMX                                  INMAC                                            44103105    Ink cartridges

Answers

  • Options
    haddockhaddock Member Posts: 849 Maven
    Of course, anything can be achieved with RapidMiner  ;D
  • Options
    SolihullSolihull Member Posts: 6 Contributor II
    Many thanks for your reply Haddock - If anyone is willing to comment on the outline approach needed then I'd very much appreciate it.
  • Options
    landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,
    originally RapidMiner is designed to learn classification tasks from examples of previously human classified data. It seems to me, your problem does not have this data? It seems rather to be some sort of lookup problem in a great directory? If I'm wrong, feel free to explain, how the new attributes where assigend to the "cartouche pour 5SIMX" thing.
    But anyway, Haddock is right: You might achieve anything with rapid miner, the question is more or less, how complex things may become.

    And something you might be interested in, even before thinking about rapidminer: It's open source, so if you offer it's result to your customers inside a program or website, you might have to give them access to your code...

    Greetings,
      Sebastian
  • Options
    SolihullSolihull Member Posts: 6 Contributor II
    Hi Sebastian and thank you for your comments.

    Let me give you another example along with the process:

    Example 1
    Description = VAIO FW48E/H Laptop
    Supplier = Sony

    To classify this example manually one would recognize the word Laptop and so classify it against UNSPSC code = 43211509  UNSPSC Description = Laptop / Notebook PCs.

    Example 2
    Description = VAIO FW48E/H
    Supplier = Sony

    To classify this example manually one would recognize the word VAIO and so classify it against UNSPSC code = 43211509  UNSPSC Description = Laptop / Notebook PCs.

    Example 3
    Description = FW48E/H
    Supplier = Sony

    To classify this example manually one would need to look up on the Sony website to find out what FW48E/H related to before being able to classify it against UNSPSC code = 43211509  UNSPSC Description = Laptop / Notebook PCs.

    Or looking at it from another angle:
    If we see Sony as a supplier then based on previous experience of them as a supplier one would expect them to be supplying games, computers TVs etc
    If one then see VAIO in the description we would know that it's referring to a laptop because of our previos knowledge of what a VAIO is when supplied by SONY.

    What I need to be able to achieve is to look at a line of data (Description + supplier) and then allocate a UNSPSC code/description to it. Having done this once I then need the software to be able to learn that the words laptop and VIAO are related to the UNSPSC code for laptops and that the presence of Sony as a supplier just concretes the case.

    This would be achieved by reading the text found in Description and Supplier then identifying the words VIAO, Laptop and Sony from the text strings before using those to classify to the UNSPSC code and Description.

    I hope this has explained things a little better.

    Thanks in advance for any additional advice.
  • Options
    haddockhaddock Member Posts: 849 Maven
    G'Day,

    Although it may be possible to do brain surgery with a power drill, it may not always be optimal so to do.

    It is no different with using RM in this scenario, because RM's main purpose is to winkle out patterns in data; but if any string could have UNSPSC code 43211509 what patterns would there be in your data ?

    Indeed in the darker parts of eastern France the Sony Vaio might be the appellation of a strong cheese, why not?

    Just ponderin'

  • Options
    landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,
    I think you might do this using the TextPlugin and TextClassification. But I share haddocks doubt, that this will perform very well on new data. But again, this very much depends on the data you have and on the data you are going to classify, so I cannot predict something without taking a look on the complete data. Might be, there are some useful informations one could extract from the data.
    If you want, you might contact us for setting up a small start-up project, where we could test together if it works, or you might try it yourself.

    Greetings,
      Sebastian

  • Options
    SolihullSolihull Member Posts: 6 Contributor II
    Thanks Sebastian - What do I need to do to set up a small start-up project with you - it sounds like a good idea.
  • Options
    landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,
    simply write an email to contact@rapid-i.com. This will be received by the responsible persons and I will go and explain them all I know about your problem, so he can decide how to proceed.

    Greetings,
      Sebastian
  • Options
    SolihullSolihull Member Posts: 6 Contributor II
    Sebastian - thank you again for your help.
  • Options
    haddockhaddock Member Posts: 849 Maven
    Actually this may not be so easy; I've just been to http://www.unspsc.org/search.asp and name searched on "laptop" with a result of "No Record found".
  • Options
    SolihullSolihull Member Posts: 6 Contributor II
    True - it shows up under 43211503 Notebook computers
  • Options
    haddockhaddock Member Posts: 849 Maven
    To classify this example manually one would recognize the word Laptop and so classify it against UNSPSC code = 43211509  UNSPSC Description = Laptop / Notebook PCs.
    ?
Sign In or Register to comment.