Can i do this with rapidminer?

rstrst Member Posts: 3 Contributor I
edited November 2018 in Help
hello,

im just beginning with data mining and i was wondering if i can use rapidminer for my needs.

i have the following data:

ProjectId  Contributors                      Subjects                                                DOP
1              {"joe","Karen"}                  {"Data mining","BI","html"}                      11/16/2007
2              {""}                                    {"modern literature"}                              06/05/2000
3              {"michael","roger","jen"}  {"medicine"}                                            09/09/1998
4              {"ken","karen"}                {"web design", "html", "flash", "css"}    01/12/2004
5              {"steve", "andrew",ken}  {"BI"}                                                      02/06/2003


I want to calculate or predict the probability of each project given the following user selections:

Contributors:
joe
ken
andrew
michael
jen

Subjects
html
BI
modern literature


Can i do this with rapidminer? i've been trying out the examples but i cant really see how i can do this, if anyone can provide any info
i will be greatly appreciated.

thanks.

Answers

  • rstrst Member Posts: 3 Contributor I
    Anyone?  :-\
  • steffensteffen Member Posts: 347 Maven
    Hello

    I cannot figure out your predictiontask. I understand you this way: You want to predict the projectid given a user (or a set of users) and a subject (or a set of subjects).

    In this case I am afraid you got to change the way your data is stored. E.g.: Instead of
    1              {"joe","Karen"}                  {"Data mining","BI","html"}     
    you need something like

    ProjectId  Joe      Karen       Michael      etc..  DataMining BI html medicine  etc... 
    1               1          1               0                          1                    1    1        0

    understand ?
    The resulting matrix will allow you to learn models (or calculate probabilities approximately, e.g. per NaiveBayes), converting the selection of users and subjects to the same format will allow you create predictions.

    But:
    1. The mentioned conversion task cannot be done in RapidMiner (as far as I see)
    2. If you donot have much more data with repetitions of users and subjects, the resulting probabilities will be very small, it is further possible that some learners will crash or calculate strange results
    3. I  cannot get rid of the feeling  that this is a task for discrete mathematics, not for Data Mining = RapidMiner. If you want to calculate the exact probabilities (!) instead of approximations, you have to look for another way. Seems to be more like a job for a sheet of paper instead a tool...

    hope this was helpful

    Steffen
  • rstrst Member Posts: 3 Contributor I
    Thank you Steffen,

    I understand how i am suppose to change my data i have no problems with that. I am looking for probability estimates rather than exact probabilities, my issue is that i am completely new to the field of data mining and rapidminer and i am not sure which algorithms, learners or classifiers i am suppose to use for this task, which is basically probability estimates for each of the project id's given the search parameters (subjects, contributors...etc).

    Thank you again Steffen.
  • steffensteffen Member Posts: 347 Maven
    You are welcome !

    Feel free to come back and ask more questions  :)

    greetings

    Steffen
  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    Hi,

    actually, the transformation task can be done with RapidMiner. It should be possible with the Nominal2Binominal operator. The result will be the matrix from which the "prediction" models can be learned.

    Cheers,
    Ingo
  • steffensteffen Member Posts: 347 Maven
    Hm partly

    The main problem is to split up the sets {...} automatically. This is what not cannot be done by RapidMiner (as far as I know ;) )

    1              {"joe","Karen"}                  {"Data mining","BI","html"}                      11/16/2007

    once the sets are split up, Nominal2Binominal can be applied.

    greetings

    Steffen
  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    Hi,

    of course, you are absolutely right. For this step one would need to create a new operator.

    Cheers,
    Ingo
Sign In or Register to comment.