Can i do this with rapidminer?

rst · September 2008

hello,

im just beginning with data mining and i was wondering if i can use rapidminer for my needs.

i have the following data:

ProjectId Contributors Subjects DOP
1 {"joe","Karen"} {"Data mining","BI","html"} 11/16/2007
2 {""} {"modern literature"} 06/05/2000
3 {"michael","roger","jen"} {"medicine"} 09/09/1998
4 {"ken","karen"} {"web design", "html", "flash", "css"} 01/12/2004
5 {"steve", "andrew",ken} {"BI"} 02/06/2003

I want to calculate or predict the probability of each project given the following user selections:

Contributors:
joe
ken
andrew
michael
jen

Subjects
html
BI
modern literature

Can i do this with rapidminer? i've been trying out the examples but i cant really see how i can do this, if anyone can provide any info
i will be greatly appreciated.

thanks.

rst · September 2008

Anyone? :-\

steffen · September 2008

Hello

I cannot figure out your predictiontask. I understand you this way: You want to predict the projectid given a user (or a set of users) and a subject (or a set of subjects).

In this case I am afraid you got to change the way your data is stored. E.g.: Instead of
1 {"joe","Karen"} {"Data mining","BI","html"}
you need something like

ProjectId Joe Karen Michael etc.. DataMining BI html medicine etc...
1 1 1 0 1 1 1 0

understand ?
The resulting matrix will allow you to learn models (or calculate probabilities approximately, e.g. per NaiveBayes), converting the selection of users and subjects to the same format will allow you create predictions.

But:
1. The mentioned conversion task cannot be done in RapidMiner (as far as I see)
2. If you donot have much more data with repetitions of users and subjects, the resulting probabilities will be very small, it is further possible that some learners will crash or calculate strange results
3. I cannot get rid of the feeling that this is a task for discrete mathematics, not for Data Mining = RapidMiner. If you want to calculate the exact probabilities (!) instead of approximations, you have to look for another way. Seems to be more like a job for a sheet of paper instead a tool...

hope this was helpful

Steffen

rst · September 2008

Thank you Steffen,

I understand how i am suppose to change my data i have no problems with that. I am looking for probability estimates rather than exact probabilities, my issue is that i am completely new to the field of data mining and rapidminer and i am not sure which algorithms, learners or classifiers i am suppose to use for this task, which is basically probability estimates for each of the project id's given the search parameters (subjects, contributors...etc).

Thank you again Steffen.

steffen · September 2008

You are welcome !

Feel free to come back and ask more questions

greetings

Steffen

IngoRM · September 2008

Hi,

actually, the transformation task can be done with RapidMiner. It should be possible with the Nominal2Binominal operator. The result will be the matrix from which the "prediction" models can be learned.

Cheers,
Ingo

steffen · September 2008

Hm partly

The main problem is to split up the sets {...} automatically. This is what not cannot be done by RapidMiner (as far as I know

)

1 {"joe","Karen"} {"Data mining","BI","html"} 11/16/2007

once the sets are split up, Nominal2Binominal can be applied.

greetings

Steffen

IngoRM · September 2008

Hi,

of course, you are absolutely right. For this step one would need to create a new operator.

Cheers,
Ingo

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Can i do this with rapidminer?

Answers