Options

# Can i do this with rapidminer?

hello,

im just beginning with data mining and i was wondering if i can use rapidminer for my needs.

i have the following data:

ProjectId Contributors Subjects DOP

1 {"joe","Karen"} {"Data mining","BI","html"} 11/16/2007

2 {""} {"modern literature"} 06/05/2000

3 {"michael","roger","jen"} {"medicine"} 09/09/1998

4 {"ken","karen"} {"web design", "html", "flash", "css"} 01/12/2004

5 {"steve", "andrew",ken} {"BI"} 02/06/2003

I want to calculate or predict the probability of each project given the following user selections:

Contributors:

joe

ken

andrew

michael

jen

Subjects

html

BI

modern literature

Can i do this with rapidminer? i've been trying out the examples but i cant really see how i can do this, if anyone can provide any info

i will be greatly appreciated.

thanks.

im just beginning with data mining and i was wondering if i can use rapidminer for my needs.

i have the following data:

ProjectId Contributors Subjects DOP

1 {"joe","Karen"} {"Data mining","BI","html"} 11/16/2007

2 {""} {"modern literature"} 06/05/2000

3 {"michael","roger","jen"} {"medicine"} 09/09/1998

4 {"ken","karen"} {"web design", "html", "flash", "css"} 01/12/2004

5 {"steve", "andrew",ken} {"BI"} 02/06/2003

I want to calculate or predict the probability of each project given the following user selections:

Contributors:

joe

ken

andrew

michael

jen

Subjects

html

BI

modern literature

Can i do this with rapidminer? i've been trying out the examples but i cant really see how i can do this, if anyone can provide any info

i will be greatly appreciated.

thanks.

0

## Answers

3Contributor I347MavenI cannot figure out your predictiontask. I understand you this way: You want to predict the projectid given a user (or a set of users) and a subject (or a set of subjects).

In this case I am afraid you got to change the way your data is stored. E.g.: Instead of

1 {"joe","Karen"} {"Data mining","BI","html"}

you need something like

ProjectId Joe Karen Michael etc.. DataMining BI html medicine etc...

1 1 1 0 1 1 1 0

understand ?

The resulting matrix will allow you to learn models (or calculate probabilities approximately, e.g. per NaiveBayes), converting the selection of users and subjects to the same format will allow you create predictions.

But:

1. The mentioned conversion task cannot be done in RapidMiner (as far as I see)

2. If you donot have much more data with repetitions of users and subjects, the resulting probabilities will be very small, it is further possible that some learners will crash or calculate strange results

3. I cannot get rid of the feeling that this is a task for discrete mathematics, not for Data Mining = RapidMiner. If you want to calculate the exact probabilities (!) instead of approximations, you have to look for another way. Seems to be more like a job for a sheet of paper instead a tool...

hope this was helpful

Steffen

3Contributor II understand how i am suppose to change my data i have no problems with that. I am looking for probability estimates rather than exact probabilities, my issue is that i am completely new to the field of data mining and rapidminer and i am not sure which algorithms, learners or classifiers i am suppose to use for this task, which is basically probability estimates for each of the project id's given the search parameters (subjects, contributors...etc).

Thank you again Steffen.

347MavenFeel free to come back and ask more questions

greetings

Steffen

1,751RM Founderactually, the transformation task can be done with RapidMiner. It should be possible with the Nominal2Binominal operator. The result will be the matrix from which the "prediction" models can be learned.

Cheers,

Ingo

347MavenThe main problem is to split up the sets {...} automatically. This is what not cannot be done by RapidMiner (as far as I know )

1 {"joe","Karen"} {"Data mining","BI","html"} 11/16/2007

once the sets are split up, Nominal2Binominal can be applied.

greetings

Steffen

1,751RM Founderof course, you are absolutely right. For this step one would need to create a new operator.

Cheers,

Ingo