Find correlation between names and score

dbzykodbzyko Member Posts: 1 Learner I
edited December 2018 in Help

Hello guys,

 

I'm new to data mining as well as rapidminer and hope you can help me with my task :)

 

First some details about my (input data)

I've got an excel table with the following structure:

document_id, name_0,name_1,name_n,score

1234,0,0,1,50.1

1235,1,1,1,70.9

1236,0,0,0,20.5

 

The id is a unique number, the name columns explain if the name_i occures in the data (1) or not (0) (the label of the column is the name of the person) and the corresponding score of the document. as you can see the excel file looks like a vektor.

 

My goal is to find a correlation between names (nominal attribute) and a score (numeric). So if the score of the document is potentialy higher if name_0 or name_1 (or name_i) occures in the coument.

 

When searching in rapidminer for "correlation", the correlation matrix appears but I'm not sure if it is the right tool to work with on this task.

 

Do you have any clue if there are practices to handle this task correctly?

Thank you very much :)

 

Answers

  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager

    hello @dbzyko I think Correlation Matrix is a good place to start. It will give you r (or r^2) values for pairwise features that will give you a sense of things.


    Scott

     

Sign In or Register to comment.