Find correlation between names and score
Hello guys,
I'm new to data mining as well as rapidminer and hope you can help me with my task
First some details about my (input data)
I've got an excel table with the following structure:
document_id, name_0,name_1,name_n,score
1234,0,0,1,50.1
1235,1,1,1,70.9
1236,0,0,0,20.5
The id is a unique number, the name columns explain if the name_i occures in the data (1) or not (0) (the label of the column is the name of the person) and the corresponding score of the document. as you can see the excel file looks like a vektor.
My goal is to find a correlation between names (nominal attribute) and a score (numeric). So if the score of the document is potentialy higher if name_0 or name_1 (or name_i) occures in the coument.
When searching in rapidminer for "correlation", the correlation matrix appears but I'm not sure if it is the right tool to work with on this task.
Do you have any clue if there are practices to handle this task correctly?
Thank you very much
Answers
hello @dbzyko I think Correlation Matrix is a good place to start. It will give you r (or r^2) values for pairwise features that will give you a sense of things.
Scott