"data mining on school database"

Elmo · February 2009

Hi all , I am new to data mining

I am trying to analyse the set of information on the students of a sma ll school
those are the grades in diferent subjects ( math sciences, english , french, ...) , the distance or time to come to school, family background, and the disciplinary record
I have the data in and excel file

can any exeperineced user of rapid miner give me some advice ?

steffen · February 2009

Hello Elmo and welcome to RapidMiner

Since you are are complete newbie to data mining, I suggest that you go and get a good book. Here is a thread with suggestions for some literature: klick

It is much easier to give you more advice and help when you are able to specify your questions ... starting from such a general position one could start a data mining lecture ... hope you understand

kind regards,

Steffen

Elmo · February 2009

Thank you Steffen for your answer

I think I misexpressed myself :-(

I wanted to say I am new to rapidminer, I downloadded it last week

I have been reading about data mining for four months, but you know theories are not identical to applications

may I ask some technical questions from time to time?

regards

steffen · February 2009

Hello Elmo

Of course !

I just wanted to clarify that "I have data, what now?" - questions are hard to answer.

kind regards,

Steffen

Elmo · February 2009

hello Steffen

thank you very much

would you please tell me what mistake am I doing to get this message :

"Many operators like classification and regression methods or the PerformancEvaluator require the input example sets to have a label or class attribute. If this not the case, applying these operators is pointless. If you read the data using an ExampleSource, you can specify the label attribute by using a 'label' tag in the attribute description file."

I am trying to load my data put on an excel 2003 sheet , I have deleted the other sheets and saved it as an csv file

best regards

land · February 2009

Hi Elmo,
you have to specify which attribute (the way we call the columns), is the target of your analysis. This attribute is then called label.
If you perform a regression to predict a numerical value, this label has to be numerical, otherwise you have to choose a nominal for classification.

Greetings,
Sebastian

Elmo · March 2009

thank you Sebastian

may I ask other questions?

which is better for RM Excel 2003 or Excel 2007? or is ti the same?

which is easier :data from MS access or MS Excel?

I am having trouble with the excel files how can I make sure I am doing the right way?

many thanks

land · March 2009

Hi Elmo,
I think you might use booth excel versions, but you have to save the files in the "old" .xls format instead of the new xml style document format.
Excel is probably easier to use than access, but if your data exceeds some number of lines (64k if I remember correctly) you will have to change to Access. But with a school database this should be not a problem

Greetings,
Sebastian

Elmo · March 2009

thank you Sebastian you are really kind

can you tell me please what mistake am I doing ?

when I try to work on may data , on an excel sheet, I get the following error message:

" Parameter 'excel_file' is not set and has no default value. "

best regards

IngoRM · March 2009

Parameter 'excel_file' is not set and has no default value.

It seems that you did not specify the file --> just specify the excel file you want to read the data from for this parameter "excel_file".

Cheers,
Ingo

Elmo · March 2009

thank you Ingo It worked

My intension is to find a correlation, if there is one, between the distnace from home to school and the behavior of a student (measured by warnings), or to the achievement in a certain subject let's say Math or English.
I also want to find out the influence of home backround ( parents together or divorced) on the students achievement in school, and if this influence varies according to gender.
would you tell me please what operators to use in order to measure the correlation?

thanks & best regards

TobiasMalbrecht · March 2009

Hi,

for the computation of correlations you can use the [tt]CorrelationMatrix[/tt] operator. However, be aware that the correlation coefficients in the matrix might not be the right concept to get an insight into your data, as the correlation coefficient only measures a linear relationship among numerical values. When being faced with nominal attributes (e.g. family status) the correlation coefficient has almost no useful meaning.

Since your problem seems to be a standard classification task with the achievement at school (i.e. the grade) as the label, I would use a classification learner (Decision Tree, Naive Bayes, etc.) to model the data and find relationships.

Kind regards,
Tobias

Elmo · March 2009

Thank you Tobias

I am thinking of replacing the column on family status by two colunms with numerical entries,

the first numbers of parents at home
(2 if parents live together, 1 if one parent is divorced, dead, or working abroad, and 0 if the student is living with grand parents or living by him/herself)

the second columns changing the status into numbers
(a positive value if the student live with two parents , it has a positive effect on the student's well being
a zero value if one of the parent is abroad, and a negative if one of the parent is dead or divorced, and a lower negative value in the case of the abscence of the two parents. i think I have to ask the school counsellor on which has the worst effect on the student)

Do you think this is reasonable? can I do it on RapidMiner ? if yes wich operators are the best?

best regards

IngoRM · March 2009

Hi,

yes, you can do something like this with RapidMiner with the AttributeConstruction operator. This operator is able to work on conditions like

if (family_status == "together", 2, if (family_status="divorced", 1, 0))

Cheers,
Ingo

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

"data mining on school database"

Answers