Options

# "data mining on school database"

Hi all , I am new to data mining

I am trying to analyse the set of information on the students of a sma ll school

those are the grades in diferent subjects ( math sciences, english , french, ...) , the distance or time to come to school, family background, and the disciplinary record

I have the data in and excel file

can any exeperineced user of rapid miner give me some advice ?

I am trying to analyse the set of information on the students of a sma ll school

those are the grades in diferent subjects ( math sciences, english , french, ...) , the distance or time to come to school, family background, and the disciplinary record

I have the data in and excel file

can any exeperineced user of rapid miner give me some advice ?

Tagged:

0

## Answers

347MavenSince you are are complete newbie to data mining, I suggest that you go and get a good book. Here is a thread with suggestions for some literature: klick

It is much easier to give you more advice and help when you are able to specify your questions ... starting from such a general position one could start a data mining lecture ... hope you understand

kind regards,

Steffen

7Contributor III think I misexpressed myself :-(

I wanted to say I am new to rapidminer, I downloadded it last week

I have been reading about data mining for four months, but you know theories are not identical to applications

may I ask some technical questions from time to time?

regards

347MavenOf course !

I just wanted to clarify that "I have data, what now?" - questions are hard to answer.

kind regards,

Steffen

7Contributor IIthank you very much

would you please tell me what mistake am I doing to get this message :

"Many operators like classification and regression methods or the PerformancEvaluator require the input example sets to have a label or class attribute. If this not the case, applying these operators is pointless. If you read the data using an ExampleSource, you can specify the label attribute by using a 'label' tag in the attribute description file."

I am trying to load my data put on an excel 2003 sheet , I have deleted the other sheets and saved it as an csv file

best regards

2,531Unicornyou have to specify which attribute (the way we call the columns), is the target of your analysis. This attribute is then called label.

If you perform a regression to predict a numerical value, this label has to be numerical, otherwise you have to choose a nominal for classification.

Greetings,

Sebastian

7Contributor IImay I ask other questions?

which is better for RM Excel 2003 or Excel 2007? or is ti the same?

which is easier :data from MS access or MS Excel?

I am having trouble with the excel files how can I make sure I am doing the right way?

many thanks

2,531UnicornI think you might use booth excel versions, but you have to save the files in the "old" .xls format instead of the new xml style document format.

Excel is probably easier to use than access, but if your data exceeds some number of lines (64k if I remember correctly) you will have to change to Access. But with a school database this should be not a problem

Greetings,

Sebastian

7Contributor IIcan you tell me please what mistake am I doing ?

when I try to work on may data , on an excel sheet, I get the following error message:

" Parameter 'excel_file' is not set and has no default value. "

best regards

1,751RM FounderCheers,

Ingo

7Contributor IIMy intension is to find a correlation, if there is one, between the distnace from home to school and the behavior of a student (measured by warnings), or to the achievement in a certain subject let's say Math or English.

I also want to find out the influence of home backround ( parents together or divorced) on the students achievement in school, and if this influence varies according to gender.

would you tell me please what operators to use in order to measure the correlation?

thanks & best regards

295RM Product Managementfor the computation of correlations you can use the [tt]CorrelationMatrix[/tt] operator. However, be aware that the correlation coefficients in the matrix might not be the right concept to get an insight into your data, as the correlation coefficient only measures a linear relationship among numerical values. When being faced with nominal attributes (e.g. family status) the correlation coefficient has almost no useful meaning.

Since your problem seems to be a standard classification task with the achievement at school (i.e. the grade) as the label, I would use a classification learner (Decision Tree, Naive Bayes, etc.) to model the data and find relationships.

Kind regards,

Tobias

7Contributor III am thinking of replacing the column on family status by two colunms with numerical entries,

the first numbers of parents at home

(2 if parents live together, 1 if one parent is divorced, dead, or working abroad, and 0 if the student is living with grand parents or living by him/herself)

the second columns changing the status into numbers

(a positive value if the student live with two parents , it has a positive effect on the student's well being

a zero value if one of the parent is abroad, and a negative if one of the parent is dead or divorced, and a lower negative value in the case of the abscence of the two parents. i think I have to ask the school counsellor on which has the worst effect on the student)

Do you think this is reasonable? can I do it on RapidMiner ? if yes wich operators are the best?

best regards

1,751RM Founderyes, you can do something like this with RapidMiner with the AttributeConstruction operator. This operator is able to work on conditions like

if (family_status == "together", 2, if (family_status="divorced", 1, 0))

Cheers,

Ingo