Preparing data for pattern recogintion

Pete84Pete84 Member Posts: 3 Contributor I
edited November 2018 in Help
Hello again, I'm still new to rapidminer, so please be passioned :)
Currently I am writing my master thesis in electrical engineering. I guess using rapidminer fits perfectly for some excellent simulating results. I guess I could write something about the simulation in my thesis. But first of all here some information about my data: I have a database resp. a set of training data, that looks like this:

There are multiple containers (1..n). Each container has multiple measurements (1..m). Each measurements consists of x-y-z data. The first measurement can have 100values for each component x,y and z. The second measurement might have 130 values for each x-y-z component...

container-1 [ measurement-1[x,y,z], measurement-2[x,y,z], measurement-3[x,y,z], ..., measurement-m[x,y,z]]
container-2 [ measurement-1[x,y,z], measurement-2[x,y,z], measurement-3[x,y,z], ..., measurement-m[x,y,z]]
...
container-n [ measurement-1[x,y,z], measurement-2[x,y,z], measurement-3[x,y,z], ..., measurement-m[x,y,z]]


On the other hand I have on measurement, which will be tested against the training database to classify, if my measurement-x attends to container 1,2, .. n...

My question is, how do I have to setup my CSV or Excel file for the database?! And how can I test a measurement against my database set? I think I have to use x-validation, right? If you need more information about my project, dont hesitate and ask :)

Answers

  • GhostriderGhostrider Member Posts: 60 Contributor II
    I am not really sure what you are trying to do, but it sounds like a classification problem.  Basically, you have a set of measurements from some component and you want to bin the component based on the measurement?  Look at decision trees.

    Regarding the database, look at the CSVReader or ExcelReader operators.  They can read in Excel sheets.

    X-Validation or cross-validation is a method of testing the strength of a model.  It's not used in classifying new data.
Sign In or Register to comment.