There are big differences on how Split and Cross Validation operator but the intent is the same, train, test, and measure performance of a model. The Cross Validation operator gives a more honest estimation on how the model would perform on unseen data sets. This is why in the accuracy measure for a CV model you might see 70.00% (+/- 5%). The +/- 5% is essentially one standard deviation of the average 70% accuracy .
Go check out Ingo's paper on model validation to learn more: https://rapidminer.com/resource/correct-model-vali
two operators are of major importance for you.
Aggregate: Aggregate works like a SQL-Aggregate. You can group by coloums and generate statistics like average, std_dev, count etc. It reads a like "count(level) per projectId per Area".
Pivot: Pivot rotates your table. You can get a table like:
ProjectID, Area_A, Area_B, Area_C
with levels in the cells.
I think the Excel Pivot is for some mysterious reason a combination of both.
That sounds odd. Have you tried using the Read Database operator instead of "Add Data?" That's the more manual and fine tuning way.
Also, if your DB table is huge, RM will want to fetch the metadata for it and it could take a LONG time. If that's the case, just go to Preferences and toggle off Fetch SQL Metadata.
The issue is with the TerraData driver. The way we treat TerraData causes problems with the database repository view. On the other hand, all of the database operators still work and the connection is fine. I.E. you can use the read database operator and pull in the data from the TerraData database connection even though the repository view gives you an error.
the confusion matrix in the model are _training_ errors. So you should usually work on the Performance Vector, not the Gradient Boosted Model values. These are sometimes interesting to have a look on overfitting (e.g. add more complexity or not).
these steps are correct:
Segmentation of the object
Extract shape features with the operator
For selecting only one segment, you have to classifier for these segments. There is operator for it - Filter Segments by Example Set. For this, you need ExampleSet with segment IDs which you want to preserve.
Also you need label for these segments. You can do it manually by putting breakpoint after segmentation and in result view edit the label by left and right mouse button. Or you can set image mask and use Calculate Segments Label By Mask operator to automatically calculate segment's labels.
it's a macro of the process defined in the context. Macros are process variables. Have a look at this article for more details: http://community.rapidminer.com/t5/RapidMiner-Stud