Hi, I tried to implement a test case in rapid miner.

nn_here · February 21

Hi,

I tried to implement a test case in rapid miner.

1.Loaded the training data,since it's a regression model, tried with linear regression ..

2.After preprocessing the data and removing unnecessary column values, and applying the model and performance it has produced a result of decent accuracy.

3Now wanted to apply this model onto the testing data and check the performance and related attributes.

Kindly refer the attached doc containing the flow of operators used for both training and testing dataset for reaching the target values.

I have retrieved train and test data again and then gave used cross validation and applied the model

Can you please tell me if there is any way where the apply model only can be saved somewhere and then invoke it by giving the input as test data only ,without considering the training data. i have applied the entire operators used in the training data to testing data also which i feel is redundant .

Kindly help me in clarifying the same.
Thanks in advance.

CKönig · February 27

As a general rule, you should be applying the same preprocessing steps on both the training dataset and the testing dataset. This can make a huge difference, e.g. if you normalize the training dataset and the model expects values around 0, and then you feed it huge unnormalized numbers. It usually makes sense to put the preprocessing steps in a separate process that you can drag and drop into the training and scoring process. This also makes maintaining them much easier, since you only have to make changes in one place.

ceaperez · February 21

Hi @nn_here,

After the validation of your model with the cross-validation operator you can use the apply model operator.
The Apply model operator have two entries mod and uni. Connect the mod output from the Cross-validation operator to the input of the Apply model operator and the validation dataset to the uni input port of the Apply model operator.

best,

Cesar

Image: https://us.v-cdn.net/6030995/uploads/editor/w3/u1irn7gaiwck.png

nn_here · February 21

Thankyou for the quick help. Can you tell me in this case also we have loaded deals and deals2(which i assume is of training and testing data respectively, kindly correct me if i am wrong).So every time rapid miner expects us to load both the data sets to get the prediction of 2nd dataset..?
Thanks and regards,

ceaperez · February 22

Hi @nn_here,
you welcome. Yes, you are right, the Deals dataset is for training and testing and the Deals(2) dataset is for validation. Another option is to split your dataset, 90% for training and testing, and the use the other 10% for validation.

Best,

Cesar

nn_here · February 22

Hi,
Thank you once again for the update. In a nutshell, in rapidminer,we have to load two datasets training and testing in the same process for validating the performance of testing data,There is no option like we can save a model trained for training data and later on we can pullout the model alone for getting the result of testing data(without placing training data in the same process).Kindly confirm if my understanding is fine or i miss any operator that would do the same intended function i need.
Thanks and regards.

ceaperez · February 22

Hi again @nn_here,

After the testing process you can save your model and then import it into other processes, for example as part of a validation process.

This is very easy, after you have saved the model in your repository, just drag and drop the process into a new process and connect the output ports to other operators as you need.

Image: https://us.v-cdn.net/6030995/uploads/editor/yj/2o9bigelat7d.png

nn_here · February 22

Hi,
Thank you for clarifying the doubt. Will try this out !
Thanks and regards.

nn_here · February 23

Hi,

As mentioned ,i have saved the model of training set as a separate process and that of testing as another process .Then in new process i dragged these processes and combined with apply model.But the result we got as part of this is far different from the one which we got when these processes were created as a single process. Is that a possible case. or am i missing something here too .kindly find the latest doc along with this post, the original doc is already uploaded. Kindly help me in clarifying the sameFor your reference uploading both files again. result doc contains the latest changes made and rapidminer crossvalidation consists of the original process created.
Thanks and regards.

ceaperez · February 24

Hi @nn_here,

If you are using the same datasets in both cases, the results must be similar. Can you share your process and dataset?

Best,

Cesar

nn_here · February 26

Hi, Thankyou for the update provided.i will cross verify the process i have created.

nn_here · February 27

Hi,

I have a requirement.

1.I need to build a random forest model on a training data set.Need to check the performance

2.Apply an unseen testing data set and evaluate the performance.

Please find the process created shared in the attached doc and let me know if iam using the correct approach

I have tried with crosss validation and apply model operators.With the training set alone the squarred relation was 0.969%.And the actual and predicted value for RUL column was very near to each other.But after giving testing set,there is a far difference in the values predicted and actual values.

Also another doubt,if am using split data operator(when i trained only training dataset) there is far performance difference.Do we have to use this operator with models linear regression and random forest always?

Thanks and regards,

nn_here · March 11

Hi,
I want to use optimize parameters operator(Grid) for my models built. Can you please let me know should we apply all the parameters of the model for optimization at one go or apply each parameter one by one. This doubt I have as it's taking lots and lots of time for optimizing just one parameter.

Thanks and regards,

nn_here · March 12

Hi,
I have a scenario, where the number of datasets is 4 and the number of columns is different in each of the dataset. I need to pickup 2 columns from each of these dataset and create a new one. Can you please let me know if we have an option to achieve this..
Thanks and regards.

ceaperez · March 13

Hi @nn_here,
you can use the Select Attributes operator to select the columns (attributes) from each dataset and afther that use the Superset Operator, to joint them into a new dataset.

Best,
Cesar

nn_here · March 13

Thank you for the help. Will try this out.
Thanks, and regards,
nn_here

nn_here · March 18

Hi,
I tried using check outlier option in automodel tab of rapidminer.As the csv is having more than 2.5lakh rows,i decided to go with automodel .But it is taking more than 1.5hour and counting for the same. Can you please let me know if we need to go by this option or we have any other operator to satisfy the same purpose..?
Thanks in advance.

nn_here · April 8

ceaperez
I tried with the operators you had suggested, 'you can use the Select Attributes operator to select the columns (attributes) from each dataset and after that use the Superset Operator, to joint them into a new dataset.'Can you tell me if the doubt i have is a valid one or not.I have
264960 rows in each of the dataset. Some of the values are missing. when i give superset from 2 datasets, it still shows the number as 264960.Shouldn't it display 264960*2 number of rows.KIndly correct me if my understanding is wrong.Also please find the attached process used.

Thanks and regards,
nn_here

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Hi, I tried to implement a test case in rapid miner.

Best Answer

Answers