Avoid the execution of all the processes and nodes everytime

f_lapernaf_laperna Member Posts: 13 Contributor II
edited November 2018 in Help

Hi, I'm new with Rapid Miner and I can't understand one thing. I built a process for Data Prep and now I'm working on another process for Classification. But everytime I want to run some nodes of the classification process also the initial Data Prep process need to run again from the beginning. Is it possible to, in some way, store the result of the previous running and execute only the classification process? Moreover, is it possible to do the same with nodes (for example the one reading thed dataset), and execute only the last nodes I added?


thank you!


Best Answer

  • Options
    sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager
    Solution Accepted

    hello @f_laperna and welcome to the RapidMiner User Community.  We are very happy you are here.


    So yes, I would recommend using the "Store" operator to store your data prep example set so it does not need to run every time.  Once you do this, you can use the "Retrieve" operator to grab that example set and keep using it for your classification.


    If you need more help, please copy and paste your process (in XML) in this thread using the </> tool.  It is often easier for us to help this way.


    Good luck!




  • Options
    f_lapernaf_laperna Member Posts: 13 Contributor II

    Thank you for your answer! I tried your solution but now when I run it I get an error "Input is missing". Following you can find the XML and a screenshot of the error.


    <?xml version="1.0" encoding="UTF-8"?><process version="7.6.001">
    <operator activated="true" class="process" compatibility="7.6.001" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="split_data" compatibility="7.6.001" expanded="true" height="103" name="Split Data" width="90" x="179" y="136">
    <enumeration key="partitions">
    <parameter key="ratio" value="0.7"/>
    <parameter key="ratio" value="0.3"/>
    <parameter key="sampling_type" value="linear sampling"/>
    <operator activated="true" class="concurrency:parallel_random_forest" compatibility="7.6.001" expanded="true" height="82" name="TRAIN MODEL Random Forest" width="90" x="313" y="34"/>
    <operator activated="true" class="apply_model" compatibility="7.6.001" expanded="true" height="82" name="Apply Model" width="90" x="447" y="136">
    <list key="application_parameters"/>
    <operator activated="true" class="performance_classification" compatibility="7.6.001" expanded="true" height="82" name="Performance" width="90" x="648" y="34">
    <parameter key="main_criterion" value="classification_error"/>
    <parameter key="classification_error" value="true"/>
    <parameter key="root_mean_squared_error" value="true"/>
    <list key="class_weights"/>
    <operator activated="true" class="retrieve" compatibility="7.6.001" expanded="true" height="68" name="Retrieve Filtered_Data" width="90" x="45" y="85">
    <parameter key="repository_entry" value="../data/Filtered_Data"/>
    <connect from_op="Split Data" from_port="partition 1" to_op="TRAIN MODEL Random Forest" to_port="training set"/>
    <connect from_op="Split Data" from_port="partition 2" to_op="Apply Model" to_port="unlabelled data"/>
    <connect from_op="TRAIN MODEL Random Forest" from_port="model" to_op="Apply Model" to_port="model"/>
    <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
    <connect from_op="Performance" from_port="performance" to_op="Performance" to_port="performance"/>
    <connect from_op="Performance" from_port="example set" to_port="result 1"/>
    <connect from_op="Retrieve Filtered_Data" from_port="output" to_op="Split Data" to_port="example set"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
  • Options
    f_lapernaf_laperna Member Posts: 13 Contributor II

    I solved by creating new nodes (simply copy-pastying the old ones) and connecting everything to the new nodes. Now it works fine

  • Options
    Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn

    You can also use breakpoints to run only part of a process and view the output up to that point---that can be helpful when building long processes.  Right click on any operator and you will see the options to add breakpoints.

    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
Sign In or Register to comment.