RFM - nth selection process to create a test sample in Rapid Miner . Can someone assist

cwoocwoo Member Posts: 10 Contributor II
edited November 2018 in Help

Given a  scored RFM  master file  , i  would like to  extract a  nth  selection  test sample . Eg.  if the nth  slection is  10  then the sample  will consist  of   every  10th  record  and should create  a statistically  similar  test sample . 


400,000  fille  will  result  in a  test file  40,00  examples.





Best Answers

  • Options
    earmijoearmijo Member Posts: 270 Unicorn
    Solution Accepted

    I don't claim efficiency or beauty but the code below ought to work. 


    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="6.5.002">
    <operator activated="true" class="process" compatibility="6.5.002" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="retrieve" compatibility="6.5.002" expanded="true" height="60" name="Retrieve Deals" width="90" x="179" y="120">
    <parameter key="repository_entry" value="//Samples/data/Deals"/>
    <operator activated="true" class="generate_id" compatibility="6.5.002" expanded="true" height="76" name="Generate ID" width="90" x="380" y="120"/>
    <operator activated="true" breakpoints="after" class="generate_attributes" compatibility="6.5.002" expanded="true" height="76" name="Generate Attributes" width="90" x="581" y="120">
    <list key="function_descriptions">
    <parameter key="sampled" value="mod(id,10)"/>
    <operator activated="true" class="filter_examples" compatibility="6.5.002" expanded="true" height="94" name="Filter Examples" width="90" x="849" y="120">
    <list key="filters_list">
    <parameter key="filters_entry_key" value="sampled.eq.0"/>
    <connect from_op="Retrieve Deals" from_port="output" to_op="Generate ID" to_port="example set input"/>
    <connect from_op="Generate ID" from_port="example set output" to_op="Generate Attributes" to_port="example set input"/>
    <connect from_op="Generate Attributes" from_port="example set output" to_op="Filter Examples" to_port="example set input"/>
    <connect from_op="Filter Examples" from_port="example set output" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
  • Options
    Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn
    Solution Accepted

    You are probably aware of this, but there is also a "sample" operator--it doesn't take exactly every nth record, but it does have parameters for taking either an absolute number of records or a percentage randomly, and if you set the random seed then the results will be reproducible.  For most purposes, typically a random sample is sufficient (and may even be preferable) compared to a sample based on a heuristic such as "every nth record."


    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts


  • Options
    cwoocwoo Member Posts: 10 Contributor II

    thank you very much .

    Quite simple using the generate ID   and then  generating  sample  using the modulus  function  then filter all with  mod 0 .





  • Options
    landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn



    you can make it a bit more efficient with the Filter Example's option to use an expression right away. With that you can save the overhead of Generate Attribute and adding a new column. You simply enter there an expression that evaluates to true or false, where you can use the mod function on the id as in the example above.




  • Options
    bhupendra_patilbhupendra_patil Administrator, Employee, Member Posts: 168 RM Data Scientist
  • Options
    cwoocwoo Member Posts: 10 Contributor II

    thanks for refining it

Sign In or Register to comment.