The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

RFM - nth selection process to create a test sample in Rapid Miner . Can someone assist

cwoocwoo Member Posts: 10 Contributor II
edited November 2018 in Help

Given a  scored RFM  master file  , i  would like to  extract a  nth  selection  test sample . Eg.  if the nth  slection is  10  then the sample  will consist  of   every  10th  record  and should create  a statistically  similar  test sample . 

 

400,000  fille  will  result  in a  test file  40,00  examples.

 

Colin 

 

Tagged:

Best Answers

  • earmijoearmijo Member Posts: 271 Unicorn
    Solution Accepted

    I don't claim efficiency or beauty but the code below ought to work. 

     

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="6.5.002">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="6.5.002" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="retrieve" compatibility="6.5.002" expanded="true" height="60" name="Retrieve Deals" width="90" x="179" y="120">
    <parameter key="repository_entry" value="//Samples/data/Deals"/>
    </operator>
    <operator activated="true" class="generate_id" compatibility="6.5.002" expanded="true" height="76" name="Generate ID" width="90" x="380" y="120"/>
    <operator activated="true" breakpoints="after" class="generate_attributes" compatibility="6.5.002" expanded="true" height="76" name="Generate Attributes" width="90" x="581" y="120">
    <list key="function_descriptions">
    <parameter key="sampled" value="mod(id,10)"/>
    </list>
    </operator>
    <operator activated="true" class="filter_examples" compatibility="6.5.002" expanded="true" height="94" name="Filter Examples" width="90" x="849" y="120">
    <list key="filters_list">
    <parameter key="filters_entry_key" value="sampled.eq.0"/>
    </list>
    </operator>
    <connect from_op="Retrieve Deals" from_port="output" to_op="Generate ID" to_port="example set input"/>
    <connect from_op="Generate ID" from_port="example set output" to_op="Generate Attributes" to_port="example set input"/>
    <connect from_op="Generate Attributes" from_port="example set output" to_op="Filter Examples" to_port="example set input"/>
    <connect from_op="Filter Examples" from_port="example set output" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>
  • Telcontar120Telcontar120 RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn
    Solution Accepted

    You are probably aware of this, but there is also a "sample" operator--it doesn't take exactly every nth record, but it does have parameters for taking either an absolute number of records or a percentage randomly, and if you set the random seed then the results will be reproducible.  For most purposes, typically a random sample is sufficient (and may even be preferable) compared to a sample based on a heuristic such as "every nth record."

     

    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts

Answers

  • cwoocwoo Member Posts: 10 Contributor II

    thank you very much .

    Quite simple using the generate ID   and then  generating  sample  using the modulus  function  then filter all with  mod 0 .

     

    Excellent 

     

    Colin

  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn

    Hi,

     

    you can make it a bit more efficient with the Filter Example's option to use an expression right away. With that you can save the overhead of Generate Attribute and adding a new column. You simply enter there an expression that evaluates to true or false, where you can use the mod function on the id as in the example above.

     

    Greetings,

      Sebastian

  • bhupendra_patilbhupendra_patil Employee-RapidMiner, Member Posts: 168 RM Data Scientist
  • cwoocwoo Member Posts: 10 Contributor II

    thanks for refining it

Sign In or Register to comment.