The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Updated Target Shuffling
Best Answers
-
MartinLiebig Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,533 RM Data ScientistHi @npapan69 ,sounds like you are going into similar too Shap(ely).You can use Shuffle and Merge Attributes to do this. Attached is an Example process. This process needs Operator Toolbox to work.Best,Martin<?xml version="1.0" encoding="UTF-8"?><process version="9.7.002">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="9.7.002" expanded="true" name="Process">
<parameter key="logverbosity" value="init"/>
<parameter key="random_seed" value="2001"/>
<parameter key="send_mail" value="never"/>
<parameter key="notification_email" value=""/>
<parameter key="process_duration_for_mail" value="30"/>
<parameter key="encoding" value="SYSTEM"/>
<process expanded="true">
<operator activated="true" class="retrieve" compatibility="9.7.002" expanded="true" height="68" name="Retrieve Titanic Training" width="90" x="112" y="85">
<parameter key="repository_entry" value="//Samples/data/Titanic Training"/>
</operator>
<operator activated="true" class="select_attributes" compatibility="9.7.002" expanded="true" height="82" name="Select Attributes" width="90" x="246" y="34">
<parameter key="attribute_filter_type" value="single"/>
<parameter key="attribute" value="Survived"/>
<parameter key="attributes" value=""/>
<parameter key="use_except_expression" value="false"/>
<parameter key="value_type" value="attribute_value"/>
<parameter key="use_value_type_exception" value="false"/>
<parameter key="except_value_type" value="time"/>
<parameter key="block_type" value="attribute_block"/>
<parameter key="use_block_type_exception" value="false"/>
<parameter key="except_block_type" value="value_matrix_row_start"/>
<parameter key="invert_selection" value="false"/>
<parameter key="include_special_attributes" value="true"/>
</operator>
<operator activated="true" class="shuffle" compatibility="9.7.002" expanded="true" height="82" name="Shuffle" width="90" x="380" y="34">
<parameter key="use_local_random_seed" value="false"/>
<parameter key="local_random_seed" value="1992"/>
</operator>
<operator activated="true" class="select_attributes" compatibility="9.7.002" expanded="true" height="82" name="Select Attributes (3)" width="90" x="380" y="187">
<parameter key="attribute_filter_type" value="single"/>
<parameter key="attribute" value="Survived"/>
<parameter key="attributes" value=""/>
<parameter key="use_except_expression" value="false"/>
<parameter key="value_type" value="attribute_value"/>
<parameter key="use_value_type_exception" value="false"/>
<parameter key="except_value_type" value="time"/>
<parameter key="block_type" value="attribute_block"/>
<parameter key="use_block_type_exception" value="false"/>
<parameter key="except_block_type" value="value_matrix_row_start"/>
<parameter key="invert_selection" value="true"/>
<parameter key="include_special_attributes" value="true"/>
</operator>
<operator activated="true" class="operator_toolbox:merge" compatibility="2.7.000-SNAPSHOT" expanded="true" height="103" name="Merge Attributes" width="90" x="715" y="34">
<parameter key="handling_of_duplicate_attributes" value="rename"/>
<parameter key="handling_of_special_attributes" value="keep_first_special_other_regular"/>
<parameter key="handling_of_duplicate_annotations" value="rename"/>
</operator>
<connect from_op="Retrieve Titanic Training" from_port="output" to_op="Select Attributes" to_port="example set input"/>
<connect from_op="Select Attributes" from_port="example set output" to_op="Shuffle" to_port="example set input"/>
<connect from_op="Select Attributes" from_port="original" to_op="Select Attributes (3)" to_port="example set input"/>
<connect from_op="Shuffle" from_port="example set output" to_op="Merge Attributes" to_port="example set 1"/>
<connect from_op="Select Attributes (3)" from_port="example set output" to_op="Merge Attributes" to_port="example set 2"/>
<connect from_op="Merge Attributes" from_port="merged set" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
- Sr. Director Data Solutions, Altair RapidMiner -
Dortmund, Germany5 -
MartinLiebig Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,533 RM Data ScientistHi,you can use a Loop and Loop over certain destinations. Do you have a list of file names as an example set?Best,Martin- Sr. Director Data Solutions, Altair RapidMiner -
Dortmund, Germany5
Answers
Dortmund, Germany
I'm referring to the technique called "Target Shuffling" where you are generating multiple datasets by randomly shuffling the labels in order to compare the performance based on the real data as opposed to "bogus" data. There is a relative XML code posted back in 2011, but I cant make it work on the current RM Studio (9.7), so I was wondering if there is a process or an operator even better that could achieve the latter. Here comes the old post that I'm mentioning above:
I implemented Target Shuffling in RM.
I saved it as a Building Block for easy inclusion in projects.
The enclosed code is for a building block. Save it in a file called [tt]Target Shuffling.buildingblock[/tt] your repository directory.
I hope you find it useful.
I'll be happy to get any comments.
Sincerely,
Amnon Khen
Many thanks for providing the solution, another question is how can I write multiple xls files that I can produce with your code? The Write Excel operator saves multiple sheets within the same file while I need to save multiple separate xls files. Is there a solution to that?
Best
Nikos