The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Batch ID Generation
Hello,
Is there a simple way to generate batch id (Divide samples into 5 groups) based on ID column in a dataset. For example, I have a dataset with 400 samples related to 30 subjects (Multiple samples per subject). I would like to divide the data set into 5 (this can be any value) batches based on the Subject (not samples), so each batch will have 6 subjects related data.
I attached a sample dataset where "Subject ID" is the ID column.
Is there a simple way to generate batch id (Divide samples into 5 groups) based on ID column in a dataset. For example, I have a dataset with 400 samples related to 30 subjects (Multiple samples per subject). I would like to divide the data set into 5 (this can be any value) batches based on the Subject (not samples), so each batch will have 6 subjects related data.
I attached a sample dataset where "Subject ID" is the ID column.
Regards,
Varun
https://www.varunmandalapu.com/
Varun
https://www.varunmandalapu.com/
Be Safe. Follow precautions and Maintain Social Distancing
Tagged:
0
Best Answer
-
sgenzer
Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959
Community Manager
hi @varunm1 ah ok took me a minute. Why not use Generate Attributes where you do something like mod([Subject_ID],5) ?
Scott
7

Unicorn
Answers
Thanks for your response. Is this scalable to ID with characters like (A, B,C D,... etc)?
I just replaced Subject ID column values to characters (Instead of 1 now its A).
Varun
https://www.varunmandalapu.com/
Be Safe. Follow precautions and Maintain Social Distancing
Dortmund, Germany
I'm not sure to understand, but is it what you are looking for ..?
<?xml version="1.0" encoding="UTF-8"?><process version="9.5.001"> <context> <input/> <output/> <macros/> </context> <operator activated="true" class="process" compatibility="9.5.001" expanded="true" name="Process"> <parameter key="logverbosity" value="init"/> <parameter key="random_seed" value="2001"/> <parameter key="send_mail" value="never"/> <parameter key="notification_email" value=""/> <parameter key="process_duration_for_mail" value="30"/> <parameter key="encoding" value="SYSTEM"/> <process expanded="true"> <operator activated="true" class="read_excel" compatibility="9.5.001" expanded="true" height="68" name="Read Excel" width="90" x="112" y="34"> <parameter key="excel_file" value="C:\Users\Lionel\Downloads\Batch_Sample_Data.xlsx"/> <parameter key="sheet_selection" value="sheet number"/> <parameter key="sheet_number" value="1"/> <parameter key="imported_cell_range" value="A1"/> <parameter key="encoding" value="SYSTEM"/> <parameter key="first_row_as_names" value="true"/> <list key="annotations"/> <parameter key="date_format" value=""/> <parameter key="time_zone" value="SYSTEM"/> <parameter key="locale" value="English (United States)"/> <parameter key="read_all_values_as_polynominal" value="false"/> <list key="data_set_meta_data_information"> <parameter key="0" value="Subject ID.true.integer.attribute"/> <parameter key="1" value="Data.true.real.attribute"/> <parameter key="2" value="Data_1.true.real.attribute"/> <parameter key="3" value="Data_2.true.integer.attribute"/> <parameter key="4" value="Data_3.true.real.attribute"/> </list> <parameter key="read_not_matching_values_as_missings" value="false"/> <parameter key="datamanagement" value="double_array"/> <parameter key="data_management" value="auto"/> </operator> <operator activated="true" class="operator_toolbox:group_into_collection" compatibility="2.2.000" expanded="true" height="82" name="Group Into Collection" width="90" x="313" y="34"> <parameter key="group_by_attribute" value="Subject ID"/> <parameter key="group_by_attribute (numerical)" value=""/> <parameter key="sorting_order" value="none"/> </operator> <connect from_op="Read Excel" from_port="output" to_op="Group Into Collection" to_port="exa"/> <connect from_op="Group Into Collection" from_port="col" to_port="result 1"/> <portSpacing port="source_input 1" spacing="0"/> <portSpacing port="sink_result 1" spacing="0"/> <portSpacing port="sink_result 2" spacing="0"/> </process> </operator> </process>Lionel
Varun
https://www.varunmandalapu.com/
Be Safe. Follow precautions and Maintain Social Distancing