The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Numeric to Binomial without knowing ranges
Hello,
i‘ve got the following question:
Is it somehow possible to convert numeric values to binominal or polinominal values without knowing the ranges? Is there any possibilty that there exists an operator that can suggest ranges automatically?
Thanks for the help!
i‘ve got the following question:
Is it somehow possible to convert numeric values to binominal or polinominal values without knowing the ranges? Is there any possibilty that there exists an operator that can suggest ranges automatically?
Thanks for the help!
0
Best Answers
-
IngoRM Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM FounderHi,Yes, you are looking for the various operators "Discretize by...". Two commonly used operators are Discretize by Frequency which creates the desired number of ranges so that each bucket contains the same number of data points. The other one is Discretize by Size which will ensure that all ranges have equal size.Hope this helps,
Ingo1 -
IngoRM Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM FounderHi,Yes, that is possible as well but requires a bit more advanced stuff. We first create a new data set containing all the values in one single column by looping over all columns and appending the results. We then apply the discretization on this column which also creates a so-called preprocessing model. We then loop over the columns of the original data again and apply this preprocessing model on the original columns. Attached is a process doing this for the Iris data in the Samples repository.Hope this helps,
Ingo<?xml version="1.0" encoding="UTF-8"?><process version="9.4.000-SNAPSHOT"><br> <context><br> <input/><br> <output/><br> <macros/><br> </context><br> <operator activated="true" class="process" compatibility="9.4.000-SNAPSHOT" expanded="true" name="Process"><br> <parameter key="logverbosity" value="init"/><br> <parameter key="random_seed" value="2001"/><br> <parameter key="send_mail" value="never"/><br> <parameter key="notification_email" value=""/><br> <parameter key="process_duration_for_mail" value="30"/><br> <parameter key="encoding" value="UTF-8"/><br> <process expanded="true"><br> <operator activated="true" class="retrieve" compatibility="9.4.000-SNAPSHOT" expanded="true" height="68" name="Retrieve Iris" width="90" x="45" y="187"><br> <parameter key="repository_entry" value="//Samples/data/Iris"/><br> </operator><br> <operator activated="true" class="multiply" compatibility="9.4.000-SNAPSHOT" expanded="true" height="103" name="Multiply" width="90" x="179" y="187"/><br> <operator activated="true" class="concurrency:loop_attributes" compatibility="9.4.000-SNAPSHOT" expanded="true" height="82" name="Loop Attributes" width="90" x="313" y="34"><br> <parameter key="attribute_filter_type" value="all"/><br> <parameter key="attribute" value=""/><br> <parameter key="attributes" value=""/><br> <parameter key="use_except_expression" value="false"/><br> <parameter key="value_type" value="attribute_value"/><br> <parameter key="use_value_type_exception" value="false"/><br> <parameter key="except_value_type" value="time"/><br> <parameter key="block_type" value="attribute_block"/><br> <parameter key="use_block_type_exception" value="false"/><br> <parameter key="except_block_type" value="value_matrix_row_start"/><br> <parameter key="invert_selection" value="false"/><br> <parameter key="include_special_attributes" value="false"/><br> <parameter key="attribute_name_macro" value="loop_attribute"/><br> <parameter key="reuse_results" value="false"/><br> <parameter key="enable_parallel_execution" value="true"/><br> <process expanded="true"><br> <operator activated="true" class="select_attributes" compatibility="9.4.000-SNAPSHOT" expanded="true" height="82" name="Select Attributes" width="90" x="45" y="34"><br> <parameter key="attribute_filter_type" value="single"/><br> <parameter key="attribute" value="%{loop_attribute}"/><br> <parameter key="attributes" value=""/><br> <parameter key="use_except_expression" value="false"/><br> <parameter key="value_type" value="attribute_value"/><br> <parameter key="use_value_type_exception" value="false"/><br> <parameter key="except_value_type" value="time"/><br> <parameter key="block_type" value="attribute_block"/><br> <parameter key="use_block_type_exception" value="false"/><br> <parameter key="except_block_type" value="value_matrix_row_start"/><br> <parameter key="invert_selection" value="false"/><br> <parameter key="include_special_attributes" value="true"/><br> </operator><br> <operator activated="true" class="rename" compatibility="9.4.000-SNAPSHOT" expanded="true" height="82" name="Rename" width="90" x="179" y="34"><br> <parameter key="old_name" value="%{loop_attribute}"/><br> <parameter key="new_name" value="att"/><br> <list key="rename_additional_attributes"/><br> </operator><br> <connect from_port="input 1" to_op="Select Attributes" to_port="example set input"/><br> <connect from_op="Select Attributes" from_port="example set output" to_op="Rename" to_port="example set input"/><br> <connect from_op="Rename" from_port="example set output" to_port="output 1"/><br> <portSpacing port="source_input 1" spacing="0"/><br> <portSpacing port="source_input 2" spacing="0"/><br> <portSpacing port="sink_output 1" spacing="0"/><br> <portSpacing port="sink_output 2" spacing="0"/><br> </process><br> </operator><br> <operator activated="true" class="append" compatibility="9.4.000-SNAPSHOT" expanded="true" height="82" name="Append" width="90" x="447" y="34"><br> <parameter key="datamanagement" value="double_array"/><br> <parameter key="data_management" value="auto"/><br> <parameter key="merge_type" value="all"/><br> </operator><br> <operator activated="true" class="discretize_by_frequency" compatibility="9.4.000-SNAPSHOT" expanded="true" height="103" name="Discretize" width="90" x="581" y="34"><br> <parameter key="return_preprocessing_model" value="false"/><br> <parameter key="create_view" value="false"/><br> <parameter key="attribute_filter_type" value="single"/><br> <parameter key="attribute" value="att"/><br> <parameter key="attributes" value=""/><br> <parameter key="use_except_expression" value="false"/><br> <parameter key="value_type" value="numeric"/><br> <parameter key="use_value_type_exception" value="false"/><br> <parameter key="except_value_type" value="real"/><br> <parameter key="block_type" value="value_series"/><br> <parameter key="use_block_type_exception" value="false"/><br> <parameter key="except_block_type" value="value_series_end"/><br> <parameter key="invert_selection" value="false"/><br> <parameter key="include_special_attributes" value="false"/><br> <parameter key="use_sqrt_of_examples" value="false"/><br> <parameter key="number_of_bins" value="4"/><br> <parameter key="range_name_type" value="long"/><br> <parameter key="automatic_number_of_digits" value="true"/><br> <parameter key="number_of_digits" value="-1"/><br> </operator><br> <operator activated="true" class="concurrency:loop_attributes" compatibility="9.4.000-SNAPSHOT" expanded="true" height="103" name="Loop Attributes (2)" width="90" x="715" y="187"><br> <parameter key="attribute_filter_type" value="all"/><br> <parameter key="attribute" value=""/><br> <parameter key="attributes" value=""/><br> <parameter key="use_except_expression" value="false"/><br> <parameter key="value_type" value="attribute_value"/><br> <parameter key="use_value_type_exception" value="false"/><br> <parameter key="except_value_type" value="time"/><br> <parameter key="block_type" value="attribute_block"/><br> <parameter key="use_block_type_exception" value="false"/><br> <parameter key="except_block_type" value="value_matrix_row_start"/><br> <parameter key="invert_selection" value="false"/><br> <parameter key="include_special_attributes" value="false"/><br> <parameter key="attribute_name_macro" value="loop_attribute"/><br> <parameter key="reuse_results" value="true"/><br> <parameter key="enable_parallel_execution" value="true"/><br> <process expanded="true"><br> <operator activated="true" class="rename" compatibility="9.4.000-SNAPSHOT" expanded="true" height="82" name="Rename (2)" width="90" x="45" y="85"><br> <parameter key="old_name" value="%{loop_attribute}"/><br> <parameter key="new_name" value="att"/><br> <list key="rename_additional_attributes"/><br> </operator><br> <operator activated="true" class="apply_model" compatibility="9.4.000-SNAPSHOT" expanded="true" height="82" name="Apply Model" width="90" x="179" y="34"><br> <list key="application_parameters"/><br> <parameter key="create_view" value="false"/><br> </operator><br> <operator activated="true" class="rename" compatibility="9.4.000-SNAPSHOT" expanded="true" height="82" name="Rename (3)" width="90" x="313" y="34"><br> <parameter key="old_name" value="att"/><br> <parameter key="new_name" value="%{loop_attribute}"/><br> <list key="rename_additional_attributes"/><br> </operator><br> <connect from_port="input 1" to_op="Rename (2)" to_port="example set input"/><br> <connect from_port="input 2" to_op="Apply Model" to_port="model"/><br> <connect from_op="Rename (2)" from_port="example set output" to_op="Apply Model" to_port="unlabelled data"/><br> <connect from_op="Apply Model" from_port="labelled data" to_op="Rename (3)" to_port="example set input"/><br> <connect from_op="Apply Model" from_port="model" to_port="output 2"/><br> <connect from_op="Rename (3)" from_port="example set output" to_port="output 1"/><br> <portSpacing port="source_input 1" spacing="0"/><br> <portSpacing port="source_input 2" spacing="0"/><br> <portSpacing port="source_input 3" spacing="0"/><br> <portSpacing port="sink_output 1" spacing="0"/><br> <portSpacing port="sink_output 2" spacing="0"/><br> <portSpacing port="sink_output 3" spacing="0"/><br> </process><br> </operator><br> <connect from_op="Retrieve Iris" from_port="output" to_op="Multiply" to_port="input"/><br> <connect from_op="Multiply" from_port="output 1" to_op="Loop Attributes" to_port="input 1"/><br> <connect from_op="Multiply" from_port="output 2" to_op="Loop Attributes (2)" to_port="input 1"/><br> <connect from_op="Loop Attributes" from_port="output 1" to_op="Append" to_port="example set 1"/><br> <connect from_op="Append" from_port="merged set" to_op="Discretize" to_port="example set input"/><br> <connect from_op="Discretize" from_port="preprocessing model" to_op="Loop Attributes (2)" to_port="input 2"/><br> <connect from_op="Loop Attributes (2)" from_port="output 1" to_port="result 1"/><br> <portSpacing port="source_input 1" spacing="0"/><br> <portSpacing port="sink_result 1" spacing="0"/><br> <portSpacing port="sink_result 2" spacing="0"/><br> </process><br> </operator><br></process>
5
Answers
So i know now its possible. Let me explain my Problem:
I have 30 Attributes. Each of these Attributes has values from 0-100. My aim is it to do a Association analysis. For that i need to transform the numeric values into binominal. I tried your suggestion. The Problem now is that these Operators do a different Range for each attribute. Is it possible to create a range that involves all attributes and creates one range that fits for all Attributes?
Thanks!