The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
GSP operator - min gap
Can't understand whether GSP operator's "min gap" parameter works somehow.
In the data below min gap changing should influence to generated SPs, but it doesn't!
Min gap should exclude the transactions from patterns if they are close to each other.
Doesn't matter the value of min gap: 0, 4, 10.
Always the same result!
Please help ....
Maybe some working example ...
This is process:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="6.5.002">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="6.5.002" expanded="true" name="Process">
<parameter key="logverbosity" value="init"/>
<parameter key="random_seed" value="2001"/>
<parameter key="send_mail" value="never"/>
<parameter key="notification_email" value=""/>
<parameter key="process_duration_for_mail" value="30"/>
<parameter key="encoding" value="SYSTEM"/>
<process expanded="true">
<operator activated="true" class="read_csv" compatibility="6.5.002" expanded="true" height="60" name="Read CSV" width="90" x="45" y="75">
<parameter key="csv_file" value="\\customer_sample_GSP.csv"/>
<parameter key="column_separators" value=";"/>
<parameter key="trim_lines" value="false"/>
<parameter key="use_quotes" value="true"/>
<parameter key="quotes_character" value="""/>
<parameter key="escape_character" value="\"/>
<parameter key="skip_comments" value="false"/>
<parameter key="comment_characters" value="#"/>
<parameter key="parse_numbers" value="true"/>
<parameter key="decimal_character" value="."/>
<parameter key="grouped_digits" value="false"/>
<parameter key="grouping_character" value=","/>
<parameter key="date_format" value=""/>
<parameter key="first_row_as_names" value="false"/>
<list key="annotations">
<parameter key="0" value="Name"/>
</list>
<parameter key="time_zone" value="SYSTEM"/>
<parameter key="locale" value="English (United States)"/>
<parameter key="encoding" value="windows-1251"/>
<list key="data_set_meta_data_information">
<parameter key="0" value="Customer.true.polynominal.attribute"/>
<parameter key="1" value="Time.true.integer.attribute"/>
<parameter key="2" value="Product.true.polynominal.attribute"/>
</list>
<parameter key="read_not_matching_values_as_missings" value="true"/>
<parameter key="datamanagement" value="double_array"/>
</operator>
<operator activated="true" class="nominal_to_binominal" compatibility="6.5.002" expanded="true" height="94" name="Nominal2Binominal" width="90" x="246" y="30">
<parameter key="return_preprocessing_model" value="false"/>
<parameter key="create_view" value="false"/>
<parameter key="attribute_filter_type" value="single"/>
<parameter key="attribute" value="Product"/>
<parameter key="attributes" value="|a1|a2|a3|a4"/>
<parameter key="use_except_expression" value="false"/>
<parameter key="value_type" value="nominal"/>
<parameter key="use_value_type_exception" value="false"/>
<parameter key="except_value_type" value="file_path"/>
<parameter key="block_type" value="single_value"/>
<parameter key="use_block_type_exception" value="false"/>
<parameter key="except_block_type" value="single_value"/>
<parameter key="invert_selection" value="false"/>
<parameter key="include_special_attributes" value="true"/>
<parameter key="transform_binominal" value="true"/>
<parameter key="use_underscore_in_name" value="false"/>
</operator>
<operator activated="true" class="generalized_sequential_patterns" compatibility="6.5.002" expanded="true" height="76" name="GSP" width="90" x="380" y="165">
<parameter key="customer_id" value="Customer"/>
<parameter key="time_attribute" value="Time"/>
<parameter key="min_support" value="0.15"/>
<parameter key="window_size" value="0.0"/>
<parameter key="max_gap" value="15.0"/>
<parameter key="min_gap" value="7.0"/>
<parameter key="positive_value" value="true"/>
</operator>
<connect from_op="Read CSV" from_port="output" to_op="Nominal2Binominal" to_port="example set input"/>
<connect from_op="Nominal2Binominal" from_port="example set output" to_op="GSP" to_port="example set"/>
<connect from_op="GSP" from_port="example set" to_port="result 1"/>
<connect from_op="GSP" from_port="patterns" to_port="result 2"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
</process>
</operator>
</process>
This is csv data:
Customer Time Product
Alex 10 bread
Alex 15 butter
Alex 20 caviar
Peter 10 bread
Peter 15 butter
Peter 17 caviar
Peter 20 water
Igor 10 butter
Igor 20 bread
Igor 30 water
Hasan 10 bread
Hasan 20 butter
Hasan 22 caviar
Hasan 50 lemon
Pan 19 butter
Pan 20 bread
Pan 22 caviar
In the data below min gap changing should influence to generated SPs, but it doesn't!
Min gap should exclude the transactions from patterns if they are close to each other.
Doesn't matter the value of min gap: 0, 4, 10.
Always the same result!
Please help ....
Maybe some working example ...
This is process:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="6.5.002">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="6.5.002" expanded="true" name="Process">
<parameter key="logverbosity" value="init"/>
<parameter key="random_seed" value="2001"/>
<parameter key="send_mail" value="never"/>
<parameter key="notification_email" value=""/>
<parameter key="process_duration_for_mail" value="30"/>
<parameter key="encoding" value="SYSTEM"/>
<process expanded="true">
<operator activated="true" class="read_csv" compatibility="6.5.002" expanded="true" height="60" name="Read CSV" width="90" x="45" y="75">
<parameter key="csv_file" value="\\customer_sample_GSP.csv"/>
<parameter key="column_separators" value=";"/>
<parameter key="trim_lines" value="false"/>
<parameter key="use_quotes" value="true"/>
<parameter key="quotes_character" value="""/>
<parameter key="escape_character" value="\"/>
<parameter key="skip_comments" value="false"/>
<parameter key="comment_characters" value="#"/>
<parameter key="parse_numbers" value="true"/>
<parameter key="decimal_character" value="."/>
<parameter key="grouped_digits" value="false"/>
<parameter key="grouping_character" value=","/>
<parameter key="date_format" value=""/>
<parameter key="first_row_as_names" value="false"/>
<list key="annotations">
<parameter key="0" value="Name"/>
</list>
<parameter key="time_zone" value="SYSTEM"/>
<parameter key="locale" value="English (United States)"/>
<parameter key="encoding" value="windows-1251"/>
<list key="data_set_meta_data_information">
<parameter key="0" value="Customer.true.polynominal.attribute"/>
<parameter key="1" value="Time.true.integer.attribute"/>
<parameter key="2" value="Product.true.polynominal.attribute"/>
</list>
<parameter key="read_not_matching_values_as_missings" value="true"/>
<parameter key="datamanagement" value="double_array"/>
</operator>
<operator activated="true" class="nominal_to_binominal" compatibility="6.5.002" expanded="true" height="94" name="Nominal2Binominal" width="90" x="246" y="30">
<parameter key="return_preprocessing_model" value="false"/>
<parameter key="create_view" value="false"/>
<parameter key="attribute_filter_type" value="single"/>
<parameter key="attribute" value="Product"/>
<parameter key="attributes" value="|a1|a2|a3|a4"/>
<parameter key="use_except_expression" value="false"/>
<parameter key="value_type" value="nominal"/>
<parameter key="use_value_type_exception" value="false"/>
<parameter key="except_value_type" value="file_path"/>
<parameter key="block_type" value="single_value"/>
<parameter key="use_block_type_exception" value="false"/>
<parameter key="except_block_type" value="single_value"/>
<parameter key="invert_selection" value="false"/>
<parameter key="include_special_attributes" value="true"/>
<parameter key="transform_binominal" value="true"/>
<parameter key="use_underscore_in_name" value="false"/>
</operator>
<operator activated="true" class="generalized_sequential_patterns" compatibility="6.5.002" expanded="true" height="76" name="GSP" width="90" x="380" y="165">
<parameter key="customer_id" value="Customer"/>
<parameter key="time_attribute" value="Time"/>
<parameter key="min_support" value="0.15"/>
<parameter key="window_size" value="0.0"/>
<parameter key="max_gap" value="15.0"/>
<parameter key="min_gap" value="7.0"/>
<parameter key="positive_value" value="true"/>
</operator>
<connect from_op="Read CSV" from_port="output" to_op="Nominal2Binominal" to_port="example set input"/>
<connect from_op="Nominal2Binominal" from_port="example set output" to_op="GSP" to_port="example set"/>
<connect from_op="GSP" from_port="example set" to_port="result 1"/>
<connect from_op="GSP" from_port="patterns" to_port="result 2"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
</process>
</operator>
</process>
This is csv data:
Customer Time Product
Alex 10 bread
Alex 15 butter
Alex 20 caviar
Peter 10 bread
Peter 15 butter
Peter 17 caviar
Peter 20 water
Igor 10 butter
Igor 20 bread
Igor 30 water
Hasan 10 bread
Hasan 20 butter
Hasan 22 caviar
Hasan 50 lemon
Pan 19 butter
Pan 20 bread
Pan 22 caviar
0
Answers
Dortmund, Germany
Where can I get a source code of GSP?
https://github.com/rapidminer/rapidminer-studio/tree/master/src/main/java/com/rapidminer/operator/learner/associations/gsp
Dortmund, Germany
H
That's probably where you need to start looking.
Thanks for your response, it is exactly because I looked at the code that I raised my question. The original question was about the min gap . The help says... As I read it transactions have to be apart in time by at least this amount, so I agree with Shamil that it should That would mean that setting the min gap to a huge number would mean no sequences, but that does not appear to be the case. If I copy Shamil's data into a file and replicate his process as follows I get four transactions in the GSPset even if I have a minimum gap larger than the largest time value. Here's the xml again The long and short is that it does not do what the help document suggests, so Shamil has a fair question.
As a point of interest, if you search this forum you'll see that this issue has shown up before, several times. Probably time for other people to look at the code as well.
looks indeed like a bug to me. I have opened a ticket for this.
Regards,
Marco