The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
association analysis with RM and Weka's Apriori
Hi,
I've just passed to exploring the association analysis in RM. I've tested a few processes, and
got this one in which Apriori implemented in Weka is used on the discretised iris dataset. I've got some strange result: no association rules under a particular parameter setting (in particular confidence=0.9, support=0.1), while rules were expected indeed for this setting.
So I've exported the dicretised iris dataset in csv, loaded it in Weka and run the Apriori with the same parameter setting. The result has included a set of more than 20 rules. I've checked one - it was correct and according to the confidence and support specified parameters.
So I've checked what was different between the two tests (despite the same dataset, and parameter setting as said). The frequent itemsets were displayed and checked in both cases. The difference was coming from here. RM with the W-Apriori did not produce the correct frequent itemsets. Some were skipped. For instance a2=range2 has a count of 50 according to the metadata or the dataset, and it should have appeared as frequent/large itemset in L1 since the required minimum count is 15, but it did not. Obviously if frequent itemsets are wrongly skipped, less rules are produced (in this case none).
Any comments would be appreciated. The process is here attached.
Best,
Dan
I've just passed to exploring the association analysis in RM. I've tested a few processes, and
got this one in which Apriori implemented in Weka is used on the discretised iris dataset. I've got some strange result: no association rules under a particular parameter setting (in particular confidence=0.9, support=0.1), while rules were expected indeed for this setting.
So I've exported the dicretised iris dataset in csv, loaded it in Weka and run the Apriori with the same parameter setting. The result has included a set of more than 20 rules. I've checked one - it was correct and according to the confidence and support specified parameters.
So I've checked what was different between the two tests (despite the same dataset, and parameter setting as said). The frequent itemsets were displayed and checked in both cases. The difference was coming from here. RM with the W-Apriori did not produce the correct frequent itemsets. Some were skipped. For instance a2=range2 has a count of 50 according to the metadata or the dataset, and it should have appeared as frequent/large itemset in L1 since the required minimum count is 15, but it did not. Obviously if frequent itemsets are wrongly skipped, less rules are produced (in this case none).
Any comments would be appreciated. The process is here attached.
Best,
Dan
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.0">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.0.8" expanded="true" name="Process">
<parameter key="logverbosity" value="3"/>
<parameter key="random_seed" value="2001"/>
<parameter key="send_mail" value="1"/>
<parameter key="process_duration_for_mail" value="30"/>
<parameter key="encoding" value="SYSTEM"/>
<parameter key="parallelize_main_process" value="false"/>
<process expanded="true" height="404" width="435">
<operator activated="true" class="retrieve" compatibility="5.0.8" expanded="true" height="60" name="Retrieve (2)" width="90" x="45" y="30">
<parameter key="repository_entry" value="//Samples/data/Iris"/>
</operator>
<operator activated="true" class="discretize_by_frequency" compatibility="5.0.8" expanded="true" height="94" name="FrequencyDiscretization (2)" width="90" x="45" y="120">
<parameter key="return_preprocessing_model" value="false"/>
<parameter key="create_view" value="false"/>
<parameter key="attribute_filter_type" value="all"/>
<parameter key="attribute" value=""/>
<parameter key="use_except_expression" value="false"/>
<parameter key="value_type" value="0"/>
<parameter key="use_value_type_exception" value="false"/>
<parameter key="except_value_type" value="2"/>
<parameter key="block_type" value="0"/>
<parameter key="use_block_type_exception" value="false"/>
<parameter key="except_block_type" value="2"/>
<parameter key="invert_selection" value="false"/>
<parameter key="include_special_attributes" value="false"/>
<parameter key="use_sqrt_of_examples" value="false"/>
<parameter key="number_of_bins" value="5"/>
<parameter key="range_name_type" value="long"/>
<parameter key="automatic_number_of_digits" value="true"/>
<parameter key="number_of_digits" value="-1"/>
</operator>
<operator activated="true" class="multiply" compatibility="5.0.8" expanded="true" height="94" name="Multiply (2)" width="90" x="179" y="120"/>
<operator activated="true" class="weka:W-Apriori" compatibility="5.0.1" expanded="true" height="60" name="W-Apriori (2)" width="90" x="246" y="30">
<parameter key="N" value="100.0"/>
<parameter key="T" value="0.0"/>
<parameter key="C" value="0.9"/>
<parameter key="D" value="0.05"/>
<parameter key="U" value="1.0"/>
<parameter key="M" value="0.1"/>
<parameter key="S" value="-1.0"/>
<parameter key="I" value="true"/>
<parameter key="R" value="false"/>
<parameter key="V" value="false"/>
<parameter key="A" value="false"/>
<parameter key="c" value="-1.0"/>
</operator>
<connect from_op="Retrieve (2)" from_port="output" to_op="FrequencyDiscretization (2)" to_port="example set input"/>
<connect from_op="FrequencyDiscretization (2)" from_port="example set output" to_op="Multiply (2)" to_port="input"/>
<connect from_op="Multiply (2)" from_port="output 1" to_op="W-Apriori (2)" to_port="example set"/>
<connect from_op="Multiply (2)" from_port="output 2" to_port="result 2"/>
<connect from_op="W-Apriori (2)" from_port="associator" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
</process>
</operator>
</process>
0
Answers
I guess the problem is that the mappings might be incorrect for the Weka Apriori: It probably assumes that the first nominal value indicates false, the second true. But the discretization might mix this up as needed, since it isn't important for discretization. (In fact it's somehow missusing it for this task )
I would suggest using RapidMiner's own FP-Growth operator, where you can define the positive and negative value. And it is much faster than Apriori anyway.
Greetings,
Sebastian
I had already applied the suggested solution and was at the stage to compare the result with that of alternative available algorithms as Weka's Apriori (also in terms of execution speed). Hopefully this issue would be in the RM team's attention for one of the future releases.
Best wishes,
Dan
you can use the Remap Binominals operator to correct the mapping for binominal attributes. This way you should be able to apply the weka learner without any problems.
Greetings,
Sebastian