The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Rapid Miner is giving different results in 32bit machine and 64bit machines
subhasisdasgupt
Member Posts: 15 Contributor II
I was using Rapid Miner V5.2.002 in a 32bit machine and now I am using V5.2.008 in a 64bit machine. While executing the same process in these two versions I am getting very different results. I am confused which one to consider. I was just trying to analyze reviews of Samsung Galaxy S3 through text mining and I used X-Mean process to cluster the documents. With everything same, my 32bit machine gave two clusters with 141 and 60 documents in cluster 0 and cluster 1 respectively and my 64bit machine gave two clusters with 197 and 4 documents in cluster 0 and cluster 1 respectively. I donno whether it is a bug or not but yes, I am very confused. Kindly help. The XML (V5.2.002 and 32bit machine) file is given below
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.2.002">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.2.002" expanded="true" name="Process">
<process expanded="true" height="1016" width="413">
<operator activated="true" class="text:process_document_from_file" compatibility="5.2.004" expanded="true" height="76" name="Process Documents from Files" width="90" x="45" y="30">
<list key="text_directories">
<parameter key="Samsung" value="D:\Subhasis\text mining\Samsung G3 Review"/>
</list>
<parameter key="keep_text" value="true"/>
<process expanded="true" height="418" width="480">
<operator activated="true" class="text:tokenize" compatibility="5.2.004" expanded="true" height="60" name="Tokenize" width="90" x="45" y="30"/>
<operator activated="true" class="text:transform_cases" compatibility="5.2.004" expanded="true" height="60" name="Transform Cases" width="90" x="179" y="30"/>
<operator activated="true" class="text:filter_stopwords_english" compatibility="5.2.004" expanded="true" height="60" name="Filter Stopwords (English)" width="90" x="45" y="165"/>
<operator activated="false" class="text:stem_snowball" compatibility="5.2.004" expanded="true" height="60" name="Stem (Snowball)" width="90" x="179" y="255"/>
<operator activated="true" class="text:filter_tokens_by_content" compatibility="5.2.004" expanded="true" height="60" name="Filter Tokens (by Content)" width="90" x="246" y="165">
<parameter key="condition" value="equals"/>
<parameter key="string" value="i"/>
<parameter key="regular_expression" value="( i )"/>
<parameter key="invert condition" value="true"/>
</operator>
<connect from_port="document" to_op="Tokenize" to_port="document"/>
<connect from_op="Tokenize" from_port="document" to_op="Transform Cases" to_port="document"/>
<connect from_op="Transform Cases" from_port="document" to_op="Filter Stopwords (English)" to_port="document"/>
<connect from_op="Filter Stopwords (English)" from_port="document" to_op="Filter Tokens (by Content)" to_port="document"/>
<connect from_op="Filter Tokens (by Content)" from_port="document" to_port="document 1"/>
<portSpacing port="source_document" spacing="0"/>
<portSpacing port="sink_document 1" spacing="0"/>
<portSpacing port="sink_document 2" spacing="0"/>
</process>
</operator>
<operator activated="true" breakpoints="after" class="x_means" compatibility="5.2.002" expanded="true" height="76" name="X-Means" width="90" x="45" y="120">
<parameter key="determine_good_start_values" value="true"/>
</operator>
<connect from_op="Process Documents from Files" from_port="example set" to_op="X-Means" to_port="example set"/>
<connect from_op="Process Documents from Files" from_port="word list" to_port="result 1"/>
<connect from_op="X-Means" from_port="cluster model" to_port="result 2"/>
<connect from_op="X-Means" from_port="clustered set" to_port="result 3"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
<portSpacing port="sink_result 4" spacing="0"/>
</process>
</operator>
</process>
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.2.002">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.2.002" expanded="true" name="Process">
<process expanded="true" height="1016" width="413">
<operator activated="true" class="text:process_document_from_file" compatibility="5.2.004" expanded="true" height="76" name="Process Documents from Files" width="90" x="45" y="30">
<list key="text_directories">
<parameter key="Samsung" value="D:\Subhasis\text mining\Samsung G3 Review"/>
</list>
<parameter key="keep_text" value="true"/>
<process expanded="true" height="418" width="480">
<operator activated="true" class="text:tokenize" compatibility="5.2.004" expanded="true" height="60" name="Tokenize" width="90" x="45" y="30"/>
<operator activated="true" class="text:transform_cases" compatibility="5.2.004" expanded="true" height="60" name="Transform Cases" width="90" x="179" y="30"/>
<operator activated="true" class="text:filter_stopwords_english" compatibility="5.2.004" expanded="true" height="60" name="Filter Stopwords (English)" width="90" x="45" y="165"/>
<operator activated="false" class="text:stem_snowball" compatibility="5.2.004" expanded="true" height="60" name="Stem (Snowball)" width="90" x="179" y="255"/>
<operator activated="true" class="text:filter_tokens_by_content" compatibility="5.2.004" expanded="true" height="60" name="Filter Tokens (by Content)" width="90" x="246" y="165">
<parameter key="condition" value="equals"/>
<parameter key="string" value="i"/>
<parameter key="regular_expression" value="( i )"/>
<parameter key="invert condition" value="true"/>
</operator>
<connect from_port="document" to_op="Tokenize" to_port="document"/>
<connect from_op="Tokenize" from_port="document" to_op="Transform Cases" to_port="document"/>
<connect from_op="Transform Cases" from_port="document" to_op="Filter Stopwords (English)" to_port="document"/>
<connect from_op="Filter Stopwords (English)" from_port="document" to_op="Filter Tokens (by Content)" to_port="document"/>
<connect from_op="Filter Tokens (by Content)" from_port="document" to_port="document 1"/>
<portSpacing port="source_document" spacing="0"/>
<portSpacing port="sink_document 1" spacing="0"/>
<portSpacing port="sink_document 2" spacing="0"/>
</process>
</operator>
<operator activated="true" breakpoints="after" class="x_means" compatibility="5.2.002" expanded="true" height="76" name="X-Means" width="90" x="45" y="120">
<parameter key="determine_good_start_values" value="true"/>
</operator>
<connect from_op="Process Documents from Files" from_port="example set" to_op="X-Means" to_port="example set"/>
<connect from_op="Process Documents from Files" from_port="word list" to_port="result 1"/>
<connect from_op="X-Means" from_port="cluster model" to_port="result 2"/>
<connect from_op="X-Means" from_port="clustered set" to_port="result 3"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
<portSpacing port="sink_result 4" spacing="0"/>
</process>
</operator>
</process>
Tagged:
0
Answers
Regards,
Marius
And do you have the same results if you run the process two times in a row on any of the systems?
Our first guess is that the issue may be related to the Random Number Generator. We will definitely investigate the issue, as it's really quite important. It would be nice if you could also test the 64 bit installer and tell us about the results.
Best regards,
Marius
Uninstalled the installed version and reinstalled the 64bit version (5.2.008) of RM and tested the setup. Result was found same (197 documents in cluster 0 and 4 documents in cluster 1). The answers were different for 32 bit machine and 64bit machine. Then I uninstalled the 64bit version and reinstalled the 32bit version of RM and re-tested the setup. The results of both 32bit and 64bit versions of RM were same in the 64bit machine. Infact, 32bit version of RM took longer time to produce the output. So the final outcome is something like this
On 32bit desktop machine 32bit RM V5.2.002 and 32bit RM V5.2.008 produced identical results (141 documents in cluster 0 and 60 documents in cluster 1) and on 64bit machine both 32bit and 64bit RM V5.2.008 produced identical results (197 documents in cluster 0 and 4 documents in cluster 1). So, I am thinking of reinstalling the 64bit RM to my 64bit machine.
Is there any way to check which one is correct or more appropriate?
Regards
Subhasis