The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Getting different results when loading a process vs coding it
I want to create the example text clustering process using the Java APIs. Here's a copy of the original process that comes with the Examples bundle:
Now I want to create this process using the Java API. Here's my code:
Thanks in advance,
Behi
When I run this using this code:
<operator name="Root" class="Process" expanded="yes">
<description text="#ylt#h3#ygt#Clustering text documents#ylt#/h3#ygt##ylt#p#ygt#In this experiment, texts from two newsgroups are read and clustered. To make the clusters better comprehensible, three keywords are extracted for each cluster and added to the cluster description.#ylt#/p#ygt#"/>
<parameter key="logverbosity" value="status"/>
<operator name="TextInput" class="TextInput" expanded="yes">
<parameter key="default_content_language" value="english"/>
<list key="namespaces">
</list>
<parameter key="prune_above" value="10"/>
<parameter key="prune_below" value="5"/>
<list key="texts">
<parameter key="graphics" value="../data/newsgroup/graphics"/>
<parameter key="hardware" value="../data/newsgroup/hardware"/>
</list>
<operator name="StringTokenizer" class="StringTokenizer">
</operator>
<operator name="EnglishStopwordFilter" class="EnglishStopwordFilter">
</operator>
<operator name="TokenLengthFilter" class="TokenLengthFilter">
<parameter key="min_chars" value="5"/>
</operator>
<operator name="PorterStemmer" class="PorterStemmer">
</operator>
</operator>
<operator name="KMeans" class="KMeans">
</operator>
<operator name="AttributeSumClusterCharacterizer" class="AttributeSumClusterCharacterizer">
</operator>
</operator>
The result is:
System.setProperty("rapidminer.home", "C:\\Java\\RapidMiner-4.2");
RapidMiner.init();
Process p = new Process(theProcessFile);
p.run();
If I run the process multiple times, I get the same result. So I assume that the initial centroids are not selected randomly and the outcome is always the same.
IOContainer (2 objects):
A cluster model with the following properties:
Cluster 0 [characterization: graphic buffer model]: 11 items
Cluster 1 [characterization: appl memori crabappl]: 9 items
Total number of items: 20
Now I want to create this process using the Java API. Here's my code:
When I save the process to a file, it looks identical to the original process that comes with the examples bundle with the only difference being that it is wrapped inside a <process> element:
System.setProperty("rapidminer.home", "C:\\Java\\RapidMiner-4.2");
RapidMiner.init();
Process p = new Process();
OperatorChain textInput = (OperatorChain) OperatorService.createOperator("TextInput");
textInput.setParameter(PARAMETER_DEFAULT_CONTENT_LANGUAGE, "english");
textInput.setParameter(PARAMETER_PRUNE_ABOVE, "15");
textInput.setParameter(PARAMETER_PRUNE_BELOW, "5");
List<Object[]> textList = new LinkedList<Object[]>();
textList.add(new Object[] {"graphics","newsgroup/graphics"});
textList.add(new Object[] {"hardware","newsgroup/hardware"});
textInput.setListParameter("texts", textList);
textInput.addOperator(OperatorService.createOperator("StringTokenizer"));
textInput.addOperator(OperatorService.createOperator("EnglishStopwordFilter"));
Operator tlfOperator = OperatorService.createOperator("TokenLengthFilter");
tlfOperator.setParameter("min_chars", "5");
textInput.addOperator(tlfOperator);
textInput.addOperator(OperatorService.createOperator("PorterStemmer"));
p.getRootOperator().addOperator(textInput);
p.getRootOperator().addOperator(OperatorService.createOperator("KMeans"));
p.getRootOperator().addOperator(OperatorService.createOperator("AttributeSumClusterCharacterizer"));
System.out.println(p.getRootOperator().createProcessTree(1));
p.save(new File("Process.xml"));
p.run();
However the result of running the process is different compared to the original process:
<?xml version="1.0" encoding="windows-1252"?>
<process version="4.2">
<operator name="Root" class="Process" expanded="yes">
<operator name="TextInput" class="TextInput" expanded="yes">
<parameter key="default_content_language" value="english"/>
<parameter key="prune_above" value="15"/>
<parameter key="prune_below" value="5"/>
<list key="texts">
<parameter key="graphics" value="newsgroup/graphics"/>
<parameter key="hardware" value="newsgroup/hardware"/>
</list>
<operator name="StringTokenizer" class="StringTokenizer">
</operator>
<operator name="EnglishStopwordFilter" class="EnglishStopwordFilter">
</operator>
<operator name="TokenLengthFilter" class="TokenLengthFilter">
<parameter key="min_chars" value="5"/>
</operator>
<operator name="PorterStemmer" class="PorterStemmer">
</operator>
</operator>
<operator name="KMeans" class="KMeans">
</operator>
<operator name="AttributeSumClusterCharacterizer" class="AttributeSumClusterCharacterizer">
</operator>
</operator>
</process>
Any ideas what is causing this?
IOContainer (2 objects):
A cluster model with the following properties:
Cluster 0 [characterization: graphic buffer memori]: 12 items
Cluster 1 [characterization: appl state problem]: 8 items
Total number of items: 20
Thanks in advance,
Behi
0
Answers
The reason for the difference could be the value of "prune_above". It's 10 in the original process and 15 in yours.
Cheers,
Ingo