Joining two sub-processes together, first classification and then clustering
![amir_askary_sha](https://us.v-cdn.net/6030995/uploads/defaultavatar/nCCNNSPK1YM69.jpg)
![](https://s3.amazonaws.com/rapidminer.community/vanilla-rank-images/contributor-16x16.png )
Hi there,
I have two sub-processes and I want to join them together in order to have a complete automated process.
The first process is a CLASSIFICATION task, which gets some text documents and then by Applying a previousely trained model puts the documents into 5 classes: politic, sport, science, ... . The output is a table with document IDs and the classes as label.
the second process is a CLUSTERING task, which gets some text documents and then using k-means algorithm, puts the documents in different clusters.
I want to join these two processes together, meaning first applying the classification and then applying the clustering 5 times for each group.
I don't know how to achieve this but I feel I should use the loop operator and somehow a split table operator to break the table result of first sub-process and loop over sub-tables.
Any help is appreciated. thanks
Best Answer
-
sgenzer Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959
Community Manager
hmmm...perhaps this is something you can work from? My "amir1" is your first process and my "amir2" is your second process. I was not sure if your label was actually called "classes" but you can change it in the parameters for "Loop Values".
<?xml version="1.0" encoding="UTF-8"?><process version="7.6.001">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="7.6.001" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="productivity:execute_process" compatibility="7.6.001" expanded="true" height="82" name="Execute amir1" width="90" x="45" y="34">
<parameter key="process_location" value="//RapidMiner OneDrive/random community stuff/amir1"/>
<list key="macros"/>
</operator>
<operator activated="true" class="concurrency:loop_values" compatibility="7.6.001" expanded="true" height="82" name="Loop Values" width="90" x="179" y="34">
<parameter key="attribute" value="classes"/>
<process expanded="true">
<operator activated="true" class="filter_examples" compatibility="7.6.001" expanded="true" height="103" name="Filter Examples" width="90" x="45" y="34">
<list key="filters_list">
<parameter key="filters_entry_key" value="classes.equals.%{loop_value}"/>
</list>
</operator>
<operator activated="true" class="productivity:execute_process" compatibility="7.6.001" expanded="true" height="68" name="Execute amir2" width="90" x="179" y="34">
<parameter key="process_location" value="//RapidMiner OneDrive/random community stuff/amir2"/>
<list key="macros"/>
</operator>
<connect from_port="input 1" to_op="Filter Examples" to_port="example set input"/>
<connect from_op="Filter Examples" from_port="example set output" to_op="Execute amir2" to_port="input 1"/>
<connect from_op="Execute amir2" from_port="result 1" to_port="output 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="source_input 2" spacing="0"/>
<portSpacing port="sink_output 1" spacing="0"/>
<portSpacing port="sink_output 2" spacing="0"/>
</process>
</operator>
<connect from_op="Execute amir1" from_port="result 1" to_op="Loop Values" to_port="input 1"/>
<connect from_op="Loop Values" from_port="output 1" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>Scott
0
Answers
hello @amir_askary_sha - I'd be happy to help but it would be very helpful to see what you have so far. Please paste your XML in this thread using the </> tool.
Thanks.
Scott
Hi @sgenzer
here is the xml of first part of the process, the classification:
And here is the second part, the clustering:
I want to concatenate them together.
P.S: There are 5 groups generated by the classification process.
Wow, that is almost it. Thank you very much, you saved me so much time.
There is just something left that I should find out, and that is that the second process doesn't care about its input. It always reads all the documents from a xml url source. I should make it now somehow that it fetches the document IDs from its input (that is the result of the classification part) and then it queries just for those documents, not all documents.
oh that's fine. Just use Extract Macro from your input to grab the document IDs and use that in your URLs.
Scott