Automatic model selection and application only of the optimal model
Building on Rapidminer's explanation videos on how to automatically select optimal models (3/30/16 RapidMiner Office Hours: Automatic Model Selection, starting at 11:26min; 17 Automated model selection and optimization), I would like to apply the optimal model to a dataset that is different from the dataset that was used to select the model. I already know how to store the optimal parameters of the models (see attached file) and apply them to all models, but I don't know how to select and store only the best model.
So, the question is: How to isolate the optimal model, store the model and the parameters in two files, and apply it with its optimal parameters to a second unlabeled dataset? Thank you for your help!
Best Answer
-
sgenzer Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager
hello @jan_spoerer - so I would use the Model Management Extension in this use case rather than go through all that legwork. I'm showing you the basic structure here; you can tune each of the models separately and then the parameters of the best model can be passed along etc. Hope this makes sense.
<?xml version="1.0" encoding="UTF-8"?><process version="8.0.001">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="8.0.001" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="subprocess" compatibility="8.0.001" expanded="true" height="68" name="Subprocess" width="90" x="45" y="340">
<process expanded="true">
<operator activated="false" class="text:process_document_from_file" compatibility="7.5.000" expanded="true" height="82" name="Process Documents from Files" width="90" x="112" y="34">
<list key="text_directories">
<parameter key="No sanctions" value="C:\Users\camil\Google Drive\Project Cheese Burger\02_Interal\04_Software Development\02_Data_Collection_APIs\Rapidminer\Top 20 banks data (labeled)\Sanctioned countries\Sanctioned no"/>
<parameter key="Yes sanctions" value="C:\Users\camil\Google Drive\Project Cheese Burger\02_Interal\04_Software Development\02_Data_Collection_APIs\Rapidminer\Top 20 banks data (labeled)\Sanctioned countries\Sanctioned yes"/>
</list>
<parameter key="extract_text_only" value="false"/>
<parameter key="prune_method" value="percentual"/>
<parameter key="prune_below_percent" value="5.0"/>
<parameter key="prune_above_percent" value="100.0"/>
<process expanded="true">
<operator activated="true" class="text:tokenize" compatibility="7.5.000" expanded="true" height="68" name="Tokenize" width="90" x="45" y="34"/>
<operator activated="true" class="text:filter_stopwords_english" compatibility="7.5.000" expanded="true" height="68" name="Filter Stopwords (English)" width="90" x="179" y="34"/>
<operator activated="true" class="text:stem_porter" compatibility="7.5.000" expanded="true" height="68" name="Stem (Porter)" width="90" x="380" y="34"/>
<operator activated="false" class="text:generate_n_grams_terms" compatibility="7.5.000" expanded="true" height="68" name="Generate n-Grams (Terms)" width="90" x="380" y="85"/>
<operator activated="false" class="text:filter_by_length" compatibility="7.5.000" expanded="true" height="68" name="Filter Tokens (by Length)" width="90" x="849" y="85"/>
<operator activated="false" class="dummy" compatibility="8.0.001" expanded="true" height="82" name="Lemmatizer" width="90" x="581" y="85"/>
<operator activated="true" class="text:transform_cases" compatibility="7.5.000" expanded="true" height="68" name="Transform Cases" width="90" x="648" y="34"/>
<connect from_port="document" to_op="Tokenize" to_port="document"/>
<connect from_op="Tokenize" from_port="document" to_op="Filter Stopwords (English)" to_port="document"/>
<connect from_op="Filter Stopwords (English)" from_port="document" to_op="Stem (Porter)" to_port="document"/>
<connect from_op="Stem (Porter)" from_port="document" to_op="Transform Cases" to_port="document"/>
<connect from_op="Generate n-Grams (Terms)" from_port="document" to_op="Filter Tokens (by Length)" to_port="document"/>
<connect from_op="Transform Cases" from_port="document" to_port="document 1"/>
<portSpacing port="source_document" spacing="0"/>
<portSpacing port="sink_document 1" spacing="0"/>
<portSpacing port="sink_document 2" spacing="0"/>
<description align="center" color="yellow" colored="false" height="504" resized="false" width="209" x="540" y="162">Lemmatization:<br/><br/>Lemmatization, the reduction of the words to their lemma (the dictionary form of the word), helps lessen both the computationals task and the duration of the analyses. Lemmatization reduces the number of words by mapping all inflextions and variations of a word to the same lemma. It also disambiguates the semantic meaning of the words in a text by assigning words with the same meaning to their lemma. In sentiment analysis, for example, &quot;improve,&quot; improved,&quot; &quot;improvement,&quot; and &quot;improves&quot; all point equally to an optimistic sentiment and share the same root; differentiating them would serve no purpose for a sentiment analyses task. The lemmatizer does distinguis between different parts of speech and notes whether the word is used as a verb or a noun, as these uses resolve the two different lemmas (as one would fine in any standard dictionary). For instance. &quot;binding contract,&quot; &quot;being in abind&quot;,&quot; and&quot;bind together&quot; woudl resolve to distinct lemmas, although they all use forms of &quot;bind&quot;.&quot; A typical lemmatizer is the WordNet lemmatizer, several other stemmers and lemmatizers are described in Manning et al. (2008).</description>
<description align="center" color="yellow" colored="false" height="469" resized="false" width="152" x="11" y="160">Tokenization:<br/><br/>Tokenization means segmenting a text, which is essentially a string of symbols including letters, spaces, punctuation marks, and numbers, into words and phrases. For example, a good tokenizer transformms &quot;he's&quot; into &quot;hi is&quot; and &quot;Dr.&quot; into &quot;doctor,&quot; treats expressions such as &quot;business model&quot; as a single token, and processes hyphenation (Manning, Raghavan, Sch&#252;tze, 2008). Depending on the purpose of the analysis, the tokenization may remove punctuation.</description>
<description align="center" color="yellow" colored="false" height="642" resized="false" width="180" x="165" y="160">Stop word removal:<br/><br/>The other two preprocessing steps in text analysis are used depending on the purpose of the analysis. For example, when one wishes to differentiate between the specific languages used by two authors, one may wish to determine how frequently they use comon words such as &quot;the,&quot; &quot;and,&quot; &quot;that.&quot; These words are called &quot;stop words&quot; and serve grammatical puropses only, but their frequency and distribution may help &quot;fingerprint &quot; and author. In contrast, when one is interested in sentiment analysis, words that carry semantic meaning matter; stop words are generally held not to carry semantic meaning, so for such analysis the should be removed in preprocesing. This is easily achieved with any of the standard text processing packages, which maintain dictionaries of stop words.<br></description>
<description align="center" color="yellow" colored="false" height="1092" resized="false" width="180" x="352" y="160">Stemming:<br><br>In other cases, information on the part of speech is not relevant for the analysis, and a simple removal of the prefixes and suffixes to reach the stem of the word is sufficient. The stem is the root of the word, the smallest unit of text that conveys the shared semantic meaning for the word family. For example, the stem of &#8220;teaching&#8221; is &#8220;teach.&#8221; Because stemmers do not look up meaning in the context of parts of speech, verbs and nouns resolve to the same root, which reduces complexity but at the cost of a loss of information. A stemmer reduces the size of the text analysis problem by reducing the text&#8217;s distinct words to a dictionary of roots, a minimally&#8208;sized collection, which enables the fastest analysis. Stemmers are standard in any programming language or toolkit that enables text analysis. The standard stemmer (Manning and Sch&#252;tze, 1999) for English language texts is the Porter Stemmer (Porter, 1980). Stemming may mask valuable information. For example, the Porter Stemmer produces on the corpus of patent titles the token &#8220;autom,&#8221; which when applied to the standard American English corpus used in the literature, the Brown corpus (Ku&#269;era and Francis, 1967), finds that the stem correponds to &#8220;automobile,&#8221; whereas the expected word is &#8220;automate.&#8221; It also introduces ambiguity when both &#8220;automaton&#8221; and &#8220;automate&#8221; stem to &#8220;autom.&#8221;<br></description>
<description align="center" color="yellow" colored="false" height="516" resized="false" width="180" x="754" y="161">Stemming vs. lemmatization<br/><br/>While there is no generalized rule in the literature about where to use a stemmer versus a<br>lemmatizer, all text preprocessing workflows should include at least one of the two. In the case<br>of well&#8208;structured texts with limited lexicons, stemming is sufficient. However, for complex<br>technical texts, such as patents, lemmatization is recommended. Further background in<br>grammars, lemmatizers, stemmers, and text processing in general can be found in the<br>comprehensive and widely cited textbook by Manning and Sch&#252;tze (1999) and in Pustejovsky<br>and Stubbs (2012</description>
</process>
</operator>
<operator activated="false" class="retrieve" compatibility="8.0.001" expanded="true" height="68" name="Retrieve Wordlist" width="90" x="112" y="187">
<parameter key="repository_entry" value="Parameter_camillo/Wordlist"/>
</operator>
<operator activated="false" class="text:process_document_from_file" compatibility="7.5.000" expanded="true" height="82" name="Process Documents Unkown" width="90" x="246" y="289">
<list key="text_directories">
<parameter key="No sanctions" value="C:\Users\camil\Google Drive\Project Cheese Burger\02_Interal\04_Software Development\02_Data_Collection_APIs\Rapidminer\Top 20 banks data (labeled)\Sanctioned countries\Sanctioned no"/>
<parameter key="Yes sanctions" value="C:\Users\camil\Google Drive\Project Cheese Burger\02_Interal\04_Software Development\02_Data_Collection_APIs\Rapidminer\Top 20 banks data (labeled)\Sanctioned countries\Sanctioned yes"/>
</list>
<process expanded="true">
<operator activated="true" class="text:tokenize" compatibility="7.5.000" expanded="true" height="68" name="Tokenize (2)" width="90" x="45" y="34"/>
<operator activated="true" class="text:filter_stopwords_english" compatibility="7.5.000" expanded="true" height="68" name="Filter Stopwords (2)" width="90" x="179" y="34"/>
<operator activated="true" class="text:stem_porter" compatibility="7.5.000" expanded="true" height="68" name="Stem (2)" width="90" x="380" y="34"/>
<operator activated="false" class="text:generate_n_grams_terms" compatibility="7.5.000" expanded="true" height="68" name="Generate n-Grams (2)" width="90" x="380" y="85"/>
<operator activated="false" class="text:filter_by_length" compatibility="7.5.000" expanded="true" height="68" name="Filter Tokens (2)" width="90" x="849" y="85"/>
<operator activated="false" class="dummy" compatibility="8.0.001" expanded="true" height="82" name="Lemmatizer (2)" width="90" x="581" y="85"/>
<operator activated="true" class="text:transform_cases" compatibility="7.5.000" expanded="true" height="68" name="Transform Cases (2)" width="90" x="648" y="34"/>
<connect from_port="document" to_op="Tokenize (2)" to_port="document"/>
<connect from_op="Tokenize (2)" from_port="document" to_op="Filter Stopwords (2)" to_port="document"/>
<connect from_op="Filter Stopwords (2)" from_port="document" to_op="Stem (2)" to_port="document"/>
<connect from_op="Stem (2)" from_port="document" to_op="Transform Cases (2)" to_port="document"/>
<connect from_op="Generate n-Grams (2)" from_port="document" to_op="Filter Tokens (2)" to_port="document"/>
<connect from_op="Transform Cases (2)" from_port="document" to_port="document 1"/>
<portSpacing port="source_document" spacing="0"/>
<portSpacing port="sink_document 1" spacing="0"/>
<portSpacing port="sink_document 2" spacing="0"/>
<description align="center" color="yellow" colored="false" height="504" resized="false" width="209" x="540" y="162">Lemmatization:<br/><br/>Lemmatization, the reduction of the words to their lemma (the dictionary form of the word), helps lessen both the computationals task and the duration of the analyses. Lemmatization reduces the number of words by mapping all inflextions and variations of a word to the same lemma. It also disambiguates the semantic meaning of the words in a text by assigning words with the same meaning to their lemma. In sentiment analysis, for example, &quot;improve,&quot; improved,&quot; &quot;improvement,&quot; and &quot;improves&quot; all point equally to an optimistic sentiment and share the same root; differentiating them would serve no purpose for a sentiment analyses task. The lemmatizer does distinguis between different parts of speech and notes whether the word is used as a verb or a noun, as these uses resolve the two different lemmas (as one would fine in any standard dictionary). For instance. &quot;binding contract,&quot; &quot;being in abind&quot;,&quot; and&quot;bind together&quot; woudl resolve to distinct lemmas, although they all use forms of &quot;bind&quot;.&quot; A typical lemmatizer is the WordNet lemmatizer, several other stemmers and lemmatizers are described in Manning et al. (2008).</description>
<description align="center" color="yellow" colored="false" height="469" resized="false" width="152" x="11" y="160">Tokenization:<br/><br/>Tokenization means segmenting a text, which is essentially a string of symbols including letters, spaces, punctuation marks, and numbers, into words and phrases. For example, a good tokenizer transformms &quot;he's&quot; into &quot;hi is&quot; and &quot;Dr.&quot; into &quot;doctor,&quot; treats expressions such as &quot;business model&quot; as a single token, and processes hyphenation (Manning, Raghavan, Sch&#252;tze, 2008). Depending on the purpose of the analysis, the tokenization may remove punctuation.</description>
<description align="center" color="yellow" colored="false" height="642" resized="false" width="180" x="165" y="160">Stop word removal:<br/><br/>The other two preprocessing steps in text analysis are used depending on the purpose of the analysis. For example, when one wishes to differentiate between the specific languages used by two authors, one may wish to determine how frequently they use comon words such as &quot;the,&quot; &quot;and,&quot; &quot;that.&quot; These words are called &quot;stop words&quot; and serve grammatical puropses only, but their frequency and distribution may help &quot;fingerprint &quot; and author. In contrast, when one is interested in sentiment analysis, words that carry semantic meaning matter; stop words are generally held not to carry semantic meaning, so for such analysis the should be removed in preprocesing. This is easily achieved with any of the standard text processing packages, which maintain dictionaries of stop words.<br></description>
<description align="center" color="yellow" colored="false" height="1092" resized="false" width="180" x="352" y="160">Stemming:<br><br>In other cases, information on the part of speech is not relevant for the analysis, and a simple removal of the prefixes and suffixes to reach the stem of the word is sufficient. The stem is the root of the word, the smallest unit of text that conveys the shared semantic meaning for the word family. For example, the stem of &#8220;teaching&#8221; is &#8220;teach.&#8221; Because stemmers do not look up meaning in the context of parts of speech, verbs and nouns resolve to the same root, which reduces complexity but at the cost of a loss of information. A stemmer reduces the size of the text analysis problem by reducing the text&#8217;s distinct words to a dictionary of roots, a minimally&#8208;sized collection, which enables the fastest analysis. Stemmers are standard in any programming language or toolkit that enables text analysis. The standard stemmer (Manning and Sch&#252;tze, 1999) for English language texts is the Porter Stemmer (Porter, 1980). Stemming may mask valuable information. For example, the Porter Stemmer produces on the corpus of patent titles the token &#8220;autom,&#8221; which when applied to the standard American English corpus used in the literature, the Brown corpus (Ku&#269;era and Francis, 1967), finds that the stem correponds to &#8220;automobile,&#8221; whereas the expected word is &#8220;automate.&#8221; It also introduces ambiguity when both &#8220;automaton&#8221; and &#8220;automate&#8221; stem to &#8220;autom.&#8221;<br></description>
<description align="center" color="yellow" colored="false" height="516" resized="false" width="180" x="754" y="161">Stemming vs. lemmatization<br/><br/>While there is no generalized rule in the literature about where to use a stemmer versus a<br>lemmatizer, all text preprocessing workflows should include at least one of the two. In the case<br>of well&#8208;structured texts with limited lexicons, stemming is sufficient. However, for complex<br>technical texts, such as patents, lemmatization is recommended. Further background in<br>grammars, lemmatizers, stemmers, and text processing in general can be found in the<br>comprehensive and widely cited textbook by Manning and Sch&#252;tze (1999) and in Pustejovsky<br>and Stubbs (2012</description>
</process>
</operator>
<operator activated="false" class="retrieve" compatibility="8.0.001" expanded="true" height="68" name="Retrieve Model" width="90" x="380" y="187">
<parameter key="repository_entry" value="Parameter_camillo/model"/>
</operator>
<operator activated="false" class="apply_model" compatibility="8.0.001" expanded="true" height="82" name="Apply Model (7)" width="90" x="581" y="238">
<list key="application_parameters"/>
</operator>
<operator activated="false" class="read_excel" compatibility="8.0.001" expanded="true" height="68" name="Read Excel" width="90" x="45" y="493">
<list key="annotations"/>
<parameter key="time_zone" value="SYSTEM"/>
<list key="data_set_meta_data_information"/>
<description align="center" color="blue" colored="true" width="126">Load data from website<br/></description>
</operator>
<operator activated="false" class="web:retrieve_webpages" compatibility="7.3.000" expanded="true" height="68" name="Get Pages" width="90" x="179" y="493"/>
<operator activated="false" class="text:process_document_from_data" compatibility="7.5.000" expanded="true" height="82" name="Process Documents from Data" width="90" x="313" y="493">
<parameter key="keep_text" value="true"/>
<list key="specify_weights"/>
<process expanded="true">
<operator activated="true" class="text:tokenize" compatibility="7.5.000" expanded="true" height="68" name="Tokenize (3)" width="90" x="45" y="34"/>
<operator activated="true" class="text:filter_stopwords_english" compatibility="7.5.000" expanded="true" height="68" name="Filter Stopwords (3)" width="90" x="246" y="34"/>
<operator activated="true" class="text:stem_porter" compatibility="7.5.000" expanded="true" height="68" name="Stem (3)" width="90" x="380" y="34"/>
<operator activated="false" class="text:generate_n_grams_terms" compatibility="7.5.000" expanded="true" height="68" name="Generate n-Grams (3)" width="90" x="380" y="85"/>
<operator activated="false" class="text:filter_by_length" compatibility="7.5.000" expanded="true" height="68" name="Filter Tokens (3)" width="90" x="849" y="85"/>
<operator activated="false" class="dummy" compatibility="8.0.001" expanded="true" height="82" name="Lemmatizer (3)" width="90" x="581" y="85"/>
<operator activated="true" class="text:transform_cases" compatibility="7.5.000" expanded="true" height="68" name="Transform Cases (3)" width="90" x="648" y="34"/>
<connect from_port="document" to_op="Tokenize (3)" to_port="document"/>
<connect from_op="Tokenize (3)" from_port="document" to_op="Filter Stopwords (3)" to_port="document"/>
<connect from_op="Filter Stopwords (3)" from_port="document" to_op="Stem (3)" to_port="document"/>
<connect from_op="Stem (3)" from_port="document" to_op="Transform Cases (3)" to_port="document"/>
<connect from_op="Generate n-Grams (3)" from_port="document" to_op="Filter Tokens (3)" to_port="document"/>
<connect from_op="Transform Cases (3)" from_port="document" to_port="document 1"/>
<portSpacing port="source_document" spacing="0"/>
<portSpacing port="sink_document 1" spacing="0"/>
<portSpacing port="sink_document 2" spacing="0"/>
</process>
</operator>
<operator activated="false" class="set_role" compatibility="8.0.001" expanded="true" height="82" name="Set Role" width="90" x="380" y="493">
<list key="set_additional_roles"/>
</operator>
<operator activated="false" class="optimize_parameters_grid" compatibility="8.0.001" expanded="true" height="103" name="Optimize Parameters (4)" width="90" x="581" y="391">
<list key="parameters">
<parameter key="Select Subprocess.select_which" value="[1.0;5;5;linear]"/>
</list>
<process expanded="true">
<operator activated="true" class="select_subprocess" compatibility="8.0.001" expanded="true" height="82" name="Select Subprocess" width="90" x="380" y="34">
<parameter key="select_which" value="5"/>
<process expanded="true">
<operator activated="true" class="optimize_parameters_grid" compatibility="8.0.001" expanded="true" height="103" name="Optimize Parameters (3)" width="90" x="112" y="34">
<list key="parameters">
<parameter key="Decision Tree (4).criterion" value="gain_ratio,information_gain,accuracy"/>
<parameter key="Decision Tree (4).minimal_gain" value="[0.1;10;2;linear]"/>
</list>
<process expanded="true">
<operator activated="true" class="concurrency:cross_validation" compatibility="8.0.001" expanded="true" height="145" name="Cross Validation (3)" width="90" x="380" y="34">
<process expanded="true">
<operator activated="true" class="concurrency:parallel_decision_tree" compatibility="8.0.001" expanded="true" height="103" name="Decision Tree (4)" width="90" x="112" y="34">
<parameter key="criterion" value="accuracy"/>
<parameter key="confidence" value="0.5"/>
<parameter key="minimal_gain" value="10.0"/>
</operator>
<connect from_port="training set" to_op="Decision Tree (4)" to_port="training set"/>
<connect from_op="Decision Tree (4)" from_port="model" to_port="model"/>
<portSpacing port="source_training set" spacing="0"/>
<portSpacing port="sink_model" spacing="0"/>
<portSpacing port="sink_through 1" spacing="0"/>
</process>
<process expanded="true">
<operator activated="true" class="apply_model" compatibility="8.0.001" expanded="true" height="82" name="Apply Model (3)" width="90" x="112" y="34">
<list key="application_parameters"/>
</operator>
<operator activated="true" class="performance" compatibility="8.0.001" expanded="true" height="82" name="Performance (3)" width="90" x="246" y="34"/>
<connect from_port="model" to_op="Apply Model (3)" to_port="model"/>
<connect from_port="test set" to_op="Apply Model (3)" to_port="unlabelled data"/>
<connect from_op="Apply Model (3)" from_port="labelled data" to_op="Performance (3)" to_port="labelled data"/>
<connect from_op="Performance (3)" from_port="performance" to_port="performance 1"/>
<connect from_op="Performance (3)" from_port="example set" to_port="test set results"/>
<portSpacing port="source_model" spacing="0"/>
<portSpacing port="source_test set" spacing="0"/>
<portSpacing port="source_through 1" spacing="0"/>
<portSpacing port="sink_test set results" spacing="0"/>
<portSpacing port="sink_performance 1" spacing="0"/>
<portSpacing port="sink_performance 2" spacing="0"/>
</process>
</operator>
<connect from_port="input 1" to_op="Cross Validation (3)" to_port="example set"/>
<connect from_op="Cross Validation (3)" from_port="performance 1" to_port="performance"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="source_input 2" spacing="0"/>
<portSpacing port="sink_performance" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
</process>
</operator>
<operator activated="true" class="remember" compatibility="8.0.001" expanded="true" height="68" name="Remember" width="90" x="246" y="136">
<parameter key="name" value="DT"/>
<parameter key="io_object" value="ParameterSet"/>
</operator>
<connect from_port="input 1" to_op="Optimize Parameters (3)" to_port="input 1"/>
<connect from_op="Optimize Parameters (3)" from_port="performance" to_port="output 1"/>
<connect from_op="Optimize Parameters (3)" from_port="parameter" to_op="Remember" to_port="store"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="source_input 2" spacing="0"/>
<portSpacing port="sink_output 1" spacing="0"/>
<portSpacing port="sink_output 2" spacing="0"/>
<description align="center" color="yellow" colored="false" height="105" resized="false" width="180" x="444" y="90">For testing store model included<br/></description>
</process>
<process expanded="true">
<operator activated="true" class="optimize_parameters_grid" compatibility="8.0.001" expanded="true" height="103" name="Optimize Parameters (2)" width="90" x="112" y="34">
<list key="parameters">
<parameter key="Random Forest (2).number_of_trees" value="[1.0;60;2;quadratic]"/>
</list>
<process expanded="true">
<operator activated="true" class="concurrency:cross_validation" compatibility="8.0.001" expanded="true" height="145" name="Cross Validation (2)" width="90" x="380" y="136">
<process expanded="true">
<operator activated="true" class="concurrency:parallel_random_forest" compatibility="8.0.001" expanded="true" height="103" name="Random Forest (2)" width="90" x="179" y="34">
<parameter key="number_of_trees" value="60"/>
</operator>
<connect from_port="training set" to_op="Random Forest (2)" to_port="training set"/>
<connect from_op="Random Forest (2)" from_port="model" to_port="model"/>
<connect from_op="Random Forest (2)" from_port="exampleSet" to_port="through 1"/>
<portSpacing port="source_training set" spacing="0"/>
<portSpacing port="sink_model" spacing="0"/>
<portSpacing port="sink_through 1" spacing="0"/>
<portSpacing port="sink_through 2" spacing="0"/>
</process>
<process expanded="true">
<operator activated="true" class="apply_model" compatibility="8.0.001" expanded="true" height="82" name="Apply Model (2)" width="90" x="112" y="34">
<list key="application_parameters"/>
</operator>
<operator activated="true" class="performance" compatibility="8.0.001" expanded="true" height="82" name="Performance (2)" width="90" x="246" y="34"/>
<connect from_port="model" to_op="Apply Model (2)" to_port="model"/>
<connect from_port="test set" to_op="Apply Model (2)" to_port="unlabelled data"/>
<connect from_op="Apply Model (2)" from_port="labelled data" to_op="Performance (2)" to_port="labelled data"/>
<connect from_op="Performance (2)" from_port="performance" to_port="performance 1"/>
<connect from_op="Performance (2)" from_port="example set" to_port="test set results"/>
<portSpacing port="source_model" spacing="0"/>
<portSpacing port="source_test set" spacing="0"/>
<portSpacing port="source_through 1" spacing="0"/>
<portSpacing port="source_through 2" spacing="0"/>
<portSpacing port="sink_test set results" spacing="0"/>
<portSpacing port="sink_performance 1" spacing="0"/>
<portSpacing port="sink_performance 2" spacing="0"/>
</process>
</operator>
<connect from_port="input 1" to_op="Cross Validation (2)" to_port="example set"/>
<connect from_op="Cross Validation (2)" from_port="performance 1" to_port="performance"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="source_input 2" spacing="0"/>
<portSpacing port="sink_performance" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
</process>
</operator>
<operator activated="true" class="remember" compatibility="8.0.001" expanded="true" height="68" name="Remember (3)" width="90" x="112" y="238">
<parameter key="name" value="RF"/>
<parameter key="io_object" value="ParameterSet"/>
</operator>
<connect from_port="input 1" to_op="Optimize Parameters (2)" to_port="input 1"/>
<connect from_op="Optimize Parameters (2)" from_port="performance" to_port="output 1"/>
<connect from_op="Optimize Parameters (2)" from_port="parameter" to_op="Remember (3)" to_port="store"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="source_input 2" spacing="0"/>
<portSpacing port="sink_output 1" spacing="0"/>
<portSpacing port="sink_output 2" spacing="0"/>
</process>
<process expanded="true">
<operator activated="true" class="optimize_parameters_grid" compatibility="8.0.001" expanded="true" height="103" name="Optimize Parameters (Grid)" width="90" x="45" y="34">
<list key="parameters">
<parameter key="Rule Induction (2).criterion" value="information_gain,accuracy"/>
</list>
<process expanded="true">
<operator activated="true" class="concurrency:cross_validation" compatibility="8.0.001" expanded="true" height="145" name="Cross Validation" width="90" x="380" y="34">
<process expanded="true">
<operator activated="true" class="rule_induction" compatibility="8.0.001" expanded="true" height="82" name="Rule Induction (2)" width="90" x="179" y="34">
<parameter key="criterion" value="accuracy"/>
</operator>
<connect from_port="training set" to_op="Rule Induction (2)" to_port="training set"/>
<connect from_op="Rule Induction (2)" from_port="model" to_port="model"/>
<portSpacing port="source_training set" spacing="0"/>
<portSpacing port="sink_model" spacing="0"/>
<portSpacing port="sink_through 1" spacing="0"/>
</process>
<process expanded="true">
<operator activated="true" class="apply_model" compatibility="8.0.001" expanded="true" height="82" name="Apply Model" width="90" x="112" y="34">
<list key="application_parameters"/>
</operator>
<operator activated="true" class="performance" compatibility="8.0.001" expanded="true" height="82" name="Performance" width="90" x="246" y="34"/>
<connect from_port="model" to_op="Apply Model" to_port="model"/>
<connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
<connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
<connect from_op="Performance" from_port="performance" to_port="performance 1"/>
<connect from_op="Performance" from_port="example set" to_port="test set results"/>
<portSpacing port="source_model" spacing="0"/>
<portSpacing port="source_test set" spacing="0"/>
<portSpacing port="source_through 1" spacing="0"/>
<portSpacing port="sink_test set results" spacing="0"/>
<portSpacing port="sink_performance 1" spacing="0"/>
<portSpacing port="sink_performance 2" spacing="0"/>
</process>
</operator>
<connect from_port="input 1" to_op="Cross Validation" to_port="example set"/>
<connect from_op="Cross Validation" from_port="performance 1" to_port="performance"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="source_input 2" spacing="0"/>
<portSpacing port="sink_performance" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
</process>
</operator>
<operator activated="true" class="remember" compatibility="8.0.001" expanded="true" height="68" name="Remember (2)" width="90" x="112" y="187">
<parameter key="name" value="RI"/>
<parameter key="io_object" value="ParameterSet"/>
</operator>
<connect from_port="input 1" to_op="Optimize Parameters (Grid)" to_port="input 1"/>
<connect from_op="Optimize Parameters (Grid)" from_port="performance" to_port="output 1"/>
<connect from_op="Optimize Parameters (Grid)" from_port="parameter" to_op="Remember (2)" to_port="store"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="source_input 2" spacing="0"/>
<portSpacing port="sink_output 1" spacing="0"/>
<portSpacing port="sink_output 2" spacing="0"/>
</process>
<process expanded="true">
<operator activated="true" class="optimize_parameters_grid" compatibility="8.0.001" expanded="true" height="103" name="Optimize Parameters (5)" width="90" x="45" y="34">
<list key="parameters">
<parameter key="k-NN.k" value="[1.0;9;1;quadratic]"/>
</list>
<process expanded="true">
<operator activated="true" class="concurrency:cross_validation" compatibility="8.0.001" expanded="true" height="145" name="Cross Validation (4)" width="90" x="380" y="34">
<process expanded="true">
<operator activated="true" class="k_nn" compatibility="8.0.001" expanded="true" height="82" name="k-NN" width="90" x="112" y="34">
<parameter key="k" value="9"/>
</operator>
<connect from_port="training set" to_op="k-NN" to_port="training set"/>
<connect from_op="k-NN" from_port="model" to_port="model"/>
<portSpacing port="source_training set" spacing="0"/>
<portSpacing port="sink_model" spacing="0"/>
<portSpacing port="sink_through 1" spacing="0"/>
</process>
<process expanded="true">
<operator activated="true" class="apply_model" compatibility="8.0.001" expanded="true" height="82" name="Apply Model (4)" width="90" x="112" y="34">
<list key="application_parameters"/>
</operator>
<operator activated="true" class="performance" compatibility="8.0.001" expanded="true" height="82" name="Performance (4)" width="90" x="246" y="34"/>
<connect from_port="model" to_op="Apply Model (4)" to_port="model"/>
<connect from_port="test set" to_op="Apply Model (4)" to_port="unlabelled data"/>
<connect from_op="Apply Model (4)" from_port="labelled data" to_op="Performance (4)" to_port="labelled data"/>
<connect from_op="Performance (4)" from_port="performance" to_port="performance 1"/>
<connect from_op="Performance (4)" from_port="example set" to_port="test set results"/>
<portSpacing port="source_model" spacing="0"/>
<portSpacing port="source_test set" spacing="0"/>
<portSpacing port="source_through 1" spacing="0"/>
<portSpacing port="sink_test set results" spacing="0"/>
<portSpacing port="sink_performance 1" spacing="0"/>
<portSpacing port="sink_performance 2" spacing="0"/>
</process>
</operator>
<connect from_port="input 1" to_op="Cross Validation (4)" to_port="example set"/>
<connect from_op="Cross Validation (4)" from_port="performance 1" to_port="performance"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="source_input 2" spacing="0"/>
<portSpacing port="sink_performance" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
</process>
</operator>
<operator activated="true" class="remember" compatibility="8.0.001" expanded="true" height="68" name="Remember (4)" width="90" x="112" y="187">
<parameter key="name" value="NB"/>
<parameter key="io_object" value="ParameterSet"/>
</operator>
<operator activated="false" class="optimize_parameters_grid" compatibility="8.0.001" expanded="true" height="103" name="Optimize Parameters (7)" width="90" x="246" y="85">
<list key="parameters">
<parameter key="Naive Bayes.laplace_correction" value="true,false"/>
</list>
<process expanded="true">
<operator activated="true" class="concurrency:cross_validation" compatibility="8.0.001" expanded="true" height="145" name="Cross Validation (6)" width="90" x="380" y="34">
<process expanded="true">
<operator activated="true" class="naive_bayes" compatibility="8.0.001" expanded="true" height="82" name="Naive Bayes (2)" width="90" x="179" y="34"/>
<connect from_port="training set" to_op="Naive Bayes (2)" to_port="training set"/>
<connect from_op="Naive Bayes (2)" from_port="model" to_port="model"/>
<portSpacing port="source_training set" spacing="0"/>
<portSpacing port="sink_model" spacing="0"/>
<portSpacing port="sink_through 1" spacing="0"/>
</process>
<process expanded="true">
<operator activated="true" class="apply_model" compatibility="8.0.001" expanded="true" height="82" name="Apply Model (6)" width="90" x="112" y="34">
<list key="application_parameters"/>
</operator>
<operator activated="true" class="performance" compatibility="8.0.001" expanded="true" height="82" name="Performance (6)" width="90" x="246" y="34"/>
<connect from_port="model" to_op="Apply Model (6)" to_port="model"/>
<connect from_port="test set" to_op="Apply Model (6)" to_port="unlabelled data"/>
<connect from_op="Apply Model (6)" from_port="labelled data" to_op="Performance (6)" to_port="labelled data"/>
<connect from_op="Performance (6)" from_port="performance" to_port="performance 1"/>
<connect from_op="Performance (6)" from_port="example set" to_port="test set results"/>
<portSpacing port="source_model" spacing="0"/>
<portSpacing port="source_test set" spacing="0"/>
<portSpacing port="source_through 1" spacing="0"/>
<portSpacing port="sink_test set results" spacing="0"/>
<portSpacing port="sink_performance 1" spacing="0"/>
<portSpacing port="sink_performance 2" spacing="0"/>
</process>
</operator>
<connect from_port="input 1" to_op="Cross Validation (6)" to_port="example set"/>
<connect from_op="Cross Validation (6)" from_port="performance 1" to_port="performance"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="source_input 2" spacing="0"/>
<portSpacing port="sink_performance" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
</process>
</operator>
<operator activated="false" class="remember" compatibility="8.0.001" expanded="true" height="68" name="Remember (6)" width="90" x="313" y="238">
<parameter key="name" value="NB"/>
<parameter key="io_object" value="ParameterSet"/>
</operator>
<connect from_port="input 1" to_op="Optimize Parameters (5)" to_port="input 1"/>
<connect from_op="Optimize Parameters (5)" from_port="performance" to_port="output 1"/>
<connect from_op="Optimize Parameters (5)" from_port="parameter" to_op="Remember (4)" to_port="store"/>
<connect from_op="Optimize Parameters (7)" from_port="parameter" to_op="Remember (6)" to_port="store"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="source_input 2" spacing="0"/>
<portSpacing port="sink_output 1" spacing="0"/>
<portSpacing port="sink_output 2" spacing="0"/>
<description align="center" color="yellow" colored="false" height="105" resized="false" width="180" x="133" y="356">NB --&gt; NKN</description>
</process>
<process expanded="true">
<operator activated="true" class="optimize_parameters_grid" compatibility="8.0.001" expanded="true" height="103" name="Optimize Parameters (6)" width="90" x="45" y="34">
<list key="parameters">
<parameter key="SVM.C" value="[-1.0;1;2;quadratic]"/>
</list>
<process expanded="true">
<operator activated="true" class="concurrency:cross_validation" compatibility="8.0.001" expanded="true" height="145" name="Cross Validation (5)" width="90" x="380" y="34">
<process expanded="true">
<operator activated="true" class="support_vector_machine" compatibility="8.0.001" expanded="true" height="124" name="SVM" width="90" x="179" y="85">
<parameter key="kernel_type" value="radial"/>
<parameter key="kernel_gamma" value="0.0"/>
<parameter key="C" value="-1.0"/>
<parameter key="convergence_epsilon" value="0.0"/>
</operator>
<connect from_port="training set" to_op="SVM" to_port="training set"/>
<connect from_op="SVM" from_port="model" to_port="model"/>
<portSpacing port="source_training set" spacing="0"/>
<portSpacing port="sink_model" spacing="0"/>
<portSpacing port="sink_through 1" spacing="0"/>
<description align="center" color="yellow" colored="false" height="105" resized="false" width="180" x="394" y="150">Type your comment</description>
</process>
<process expanded="true">
<operator activated="true" class="apply_model" compatibility="8.0.001" expanded="true" height="82" name="Apply Model (5)" width="90" x="112" y="34">
<list key="application_parameters"/>
</operator>
<operator activated="true" class="performance" compatibility="8.0.001" expanded="true" height="82" name="Performance (5)" width="90" x="246" y="34"/>
<connect from_port="model" to_op="Apply Model (5)" to_port="model"/>
<connect from_port="test set" to_op="Apply Model (5)" to_port="unlabelled data"/>
<connect from_op="Apply Model (5)" from_port="labelled data" to_op="Performance (5)" to_port="labelled data"/>
<connect from_op="Performance (5)" from_port="performance" to_port="performance 1"/>
<connect from_op="Performance (5)" from_port="example set" to_port="test set results"/>
<portSpacing port="source_model" spacing="0"/>
<portSpacing port="source_test set" spacing="0"/>
<portSpacing port="source_through 1" spacing="0"/>
<portSpacing port="sink_test set results" spacing="0"/>
<portSpacing port="sink_performance 1" spacing="0"/>
<portSpacing port="sink_performance 2" spacing="0"/>
</process>
</operator>
<connect from_port="input 1" to_op="Cross Validation (5)" to_port="example set"/>
<connect from_op="Cross Validation (5)" from_port="performance 1" to_port="performance"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="source_input 2" spacing="0"/>
<portSpacing port="sink_performance" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
</process>
</operator>
<operator activated="true" class="remember" compatibility="8.0.001" expanded="true" height="68" name="Remember (5)" width="90" x="112" y="187">
<parameter key="name" value="SVM"/>
<parameter key="io_object" value="ParameterSet"/>
</operator>
<connect from_port="input 1" to_op="Optimize Parameters (6)" to_port="input 1"/>
<connect from_op="Optimize Parameters (6)" from_port="performance" to_port="output 1"/>
<connect from_op="Optimize Parameters (6)" from_port="parameter" to_op="Remember (5)" to_port="store"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="source_input 2" spacing="0"/>
<portSpacing port="sink_output 1" spacing="0"/>
<portSpacing port="sink_output 2" spacing="0"/>
</process>
</operator>
<connect from_port="input 1" to_op="Select Subprocess" to_port="input 1"/>
<connect from_op="Select Subprocess" from_port="output 1" to_port="performance"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="source_input 2" spacing="0"/>
<portSpacing port="sink_performance" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
</process>
</operator>
<operator activated="false" class="compare_rocs" compatibility="8.0.001" expanded="true" height="82" name="Compare ROCs" width="90" x="447" y="391">
<parameter key="number_of_folds" value="20"/>
<process expanded="true">
<operator activated="true" class="recall" compatibility="8.0.001" expanded="true" height="68" name="Recall" width="90" x="313" y="136">
<parameter key="name" value="DT"/>
<parameter key="io_object" value="ParameterSet"/>
<parameter key="remove_from_store" value="false"/>
</operator>
<operator activated="true" class="set_parameters" compatibility="8.0.001" expanded="true" height="82" name="Set Parameters" width="90" x="447" y="136">
<list key="name_map">
<parameter key="DT" value="Decision Tree ROC"/>
</list>
</operator>
<operator activated="true" class="concurrency:parallel_decision_tree" compatibility="8.0.001" expanded="true" height="82" name="Decision Tree ROC" width="90" x="313" y="34"/>
<operator activated="true" class="store" compatibility="8.0.001" expanded="true" height="68" name="Store Model" width="90" x="514" y="34">
<parameter key="repository_entry" value="Parameter_camillo/ModelDC"/>
</operator>
<operator activated="true" class="recall" compatibility="8.0.001" expanded="true" height="68" name="Recall (2)" width="90" x="313" y="238">
<parameter key="name" value="RI"/>
<parameter key="io_object" value="ParameterSet"/>
<parameter key="remove_from_store" value="false"/>
</operator>
<operator activated="true" class="set_parameters" compatibility="8.0.001" expanded="true" height="82" name="Set Parameters (2)" width="90" x="447" y="238">
<list key="name_map">
<parameter key="RI" value="RI ROC"/>
</list>
</operator>
<operator activated="true" class="rule_induction" compatibility="8.0.001" expanded="true" height="82" name="RI ROC" width="90" x="313" y="340"/>
<operator activated="true" class="store" compatibility="8.0.001" expanded="true" height="68" name="Store Model (2)" width="90" x="514" y="340">
<parameter key="repository_entry" value="Parameter_camillo/ModelRI"/>
</operator>
<operator activated="true" class="recall" compatibility="8.0.001" expanded="true" height="68" name="Recall (3)" width="90" x="313" y="442">
<parameter key="name" value="RF"/>
<parameter key="io_object" value="ParameterSet"/>
<parameter key="remove_from_store" value="false"/>
</operator>
<operator activated="true" class="set_parameters" compatibility="8.0.001" expanded="true" height="82" name="Set Parameters (3)" width="90" x="447" y="442">
<list key="name_map">
<parameter key="RF" value="RF ROC"/>
</list>
</operator>
<operator activated="true" class="concurrency:parallel_random_forest" compatibility="8.0.001" expanded="true" height="82" name="RF ROC" width="90" x="313" y="544"/>
<operator activated="true" class="store" compatibility="8.0.001" expanded="true" height="68" name="Store Model (3)" width="90" x="514" y="544">
<parameter key="repository_entry" value="Parameter_camillo/ModelRF"/>
</operator>
<operator activated="true" class="recall" compatibility="8.0.001" expanded="true" height="68" name="Recall (4)" width="90" x="313" y="646">
<parameter key="name" value="NB"/>
<parameter key="io_object" value="ParameterSet"/>
<parameter key="remove_from_store" value="false"/>
</operator>
<operator activated="true" class="set_parameters" compatibility="8.0.001" expanded="true" height="82" name="Set Parameters (4)" width="90" x="447" y="646">
<list key="name_map">
<parameter key="NB" value="NB ROC"/>
</list>
</operator>
<operator activated="true" class="concurrency:parallel_random_forest" compatibility="8.0.001" expanded="true" height="82" name="NB ROC" width="90" x="313" y="748">
<description align="center" color="transparent" colored="false" width="126">NB - ROC ist aktuell NBB</description>
</operator>
<operator activated="true" class="store" compatibility="8.0.001" expanded="true" height="68" name="Store Model (4)" width="90" x="648" y="748">
<parameter key="repository_entry" value="Parameter_camillo/ModelKNN"/>
</operator>
<operator activated="true" class="recall" compatibility="8.0.001" expanded="true" height="68" name="Recall (5)" width="90" x="313" y="850">
<parameter key="name" value="SVM"/>
<parameter key="io_object" value="ParameterSet"/>
<parameter key="remove_from_store" value="false"/>
</operator>
<operator activated="true" class="set_parameters" compatibility="8.0.001" expanded="true" height="82" name="Set Parameters (5)" width="90" x="447" y="850">
<list key="name_map">
<parameter key="SMV" value="SMV ROC"/>
</list>
</operator>
<operator activated="true" class="concurrency:parallel_random_forest" compatibility="8.0.001" expanded="true" height="82" name="SMV ROC" width="90" x="313" y="952"/>
<operator activated="true" class="store" compatibility="8.0.001" expanded="true" height="68" name="Store Model (5)" width="90" x="514" y="952">
<parameter key="repository_entry" value="Parameter_camillo/ModelSMV"/>
</operator>
<connect from_port="train 1" to_op="Decision Tree ROC" to_port="training set"/>
<connect from_port="train 2" to_op="RI ROC" to_port="training set"/>
<connect from_port="train 3" to_op="RF ROC" to_port="training set"/>
<connect from_port="train 4" to_op="NB ROC" to_port="training set"/>
<connect from_port="train 5" to_op="SMV ROC" to_port="training set"/>
<connect from_op="Recall" from_port="result" to_op="Set Parameters" to_port="parameter set"/>
<connect from_op="Decision Tree ROC" from_port="model" to_op="Store Model" to_port="input"/>
<connect from_op="Store Model" from_port="through" to_port="model 1"/>
<connect from_op="Recall (2)" from_port="result" to_op="Set Parameters (2)" to_port="parameter set"/>
<connect from_op="RI ROC" from_port="model" to_op="Store Model (2)" to_port="input"/>
<connect from_op="Store Model (2)" from_port="through" to_port="model 2"/>
<connect from_op="Recall (3)" from_port="result" to_op="Set Parameters (3)" to_port="parameter set"/>
<connect from_op="RF ROC" from_port="model" to_op="Store Model (3)" to_port="input"/>
<connect from_op="Store Model (3)" from_port="through" to_port="model 3"/>
<connect from_op="Recall (4)" from_port="result" to_op="Set Parameters (4)" to_port="parameter set"/>
<connect from_op="NB ROC" from_port="model" to_op="Store Model (4)" to_port="input"/>
<connect from_op="Store Model (4)" from_port="through" to_port="model 4"/>
<connect from_op="Recall (5)" from_port="result" to_op="Set Parameters (5)" to_port="parameter set"/>
<connect from_op="SMV ROC" from_port="model" to_op="Store Model (5)" to_port="input"/>
<connect from_op="Store Model (5)" from_port="through" to_port="model 5"/>
<portSpacing port="source_train 1" spacing="0"/>
<portSpacing port="source_train 2" spacing="0"/>
<portSpacing port="source_train 3" spacing="0"/>
<portSpacing port="source_train 4" spacing="546"/>
<portSpacing port="source_train 5" spacing="0"/>
<portSpacing port="source_train 6" spacing="0"/>
<portSpacing port="sink_model 1" spacing="0"/>
<portSpacing port="sink_model 2" spacing="0"/>
<portSpacing port="sink_model 3" spacing="0"/>
<portSpacing port="sink_model 4" spacing="609"/>
<portSpacing port="sink_model 5" spacing="0"/>
<portSpacing port="sink_model 6" spacing="0"/>
<description align="center" color="yellow" colored="false" height="105" resized="false" width="180" x="423" y="745">NB ROC = KNN ROC</description>
</process>
</operator>
<connect from_op="Retrieve Wordlist" from_port="output" to_op="Process Documents Unkown" to_port="word list"/>
<connect from_op="Process Documents Unkown" from_port="example set" to_op="Apply Model (7)" to_port="unlabelled data"/>
<connect from_op="Retrieve Model" from_port="output" to_op="Apply Model (7)" to_port="model"/>
<connect from_op="Read Excel" from_port="output" to_op="Get Pages" to_port="Example Set"/>
<connect from_op="Get Pages" from_port="Example Set" to_op="Process Documents from Data" to_port="example set"/>
<portSpacing port="source_in 1" spacing="0"/>
<portSpacing port="sink_out 1" spacing="0"/>
</process>
<description align="center" color="transparent" colored="false" width="2
Answers
hello @jan_spoerer - can you please post your entire XML process so we can take a look?
Thanks.
Scott
Hi Scott,
Thanks for your reply! I attached the process.
Regards,
Jan
Hi Scott,
Thank you for your work! The process looks good and I think it will achieve what I am looking for, but I am getting a couple of errors:
Do you know how to handle these errors? Sorry for that follow-up question!
Jan
Hello @jan_spoerer - so I screwed up by not looking inside your Process Documents with Data operator and noticing that you were using the MeaningCloud extension. My bad. Hence my RapidMiner Studio did not recognize the operator and called it a "dummy". If you just go in there and put a new "Lemmitizer" operator there (or just delete it as you have disabled it anyway), that should get rid of the "dummy" issue.
As for the Random Forest, that's interesting (at least for me). You see Random Forest gives you a collection of models, as opposed to one model. I'll have to ponder this...
Scott