Errors Twitter data, Suddenly Attribute Label Missing, Inside Cross Validation, SVM and Apply Model

alinebora · August 2018

Hi everyone, I'm doing my process in rapidminer for my thesis, I have little time and I'm desperate need for help. I beg you please I have no one to help me with rapidminer, everyone else I know uses R or Python.

I am getting an many errors in my process:

First, Inside the 'Cross validation' operator it does not identify my attribute as a label. For test, I've added a 'Set role' operator towards 'SVM' operator and still doesn't work. Then for training, also 'Performance' operator doesn't recognize my label anymore.
When I go back to process and add 'Set role' operator again before 'Cross validation' it doesn't work as well. In the list, my attribute is not listed anymore for some unknown reason, while in the ealier steps of the process it does show.
Lastly to make even worse the 'Apply model' also does not recognize my attribute (also tried to add 'Set role' there). The attribute label "text" is just gone from the list.

I'm analyzing text from twitter. But since I had several queries to analyze for my research, I didn't get them from Rapidminer,instead I downloaded the data from twitter app developer and added to rapidminer. I post my xml process here.

Please help.

<?xml version="1.0" encoding="UTF-8"?><process version="9.0.000">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="9.0.000" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="read_csv" compatibility="9.0.000" expanded="true" height="68" name="Read CSV" width="90" x="45" y="34">
<parameter key="csv_file" value="C:\Users\aline\OneDrive\Documentos\AlineXX.csv"/>
<parameter key="column_separators" value=","/>
<parameter key="skip_comments" value="true"/>
<parameter key="date_format" value="dd/MM/yyyy HH:mm"/>
<list key="annotations"/>
<parameter key="encoding" value="windows-1252"/>
<list key="data_set_meta_data_information">
<parameter key="0" value="ï»¿user_id.true.polynominal.attribute"/>
<parameter key="1" value="status_id.true.polynominal.attribute"/>
<parameter key="2" value="created_at.true.polynominal.attribute"/>
<parameter key="3" value="screen_name.true.polynominal.attribute"/>
<parameter key="4" value="text.true.polynominal.attribute"/>
<parameter key="5" value="source.true.polynominal.attribute"/>
<parameter key="6" value="display_text_width.true.polynominal.attribute"/>
<parameter key="7" value="reply_to_status_id.true.polynominal.attribute"/>
<parameter key="8" value="reply_to_user_id.true.polynominal.attribute"/>
<parameter key="9" value="reply_to_screen_name.true.polynominal.attribute"/>
<parameter key="10" value="is_quote.true.polynominal.attribute"/>
<parameter key="11" value="is_retweet.true.polynominal.attribute"/>
<parameter key="12" value="favorite_count.true.polynominal.attribute"/>
<parameter key="13" value="retweet_count.true.polynominal.attribute"/>
<parameter key="14" value="hashtags.true.polynominal.attribute"/>
<parameter key="15" value="symbols.true.polynominal.attribute"/>
<parameter key="16" value="urls_url.true.polynominal.attribute"/>
<parameter key="17" value="urls_t\.co.true.polynominal.attribute"/>
<parameter key="18" value="urls_expanded_url.true.polynominal.attribute"/>
<parameter key="19" value="media_url.true.polynominal.attribute"/>
<parameter key="20" value="media_t\.co.true.polynominal.attribute"/>
<parameter key="21" value="media_expanded_url.true.polynominal.attribute"/>
<parameter key="22" value="media_type.true.polynominal.attribute"/>
<parameter key="23" value="ext_media_url.true.polynominal.attribute"/>
<parameter key="24" value="ext_media_t\.co.true.polynominal.attribute"/>
<parameter key="25" value="ext_media_expanded_url.true.polynominal.attribute"/>
<parameter key="26" value="ext_media_type.true.polynominal.attribute"/>
<parameter key="27" value="mentions_user_id.true.polynominal.attribute"/>
<parameter key="28" value="mentions_screen_name.true.polynominal.attribute"/>
<parameter key="29" value="lang.true.polynominal.attribute"/>
<parameter key="30" value="quoted_status_id.true.polynominal.attribute"/>
<parameter key="31" value="quoted_text.true.polynominal.attribute"/>
<parameter key="32" value="quoted_created_at.true.polynominal.attribute"/>
<parameter key="33" value="quoted_source.true.polynominal.attribute"/>
<parameter key="34" value="quoted_favorite_count.true.polynominal.attribute"/>
<parameter key="35" value="quoted_retweet_count.true.polynominal.attribute"/>
<parameter key="36" value="quoted_user_id.true.polynominal.attribute"/>
<parameter key="37" value="quoted_screen_name.true.polynominal.attribute"/>
<parameter key="38" value="quoted_name.true.polynominal.attribute"/>
<parameter key="39" value="quoted_followers_count.true.polynominal.attribute"/>
<parameter key="40" value="quoted_friends_count.true.polynominal.attribute"/>
<parameter key="41" value="quoted_statuses_count.true.polynominal.attribute"/>
<parameter key="42" value="quoted_location.true.polynominal.attribute"/>
<parameter key="43" value="quoted_description.true.polynominal.attribute"/>
<parameter key="44" value="quoted_verified.true.polynominal.attribute"/>
<parameter key="45" value="retweet_status_id.true.polynominal.attribute"/>
<parameter key="46" value="retweet_text.true.polynominal.attribute"/>
<parameter key="47" value="retweet_created_at.true.polynominal.attribute"/>
<parameter key="48" value="retweet_source.true.polynominal.attribute"/>
<parameter key="49" value="retweet_favorite_count.true.polynominal.attribute"/>
<parameter key="50" value="retweet_retweet_count.true.polynominal.attribute"/>
<parameter key="51" value="retweet_user_id.true.polynominal.attribute"/>
<parameter key="52" value="retweet_screen_name.true.polynominal.attribute"/>
<parameter key="53" value="retweet_name.true.polynominal.attribute"/>
<parameter key="54" value="retweet_followers_count.true.polynominal.attribute"/>
<parameter key="55" value="retweet_friends_count.true.polynominal.attribute"/>
<parameter key="56" value="retweet_statuses_count.true.polynominal.attribute"/>
<parameter key="57" value="retweet_location.true.polynominal.attribute"/>
<parameter key="58" value="retweet_description.true.polynominal.attribute"/>
<parameter key="59" value="retweet_verified.true.polynominal.attribute"/>
<parameter key="60" value="place_url.true.polynominal.attribute"/>
<parameter key="61" value="place_name.true.polynominal.attribute"/>
<parameter key="62" value="place_full_name.true.polynominal.attribute"/>
<parameter key="63" value="place_type.true.polynominal.attribute"/>
<parameter key="64" value="country.true.polynominal.attribute"/>
<parameter key="65" value="country_code.true.polynominal.attribute"/>
<parameter key="66" value="geo_coords.true.polynominal.attribute"/>
<parameter key="67" value="coords_coords.true.polynominal.attribute"/>
<parameter key="68" value="bbox_coords.true.polynominal.attribute"/>
<parameter key="69" value="status_url.true.polynominal.attribute"/>
<parameter key="70" value="name.true.polynominal.attribute"/>
<parameter key="71" value="location.true.polynominal.attribute"/>
<parameter key="72" value="description.true.polynominal.attribute"/>
<parameter key="73" value="url.true.polynominal.attribute"/>
<parameter key="74" value="protected.true.polynominal.attribute"/>
<parameter key="75" value="followers_count.true.integer.attribute"/>
<parameter key="76" value="friends_count.true.polynominal.attribute"/>
<parameter key="77" value="listed_count.true.polynominal.attribute"/>
<parameter key="78" value="statuses_count.true.polynominal.attribute"/>
<parameter key="79" value="favourites_count.true.polynominal.attribute"/>
<parameter key="80" value="account_created_at.true.polynominal.attribute"/>
<parameter key="81" value="verified.true.polynominal.attribute"/>
<parameter key="82" value="profile_url.true.polynominal.attribute"/>
<parameter key="83" value="profile_expanded_url.true.polynominal.attribute"/>
<parameter key="84" value="account_lang.true.polynominal.attribute"/>
<parameter key="85" value="profile_banner_url.true.polynominal.attribute"/>
<parameter key="86" value="profile_background_url.true.polynominal.attribute"/>
<parameter key="87" value="profile_image_url.true.polynominal.attribute"/>
<parameter key="88" value="att89.true.polynominal.attribute"/>
<parameter key="89" value="att90.true.polynominal.attribute"/>
</list>
</operator>
<operator activated="true" class="set_role" compatibility="9.0.000" expanded="true" height="82" name="Set Role" width="90" x="45" y="136">
<parameter key="attribute_name" value="text"/>
<parameter key="target_role" value="label"/>
<list key="set_additional_roles"/>
</operator>
<operator activated="true" class="filter_examples" compatibility="9.0.000" expanded="true" height="103" name="Filter Examples" width="90" x="179" y="34">
<list key="filters_list">
<parameter key="filters_entry_key" value="text.is_not_missing."/>
<parameter key="filters_entry_key" value="text.contains.strike"/>
</list>
</operator>
<operator activated="true" class="nominal_to_text" compatibility="9.0.000" expanded="true" height="82" name="Nominal to Text" width="90" x="179" y="187">
<parameter key="attribute_filter_type" value="subset"/>
<parameter key="attributes" value="text"/>
<parameter key="include_special_attributes" value="true"/>
</operator>
<operator activated="true" class="text Smiley Tongue rocess_document_from_data" compatibility="8.1.000" expanded="true" height="82" name="Process Documents from Data" width="90" x="313" y="85">
<parameter key="select_attributes_and_weights" value="true"/>
<list key="specify_weights">
<parameter key="text" value="1.0"/>
</list>
<process expanded="true">
<operator activated="true" class="text:tokenize" compatibility="8.1.000" expanded="true" height="68" name="Tokenize" width="90" x="112" y="34"/>
<operator activated="true" class="text:transform_cases" compatibility="8.1.000" expanded="true" height="68" name="Transform Cases" width="90" x="246" y="34"/>
<operator activated="true" class="text:filter_stopwords_english" compatibility="8.1.000" expanded="true" height="68" name="Filter Stopwords (English)" width="90" x="380" y="34"/>
<connect from_port="document" to_op="Tokenize" to_port="document"/>
<connect from_op="Tokenize" from_port="document" to_op="Transform Cases" to_port="document"/>
<connect from_op="Transform Cases" from_port="document" to_op="Filter Stopwords (English)" to_port="document"/>
<connect from_op="Filter Stopwords (English)" from_port="document" to_port="document 1"/>
<portSpacing port="source_document" spacing="0"/>
<portSpacing port="sink_document 1" spacing="0"/>
<portSpacing port="sink_document 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="text:create_document" compatibility="8.1.000" expanded="true" height="68" name="Create Document" width="90" x="45" y="289">
<parameter key="text" value="My flight has been cancelled
I'm very tired because nobody is giving information
We have been waiting at the airport for hours
I don't want to fly in this company ever again
I have a big problem with this cancellation
The service is horrible, nobody gives an explanation
I'm going on a business trip
I'm going on vacation
Travelling with family
Flight is delayed"/>
</operator>
<operator activated="true" class="text Smiley Tongue rocess_documents" compatibility="8.1.000" expanded="true" height="103" name="Process Documents" width="90" x="313" y="289">
<process expanded="true">
<operator activated="true" class="text:tokenize" compatibility="8.1.000" expanded="true" height="68" name="Tokenize (2)" width="90" x="112" y="34"/>
<operator activated="true" class="text:transform_cases" compatibility="8.1.000" expanded="true" height="68" name="Transform Cases (2)" width="90" x="246" y="34"/>
<operator activated="true" class="text:filter_stopwords_english" compatibility="8.1.000" expanded="true" height="68" name="Filter Stopwords (2)" width="90" x="380" y="34"/>
<connect from_port="document" to_op="Tokenize (2)" to_port="document"/>
<connect from_op="Tokenize (2)" from_port="document" to_op="Transform Cases (2)" to_port="document"/>
<connect from_op="Transform Cases (2)" from_port="document" to_op="Filter Stopwords (2)" to_port="document"/>
<connect from_op="Filter Stopwords (2)" from_port="document" to_port="document 1"/>
<portSpacing port="source_document" spacing="0"/>
<portSpacing port="sink_document 1" spacing="0"/>
<portSpacing port="sink_document 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="set_role" compatibility="9.0.000" expanded="true" height="82" name="Set Role (4)" width="90" x="447" y="34">
<parameter key="attribute_name" value="ï»¿user_id"/>
<parameter key="target_role" value="label"/>
<list key="set_additional_roles"/>
</operator>
<operator activated="true" class="concurrency:cross_validation" compatibility="9.0.000" expanded="true" height="145" name="Cross Validation" width="90" x="581" y="34">
<process expanded="true">
<operator activated="true" class="set_role" compatibility="9.0.000" expanded="true" height="82" name="Set Role (3)" width="90" x="44" y="34">
<parameter key="attribute_name" value="ï»¿user_id"/>
<parameter key="target_role" value="label"/>
<list key="set_additional_roles"/>
</operator>
<operator activated="true" class="naive_bayes" compatibility="9.0.000" expanded="true" height="82" name="Naive Bayes" width="90" x="246" y="34"/>
<connect from_port="training set" to_op="Set Role (3)" to_port="example set input"/>
<connect from_op="Set Role (3)" from_port="example set output" to_op="Naive Bayes" to_port="training set"/>
<connect from_op="Naive Bayes" from_port="model" to_port="model"/>
<portSpacing port="source_training set" spacing="0"/>
<portSpacing port="sink_model" spacing="0"/>
<portSpacing port="sink_through 1" spacing="0"/>
</process>
<process expanded="true">
<operator activated="true" class="apply_model" compatibility="9.0.000" expanded="true" height="82" name="Apply Model (2)" width="90" x="45" y="34">
<list key="application_parameters"/>
</operator>
<operator activated="true" class="set_role" compatibility="9.0.000" expanded="true" height="82" name="Set Role (2)" width="90" x="112" y="136">
<parameter key="attribute_name" value="ï»¿user_id"/>
<parameter key="target_role" value="label"/>
<list key="set_additional_roles"/>
</operator>
<operator activated="true" class="performance" compatibility="9.0.000" expanded="true" height="82" name="Performance" width="90" x="246" y="85"/>
<connect from_port="model" to_op="Apply Model (2)" to_port="model"/>
<connect from_port="test set" to_op="Apply Model (2)" to_port="unlabelled data"/>
<connect from_op="Apply Model (2)" from_port="labelled data" to_op="Set Role (2)" to_port="example set input"/>
<connect from_op="Set Role (2)" from_port="example set output" to_op="Performance" to_port="labelled data"/>
<connect from_op="Performance" from_port="performance" to_port="performance 1"/>
<portSpacing port="source_model" spacing="0"/>
<portSpacing port="source_test set" spacing="0"/>
<portSpacing port="source_through 1" spacing="0"/>
<portSpacing port="sink_test set results" spacing="0"/>
<portSpacing port="sink_performance 1" spacing="0"/>
<portSpacing port="sink_performance 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="apply_model" compatibility="9.0.000" expanded="true" height="82" name="Apply Model" width="90" x="514" y="289">
<list key="application_parameters"/>
</operator>
<connect from_op="Read CSV" from_port="output" to_op="Set Role" to_port="example set input"/>
<connect from_op="Set Role" from_port="example set output" to_op="Filter Examples" to_port="example set input"/>
<connect from_op="Filter Examples" from_port="example set output" to_op="Nominal to Text" to_port="example set input"/>
<connect from_op="Nominal to Text" from_port="example set output" to_op="Process Documents from Data" to_port="example set"/>
<connect from_op="Process Documents from Data" from_port="example set" to_op="Set Role (4)" to_port="example set input"/>
<connect from_op="Process Documents from Data" from_port="word list" to_op="Process Documents" to_port="word list"/>
<connect from_op="Create Document" from_port="output" to_op="Process Documents" to_port="documents 1"/>
<connect from_op="Process Documents" from_port="example set" to_op="Apply Model" to_port="unlabelled data"/>
<connect from_op="Set Role (4)" from_port="example set output" to_op="Cross Validation" to_port="example set"/>
<connect from_op="Cross Validation" from_port="model" to_op="Apply Model" to_port="model"/>
<connect from_op="Cross Validation" from_port="example set" to_port="result 1"/>
<connect from_op="Apply Model" from_port="labelled data" to_port="result 2"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
</process>
</operator>
</process>

Thomas_Ott · August 2018

@alinebora I made some tweaks. It appears that the Twitter operators you added aren't playing nicely for some reason. My guess it has to do with an encoding issue. I also disabled the store operators. They all repath to my laptop so they'd break for you. I just tested this, it works on my end.

<?xml version="1.0" encoding="UTF-8"?><process version="8.2.001">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="8.2.001" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="subprocess" compatibility="8.2.001" expanded="true" height="82" name="Retrieve Twitter Data" width="90" x="45" y="34">
        <process expanded="true">
          <operator activated="true" class="set_macros" compatibility="8.2.001" expanded="true" height="68" name="Set Macros" width="90" x="45" y="34">
            <list key="macros">
              <parameter key="keyword1" value="#flight"/>
              <parameter key="keyword2" value="#airlines"/>
              <parameter key="keyword3" value="#airport"/>
              <parameter key="retweetcount" value="5"/>
            </list>
            <description align="center" color="transparent" colored="false" width="126">Set global variables here. Such as keyword search.</description>
          </operator>
          <operator activated="false" class="retrieve" compatibility="8.2.001" expanded="true" height="68" name="Retrieve Twitter Content Ideas" width="90" x="45" y="340">
            <parameter key="repository_entry" value="../data/%{keyword1} Twitter Content Ideas"/>
          </operator>
          <operator activated="true" class="social_media:search_twitter" compatibility="8.0.010" expanded="true" height="68" name="Search Twitter for Keyword3" width="90" x="179" y="238">
            <parameter key="connection" value="NewConnection"/>
            <parameter key="query" value="airfrance"/>
            <parameter key="language" value="en"/>
          </operator>
          <operator activated="true" class="social_media:search_twitter" compatibility="8.0.010" expanded="true" height="68" name="Search Twitter for Keyword2" width="90" x="179" y="136">
            <parameter key="connection" value="NewConnection"/>
            <parameter key="query" value="easyjet"/>
            <parameter key="language" value="en"/>
          </operator>
          <operator activated="true" class="social_media:search_twitter" compatibility="8.0.010" expanded="true" height="68" name="Search Twitter for Keyword 1" width="90" x="179" y="34">
            <parameter key="connection" value="NewConnection"/>
            <parameter key="query" value="ryanair"/>
            <parameter key="language" value="en"/>
          </operator>
          <operator activated="false" class="social_media:search_twitter" compatibility="8.1.000" expanded="true" height="68" name="Search Twitter for Keyword 4" width="90" x="45" y="442">
            <parameter key="connection" value="NewConnection"/>
            <parameter key="query" value="alitalia"/>
            <parameter key="limit" value="1000"/>
            <parameter key="language" value="en"/>
          </operator>
          <operator activated="false" class="social_media:search_twitter" compatibility="8.1.000" expanded="true" height="68" name="Search Twitter" width="90" x="45" y="493">
            <parameter key="connection" value="NewConnection"/>
            <parameter key="query" value="klm"/>
            <parameter key="limit" value="1000"/>
            <parameter key="language" value="en"/>
          </operator>
          <operator activated="false" class="store" compatibility="8.2.001" expanded="true" height="68" name="Store Data for later reuse" width="90" x="715" y="34">
            <parameter key="repository_entry" value="//Local Repository/processes/Thom1"/>
          </operator>
          <operator activated="true" class="social_media:search_twitter" compatibility="8.0.010" expanded="true" height="68" name="Search Twitter for Keyword3 (2)" width="90" x="179" y="340">
            <parameter key="connection" value="NewConnection"/>
            <parameter key="query" value="alitalia"/>
            <parameter key="language" value="en"/>
          </operator>
          <operator activated="true" class="social_media:search_twitter" compatibility="8.0.010" expanded="true" height="68" name="Search Twitter for Keyword3 (3)" width="90" x="179" y="442">
            <parameter key="connection" value="NewConnection"/>
            <parameter key="query" value="klm"/>
            <parameter key="language" value="en"/>
          </operator>
          <operator activated="true" class="append" compatibility="8.2.001" expanded="true" height="166" name="Append Data Set together" width="90" x="447" y="34"/>
          <operator activated="true" class="remove_duplicates" compatibility="8.2.001" expanded="true" height="103" name="Remove Duplicate IDs" width="90" x="581" y="34">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="Id"/>
            <parameter key="include_special_attributes" value="true"/>
          </operator>
          <connect from_op="Search Twitter for Keyword3" from_port="output" to_op="Append Data Set together" to_port="example set 3"/>
          <connect from_op="Search Twitter for Keyword2" from_port="output" to_op="Append Data Set together" to_port="example set 2"/>
          <connect from_op="Search Twitter for Keyword 1" from_port="output" to_op="Append Data Set together" to_port="example set 1"/>
          <connect from_op="Search Twitter for Keyword3 (2)" from_port="output" to_op="Append Data Set together" to_port="example set 4"/>
          <connect from_op="Search Twitter for Keyword3 (3)" from_port="output" to_op="Append Data Set together" to_port="example set 5"/>
          <connect from_op="Append Data Set together" from_port="merged set" to_op="Remove Duplicate IDs" to_port="example set input"/>
          <connect from_op="Remove Duplicate IDs" from_port="example set output" to_port="out 1"/>
          <portSpacing port="source_in 1" spacing="0"/>
          <portSpacing port="sink_out 1" spacing="0"/>
          <portSpacing port="sink_out 2" spacing="0"/>
        </process>
        <description align="center" color="transparent" colored="false" width="126">Retrieves Twitter Data, Appends, and Stores</description>
      </operator>
      <operator activated="true" class="subprocess" compatibility="8.2.001" expanded="true" height="82" name="ETL Subprocess" width="90" x="179" y="34">
        <process expanded="true">
          <operator activated="true" class="remove_duplicates" compatibility="8.2.001" expanded="true" height="103" name="Remove Duplicates" width="90" x="45" y="34">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="From-User"/>
            <description align="center" color="transparent" colored="false" width="126">Remove Duplicate Tweets from same user</description>
          </operator>
          <operator activated="true" class="generate_attributes" compatibility="8.2.001" expanded="true" height="82" name="Generate Arbitrary Label" width="90" x="179" y="34">
            <list key="function_descriptions">
              <parameter key="label" value="if([Retweet-Count]&lt;eval(%{retweetcount}),&quot;Not Important&quot;,&quot;Important&quot;)"/>
            </list>
          </operator>
          <operator activated="false" class="filter_examples" compatibility="8.2.001" expanded="true" height="103" name="Filter Examples" width="90" x="313" y="34">
            <parameter key="invert_filter" value="true"/>
            <list key="filters_list">
              <parameter key="filters_entry_key" value="Text.contains.RT"/>
            </list>
          </operator>
          <operator activated="true" class="set_role" compatibility="8.2.001" expanded="true" height="82" name="Set Role" width="90" x="447" y="34">
            <parameter key="attribute_name" value="label"/>
            <parameter key="target_role" value="label"/>
            <list key="set_additional_roles"/>
            <description align="center" color="transparent" colored="false" width="126">Set Role for Label</description>
          </operator>
          <operator activated="true" class="select_attributes" compatibility="8.2.001" expanded="true" height="82" name="Select Attributes" width="90" x="581" y="34">
            <parameter key="attribute_filter_type" value="subset"/>
            <parameter key="attributes" value="Text|label"/>
            <parameter key="include_special_attributes" value="true"/>
          </operator>
          <operator activated="true" class="nominal_to_text" compatibility="8.2.001" expanded="true" height="82" name="Nominal to Text" width="90" x="715" y="34">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="Text"/>
          </operator>
          <operator activated="true" class="extract_macro" compatibility="8.2.001" expanded="true" height="68" name="Extract Macro (3)" width="90" x="849" y="34">
            <parameter key="macro" value="label_count"/>
            <parameter key="macro_type" value="statistics"/>
            <parameter key="statistics" value="count"/>
            <parameter key="attribute_name" value="label"/>
            <parameter key="attribute_value" value="Important"/>
            <list key="additional_macros"/>
          </operator>
          <connect from_port="in 1" to_op="Remove Duplicates" to_port="example set input"/>
          <connect from_op="Remove Duplicates" from_port="example set output" to_op="Generate Arbitrary Label" to_port="example set input"/>
          <connect from_op="Generate Arbitrary Label" from_port="example set output" to_op="Set Role" to_port="example set input"/>
          <connect from_op="Set Role" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
          <connect from_op="Select Attributes" from_port="example set output" to_op="Nominal to Text" to_port="example set input"/>
          <connect from_op="Nominal to Text" from_port="example set output" to_op="Extract Macro (3)" to_port="example set"/>
          <connect from_op="Extract Macro (3)" from_port="example set" to_port="out 1"/>
          <portSpacing port="source_in 1" spacing="0"/>
          <portSpacing port="source_in 2" spacing="0"/>
          <portSpacing port="sink_out 1" spacing="0"/>
          <portSpacing port="sink_out 2" spacing="0"/>
        </process>
        <description align="center" color="transparent" colored="false" width="126">Binning for Label subprocess - suspect</description>
      </operator>
      <operator activated="true" class="text:process_document_from_data" compatibility="8.1.000" expanded="true" height="82" name="Process Documents from Data" width="90" x="313" y="34">
        <parameter key="prune_method" value="percentual"/>
        <parameter key="prune_below_percent" value="5.0"/>
        <parameter key="prune_above_percent" value="50.0"/>
        <parameter key="prune_below_absolute" value="100"/>
        <parameter key="prune_above_absolute" value="500"/>
        <list key="specify_weights"/>
        <process expanded="true">
          <operator activated="true" class="text:extract_information" compatibility="8.1.000" expanded="true" height="68" name="Extract Links for later use" width="90" x="45" y="34">
            <parameter key="query_type" value="Regular Expression"/>
            <list key="string_machting_queries"/>
            <list key="regular_expression_queries">
              <parameter key="Tweet Links" value="http.*"/>
            </list>
            <list key="regular_region_queries"/>
            <list key="xpath_queries"/>
            <list key="namespaces"/>
            <list key="index_queries"/>
            <list key="jsonpath_queries"/>
          </operator>
          <operator activated="true" class="text:replace_tokens" compatibility="8.1.000" expanded="true" height="68" name="Replace http links" width="90" x="179" y="34">
            <list key="replace_dictionary">
              <parameter key="http.*" value="link"/>
            </list>
          </operator>
          <operator activated="true" class="text:tokenize" compatibility="8.1.000" expanded="true" height="68" name="Tokenize" width="90" x="313" y="34">
            <parameter key="mode" value="specify characters"/>
            <parameter key="characters" value=" .!;:[,' ?]"/>
          </operator>
          <operator activated="true" class="text:transform_cases" compatibility="8.1.000" expanded="true" height="68" name="Transform Cases" width="90" x="447" y="34"/>
          <operator activated="true" class="text:filter_by_length" compatibility="8.1.000" expanded="true" height="68" name="Filter Tokens (by Length)" width="90" x="581" y="34"/>
          <operator activated="true" class="text:generate_n_grams_terms" compatibility="8.1.000" expanded="true" height="68" name="Generate n-Grams (Terms)" width="90" x="715" y="34"/>
          <operator activated="true" class="text:filter_tokens_by_content" compatibility="8.1.000" expanded="true" height="68" name="Filter Tokens (by Content)" width="90" x="849" y="34">
            <parameter key="string" value="link"/>
            <parameter key="invert condition" value="true"/>
          </operator>
          <operator activated="true" class="text:filter_stopwords_english" compatibility="8.1.000" expanded="true" height="68" name="Filter Stopwords (English)" width="90" x="983" y="34"/>
          <connect from_port="document" to_op="Extract Links for later use" to_port="document"/>
          <connect from_op="Extract Links for later use" from_port="document" to_op="Replace http links" to_port="document"/>
          <connect from_op="Replace http links" from_port="document" to_op="Tokenize" to_port="document"/>
          <connect from_op="Tokenize" from_port="document" to_op="Transform Cases" to_port="document"/>
          <connect from_op="Transform Cases" from_port="document" to_op="Filter Tokens (by Length)" to_port="document"/>
          <connect from_op="Filter Tokens (by Length)" from_port="document" to_op="Generate n-Grams (Terms)" to_port="document"/>
          <connect from_op="Generate n-Grams (Terms)" from_port="document" to_op="Filter Tokens (by Content)" to_port="document"/>
          <connect from_op="Filter Tokens (by Content)" from_port="document" to_op="Filter Stopwords (English)" to_port="document"/>
          <connect from_op="Filter Stopwords (English)" from_port="document" to_port="document 1"/>
          <portSpacing port="source_document" spacing="0"/>
          <portSpacing port="sink_document 1" spacing="0"/>
          <portSpacing port="sink_document 2" spacing="0"/>
        </process>
      </operator>
      <operator activated="true" class="multiply" compatibility="8.2.001" expanded="true" height="103" name="Multiply" width="90" x="447" y="34"/>
      <operator activated="true" class="subprocess" compatibility="8.2.001" expanded="true" height="103" name="Clustering Stuff" width="90" x="581" y="34">
        <process expanded="true">
          <operator activated="true" class="select_attributes" compatibility="8.2.001" expanded="true" height="82" name="Remove Tweet Links" width="90" x="45" y="34">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="Tweet Links"/>
            <parameter key="attributes" value="Tweet Links"/>
            <parameter key="invert_selection" value="true"/>
          </operator>
          <operator activated="true" class="x_means" compatibility="7.5.003" expanded="true" height="82" name="X-Means" width="90" x="179" y="34">
            <parameter key="measure_types" value="BregmanDivergences"/>
            <parameter key="divergence" value="SquaredEuclideanDistance"/>
          </operator>
          <operator activated="true" class="extract_prototypes" compatibility="8.2.001" expanded="true" height="82" name="Extract Cluster Prototypes" width="90" x="313" y="136"/>
          <operator activated="false" class="store" compatibility="8.2.001" expanded="true" height="68" name="Store Cluster Model" width="90" x="447" y="34">
            <parameter key="repository_entry" value="../results/%{keyword1} Twitter Content Cluster Model"/>
          </operator>
          <connect from_port="in 1" to_op="Remove Tweet Links" to_port="example set input"/>
          <connect from_op="Remove Tweet Links" from_port="example set output" to_op="X-Means" to_port="example set"/>
          <connect from_op="X-Means" from_port="cluster model" to_op="Extract Cluster Prototypes" to_port="model"/>
          <connect from_op="Extract Cluster Prototypes" from_port="example set" to_port="out 1"/>
          <connect from_op="Extract Cluster Prototypes" from_port="model" to_port="out 2"/>
          <portSpacing port="source_in 1" spacing="0"/>
          <portSpacing port="source_in 2" spacing="0"/>
          <portSpacing port="sink_out 1" spacing="0"/>
          <portSpacing port="sink_out 2" spacing="0"/>
          <portSpacing port="sink_out 3" spacing="0"/>
        </process>
      </operator>
      <operator activated="false" class="store" compatibility="8.2.001" expanded="true" height="68" name="Store WordList" width="90" x="447" y="289">
        <parameter key="repository_entry" value="../results/%{keyword1} Twitter Content Ideas Wordlist"/>
      </operator>
      <operator activated="true" class="text:wordlist_to_data" compatibility="8.1.000" expanded="true" height="82" name="WordList to Data" width="90" x="581" y="289"/>
      <operator activated="true" class="sort" compatibility="8.2.001" expanded="true" height="82" name="Sort" width="90" x="715" y="289">
        <parameter key="attribute_name" value="total"/>
        <parameter key="sorting_direction" value="decreasing"/>
      </operator>
      <operator activated="true" class="select_attributes" compatibility="8.2.001" expanded="true" height="82" name="Remove Tweet Links (2)" width="90" x="581" y="187">
        <parameter key="attribute_filter_type" value="single"/>
        <parameter key="attribute" value="Tweet Links"/>
        <parameter key="attributes" value="Tweet Links"/>
        <parameter key="invert_selection" value="true"/>
      </operator>
      <operator activated="true" class="subprocess" compatibility="8.2.001" expanded="true" height="82" name="Determine Influence Factors" width="90" x="715" y="187">
        <process expanded="true">
          <operator activated="true" class="weight_by_correlation" compatibility="8.2.001" expanded="true" height="82" name="Weight by Correlation" width="90" x="45" y="34"/>
          <operator activated="true" class="weights_to_data" compatibility="8.2.001" expanded="true" height="68" name="Weights to Data" width="90" x="179" y="34"/>
          <operator activated="true" class="generate_attributes" compatibility="6.4.000" expanded="true" height="82" name="Generate Attributes (2)" width="90" x="313" y="34">
            <list key="function_descriptions">
              <parameter key="Method" value="&quot;Correlation&quot;"/>
            </list>
          </operator>
          <operator activated="true" class="weight_by_gini_index" compatibility="8.2.001" expanded="true" height="82" name="Weight by Gini Index" width="90" x="45" y="120"/>
          <operator activated="true" class="weight_by_information_gain" compatibility="8.2.001" expanded="true" height="82" name="Weight by Information Gain" width="90" x="45" y="210"/>
          <operator activated="true" class="weight_by_information_gain_ratio" compatibility="8.2.001" expanded="true" height="82" name="Weight by Information Gain Ratio" width="90" x="45" y="300"/>
          <operator activated="true" class="weights_to_data" compatibility="8.2.001" expanded="true" height="68" name="Weights to Data (2)" width="90" x="179" y="120"/>
          <operator activated="true" class="generate_attributes" compatibility="6.4.000" expanded="true" height="82" name="Generate Attributes (3)" width="90" x="313" y="120">
            <list key="function_descriptions">
              <parameter key="Method" value="&quot;Gini&quot;"/>
            </list>
          </operator>
          <operator activated="true" class="weights_to_data" compatibility="8.2.001" expanded="true" height="68" name="Weights to Data (3)" width="90" x="179" y="210"/>
          <operator activated="true" class="generate_attributes" compatibility="6.4.000" expanded="true" height="82" name="Generate Attributes (4)" width="90" x="313" y="210">
            <list key="function_descriptions">
              <parameter key="Method" value="&quot;InfoGain&quot;"/>
            </list>
          </operator>
          <operator activated="true" class="weights_to_data" compatibility="8.2.001" expanded="true" height="68" name="Weights to Data (4)" width="90" x="179" y="300"/>
          <operator activated="true" class="generate_attributes" compatibility="6.4.000" expanded="true" height="82" name="Generate Attributes (5)" width="90" x="313" y="300">
            <list key="function_descriptions">
              <parameter key="Method" value="&quot;InfoGainRatio&quot;"/>
            </list>
          </operator>
          <operator activated="true" class="append" compatibility="8.2.001" expanded="true" height="145" name="Append" width="90" x="447" y="30"/>
          <operator activated="true" class="pivot" compatibility="8.2.001" expanded="true" height="82" name="Pivot" width="90" x="581" y="30">
            <parameter key="group_attribute" value="Attribute"/>
            <parameter key="index_attribute" value="Method"/>
          </operator>
          <operator activated="true" class="generate_aggregation" compatibility="6.5.002" expanded="true" height="82" name="Generate Aggregation" width="90" x="715" y="30">
            <parameter key="attribute_name" value="Importance"/>
            <parameter key="attribute_filter_type" value="value_type"/>
            <parameter key="value_type" value="numeric"/>
            <parameter key="aggregation_function" value="average"/>
          </operator>
          <operator activated="true" class="normalize" compatibility="7.5.003" expanded="true" height="103" name="Normalize" width="90" x="849" y="30">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="Importance"/>
            <parameter key="method" value="range transformation"/>
          </operator>
          <operator activated="true" class="sort" compatibility="8.2.001" expanded="true" height="82" name="Sort again" width="90" x="983" y="34">
            <parameter key="attribute_name" value="Importance"/>
            <parameter key="sorting_direction" value="decreasing"/>
          </operator>
          <operator activated="true" class="order_attributes" compatibility="8.2.001" expanded="true" height="82" name="Reorder Attributes" width="90" x="1117" y="34">
            <parameter key="attribute_ordering" value="Attribute|Importance"/>
            <parameter key="handle_unmatched" value="remove"/>
          </operator>
          <operator activated="true" class="filter_example_range" compatibility="8.2.001" expanded="true" height="82" name="Select Top 20" width="90" x="1251" y="34">
            <parameter key="first_example" value="1"/>
            <parameter key="last_example" value="20"/>
          </operator>
          <connect from_port="in 1" to_op="Weight by Correlation" to_port="example set"/>
          <connect from_op="Weight by Correlation" from_port="weights" to_op="Weights to Data" to_port="attribute weights"/>
          <connect from_op="Weight by Correlation" from_port="example set" to_op="Weight by Gini Index" to_port="example set"/>
          <connect from_op="Weights to Data" from_port="example set" to_op="Generate Attributes (2)" to_port="example set input"/>
          <connect from_op="Generate Attributes (2)" from_port="example set output" to_op="Append" to_port="example set 1"/>
          <connect from_op="Weight by Gini Index" from_port="weights" to_op="Weights to Data (2)" to_port="attribute weights"/>
          <connect from_op="Weight by Gini Index" from_port="example set" to_op="Weight by Information Gain" to_port="example set"/>
          <connect from_op="Weight by Information Gain" from_port="weights" to_op="Weights to Data (3)" to_port="attribute weights"/>
          <connect from_op="Weight by Information Gain" from_port="example set" to_op="Weight by Information Gain Ratio" to_port="example set"/>
          <connect from_op="Weight by Information Gain Ratio" from_port="weights" to_op="Weights to Data (4)" to_port="attribute weights"/>
          <connect from_op="Weights to Data (2)" from_port="example set" to_op="Generate Attributes (3)" to_port="example set input"/>
          <connect from_op="Generate Attributes (3)" from_port="example set output" to_op="Append" to_port="example set 2"/>
          <connect from_op="Weights to Data (3)" from_port="example set" to_op="Generate Attributes (4)" to_port="example set input"/>
          <connect from_op="Generate Attributes (4)" from_port="example set output" to_op="Append" to_port="example set 3"/>
          <connect from_op="Weights to Data (4)" from_port="example set" to_op="Generate Attributes (5)" to_port="example set input"/>
          <connect from_op="Generate Attributes (5)" from_port="example set output" to_op="Append" to_port="example set 4"/>
          <connect from_op="Append" from_port="merged set" to_op="Pivot" to_port="example set input"/>
          <connect from_op="Pivot" from_port="example set output" to_op="Generate Aggregation" to_port="example set input"/>
          <connect from_op="Generate Aggregation" from_port="example set output" to_op="Normalize" to_port="example set input"/>
          <connect from_op="Normalize" from_port="example set output" to_op="Sort again" to_port="example set input"/>
          <connect from_op="Sort again" from_port="example set output" to_op="Reorder Attributes" to_port="example set input"/>
          <connect from_op="Reorder Attributes" from_port="example set output" to_op="Select Top 20" to_port="example set input"/>
          <connect from_op="Select Top 20" from_port="example set output" to_port="out 1"/>
          <portSpacing port="source_in 1" spacing="0"/>
          <portSpacing port="source_in 2" spacing="0"/>
          <portSpacing port="sink_out 1" spacing="0"/>
          <portSpacing port="sink_out 2" spacing="0"/>
        </process>
      </operator>
      <operator activated="false" class="store" compatibility="8.2.001" expanded="true" height="68" name="Store Influence Wrds" width="90" x="849" y="187">
        <parameter key="repository_entry" value="../results/%{keyword1} Twitter Content Influence Words"/>
      </operator>
      <operator activated="false" class="write_excel" compatibility="8.2.001" expanded="true" height="82" name="Write Important Words" width="90" x="983" y="187">
        <parameter key="excel_file" value="C:\Users\Thomas Ott\Dropbox\Twitter Influencers\%{keyword1} Todays Powerful Words to use in your Tweets.xlsx"/>
      </operator>
      <connect from_op="Retrieve Twitter Data" from_port="out 1" to_op="ETL Subprocess" to_port="in 1"/>
      <connect from_op="ETL Subprocess" from_port="out 1" to_op="Process Documents from Data" to_port="example set"/>
      <connect from_op="Process Documents from Data" from_port="example set" to_op="Multiply" to_port="input"/>
      <connect from_op="Process Documents from Data" from_port="word list" to_op="WordList to Data" to_port="word list"/>
      <connect from_op="Multiply" from_port="output 1" to_op="Clustering Stuff" to_port="in 1"/>
      <connect from_op="Multiply" from_port="output 2" to_op="Remove Tweet Links (2)" to_port="example set input"/>
      <connect from_op="Clustering Stuff" from_port="out 1" to_port="result 1"/>
      <connect from_op="Clustering Stuff" from_port="out 2" to_port="result 2"/>
      <connect from_op="WordList to Data" from_port="example set" to_op="Sort" to_port="example set input"/>
      <connect from_op="Sort" from_port="example set output" to_port="result 4"/>
      <connect from_op="Remove Tweet Links (2)" from_port="example set output" to_op="Determine Influence Factors" to_port="in 1"/>
      <connect from_op="Determine Influence Factors" from_port="out 1" to_port="result 3"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="63"/>
      <portSpacing port="sink_result 3" spacing="126"/>
      <portSpacing port="sink_result 4" spacing="84"/>
      <portSpacing port="sink_result 5" spacing="0"/>
    </process>
  </operator>
</process>

sgenzer · August 2018

hello @alinebora - hmm the first thing I see is that your "user_id" attribute has some funky characters in it. Is this intended? It would not shock me if this is messing things up...

Screen Shot 2018-08-14 at 2.43.52 PM.png

Second I just want to make sure that what you're doing makes sense: predicting the user_id based on the word vector of the tweets.

I can't test your process without your data set (can you post? it would be MUCH easier) but made some tweaks to your process here. Note I used the </> button to embed the XML in this post

<?xml version="1.0" encoding="UTF-8"?><process version="9.0.001">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="9.0.001" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="text:create_document" compatibility="7.5.000" expanded="true" height="68" name="Create Document" width="90" x="112" y="340">
        <parameter key="text" value="My flight has been cancelled&#10;I'm very tired because nobody is giving information&#10;We have been waiting at the airport for hours&#10;I don't want to fly in this company ever again&#10;I have a big problem with this cancellation&#10;The service is horrible, nobody gives an explanation&#10;I'm going on a business trip&#10;I'm going on vacation&#10;Travelling with family&#10;Flight is delayed"/>
      </operator>
      <operator activated="true" class="read_csv" compatibility="9.0.001" expanded="true" height="68" name="Read CSV" width="90" x="45" y="34">
        <parameter key="csv_file" value="C:\Users\aline\OneDrive\Documentos\AlineXX.csv"/>
        <parameter key="column_separators" value=","/>
        <parameter key="skip_comments" value="true"/>
        <parameter key="date_format" value="dd/MM/yyyy HH:mm"/>
        <list key="annotations"/>
        <parameter key="encoding" value="windows-1252"/>
        <list key="data_set_meta_data_information">
          <parameter key="0" value="user_id.true.polynominal.attribute"/>
          <parameter key="1" value="status_id.true.polynominal.attribute"/>
          <parameter key="2" value="created_at.true.polynominal.attribute"/>
          <parameter key="3" value="screen_name.true.polynominal.attribute"/>
          <parameter key="4" value="text.true.polynominal.attribute"/>
          <parameter key="5" value="source.true.polynominal.attribute"/>
          <parameter key="6" value="display_text_width.true.polynominal.attribute"/>
          <parameter key="7" value="reply_to_status_id.true.polynominal.attribute"/>
          <parameter key="8" value="reply_to_user_id.true.polynominal.attribute"/>
          <parameter key="9" value="reply_to_screen_name.true.polynominal.attribute"/>
          <parameter key="10" value="is_quote.true.polynominal.attribute"/>
          <parameter key="11" value="is_retweet.true.polynominal.attribute"/>
          <parameter key="12" value="favorite_count.true.polynominal.attribute"/>
          <parameter key="13" value="retweet_count.true.polynominal.attribute"/>
          <parameter key="14" value="hashtags.true.polynominal.attribute"/>
          <parameter key="15" value="symbols.true.polynominal.attribute"/>
          <parameter key="16" value="urls_url.true.polynominal.attribute"/>
          <parameter key="17" value="urls_t\.co.true.polynominal.attribute"/>
          <parameter key="18" value="urls_expanded_url.true.polynominal.attribute"/>
          <parameter key="19" value="media_url.true.polynominal.attribute"/>
          <parameter key="20" value="media_t\.co.true.polynominal.attribute"/>
          <parameter key="21" value="media_expanded_url.true.polynominal.attribute"/>
          <parameter key="22" value="media_type.true.polynominal.attribute"/>
          <parameter key="23" value="ext_media_url.true.polynominal.attribute"/>
          <parameter key="24" value="ext_media_t\.co.true.polynominal.attribute"/>
          <parameter key="25" value="ext_media_expanded_url.true.polynominal.attribute"/>
          <parameter key="26" value="ext_media_type.true.polynominal.attribute"/>
          <parameter key="27" value="mentions_user_id.true.polynominal.attribute"/>
          <parameter key="28" value="mentions_screen_name.true.polynominal.attribute"/>
          <parameter key="29" value="lang.true.polynominal.attribute"/>
          <parameter key="30" value="quoted_status_id.true.polynominal.attribute"/>
          <parameter key="31" value="quoted_text.true.polynominal.attribute"/>
          <parameter key="32" value="quoted_created_at.true.polynominal.attribute"/>
          <parameter key="33" value="quoted_source.true.polynominal.attribute"/>
          <parameter key="34" value="quoted_favorite_count.true.polynominal.attribute"/>
          <parameter key="35" value="quoted_retweet_count.true.polynominal.attribute"/>
          <parameter key="36" value="quoted_user_id.true.polynominal.attribute"/>
          <parameter key="37" value="quoted_screen_name.true.polynominal.attribute"/>
          <parameter key="38" value="quoted_name.true.polynominal.attribute"/>
          <parameter key="39" value="quoted_followers_count.true.polynominal.attribute"/>
          <parameter key="40" value="quoted_friends_count.true.polynominal.attribute"/>
          <parameter key="41" value="quoted_statuses_count.true.polynominal.attribute"/>
          <parameter key="42" value="quoted_location.true.polynominal.attribute"/>
          <parameter key="43" value="quoted_description.true.polynominal.attribute"/>
          <parameter key="44" value="quoted_verified.true.polynominal.attribute"/>
          <parameter key="45" value="retweet_status_id.true.polynominal.attribute"/>
          <parameter key="46" value="retweet_text.true.polynominal.attribute"/>
          <parameter key="47" value="retweet_created_at.true.polynominal.attribute"/>
          <parameter key="48" value="retweet_source.true.polynominal.attribute"/>
          <parameter key="49" value="retweet_favorite_count.true.polynominal.attribute"/>
          <parameter key="50" value="retweet_retweet_count.true.polynominal.attribute"/>
          <parameter key="51" value="retweet_user_id.true.polynominal.attribute"/>
          <parameter key="52" value="retweet_screen_name.true.polynominal.attribute"/>
          <parameter key="53" value="retweet_name.true.polynominal.attribute"/>
          <parameter key="54" value="retweet_followers_count.true.polynominal.attribute"/>
          <parameter key="55" value="retweet_friends_count.true.polynominal.attribute"/>
          <parameter key="56" value="retweet_statuses_count.true.polynominal.attribute"/>
          <parameter key="57" value="retweet_location.true.polynominal.attribute"/>
          <parameter key="58" value="retweet_description.true.polynominal.attribute"/>
          <parameter key="59" value="retweet_verified.true.polynominal.attribute"/>
          <parameter key="60" value="place_url.true.polynominal.attribute"/>
          <parameter key="61" value="place_name.true.polynominal.attribute"/>
          <parameter key="62" value="place_full_name.true.polynominal.attribute"/>
          <parameter key="63" value="place_type.true.polynominal.attribute"/>
          <parameter key="64" value="country.true.polynominal.attribute"/>
          <parameter key="65" value="country_code.true.polynominal.attribute"/>
          <parameter key="66" value="geo_coords.true.polynominal.attribute"/>
          <parameter key="67" value="coords_coords.true.polynominal.attribute"/>
          <parameter key="68" value="bbox_coords.true.polynominal.attribute"/>
          <parameter key="69" value="status_url.true.polynominal.attribute"/>
          <parameter key="70" value="name.true.polynominal.attribute"/>
          <parameter key="71" value="location.true.polynominal.attribute"/>
          <parameter key="72" value="description.true.polynominal.attribute"/>
          <parameter key="73" value="url.true.polynominal.attribute"/>
          <parameter key="74" value="protected.true.polynominal.attribute"/>
          <parameter key="75" value="followers_count.true.integer.attribute"/>
          <parameter key="76" value="friends_count.true.polynominal.attribute"/>
          <parameter key="77" value="listed_count.true.polynominal.attribute"/>
          <parameter key="78" value="statuses_count.true.polynominal.attribute"/>
          <parameter key="79" value="favourites_count.true.polynominal.attribute"/>
          <parameter key="80" value="account_created_at.true.polynominal.attribute"/>
          <parameter key="81" value="verified.true.polynominal.attribute"/>
          <parameter key="82" value="profile_url.true.polynominal.attribute"/>
          <parameter key="83" value="profile_expanded_url.true.polynominal.attribute"/>
          <parameter key="84" value="account_lang.true.polynominal.attribute"/>
          <parameter key="85" value="profile_banner_url.true.polynominal.attribute"/>
          <parameter key="86" value="profile_background_url.true.polynominal.attribute"/>
          <parameter key="87" value="profile_image_url.true.polynominal.attribute"/>
          <parameter key="88" value="att89.true.polynominal.attribute"/>
          <parameter key="89" value="att90.true.polynominal.attribute"/>
        </list>
      </operator>
      <operator activated="true" class="set_role" compatibility="9.0.001" expanded="true" height="82" name="Set Role" width="90" x="45" y="136">
        <parameter key="attribute_name" value="text"/>
        <parameter key="target_role" value="label"/>
        <list key="set_additional_roles">
          <parameter key="user_id" value="id"/>
        </list>
      </operator>
      <operator activated="true" class="filter_examples" compatibility="9.0.001" expanded="true" height="103" name="Filter Examples" width="90" x="179" y="34">
        <list key="filters_list">
          <parameter key="filters_entry_key" value="text.is_not_missing."/>
          <parameter key="filters_entry_key" value="text.contains.strike"/>
        </list>
      </operator>
      <operator activated="true" class="nominal_to_text" compatibility="9.0.001" expanded="true" height="82" name="Nominal to Text" width="90" x="179" y="187">
        <parameter key="attribute_filter_type" value="subset"/>
        <parameter key="attributes" value="text"/>
        <parameter key="include_special_attributes" value="true"/>
      </operator>
      <operator activated="true" class="text:process_document_from_data" compatibility="7.5.000" expanded="true" height="82" name="Process Documents from Data" width="90" x="313" y="85">
        <parameter key="select_attributes_and_weights" value="true"/>
        <list key="specify_weights">
          <parameter key="text" value="1.0"/>
        </list>
        <process expanded="true">
          <operator activated="true" class="text:tokenize" compatibility="7.5.000" expanded="true" height="68" name="Tokenize" width="90" x="112" y="34"/>
          <operator activated="true" class="text:transform_cases" compatibility="7.5.000" expanded="true" height="68" name="Transform Cases" width="90" x="246" y="34"/>
          <operator activated="true" class="text:filter_stopwords_english" compatibility="7.5.000" expanded="true" height="68" name="Filter Stopwords (English)" width="90" x="380" y="34"/>
          <connect from_port="document" to_op="Tokenize" to_port="document"/>
          <connect from_op="Tokenize" from_port="document" to_op="Transform Cases" to_port="document"/>
          <connect from_op="Transform Cases" from_port="document" to_op="Filter Stopwords (English)" to_port="document"/>
          <connect from_op="Filter Stopwords (English)" from_port="document" to_port="document 1"/>
          <portSpacing port="source_document" spacing="0"/>
          <portSpacing port="sink_document 1" spacing="0"/>
          <portSpacing port="sink_document 2" spacing="0"/>
        </process>
      </operator>
      <operator activated="true" class="text:process_documents" compatibility="7.5.000" expanded="true" height="103" name="Process Documents" width="90" x="380" y="340">
        <process expanded="true">
          <operator activated="true" class="text:tokenize" compatibility="7.5.000" expanded="true" height="68" name="Tokenize (2)" width="90" x="112" y="34"/>
          <operator activated="true" class="text:transform_cases" compatibility="7.5.000" expanded="true" height="68" name="Transform Cases (2)" width="90" x="246" y="34"/>
          <operator activated="true" class="text:filter_stopwords_english" compatibility="7.5.000" expanded="true" height="68" name="Filter Stopwords (2)" width="90" x="380" y="34"/>
          <connect from_port="document" to_op="Tokenize (2)" to_port="document"/>
          <connect from_op="Tokenize (2)" from_port="document" to_op="Transform Cases (2)" to_port="document"/>
          <connect from_op="Transform Cases (2)" from_port="document" to_op="Filter Stopwords (2)" to_port="document"/>
          <connect from_op="Filter Stopwords (2)" from_port="document" to_port="document 1"/>
          <portSpacing port="source_document" spacing="0"/>
          <portSpacing port="sink_document 1" spacing="0"/>
          <portSpacing port="sink_document 2" spacing="0"/>
        </process>
      </operator>
      <operator activated="true" class="generate_id" compatibility="9.0.001" expanded="true" height="82" name="Generate ID" width="90" x="514" y="340"/>
      <operator activated="true" class="rename" compatibility="9.0.001" expanded="true" height="82" name="Rename" width="90" x="648" y="340">
        <parameter key="old_name" value="id"/>
        <parameter key="new_name" value="user_id"/>
        <list key="rename_additional_attributes"/>
      </operator>
      <operator activated="true" class="set_role" compatibility="9.0.001" expanded="true" height="82" name="Set Role (4)" width="90" x="447" y="34">
        <parameter key="attribute_name" value="user_id"/>
        <parameter key="target_role" value="label"/>
        <list key="set_additional_roles"/>
      </operator>
      <operator activated="true" class="concurrency:cross_validation" compatibility="9.0.001" expanded="true" height="145" name="Cross Validation" width="90" x="581" y="34">
        <process expanded="true">
          <operator activated="true" class="naive_bayes" compatibility="9.0.001" expanded="true" height="82" name="Naive Bayes" width="90" x="112" y="34"/>
          <connect from_port="training set" to_op="Naive Bayes" to_port="training set"/>
          <connect from_op="Naive Bayes" from_port="model" to_port="model"/>
          <portSpacing port="source_training set" spacing="0"/>
          <portSpacing port="sink_model" spacing="0"/>
          <portSpacing port="sink_through 1" spacing="0"/>
        </process>
        <process expanded="true">
          <operator activated="true" class="apply_model" compatibility="9.0.001" expanded="true" height="82" name="Apply Model (2)" width="90" x="45" y="34">
            <list key="application_parameters"/>
          </operator>
          <operator activated="true" class="performance" compatibility="9.0.001" expanded="true" height="82" name="Performance" width="90" x="179" y="34"/>
          <connect from_port="model" to_op="Apply Model (2)" to_port="model"/>
          <connect from_port="test set" to_op="Apply Model (2)" to_port="unlabelled data"/>
          <connect from_op="Apply Model (2)" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
          <connect from_op="Performance" from_port="performance" to_port="performance 1"/>
          <portSpacing port="source_model" spacing="0"/>
          <portSpacing port="source_test set" spacing="0"/>
          <portSpacing port="source_through 1" spacing="0"/>
          <portSpacing port="sink_test set results" spacing="0"/>
          <portSpacing port="sink_performance 1" spacing="0"/>
          <portSpacing port="sink_performance 2" spacing="0"/>
        </process>
      </operator>
      <operator activated="true" class="apply_model" compatibility="9.0.001" expanded="true" height="82" name="Apply Model" width="90" x="782" y="289">
        <list key="application_parameters"/>
      </operator>
      <connect from_op="Create Document" from_port="output" to_op="Process Documents" to_port="documents 1"/>
      <connect from_op="Read CSV" from_port="output" to_op="Set Role" to_port="example set input"/>
      <connect from_op="Set Role" from_port="example set output" to_op="Filter Examples" to_port="example set input"/>
      <connect from_op="Filter Examples" from_port="example set output" to_op="Nominal to Text" to_port="example set input"/>
      <connect from_op="Nominal to Text" from_port="example set output" to_op="Process Documents from Data" to_port="example set"/>
      <connect from_op="Process Documents from Data" from_port="example set" to_op="Set Role (4)" to_port="example set input"/>
      <connect from_op="Process Documents from Data" from_port="word list" to_op="Process Documents" to_port="word list"/>
      <connect from_op="Process Documents" from_port="example set" to_op="Generate ID" to_port="example set input"/>
      <connect from_op="Generate ID" from_port="example set output" to_op="Rename" to_port="example set input"/>
      <connect from_op="Rename" from_port="example set output" to_op="Apply Model" to_port="unlabelled data"/>
      <connect from_op="Set Role (4)" from_port="example set output" to_op="Cross Validation" to_port="example set"/>
      <connect from_op="Cross Validation" from_port="model" to_op="Apply Model" to_port="model"/>
      <connect from_op="Cross Validation" from_port="example set" to_port="result 1"/>
      <connect from_op="Apply Model" from_port="labelled data" to_port="result 2"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
      <portSpacing port="sink_result 3" spacing="0"/>
    </process>
  </operator>
</process>

Hope that helps?

Scott

alinebora · August 2018

Dear Scott,

Thanks for your answer. Indeed, this funky attribute I don't know where it came from. So I decided to change some things in my process in order to try to solve this. So I decided to remove all weird attributes manually and retrieved again into rapidminer into a new process.

So far was working ok, but now the SVM operator does not recognize my one single attribute again saying that the example set has no examples. I dont get why this happens since the data is connected. I attached the data with all spreadsheets (but in rapiminer I used the 'Append' operator to bring them all together, I hope to solve this asap.:smileyembarrassed: Thank you

<?xml version="1.0" encoding="UTF-8"?><process version="9.0.000">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="9.0.000" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="retrieve" compatibility="9.0.000" expanded="true" height="68" name="Trained Data" width="90" x="45" y="34">
        <parameter key="repository_entry" value="//AirlineData/RetrieveJa"/>
      </operator>
      <operator activated="true" class="nominal_to_text" compatibility="9.0.000" expanded="true" height="82" name="Nominal to Text" width="90" x="45" y="187"/>
      <operator activated="true" class="generate_attributes" compatibility="9.0.000" expanded="true" height="82" name="Generate Attributes" width="90" x="179" y="187">
        <list key="function_descriptions">
          <parameter key="Text2" value="text"/>
        </list>
      </operator>
      <operator activated="true" class="set_role" compatibility="9.0.000" expanded="true" height="82" name="Set Role" width="90" x="179" y="34">
        <parameter key="attribute_name" value="text"/>
        <parameter key="target_role" value="label"/>
        <list key="set_additional_roles"/>
      </operator>
      <operator activated="true" class="text:process_document_from_data" compatibility="8.1.000" expanded="true" height="82" name="Process Documents from Data" width="90" x="313" y="34">
        <list key="specify_weights"/>
        <process expanded="true">
          <operator activated="true" class="text:tokenize" compatibility="8.1.000" expanded="true" height="68" name="Tokenize" width="90" x="112" y="34"/>
          <operator activated="true" class="text:transform_cases" compatibility="8.1.000" expanded="true" height="68" name="Transform Cases" width="90" x="246" y="34"/>
          <operator activated="true" class="text:filter_stopwords_english" compatibility="8.1.000" expanded="true" height="68" name="Filter Stopwords (English)" width="90" x="380" y="34"/>
          <connect from_port="document" to_op="Tokenize" to_port="document"/>
          <connect from_op="Tokenize" from_port="document" to_op="Transform Cases" to_port="document"/>
          <connect from_op="Transform Cases" from_port="document" to_op="Filter Stopwords (English)" to_port="document"/>
          <connect from_op="Filter Stopwords (English)" from_port="document" to_port="document 1"/>
          <portSpacing port="source_document" spacing="0"/>
          <portSpacing port="sink_document 1" spacing="0"/>
          <portSpacing port="sink_document 2" spacing="0"/>
        </process>
      </operator>
      <operator activated="true" class="set_role" compatibility="9.0.000" expanded="true" height="82" name="Set Role (5)" width="90" x="313" y="187">
        <parameter key="attribute_name" value="Text2"/>
        <parameter key="target_role" value="label"/>
        <list key="set_additional_roles"/>
      </operator>
      <operator activated="true" class="sample" compatibility="9.0.000" expanded="true" height="82" name="Sample" width="90" x="447" y="136">
        <parameter key="sample_size" value="0"/>
        <list key="sample_size_per_class"/>
        <list key="sample_ratio_per_class"/>
        <list key="sample_probability_per_class"/>
      </operator>
      <operator activated="true" class="concurrency:cross_validation" compatibility="9.0.000" expanded="true" height="145" name="Cross Validation" width="90" x="581" y="34">
        <process expanded="true">
          <operator activated="true" class="set_role" compatibility="9.0.000" expanded="true" height="82" name="Set Role (2)" width="90" x="44" y="34">
            <parameter key="attribute_name" value="Text2"/>
            <list key="set_additional_roles"/>
          </operator>
          <operator activated="true" class="set_role" compatibility="9.0.000" expanded="true" height="82" name="Set Role (4)" width="90" x="112" y="136">
            <parameter key="attribute_name" value="Text2"/>
            <parameter key="target_role" value="label"/>
            <list key="set_additional_roles"/>
          </operator>
          <operator activated="true" class="classification_by_regression" compatibility="9.0.000" expanded="true" height="82" name="Classification by Regression" width="90" x="254" y="85">
            <process expanded="true">
              <operator activated="true" class="support_vector_machine" compatibility="9.0.000" expanded="true" height="124" name="SVM" origin="GENERATED_TEMPLATE" width="90" x="246" y="34"/>
              <connect from_port="training set" to_op="SVM" to_port="training set"/>
              <connect from_op="SVM" from_port="model" to_port="model"/>
              <portSpacing port="source_training set" spacing="0"/>
              <portSpacing port="sink_model" spacing="0"/>
            </process>
          </operator>
          <connect from_port="training set" to_op="Set Role (2)" to_port="example set input"/>
          <connect from_op="Set Role (2)" from_port="example set output" to_op="Set Role (4)" to_port="example set input"/>
          <connect from_op="Set Role (4)" from_port="example set output" to_op="Classification by Regression" to_port="training set"/>
          <connect from_op="Classification by Regression" from_port="model" to_port="model"/>
          <portSpacing port="source_training set" spacing="0"/>
          <portSpacing port="sink_model" spacing="0"/>
          <portSpacing port="sink_through 1" spacing="0"/>
        </process>
        <process expanded="true">
          <operator activated="true" class="apply_model" compatibility="9.0.000" expanded="true" height="82" name="Apply Model" width="90" x="45" y="34">
            <list key="application_parameters"/>
          </operator>
          <operator activated="true" class="set_role" compatibility="9.0.000" expanded="true" height="82" name="Set Role (3)" width="90" x="179" y="136">
            <parameter key="attribute_name" value="Text2"/>
            <parameter key="target_role" value="label"/>
            <list key="set_additional_roles"/>
          </operator>
          <operator activated="true" class="performance" compatibility="9.0.000" expanded="true" height="82" name="Performance" width="90" x="246" y="34"/>
          <connect from_port="model" to_op="Apply Model" to_port="model"/>
          <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
          <connect from_op="Apply Model" from_port="labelled data" to_op="Set Role (3)" to_port="example set input"/>
          <connect from_op="Set Role (3)" from_port="example set output" to_op="Performance" to_port="labelled data"/>
          <connect from_op="Performance" from_port="performance" to_port="performance 1"/>
          <portSpacing port="source_model" spacing="0"/>
          <portSpacing port="source_test set" spacing="0"/>
          <portSpacing port="source_through 1" spacing="0"/>
          <portSpacing port="sink_test set results" spacing="0"/>
          <portSpacing port="sink_performance 1" spacing="0"/>
          <portSpacing port="sink_performance 2" spacing="0"/>
        </process>
      </operator>
      <connect from_op="Trained Data" from_port="output" to_op="Nominal to Text" to_port="example set input"/>
      <connect from_op="Nominal to Text" from_port="example set output" to_op="Generate Attributes" to_port="example set input"/>
      <connect from_op="Generate Attributes" from_port="example set output" to_op="Set Role" to_port="example set input"/>
      <connect from_op="Set Role" from_port="example set output" to_op="Process Documents from Data" to_port="example set"/>
      <connect from_op="Process Documents from Data" from_port="example set" to_op="Set Role (5)" to_port="example set input"/>
      <connect from_op="Set Role (5)" from_port="example set output" to_op="Sample" to_port="example set input"/>
      <connect from_op="Sample" from_port="example set output" to_op="Cross Validation" to_port="example set"/>
      <connect from_op="Cross Validation" from_port="model" to_port="result 2"/>
      <connect from_op="Cross Validation" from_port="example set" to_port="result 1"/>
      <connect from_op="Cross Validation" from_port="performance 1" to_port="result 3"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
      <portSpacing port="sink_result 3" spacing="0"/>
      <portSpacing port="sink_result 4" spacing="0"/>
    </process>
  </operator>
</process>

Thomas_Ott · August 2018

@alinebora You do not provide a label. What are you trying to train all these text tweets too? Negative/positive sentiment?

alinebora · August 2018

I'm trying to analyze passengers sentiment in the airline industry and find out what kind of insights can social media text mining from twitter bring;
I cannot focus in only one company, so I selected 6 popular airlines;
Rapidminer Twitter search operator only allows one query to be searched at a time. So I just extracted the data from Twitter Api directly, saved in CSV files and uploaded to Rapidminer;
Then, I used the 'Append' operator to connect them all;
I have provided a label. Set role operator is there. You see... The process was working fine til 'Process documents' operator;
Inside 'Cross validation', SVM and 'Performance' does not recognize my label anymore. Rapidminer suggested me to fix it, so it add another 'set role' operator, which does not help.
I've been trying for days to solve this, but I'm running out of time. What I did then, I opened the extracted CSV from each airline, and as you may know, it's full of useless attributes. So I deleted all of them to see if it worked (This may have been done, but I'm learning, sorry). And the same problem remains.
Should I have kept the spreadsheet as it was? (Full of the other attributes - There were almos 80 funky attributes)

Thank you so much for your time. I truly appreciate, you guys are the only helo I can count on.

I hope to solve this. It's for my thesis, I should've started writing the analysis last week, but I've been having this problems... And I have to deliver August 31st.

Thomas_Ott · August 2018

@alinebora you set the user_id as the label. There is no user_id in the CSV files you attached. Conceptually that will not work either because there are hundreds of users tweeting about some airline that the model can't find any relationships there to the user_id.

From the create document operator I see you entered some negative related tweets. I'm assuming that this is all about sentiment. If that's the case you need to create a training set with a label column that has negative or positive in it. From there you can text process the tweet and the label remains intact. That gets passed to the Cross Validation and from there you can measure your performance. Your process setup would appear to work but you're not feeding the information correctly. That's why nothing happens. That's the flaw.

If sentiment is not what you're after, maybe understanding topics that people are tweeting about, I suggest checking out the LDA operator. I believe it's in the Operator Toolbox extension. Also visit KB here for a tutorial on using the LDA operator.

Thomas_Ott · August 2018

@alinebora

See below a sample process. It uses the Retweet column as the label and trains a GLM on it. For future reference, you can Loop over the Search Twitter operator and load several search terms.

<?xml version="1.0" encoding="UTF-8"?><process version="8.2.001">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="8.2.001" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="social_media:search_twitter" compatibility="8.1.000" expanded="true" height="68" name="Search Twitter" width="90" x="45" y="34">
        <parameter key="connection" value="NewConnection"/>
        <parameter key="query" value="@airfrance"/&gt;
      </operator>
      <operator activated="true" class="set_role" compatibility="8.2.001" expanded="true" height="82" name="Set Role" width="90" x="112" y="136">
        <parameter key="attribute_name" value="Retweet-Count"/>
        <parameter key="target_role" value="label"/>
        <list key="set_additional_roles">
          <parameter key="Id" value="id"/>
        </list>
      </operator>
      <operator activated="true" class="select_attributes" compatibility="8.2.001" expanded="true" height="82" name="Select Attributes" width="90" x="246" y="136">
        <parameter key="attribute_filter_type" value="single"/>
        <parameter key="attribute" value="Text"/>
      </operator>
      <operator activated="true" class="filter_examples" compatibility="8.2.001" expanded="true" height="103" name="Filter Examples" width="90" x="313" y="34">
        <list key="filters_list">
          <parameter key="filters_entry_key" value="Text.is_not_missing."/>
        </list>
      </operator>
      <operator activated="true" class="nominal_to_text" compatibility="8.2.001" expanded="true" height="82" name="Nominal to Text" width="90" x="447" y="34">
        <parameter key="attribute_filter_type" value="subset"/>
        <parameter key="attributes" value="Text"/>
        <parameter key="include_special_attributes" value="true"/>
      </operator>
      <operator activated="true" class="multiply" compatibility="8.2.001" expanded="true" height="103" name="Multiply" width="90" x="581" y="34"/>
      <operator activated="true" class="text:process_document_from_data" compatibility="8.1.000" expanded="true" height="82" name="Process Documents from Data" width="90" x="782" y="34">
        <parameter key="select_attributes_and_weights" value="true"/>
        <list key="specify_weights"/>
        <process expanded="true">
          <operator activated="true" class="text:tokenize" compatibility="8.1.000" expanded="true" height="68" name="Tokenize" width="90" x="112" y="34"/>
          <operator activated="true" class="text:transform_cases" compatibility="8.1.000" expanded="true" height="68" name="Transform Cases" width="90" x="246" y="34"/>
          <operator activated="true" class="text:filter_stopwords_english" compatibility="8.1.000" expanded="true" height="68" name="Filter Stopwords (English)" width="90" x="380" y="34"/>
          <connect from_port="document" to_op="Tokenize" to_port="document"/>
          <connect from_op="Tokenize" from_port="document" to_op="Transform Cases" to_port="document"/>
          <connect from_op="Transform Cases" from_port="document" to_op="Filter Stopwords (English)" to_port="document"/>
          <connect from_op="Filter Stopwords (English)" from_port="document" to_port="document 1"/>
          <portSpacing port="source_document" spacing="0"/>
          <portSpacing port="sink_document 1" spacing="0"/>
          <portSpacing port="sink_document 2" spacing="0"/>
        </process>
      </operator>
      <operator activated="true" class="text:process_document_from_data" compatibility="8.1.000" expanded="true" height="82" name="Process Documents from Data (2)" width="90" x="715" y="187">
        <parameter key="select_attributes_and_weights" value="true"/>
        <list key="specify_weights"/>
        <process expanded="true">
          <operator activated="true" class="text:tokenize" compatibility="8.1.000" expanded="true" height="68" name="Tokenize (2)" width="90" x="112" y="34"/>
          <operator activated="true" class="text:transform_cases" compatibility="8.1.000" expanded="true" height="68" name="Transform Cases (2)" width="90" x="246" y="34"/>
          <operator activated="true" class="text:filter_stopwords_english" compatibility="8.1.000" expanded="true" height="68" name="Filter Stopwords (2)" width="90" x="380" y="34"/>
          <connect from_port="document" to_op="Tokenize (2)" to_port="document"/>
          <connect from_op="Tokenize (2)" from_port="document" to_op="Transform Cases (2)" to_port="document"/>
          <connect from_op="Transform Cases (2)" from_port="document" to_op="Filter Stopwords (2)" to_port="document"/>
          <connect from_op="Filter Stopwords (2)" from_port="document" to_port="document 1"/>
          <portSpacing port="source_document" spacing="0"/>
          <portSpacing port="sink_document 1" spacing="0"/>
          <portSpacing port="sink_document 2" spacing="0"/>
        </process>
      </operator>
      <operator activated="true" class="concurrency:cross_validation" compatibility="8.2.001" expanded="true" height="145" name="Cross Validation" width="90" x="916" y="34">
        <process expanded="true">
          <operator activated="true" class="h2o:generalized_linear_model" compatibility="8.2.000" expanded="true" height="124" name="Generalized Linear Model" width="90" x="246" y="34">
            <list key="beta_constraints"/>
            <list key="expert_parameters"/>
          </operator>
          <connect from_port="training set" to_op="Generalized Linear Model" to_port="training set"/>
          <connect from_op="Generalized Linear Model" from_port="model" to_port="model"/>
          <portSpacing port="source_training set" spacing="0"/>
          <portSpacing port="sink_model" spacing="0"/>
          <portSpacing port="sink_through 1" spacing="0"/>
        </process>
        <process expanded="true">
          <operator activated="true" class="apply_model" compatibility="8.2.001" expanded="true" height="82" name="Apply Model (2)" width="90" x="45" y="34">
            <list key="application_parameters"/>
          </operator>
          <operator activated="true" class="performance" compatibility="8.2.001" expanded="true" height="82" name="Performance" width="90" x="179" y="34"/>
          <connect from_port="model" to_op="Apply Model (2)" to_port="model"/>
          <connect from_port="test set" to_op="Apply Model (2)" to_port="unlabelled data"/>
          <connect from_op="Apply Model (2)" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
          <connect from_op="Performance" from_port="performance" to_port="performance 1"/>
          <portSpacing port="source_model" spacing="0"/>
          <portSpacing port="source_test set" spacing="0"/>
          <portSpacing port="source_through 1" spacing="0"/>
          <portSpacing port="sink_test set results" spacing="0"/>
          <portSpacing port="sink_performance 1" spacing="0"/>
          <portSpacing port="sink_performance 2" spacing="0"/>
        </process>
      </operator>
      <operator activated="true" class="generate_id" compatibility="8.2.001" expanded="true" height="82" name="Generate ID" width="90" x="715" y="340"/>
      <operator activated="true" class="rename" compatibility="8.2.001" expanded="true" height="82" name="Rename" width="90" x="849" y="187">
        <parameter key="old_name" value="id"/>
        <parameter key="new_name" value="user_id"/>
        <list key="rename_additional_attributes"/>
      </operator>
      <operator activated="true" class="apply_model" compatibility="8.2.001" expanded="true" height="82" name="Apply Model" width="90" x="1050" y="187">
        <list key="application_parameters"/>
      </operator>
      <connect from_op="Search Twitter" from_port="output" to_op="Set Role" to_port="example set input"/>
      <connect from_op="Set Role" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
      <connect from_op="Select Attributes" from_port="example set output" to_op="Filter Examples" to_port="example set input"/>
      <connect from_op="Filter Examples" from_port="example set output" to_op="Nominal to Text" to_port="example set input"/>
      <connect from_op="Nominal to Text" from_port="example set output" to_op="Multiply" to_port="input"/>
      <connect from_op="Multiply" from_port="output 1" to_op="Process Documents from Data" to_port="example set"/>
      <connect from_op="Multiply" from_port="output 2" to_op="Process Documents from Data (2)" to_port="example set"/>
      <connect from_op="Process Documents from Data" from_port="example set" to_op="Cross Validation" to_port="example set"/>
      <connect from_op="Process Documents from Data" from_port="word list" to_op="Process Documents from Data (2)" to_port="word list"/>
      <connect from_op="Process Documents from Data (2)" from_port="example set" to_op="Generate ID" to_port="example set input"/>
      <connect from_op="Cross Validation" from_port="model" to_op="Apply Model" to_port="model"/>
      <connect from_op="Cross Validation" from_port="example set" to_port="result 1"/>
      <connect from_op="Cross Validation" from_port="performance 1" to_port="result 3"/>
      <connect from_op="Generate ID" from_port="example set output" to_op="Rename" to_port="example set input"/>
      <connect from_op="Rename" from_port="example set output" to_op="Apply Model" to_port="unlabelled data"/>
      <connect from_op="Apply Model" from_port="labelled data" to_port="result 2"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
      <portSpacing port="sink_result 3" spacing="0"/>
      <portSpacing port="sink_result 4" spacing="0"/>
    </process>
  </operator>
</process>

Thomas_Ott · August 2018

@alinebora additionally, your tokenization parameter set to 'non-letters' will wipe out things like '#airfranceisgreat or '@' symbols, stuff that might be valuable to your model. My suggestion is to look at the process I share here: http://www.neuralmarkettrends.com/use-rapidminer-discover-twitter-content/ and see how I did the tokenization and extraction of Twitter data.

alinebora · August 2018

Dear Thomas,

Thanks for your reply. Is it possible to search for more than one query to analyze in Rapidminer?

Perhaps, if I add the 'Search twitter' operator, run, then store individually doing the following:

1) Search the 5 times (5 different queries) to get the twitters I need;

2) Use the 'Store' operator and retrieve it in my local repository;

3) Start a new process, retrieve the 5 datasets created previously and then use 'Append' operator to merge them altogether.

Will this make it work?

Will this make my analysis possible avoiding the previous errors mentioned?

alinebora · August 2018

@Thomas_Ott I opened your link, but since I'm new at Rapidminer, I don't know how to read and understand a XML process as you posted

Thomas_Ott · August 2018

@alinebora please read this: https://community.rapidminer.com/t5/RapidMiner-Studio-Knowledge-Base/How-can-I-share-processes-without-RapidMiner-Server/ta-p/37047

alinebora · August 2018

@Thomas_Ott I managed to post yout xml process in my process panel, but now I'm getting this error (Please note that I updated each connection individually to validate my credentials clicking in the twitter button and testing connection):

Thomas_Ott · August 2018

@alinebora double check to make sure your connection is correct. You might have to go into each Twitter operator and reselect the connection.

rfuentealba · August 2018

Hi @alinebora,

That error you get might be because @Thomas_Ott is using his own Twitter credentials (consumer and api key) to create this process. To solve that issue, you should set up yours on each Search Twitter operator. To do that, you should go to:

Connections > Manage Connections

It's a pretty straightforward process.

Screen Shot 2018-08-15 at 1.44.43 PM.png

You must then replace the connection set up on the Search Twitter operators by the one you have setup, et voilá.

All the best,

Rodrigo.

alinebora · August 2018

@rfuentealba

I had done that already as you mentioned. Nevertheless, I decided to create a new connection, requested a new pin, updated each twitter search operator one by one, and now I get this error TT

alinebora · August 2018

@rfuentealba @Thomas_Ott

An update of my last XML process...

<?xml version="1.0" encoding="UTF-8"?><process version="9.0.000">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="9.0.000" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="subprocess" compatibility="9.0.000" expanded="true" height="82" name="Retrieve Twitter Data" width="90" x="45" y="34">
        <process expanded="true">
          <operator activated="true" class="set_macros" compatibility="9.0.000" expanded="true" height="68" name="Set Macros" width="90" x="45" y="34">
            <list key="macros">
              <parameter key="keyword1" value="#flight"/>
              <parameter key="keyword2" value="#airlines"/>
              <parameter key="keyword3" value="#airport"/>
              <parameter key="retweetcount" value="0"/>
            </list>
            <description align="center" color="transparent" colored="false" width="126">Set global variables here. Such as keyword search.</description>
          </operator>
          <operator activated="false" class="retrieve" compatibility="9.0.000" expanded="true" height="68" name="Retrieve Twitter Content Ideas" width="90" x="45" y="340">
            <parameter key="repository_entry" value="../data/%{keyword1} Twitter Content Ideas"/>
          </operator>
          <operator activated="true" class="social_media:search_twitter" compatibility="8.0.010" expanded="true" height="68" name="Search Twitter for Keyword3" width="90" x="179" y="238">
            <parameter key="connection" value="AlineConnection"/>
            <parameter key="query" value="airfrance"/>
            <parameter key="limit" value="2000"/>
            <parameter key="language" value="en"/>
          </operator>
          <operator activated="true" class="social_media:search_twitter" compatibility="8.0.010" expanded="true" height="68" name="Search Twitter for Keyword2" width="90" x="179" y="136">
            <parameter key="connection" value="AlineConnection"/>
            <parameter key="query" value="easyjet"/>
            <parameter key="limit" value="2000"/>
            <parameter key="language" value="en"/>
          </operator>
          <operator activated="true" class="social_media:search_twitter" compatibility="8.0.010" expanded="true" height="68" name="Search Twitter for Keyword 1" width="90" x="179" y="34">
            <parameter key="connection" value="AlineConnection"/>
            <parameter key="query" value="ryanair"/>
            <parameter key="limit" value="2000"/>
            <parameter key="language" value="en"/>
          </operator>
          <operator activated="true" class="social_media:search_twitter" compatibility="9.0.000" expanded="true" height="68" name="Search Twitter for Keyword 4" width="90" x="179" y="340">
            <parameter key="connection" value="AlineConnection"/>
            <parameter key="query" value="alitalia"/>
            <parameter key="limit" value="1000"/>
            <parameter key="language" value="en"/>
          </operator>
          <operator activated="true" class="social_media:search_twitter" compatibility="9.0.000" expanded="true" height="68" name="Search Twitter" width="90" x="313" y="340">
            <parameter key="connection" value="AlineConnection"/>
            <parameter key="query" value="klm"/>
            <parameter key="limit" value="1000"/>
          </operator>
          <operator activated="true" class="append" compatibility="9.0.000" expanded="true" height="166" name="Append Data Set together" width="90" x="447" y="34"/>
          <operator activated="true" class="remove_duplicates" compatibility="9.0.000" expanded="true" height="103" name="Remove Duplicate IDs" width="90" x="581" y="34">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="Id"/>
            <parameter key="include_special_attributes" value="true"/>
          </operator>
          <operator activated="true" class="store" compatibility="9.0.000" expanded="true" height="68" name="Store Data for later reuse" width="90" x="715" y="34">
            <parameter key="repository_entry" value="//Local Repository/processes/Thom1"/>
          </operator>
          <connect from_op="Search Twitter for Keyword3" from_port="output" to_op="Append Data Set together" to_port="example set 3"/>
          <connect from_op="Search Twitter for Keyword2" from_port="output" to_op="Append Data Set together" to_port="example set 2"/>
          <connect from_op="Search Twitter for Keyword 1" from_port="output" to_op="Append Data Set together" to_port="example set 1"/>
          <connect from_op="Search Twitter for Keyword 4" from_port="output" to_op="Append Data Set together" to_port="example set 4"/>
          <connect from_op="Search Twitter" from_port="output" to_op="Append Data Set together" to_port="example set 5"/>
          <connect from_op="Append Data Set together" from_port="merged set" to_op="Remove Duplicate IDs" to_port="example set input"/>
          <connect from_op="Remove Duplicate IDs" from_port="example set output" to_op="Store Data for later reuse" to_port="input"/>
          <connect from_op="Store Data for later reuse" from_port="through" to_port="out 1"/>
          <portSpacing port="source_in 1" spacing="0"/>
          <portSpacing port="sink_out 1" spacing="0"/>
          <portSpacing port="sink_out 2" spacing="0"/>
        </process>
        <description align="center" color="transparent" colored="false" width="126">Retrieves Twitter Data, Appends, and Stores</description>
      </operator>
      <operator activated="true" class="subprocess" compatibility="9.0.000" expanded="true" height="82" name="ETL Subprocess" width="90" x="179" y="34">
        <process expanded="true">
          <operator activated="true" class="remove_duplicates" compatibility="9.0.000" expanded="true" height="103" name="Remove Duplicates" width="90" x="45" y="34">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="From-User"/>
            <description align="center" color="transparent" colored="false" width="126">Remove Duplicate Tweets from same user</description>
          </operator>
          <operator activated="true" class="generate_attributes" compatibility="9.0.000" expanded="true" height="82" name="Generate Arbitrary Label" width="90" x="179" y="34">
            <list key="function_descriptions">
              <parameter key="label" value="if([Retweet-Count]&lt;eval(%{retweetcount}),&quot;Not Important&quot;,&quot;Important&quot;)"/>
            </list>
          </operator>
          <operator activated="false" class="filter_examples" compatibility="9.0.000" expanded="true" height="103" name="Filter Examples" width="90" x="313" y="34">
            <parameter key="invert_filter" value="true"/>
            <list key="filters_list">
              <parameter key="filters_entry_key" value="Text.contains.RT"/>
            </list>
          </operator>
          <operator activated="true" class="set_role" compatibility="9.0.000" expanded="true" height="82" name="Set Role" width="90" x="447" y="34">
            <parameter key="attribute_name" value="label"/>
            <parameter key="target_role" value="label"/>
            <list key="set_additional_roles"/>
            <description align="center" color="transparent" colored="false" width="126">Set Role for Label</description>
          </operator>
          <operator activated="true" class="select_attributes" compatibility="9.0.000" expanded="true" height="82" name="Select Attributes" width="90" x="581" y="34">
            <parameter key="attribute_filter_type" value="subset"/>
            <parameter key="attributes" value="Text|label"/>
            <parameter key="include_special_attributes" value="true"/>
          </operator>
          <operator activated="true" class="nominal_to_text" compatibility="9.0.000" expanded="true" height="82" name="Nominal to Text" width="90" x="715" y="34">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="Text"/>
          </operator>
          <operator activated="true" class="extract_macro" compatibility="9.0.000" expanded="true" height="68" name="Extract Macro (3)" width="90" x="849" y="34">
            <parameter key="macro" value="label_count"/>
            <parameter key="macro_type" value="statistics"/>
            <parameter key="statistics" value="count"/>
            <parameter key="attribute_name" value="label"/>
            <parameter key="attribute_value" value="Important"/>
            <list key="additional_macros"/>
          </operator>
          <connect from_port="in 1" to_op="Remove Duplicates" to_port="example set input"/>
          <connect from_op="Remove Duplicates" from_port="example set output" to_op="Generate Arbitrary Label" to_port="example set input"/>
          <connect from_op="Generate Arbitrary Label" from_port="example set output" to_op="Set Role" to_port="example set input"/>
          <connect from_op="Set Role" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
          <connect from_op="Select Attributes" from_port="example set output" to_op="Nominal to Text" to_port="example set input"/>
          <connect from_op="Nominal to Text" from_port="example set output" to_op="Extract Macro (3)" to_port="example set"/>
          <connect from_op="Extract Macro (3)" from_port="example set" to_port="out 1"/>
          <portSpacing port="source_in 1" spacing="0"/>
          <portSpacing port="source_in 2" spacing="0"/>
          <portSpacing port="sink_out 1" spacing="0"/>
          <portSpacing port="sink_out 2" spacing="0"/>
        </process>
        <description align="center" color="transparent" colored="false" width="126">Binning for Label subprocess - suspect</description>
      </operator>
      <operator activated="true" class="text:process_document_from_data" compatibility="8.1.000" expanded="true" height="82" name="Process Documents from Data" width="90" x="313" y="34">
        <parameter key="prune_method" value="percentual"/>
        <parameter key="prune_below_percent" value="5.0"/>
        <parameter key="prune_above_percent" value="50.0"/>
        <parameter key="prune_below_absolute" value="100"/>
        <parameter key="prune_above_absolute" value="500"/>
        <list key="specify_weights"/>
        <process expanded="true">
          <operator activated="true" class="text:extract_information" compatibility="8.1.000" expanded="true" height="68" name="Extract Links for later use" width="90" x="45" y="34">
            <parameter key="query_type" value="Regular Expression"/>
            <list key="string_machting_queries"/>
            <list key="regular_expression_queries">
              <parameter key="Tweet Links" value="http.*"/>
            </list>
            <list key="regular_region_queries"/>
            <list key="xpath_queries"/>
            <list key="namespaces"/>
            <list key="index_queries"/>
            <list key="jsonpath_queries"/>
          </operator>
          <operator activated="true" class="text:replace_tokens" compatibility="8.1.000" expanded="true" height="68" name="Replace http links" width="90" x="179" y="34">
            <list key="replace_dictionary">
              <parameter key="http.*" value="link"/>
            </list>
          </operator>
          <operator activated="true" class="text:tokenize" compatibility="8.1.000" expanded="true" height="68" name="Tokenize" width="90" x="313" y="34">
            <parameter key="mode" value="specify characters"/>
            <parameter key="characters" value=" .!;:[,' ?]"/>
          </operator>
          <operator activated="true" class="text:transform_cases" compatibility="8.1.000" expanded="true" height="68" name="Transform Cases" width="90" x="447" y="34"/>
          <operator activated="true" class="text:filter_by_length" compatibility="8.1.000" expanded="true" height="68" name="Filter Tokens (by Length)" width="90" x="581" y="34"/>
          <operator activated="true" class="text:generate_n_grams_terms" compatibility="8.1.000" expanded="true" height="68" name="Generate n-Grams (Terms)" width="90" x="715" y="34"/>
          <operator activated="true" class="text:filter_tokens_by_content" compatibility="8.1.000" expanded="true" height="68" name="Filter Tokens (by Content)" width="90" x="849" y="34">
            <parameter key="string" value="link"/>
            <parameter key="invert condition" value="true"/>
          </operator>
          <operator activated="true" class="text:filter_stopwords_english" compatibility="8.1.000" expanded="true" height="68" name="Filter Stopwords (English)" width="90" x="983" y="34"/>
          <connect from_port="document" to_op="Extract Links for later use" to_port="document"/>
          <connect from_op="Extract Links for later use" from_port="document" to_op="Replace http links" to_port="document"/>
          <connect from_op="Replace http links" from_port="document" to_op="Tokenize" to_port="document"/>
          <connect from_op="Tokenize" from_port="document" to_op="Transform Cases" to_port="document"/>
          <connect from_op="Transform Cases" from_port="document" to_op="Filter Tokens (by Length)" to_port="document"/>
          <connect from_op="Filter Tokens (by Length)" from_port="document" to_op="Generate n-Grams (Terms)" to_port="document"/>
          <connect from_op="Generate n-Grams (Terms)" from_port="document" to_op="Filter Tokens (by Content)" to_port="document"/>
          <connect from_op="Filter Tokens (by Content)" from_port="document" to_op="Filter Stopwords (English)" to_port="document"/>
          <connect from_op="Filter Stopwords (English)" from_port="document" to_port="document 1"/>
          <portSpacing port="source_document" spacing="0"/>
          <portSpacing port="sink_document 1" spacing="0"/>
          <portSpacing port="sink_document 2" spacing="0"/>
        </process>
      </operator>
      <operator activated="true" class="multiply" compatibility="9.0.000" expanded="true" height="103" name="Multiply" width="90" x="447" y="34"/>
      <operator activated="true" class="subprocess" compatibility="9.0.000" expanded="true" height="103" name="Clustering Stuff" width="90" x="581" y="34">
        <process expanded="true">
          <operator activated="true" class="select_attributes" compatibility="9.0.000" expanded="true" height="82" name="Remove Tweet Links" width="90" x="45" y="34">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="Tweet Links"/>
            <parameter key="attributes" value="Tweet Links"/>
            <parameter key="invert_selection" value="true"/>
          </operator>
          <operator activated="true" class="x_means" compatibility="7.5.003" expanded="true" height="82" name="X-Means" width="90" x="179" y="34">
            <parameter key="measure_types" value="BregmanDivergences"/>
            <parameter key="divergence" value="SquaredEuclideanDistance"/>
          </operator>
          <operator activated="true" class="extract_prototypes" compatibility="9.0.000" expanded="true" height="82" name="Extract Cluster Prototypes" width="90" x="313" y="136"/>
          <operator activated="true" class="store" compatibility="9.0.000" expanded="true" height="68" name="Store Cluster Model" width="90" x="447" y="34">
            <parameter key="repository_entry" value="../results/%{keyword1} Twitter Content Cluster Model"/>
          </operator>
          <connect from_port="in 1" to_op="Remove Tweet Links" to_port="example set input"/>
          <connect from_op="Remove Tweet Links" from_port="example set output" to_op="X-Means" to_port="example set"/>
          <connect from_op="X-Means" from_port="cluster model" to_op="Extract Cluster Prototypes" to_port="model"/>
          <connect from_op="Extract Cluster Prototypes" from_port="example set" to_op="Store Cluster Model" to_port="input"/>
          <connect from_op="Extract Cluster Prototypes" from_port="model" to_port="out 2"/>
          <connect from_op="Store Cluster Model" from_port="through" to_port="out 1"/>
          <portSpacing port="source_in 1" spacing="0"/>
          <portSpacing port="source_in 2" spacing="0"/>
          <portSpacing port="sink_out 1" spacing="0"/>
          <portSpacing port="sink_out 2" spacing="0"/>
          <portSpacing port="sink_out 3" spacing="0"/>
        </process>
      </operator>
      <operator activated="true" class="store" compatibility="9.0.000" expanded="true" height="68" name="Store WordList" width="90" x="447" y="289">
        <parameter key="repository_entry" value="../results/%{keyword1} Twitter Content Ideas Wordlist"/>
      </operator>
      <operator activated="true" class="text:wordlist_to_data" compatibility="8.1.000" expanded="true" height="82" name="WordList to Data" width="90" x="581" y="289"/>
      <operator activated="true" class="sort" compatibility="9.0.000" expanded="true" height="82" name="Sort" width="90" x="715" y="289">
        <parameter key="attribute_name" value="total"/>
        <parameter key="sorting_direction" value="decreasing"/>
      </operator>
      <operator activated="true" class="select_attributes" compatibility="9.0.000" expanded="true" height="82" name="Remove Tweet Links (2)" width="90" x="581" y="187">
        <parameter key="attribute_filter_type" value="single"/>
        <parameter key="attribute" value="Tweet Links"/>
        <parameter key="attributes" value="Tweet Links"/>
        <parameter key="invert_selection" value="true"/>
      </operator>
      <operator activated="true" class="subprocess" compatibility="9.0.000" expanded="true" height="82" name="Determine Influence Factors" width="90" x="715" y="187">
        <process expanded="true">
          <operator activated="true" class="weight_by_correlation" compatibility="9.0.000" expanded="true" height="82" name="Weight by Correlation" width="90" x="45" y="34"/>
          <operator activated="true" class="weights_to_data" compatibility="9.0.000" expanded="true" height="68" name="Weights to Data" width="90" x="179" y="34"/>
          <operator activated="true" class="generate_attributes" compatibility="6.4.000" expanded="true" height="82" name="Generate Attributes (2)" width="90" x="313" y="34">
            <list key="function_descriptions">
              <parameter key="Method" value="&quot;Correlation&quot;"/>
            </list>
          </operator>
          <operator activated="true" class="weight_by_gini_index" compatibility="9.0.000" expanded="true" height="82" name="Weight by Gini Index" width="90" x="45" y="120"/>
          <operator activated="true" class="weight_by_information_gain" compatibility="9.0.000" expanded="true" height="82" name="Weight by Information Gain" width="90" x="45" y="210"/>
          <operator activated="true" class="weight_by_information_gain_ratio" compatibility="9.0.000" expanded="true" height="82" name="Weight by Information Gain Ratio" width="90" x="45" y="300"/>
          <operator activated="true" class="weights_to_data" compatibility="9.0.000" expanded="true" height="68" name="Weights to Data (2)" width="90" x="179" y="120"/>
          <operator activated="true" class="generate_attributes" compatibility="6.4.000" expanded="true" height="82" name="Generate Attributes (3)" width="90" x="313" y="120">
            <list key="function_descriptions">
              <parameter key="Method" value="&quot;Gini&quot;"/>
            </list>
          </operator>
          <operator activated="true" class="weights_to_data" compatibility="9.0.000" expanded="true" height="68" name="Weights to Data (3)" width="90" x="179" y="210"/>
          <operator activated="true" class="generate_attributes" compatibility="6.4.000" expanded="true" height="82" name="Generate Attributes (4)" width="90" x="313" y="210">
            <list key="function_descriptions">
              <parameter key="Method" value="&quot;InfoGain&quot;"/>
            </list>
          </operator>
          <operator activated="true" class="weights_to_data" compatibility="9.0.000" expanded="true" height="68" name="Weights to Data (4)" width="90" x="179" y="300"/>
          <operator activated="true" class="generate_attributes" compatibility="6.4.000" expanded="true" height="82" name="Generate Attributes (5)" width="90" x="313" y="300">
            <list key="function_descriptions">
              <parameter key="Method" value="&quot;InfoGainRatio&quot;"/>
            </list>
          </operator>
          <operator activated="true" class="append" compatibility="9.0.000" expanded="true" height="145" name="Append" width="90" x="447" y="30"/>
          <operator activated="true" class="pivot" compatibility="9.0.000" expanded="true" height="82" name="Pivot" width="90" x="581" y="30">
            <parameter key="group_attribute" value="Attribute"/>
            <parameter key="index_attribute" value="Method"/>
          </operator>
          <operator activated="true" class="generate_aggregation" compatibility="6.5.002" expanded="true" height="82" name="Generate Aggregation" width="90" x="715" y="30">
            <parameter key="attribute_name" value="Importance"/>
            <parameter key="attribute_filter_type" value="value_type"/>
            <parameter key="value_type" value="numeric"/>
            <parameter key="aggregation_function" value="average"/>
          </operator>
          <operator activated="true" class="normalize" compatibility="7.5.003" expanded="true" height="103" name="Normalize" width="90" x="849" y="30">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="Importance"/>
            <parameter key="method" value="range transformation"/>
          </operator>
          <operator activated="true" class="sort" compatibility="9.0.000" expanded="true" height="82" name="Sort again" width="90" x="983" y="34">
            <parameter key="attribute_name" value="Importance"/>
            <parameter key="sorting_direction" value="decreasing"/>
          </operator>
          <operator activated="true" class="order_attributes" compatibility="9.0.000" expanded="true" height="82" name="Reorder Attributes" width="90" x="1117" y="34">
            <parameter key="attribute_ordering" value="Attribute|Importance"/>
            <parameter key="handle_unmatched" value="remove"/>
          </operator>
          <operator activated="true" class="filter_example_range" compatibility="9.0.000" expanded="true" height="82" name="Select Top 20" width="90" x="1251" y="34">
            <parameter key="first_example" value="1"/>
            <parameter key="last_example" value="20"/>
          </operator>
          <connect from_port="in 1" to_op="Weight by Correlation" to_port="example set"/>
          <connect from_op="Weight by Correlation" from_port="weights" to_op="Weights to Data" to_port="attribute weights"/>
          <connect from_op="Weight by Correlation" from_port="example set" to_op="Weight by Gini Index" to_port="example set"/>
          <connect from_op="Weights to Data" from_port="example set" to_op="Generate Attributes (2)" to_port="example set input"/>
          <connect from_op="Generate Attributes (2)" from_port="example set output" to_op="Append" to_port="example set 1"/>
          <connect from_op="Weight by Gini Index" from_port="weights" to_op="Weights to Data (2)" to_port="attribute weights"/>
          <connect from_op="Weight by Gini Index" from_port="example set" to_op="Weight by Information Gain" to_port="example set"/>
          <connect from_op="Weight by Information Gain" from_port="weights" to_op="Weights to Data (3)" to_port="attribute weights"/>
          <connect from_op="Weight by Information Gain" from_port="example set" to_op="Weight by Information Gain Ratio" to_port="example set"/>
          <connect from_op="Weight by Information Gain Ratio" from_port="weights" to_op="Weights to Data (4)" to_port="attribute weights"/>
          <connect from_op="Weights to Data (2)" from_port="example set" to_op="Generate Attributes (3)" to_port="example set input"/>
          <connect from_op="Generate Attributes (3)" from_port="example set output" to_op="Append" to_port="example set 2"/>
          <connect from_op="Weights to Data (3)" from_port="example set" to_op="Generate Attributes (4)" to_port="example set input"/>
          <connect from_op="Generate Attributes (4)" from_port="example set output" to_op="Append" to_port="example set 3"/>
          <connect from_op="Weights to Data (4)" from_port="example set" to_op="Generate Attributes (5)" to_port="example set input"/>
          <connect from_op="Generate Attributes (5)" from_port="example set output" to_op="Append" to_port="example set 4"/>
          <connect from_op="Append" from_port="merged set" to_op="Pivot" to_port="example set input"/>
          <connect from_op="Pivot" from_port="example set output" to_op="Generate Aggregation" to_port="example set input"/>
          <connect from_op="Generate Aggregation" from_port="example set output" to_op="Normalize" to_port="example set input"/>
          <connect from_op="Normalize" from_port="example set output" to_op="Sort again" to_port="example set input"/>
          <connect from_op="Sort again" from_port="example set output" to_op="Reorder Attributes" to_port="example set input"/>
          <connect from_op="Reorder Attributes" from_port="example set output" to_op="Select Top 20" to_port="example set input"/>
          <connect from_op="Select Top 20" from_port="example set output" to_port="out 1"/>
          <portSpacing port="source_in 1" spacing="0"/>
          <portSpacing port="source_in 2" spacing="0"/>
          <portSpacing port="sink_out 1" spacing="0"/>
          <portSpacing port="sink_out 2" spacing="0"/>
        </process>
      </operator>
      <operator activated="true" class="store" compatibility="9.0.000" expanded="true" height="68" name="Store Influence Wrds" width="90" x="849" y="187">
        <parameter key="repository_entry" value="../results/%{keyword1} Twitter Content Influence Words"/>
      </operator>
      <operator activated="true" class="write_excel" compatibility="9.0.000" expanded="true" height="82" name="Write Important Words" width="90" x="983" y="187">
        <parameter key="excel_file" value="C:\Users\Thomas Ott\Dropbox\Twitter Influencers\%{keyword1} Todays Powerful Words to use in your Tweets.xlsx"/>
      </operator>
      <connect from_op="Retrieve Twitter Data" from_port="out 1" to_op="ETL Subprocess" to_port="in 1"/>
      <connect from_op="ETL Subprocess" from_port="out 1" to_op="Process Documents from Data" to_port="example set"/>
      <connect from_op="Process Documents from Data" from_port="example set" to_op="Multiply" to_port="input"/>
      <connect from_op="Process Documents from Data" from_port="word list" to_op="Store WordList" to_port="input"/>
      <connect from_op="Multiply" from_port="output 1" to_op="Clustering Stuff" to_port="in 1"/>
      <connect from_op="Multiply" from_port="output 2" to_op="Remove Tweet Links (2)" to_port="example set input"/>
      <connect from_op="Clustering Stuff" from_port="out 1" to_port="result 1"/>
      <connect from_op="Clustering Stuff" from_port="out 2" to_port="result 2"/>
      <connect from_op="Store WordList" from_port="through" to_op="WordList to Data" to_port="word list"/>
      <connect from_op="WordList to Data" from_port="example set" to_op="Sort" to_port="example set input"/>
      <connect from_op="Sort" from_port="example set output" to_port="result 4"/>
      <connect from_op="Remove Tweet Links (2)" from_port="example set output" to_op="Determine Influence Factors" to_port="in 1"/>
      <connect from_op="Determine Influence Factors" from_port="out 1" to_op="Store Influence Wrds" to_port="input"/>
      <connect from_op="Store Influence Wrds" from_port="through" to_op="Write Important Words" to_port="input"/>
      <connect from_op="Write Important Words" from_port="through" to_port="result 3"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="63"/>
      <portSpacing port="sink_result 3" spacing="126"/>
      <portSpacing port="sink_result 4" spacing="84"/>
      <portSpacing port="sink_result 5" spacing="0"/>
    </process>
  </operator>
</process>

rfuentealba · August 2018

Great job, sensei @Thomas_Ott!

Now, on a side note, I found out that having everything on a single process makes things confusing to some people, including me. What I am sharing here on a zipfile is a RapidMiner Local Repository that can help @alinebora organizing her work step by step, and make it more modular. It is not as advanced as what Thomas did, though. I might, for the sake of science, complete the process and make a new version of it.

@alinebora, I would like to point out that in the case of your research, your professors or someone else might want to take a look at your process, and for that you should share the data you got in the first place. This way to write modular and sequential RapidMiner processes helps you building a process that can read data from CSV instead of doing it directly from Twitter, and reuse everything else.

It is a good idea to categorize your data in staging (or raw data, without processing), parameters (categorical data which is the kind of data you use to perform filtering at some point), facts (data that is already processed that tells the story you want to be told), models (yes, you can store the result of your trained algorithms) and calculations (your data after you apply certain models; you can use this data for further reinforcement learning in other projects). This kind of OCD organization has saved me thousands of times in my research projects.

To use this repository, you have to uncompress it somewhere (the default place for this is the .RapidMiner/repositories directory on your home, but if you want to put it somewhere else, you can) and create a new Local Repository pointing to that folder.

Hope it helps,

sgenzer · August 2018

thank you all for such an amazing RM Community response!

I have posted @Thomas_Ott's solution in the new Community Repo. You can access it directly by clicking on this link.

Scott

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Errors Twitter data, Suddenly Attribute Label Missing, Inside Cross Validation, SVM and Apply Model

Best Answer

Answers