The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

Extract sentiment operator works with french words?

EL75EL75 Member Posts: 43 Contributor II
Hi,
Does someone could tell me if VADER or Wordnet are dealing with french when you select one of them in the "Extract sentiment" operator ?
- The wordnet exist for french (Wolf):http://pauillac.inria.fr/~sagot/index.html#wolf
- VADER also has been transposed: https://github.com/thomas7lieues/vader_FR

But what about the legacy operator of rapid miner? I've seen no way to parameter the operator, neither in the help window...
In case the standard rapid miner operator doesn't woks for french, is there a way to connect rapidminer to the french projects mentioned above? 
thanks.

Best Answers

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,533 RM Data Scientist
    Solution Accepted
    Hi,

    there is something odd with escaping of / and so on, please try this process and adapt the path of read csv in a way that it points to the downloaded version of: https://raw.githubusercontent.com/thomas7lieues/vader_FR/master/vaderSentiment_fr/fr_lexicon.txt
    Best,
    Martin
    <?xml version="1.0" encoding="UTF-8"?><process version="9.8.000">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="9.8.000" expanded="true" name="Process">
        <parameter key="logverbosity" value="init"/>
        <parameter key="random_seed" value="2001"/>
        <parameter key="send_mail" value="never"/>
        <parameter key="notification_email" value=""/>
        <parameter key="process_duration_for_mail" value="30"/>
        <parameter key="encoding" value="SYSTEM"/>
        <process expanded="true">
          <operator activated="true" class="read_csv" compatibility="9.8.000" expanded="true" height="68" name="Read CSV" width="90" x="246" y="85">
            <parameter key="csv_file" value="C:/Users/MartinSchmitz/Downloads/fr_lexicon.txt"/>
            <parameter key="column_separators" value="\t"/>
            <parameter key="trim_lines" value="false"/>
            <parameter key="use_quotes" value="true"/>
            <parameter key="quotes_character" value="&quot;"/>
            <parameter key="escape_character" value="\"/>
            <parameter key="skip_comments" value="false"/>
            <parameter key="comment_characters" value="#"/>
            <parameter key="starting_row" value="1"/>
            <parameter key="parse_numbers" value="true"/>
            <parameter key="decimal_character" value="."/>
            <parameter key="grouped_digits" value="false"/>
            <parameter key="grouping_character" value=","/>
            <parameter key="infinity_representation" value=""/>
            <parameter key="date_format" value=""/>
            <parameter key="first_row_as_names" value="false"/>
            <list key="annotations"/>
            <parameter key="time_zone" value="SYSTEM"/>
            <parameter key="locale" value="English (United States)"/>
            <parameter key="encoding" value="SYSTEM"/>
            <parameter key="read_all_values_as_polynominal" value="false"/>
            <list key="data_set_meta_data_information"/>
            <parameter key="read_not_matching_values_as_missings" value="true"/>
            <parameter key="datamanagement" value="double_array"/>
            <parameter key="data_management" value="auto"/>
          </operator>
          <operator activated="true" class="rename" compatibility="9.8.000" expanded="true" height="82" name="Rename" width="90" x="447" y="85">
            <parameter key="old_name" value="att1"/>
            <parameter key="new_name" value="word"/>
            <list key="rename_additional_attributes">
              <parameter key="att2" value="score"/>
            </list>
          </operator>
          <operator activated="true" class="operator_toolbox:dictionary_sentiment_learner" compatibility="2.8.000-SNAPSHOT" expanded="true" height="103" name="Dictionary-Based Sentiment (Documents)" width="90" x="581" y="85">
            <parameter key="value_attribute" value="score"/>
            <parameter key="key_attribute" value="word"/>
            <parameter key="negation_attribute" value=""/>
            <parameter key="negation_window_size" value="1"/>
            <parameter key="negation_strength" value=""/>
            <parameter key="use_symmetric_negation_window" value="false"/>
            <parameter key="use_intensifier" value="false"/>
            <parameter key="intensifier_word" value=""/>
            <parameter key="intensifier_value" value=""/>
            <parameter key="use_symmetric_intensifier_window" value="false"/>
          </operator>
          <operator activated="true" class="text:create_document" compatibility="9.3.001" expanded="true" height="68" name="Create Document" width="90" x="246" y="289">
            <parameter key="text" value="Rapidminer est un excellent logiciel"/>
            <parameter key="add label" value="false"/>
            <parameter key="label_type" value="nominal"/>
          </operator>
          <operator activated="true" class="collect" compatibility="9.8.000" expanded="true" height="82" name="Collect" width="90" x="380" y="289">
            <parameter key="unfold" value="false"/>
          </operator>
          <operator activated="true" class="loop_collection" compatibility="9.8.000" expanded="true" height="82" name="Loop Collection" width="90" x="514" y="289">
            <parameter key="set_iteration_macro" value="false"/>
            <parameter key="macro_name" value="iteration"/>
            <parameter key="macro_start_value" value="1"/>
            <parameter key="unfold" value="false"/>
            <process expanded="true">
              <operator activated="true" class="text:tokenize" compatibility="9.3.001" expanded="true" height="68" name="Tokenize" width="90" x="246" y="34">
                <parameter key="mode" value="non letters"/>
                <parameter key="characters" value=".:"/>
                <parameter key="language" value="English"/>
                <parameter key="max_token_length" value="3"/>
              </operator>
              <connect from_port="single" to_op="Tokenize" to_port="document"/>
              <connect from_op="Tokenize" from_port="document" to_port="output 1"/>
              <portSpacing port="source_single" spacing="0"/>
              <portSpacing port="sink_output 1" spacing="0"/>
              <portSpacing port="sink_output 2" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="operator_toolbox:apply_model_documents" compatibility="2.8.000-SNAPSHOT" expanded="true" height="103" name="Apply Model (Documents)" width="90" x="648" y="289">
            <list key="application_parameters"/>
          </operator>
          <connect from_op="Read CSV" from_port="output" to_op="Rename" to_port="example set input"/>
          <connect from_op="Rename" from_port="example set output" to_op="Dictionary-Based Sentiment (Documents)" to_port="exa"/>
          <connect from_op="Dictionary-Based Sentiment (Documents)" from_port="mod" to_op="Apply Model (Documents)" to_port="mod"/>
          <connect from_op="Create Document" from_port="output" to_op="Collect" to_port="input 1"/>
          <connect from_op="Collect" from_port="collection" to_op="Loop Collection" to_port="collection"/>
          <connect from_op="Loop Collection" from_port="output 1" to_op="Apply Model (Documents)" to_port="doc"/>
          <connect from_op="Apply Model (Documents)" from_port="exa" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>





    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • MartinLiebigMartinLiebig Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,533 RM Data Scientist
    Solution Accepted
    Hi @El75,
    i will connect with you via email.
    Best,
    Martin
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany

Answers

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,533 RM Data Scientist
    Hi,

    this operator is actually just wrapping models created with dictionary based sentiment operator. You can easily use the dict based sentiment operator to do this.

    Best,
    Martin
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • EL75EL75 Member Posts: 43 Contributor II
    hello mschmitz,
    thanks for your answer. how can I manage the "dictionary based sentiment operator" in order to access to french versions mentioned of vader or wordnet?
    best regards
  • MartinLiebigMartinLiebig Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,533 RM Data Scientist
    Hi @EL75,
    did you check the tutorial process?

    BR,
    Martin
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • EL75EL75 Member Posts: 43 Contributor II
    if you mean this one, yes. Tell me if I'm wrong.
    In case not, how this process allow me to access one of those ressources? 
     The wordnet exist for french (Wolf):http://pauillac.inria.fr/~sagot/index.html#wolf
    - VADER also has been transposed: https://github.com/thomas7lieues/vader_FR
    best regards

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,533 RM Data Scientist
    Hi,
    a full training process looks like this:

    <?xml version="1.0" encoding="UTF-8"?><process version="9.8.000">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="9.8.000" expanded="true" name="Process">
        <parameter key="logverbosity" value="init"/>
        <parameter key="random_seed" value="2001"/>
        <parameter key="send_mail" value="never"/>
        <parameter key="notification_email" value=""/>
        <parameter key="process_duration_for_mail" value="30"/>
        <parameter key="encoding" value="SYSTEM"/>
        <process expanded="true">
          <operator activated="true" class="open_file" compatibility="9.8.000" expanded="true" height="68" name="Open File" width="90" x="45" y="85">
            <parameter key="resource_type" value="URL"/>
            <parameter key="url" value="https://raw.githubusercontent.com/thomas7lieues/vader_FR/master/vaderSentiment_fr/fr_lexicon.txt"/>
            <description align="center" color="transparent" colored="false" width="126">https://github.com/cjhutto/vaderSentiment</description&gt;
          </operator>
          <operator activated="true" class="read_csv" compatibility="9.8.000" expanded="true" height="68" name="Read CSV" width="90" x="179" y="85">
            <parameter key="column_separators" value="\t"/>
            <parameter key="trim_lines" value="false"/>
            <parameter key="use_quotes" value="true"/>
            <parameter key="quotes_character" value="&quot;"/>
            <parameter key="escape_character" value="\"/>
            <parameter key="skip_comments" value="false"/>
            <parameter key="comment_characters" value="#"/>
            <parameter key="starting_row" value="1"/>
            <parameter key="parse_numbers" value="true"/>
            <parameter key="decimal_character" value="."/>
            <parameter key="grouped_digits" value="false"/>
            <parameter key="grouping_character" value=","/>
            <parameter key="infinity_representation" value=""/>
            <parameter key="date_format" value=""/>
            <parameter key="first_row_as_names" value="false"/>
            <list key="annotations"/>
            <parameter key="time_zone" value="SYSTEM"/>
            <parameter key="locale" value="English (United States)"/>
            <parameter key="encoding" value="SYSTEM"/>
            <parameter key="read_all_values_as_polynominal" value="false"/>
            <list key="data_set_meta_data_information"/>
            <parameter key="read_not_matching_values_as_missings" value="true"/>
            <parameter key="datamanagement" value="double_array"/>
            <parameter key="data_management" value="auto"/>
          </operator>
          <operator activated="true" class="rename" compatibility="9.8.000" expanded="true" height="82" name="Rename" width="90" x="313" y="85">
            <parameter key="old_name" value="att1"/>
            <parameter key="new_name" value="word"/>
            <list key="rename_additional_attributes">
              <parameter key="att2" value="score"/>
            </list>
          </operator>
          <operator activated="true" class="operator_toolbox:dictionary_sentiment_learner" compatibility="2.8.000-SNAPSHOT" expanded="true" height="103" name="Dictionary-Based Sentiment (Documents)" width="90" x="514" y="85">
            <parameter key="value_attribute" value="score"/>
            <parameter key="key_attribute" value="word"/>
            <parameter key="negation_attribute" value=""/>
            <parameter key="negation_window_size" value="1"/>
            <parameter key="negation_strength" value=""/>
            <parameter key="use_symmetric_negation_window" value="false"/>
            <parameter key="use_intensifier" value="false"/>
            <parameter key="intensifier_word" value=""/>
            <parameter key="intensifier_value" value=""/>
            <parameter key="use_symmetric_intensifier_window" value="false"/>
          </operator>
          <operator activated="true" class="text:create_document" compatibility="9.3.001" expanded="true" height="68" name="Create Document" width="90" x="246" y="289">
            <parameter key="text" value="Rapidminer est un excellent logiciel"/>
            <parameter key="add label" value="false"/>
            <parameter key="label_type" value="nominal"/>
          </operator>
          <operator activated="true" class="collect" compatibility="9.8.000" expanded="true" height="82" name="Collect" width="90" x="380" y="289">
            <parameter key="unfold" value="false"/>
          </operator>
          <operator activated="true" class="loop_collection" compatibility="9.8.000" expanded="true" height="82" name="Loop Collection" width="90" x="514" y="289">
            <parameter key="set_iteration_macro" value="false"/>
            <parameter key="macro_name" value="iteration"/>
            <parameter key="macro_start_value" value="1"/>
            <parameter key="unfold" value="false"/>
            <process expanded="true">
              <operator activated="true" class="text:tokenize" compatibility="9.3.001" expanded="true" height="68" name="Tokenize" width="90" x="246" y="34">
                <parameter key="mode" value="non letters"/>
                <parameter key="characters" value=".:"/>
                <parameter key="language" value="English"/>
                <parameter key="max_token_length" value="3"/>
              </operator>
              <connect from_port="single" to_op="Tokenize" to_port="document"/>
              <connect from_op="Tokenize" from_port="document" to_port="output 1"/>
              <portSpacing port="source_single" spacing="0"/>
              <portSpacing port="sink_output 1" spacing="0"/>
              <portSpacing port="sink_output 2" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="operator_toolbox:apply_model_documents" compatibility="2.8.000-SNAPSHOT" expanded="true" height="103" name="Apply Model (Documents)" width="90" x="648" y="289">
            <list key="application_parameters"/>
          </operator>
          <connect from_op="Open File" from_port="file" to_op="Read CSV" to_port="file"/>
          <connect from_op="Read CSV" from_port="output" to_op="Rename" to_port="example set input"/>
          <connect from_op="Rename" from_port="example set output" to_op="Dictionary-Based Sentiment (Documents)" to_port="exa"/>
          <connect from_op="Dictionary-Based Sentiment (Documents)" from_port="mod" to_op="Apply Model (Documents)" to_port="mod"/>
          <connect from_op="Create Document" from_port="output" to_op="Collect" to_port="input 1"/>
          <connect from_op="Collect" from_port="collection" to_op="Loop Collection" to_port="collection"/>
          <connect from_op="Loop Collection" from_port="output 1" to_op="Apply Model (Documents)" to_port="doc"/>
          <connect from_op="Apply Model (Documents)" from_port="exa" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>


    This is even more powerful than extract sentiment, but obviously also harder to use. I will create a ticket to add french vader to the Extract sentiment operator. Do you have any other dictionary to add?


    Best,

    Martin


    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • EL75EL75 Member Posts: 43 Contributor II
    Thanks for your answer !
    WOLF project is the french translation of wordnet, probably a good idea to add it too. 
    rapidminer popularity will increase within the french community :)

    - The wordnet exist for french (Wolf):http://pauillac.inria.fr/~sagot/index.html#wolf
    - VADER also has been transposed: https://github.com/thomas7lieues/vader_FR
  • EL75EL75 Member Posts: 43 Contributor II
    Martin,
    trying to copy/paste the xml code ("a full training process looks like this") in rapid miner.. but nothing happens. 
    could you help ?

    <?xml version="1.0" encoding="UTF-8"?><process version="9.8.000">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="9.8.000" expanded="true" name="Process">
        <parameter key="logverbosity" value="init"/>
        <parameter key="random_seed" value="2001"/>
        <parameter key="send_mail" value="never"/>
        <parameter key="notification_email" value=""/>
        <parameter key="process_duration_for_mail" value="30"/>
        <parameter key="encoding" value="SYSTEM"/>
        <process expanded="true">
          <operator activated="true" class="open_file" compatibility="9.8.000" expanded="true" height="68" name="Open File" width="90" x="45" y="85">
            <parameter key="resource_type" value="URL"/>
            <parameter key="url" value="https://raw.githubusercontent.com/thomas7lieues/vader_FR/master/vaderSentiment_fr/fr_lexicon.txt"/>
            <description align="center" color="transparent" colored="false" width="126">https://github.com/cjhutto/vaderSentiment</description&gt;
          </operator>
          <operator activated="true" class="read_csv" compatibility="9.8.000" expanded="true" height="68" name="Read CSV" width="90" x="179" y="85">
            <parameter key="column_separators" value="\t"/>
            <parameter key="trim_lines" value="false"/>
            <parameter key="use_quotes" value="true"/>
            <parameter key="quotes_character" value="&quot;"/>
            <parameter key="escape_character" value="\"/>
            <parameter key="skip_comments" value="false"/>
            <parameter key="comment_characters" value="#"/>
            <parameter key="starting_row" value="1"/>
            <parameter key="parse_numbers" value="true"/>
            <parameter key="decimal_character" value="."/>
            <parameter key="grouped_digits" value="false"/>
            <parameter key="grouping_character" value=","/>
            <parameter key="infinity_representation" value=""/>
            <parameter key="date_format" value=""/>
            <parameter key="first_row_as_names" value="false"/>
            <list key="annotations"/>
            <parameter key="time_zone" value="SYSTEM"/>
            <parameter key="locale" value="English (United States)"/>
            <parameter key="encoding" value="SYSTEM"/>
            <parameter key="read_all_values_as_polynominal" value="false"/>
            <list key="data_set_meta_data_information"/>
            <parameter key="read_not_matching_values_as_missings" value="true"/>
            <parameter key="datamanagement" value="double_array"/>
            <parameter key="data_management" value="auto"/>
          </operator>
          <operator activated="true" class="rename" compatibility="9.8.000" expanded="true" height="82" name="Rename" width="90" x="313" y="85">
            <parameter key="old_name" value="att1"/>
            <parameter key="new_name" value="word"/>
            <list key="rename_additional_attributes">
              <parameter key="att2" value="score"/>
            </list>
          </operator>
          <operator activated="true" class="operator_toolbox:dictionary_sentiment_learner" compatibility="2.8.000-SNAPSHOT" expanded="true" height="103" name="Dictionary-Based Sentiment (Documents)" width="90" x="514" y="85">
            <parameter key="value_attribute" value="score"/>
            <parameter key="key_attribute" value="word"/>
            <parameter key="negation_attribute" value=""/>
            <parameter key="negation_window_size" value="1"/>
            <parameter key="negation_strength" value=""/>
            <parameter key="use_symmetric_negation_window" value="false"/>
            <parameter key="use_intensifier" value="false"/>
            <parameter key="intensifier_word" value=""/>
            <parameter key="intensifier_value" value=""/>
            <parameter key="use_symmetric_intensifier_window" value="false"/>
          </operator>
          <operator activated="true" class="text:create_document" compatibility="9.3.001" expanded="true" height="68" name="Create Document" width="90" x="246" y="289">
            <parameter key="text" value="Rapidminer est un excellent logiciel"/>
            <parameter key="add label" value="false"/>
            <parameter key="label_type" value="nominal"/>
          </operator>
          <operator activated="true" class="collect" compatibility="9.8.000" expanded="true" height="82" name="Collect" width="90" x="380" y="289">
            <parameter key="unfold" value="false"/>
          </operator>
          <operator activated="true" class="loop_collection" compatibility="9.8.000" expanded="true" height="82" name="Loop Collection" width="90" x="514" y="289">
            <parameter key="set_iteration_macro" value="false"/>
            <parameter key="macro_name" value="iteration"/>
            <parameter key="macro_start_value" value="1"/>
            <parameter key="unfold" value="false"/>
            <process expanded="true">
              <operator activated="true" class="text:tokenize" compatibility="9.3.001" expanded="true" height="68" name="Tokenize" width="90" x="246" y="34">
                <parameter key="mode" value="non letters"/>
                <parameter key="characters" value=".:"/>
                <parameter key="language" value="English"/>
                <parameter key="max_token_length" value="3"/>
              </operator>
              <connect from_port="single" to_op="Tokenize" to_port="document"/>
              <connect from_op="Tokenize" from_port="document" to_port="output 1"/>
              <portSpacing port="source_single" spacing="0"/>
              <portSpacing port="sink_output 1" spacing="0"/>
              <portSpacing port="sink_output 2" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="operator_toolbox:apply_model_documents" compatibility="2.8.000-SNAPSHOT" expanded="true" height="103" name="Apply Model (Documents)" width="90" x="648" y="289">
            <list key="application_parameters"/>
          </operator>
          <connect from_op="Open File" from_port="file" to_op="Read CSV" to_port="file"/>
          <connect from_op="Read CSV" from_port="output" to_op="Rename" to_port="example set input"/>
          <connect from_op="Rename" from_port="example set output" to_op="Dictionary-Based Sentiment (Documents)" to_port="exa"/>
          <connect from_op="Dictionary-Based Sentiment (Documents)" from_port="mod" to_op="Apply Model (Documents)" to_port="mod"/>
          <connect from_op="Create Document" from_port="output" to_op="Collect" to_port="input 1"/>
          <connect from_op="Collect" from_port="collection" to_op="Loop Collection" to_port="collection"/>
          <connect from_op="Loop Collection" from_port="output 1" to_op="Apply Model (Documents)" to_port="doc"/>
          <connect from_op="Apply Model (Documents)" from_port="exa" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>

  • EL75EL75 Member Posts: 43 Contributor II
    edited December 2020
    Thanks a lot! works fine.
    May I ask you few additional questions, in oder to fine tune the process?

    1- working with example set
    As I have an example set containing reviews, I've added a "data to document" operator before the "loop collection" operator (I havent't seen an operator like "Apply Model (Documents)" dedicated to example sets). then I've put in the "loop" all my text processing operators, and it looks fine. Is it the right way?

    2- using emojis
    I've seen in the vader repository that there are two others files that could be helpful (I've lot of emoticons in my reviews): 
    is there a way to integrate them in this process ?


    3- understanding the columns in the dictionary

    - att1 is the word of de dictionary
    - att2 seems to be the value of the polarity
    - att3: is it the weight?
    - att4: how those values are used?

    4- using polarity_scores_max
    https://github.com/thomas7lieues/vader_FR
    on this web page it is indicated that we can use polarity_scores_max: how is it possible?
    # Note : You can use polarity_scores_max instead of polarity_scores. polarity_scores_max uses fuzzywuzzy to get the most similar words with your inputs. For example "connar" won't be detected with polarity_scores but with polarity_scores_max

    5- Build my own dictionary
    If I want to add sentiment words and weights related to the specific domain I'm working on, what would be the best process?
    just adding new lines in the dictionary file?

    I really enjoy using this dictionary on my data set :)
    all the best,
  • MartinLiebigMartinLiebig Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,533 RM Data Scientist
    Hi @EL75 ,
    yes, you can just append the dictionaries and create one large one to do this.

    Best,
    Martin
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • EL75EL75 Member Posts: 43 Contributor II
    Hi Martin,
    something strange: the process works fine, alone. But when the same one is added to a bigger one (copy/paste) with other operators (I've done this to compare results) => I get an error message saying (prb of tokenization) although the subprocess "loop collection" contains tokenization process". I'm 100% sure that all connections are good. I have even try something aberant but that seems to reveals a bug: in the processus that works fine, I've imported other operators (that generate the default), then move them to the trash (so that I come back to the process that worked fine) and then the process crash...
    below: the process containing at the bottom the "Vader FR" (deactivated)


    the "vader fr" process (works fine alone):


    thanks for your help
    best
  • MartinLiebigMartinLiebig Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,533 RM Data Scientist
    Hi @EL75,
    I would love to help, but I am very busy and this is somewhat complex. I cannot deep dive into it.

    Is this something commercial or is this an academic project? If this is a commercial request we may move this over and we can assign resources on it. Otherwise maybe @lionelderkrikor or so can help?

    Cheers,
    Martin
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • EL75EL75 Member Posts: 43 Contributor II
    edited December 2020
    Hi Martin,
    Of course not, this is not commercial but a research purpose :) =>  (working on health aspects and impacts of digital practices => I'm working on parents and children reviews coming from app stores, twitter, blogs etc) 
    But as I'm working on a french dataset that would be very useful.

    May I ask you also : 
    1 - WORD2VEC
    - I've read your article "wordSynonym Detection with Word2Vec" => I've tried to implement the process but I've obtained strange results : do this operator works with every language (e.g french of course)?

    2- TOPICS EXTRACTION
    As I'm trying to extract topics from the data set, I've read and adapted your excellent article dealing with amazon reviews, thinking that this  process could fit part of my needs. It is really inspiring! I wonder if there's any other possibilities to visualize results, such as dendrogram, etc?

    Best,
  • MartinLiebigMartinLiebig Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,533 RM Data Scientist
    Hi @EL75 ,
    maybe you want to explain a bit more what you try to accomplish from a "Business" perspective so we can map this to a DS method?

    ~Martin
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • MartinLiebigMartinLiebig Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,533 RM Data Scientist
    Hi @EL75 ,
    I added french to the operator a minute ago. It will not be publicly available for a bit (since we usually wait a bit to have more new things). Please let me know if you need a preview build.

    Best,
    Martin
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • EL75EL75 Member Posts: 43 Contributor II
    Hi Martin,
    thanks for having done it. I'd appreciate receiving a preview build, indeed.
    I wish you a happy new year!
    Best,
Sign In or Register to comment.