The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Extract sentiment operator works with french words?
Hi,
Does someone could tell me if VADER or Wordnet are dealing with french when you select one of them in the "Extract sentiment" operator ?
- The wordnet exist for french (Wolf):http://pauillac.inria.fr/~sagot/index.html#wolf
- VADER also has been transposed: https://github.com/thomas7lieues/vader_FR
But what about the legacy operator of rapid miner? I've seen no way to parameter the operator, neither in the help window...
In case the standard rapid miner operator doesn't woks for french, is there a way to connect rapidminer to the french projects mentioned above?
thanks.
Tagged:
0
Best Answers
-
MartinLiebig Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,533 RM Data ScientistHi,there is something odd with escaping of / and so on, please try this process and adapt the path of read csv in a way that it points to the downloaded version of: https://raw.githubusercontent.com/thomas7lieues/vader_FR/master/vaderSentiment_fr/fr_lexicon.txtBest,Martin<?xml version="1.0" encoding="UTF-8"?><process version="9.8.000">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="9.8.000" expanded="true" name="Process">
<parameter key="logverbosity" value="init"/>
<parameter key="random_seed" value="2001"/>
<parameter key="send_mail" value="never"/>
<parameter key="notification_email" value=""/>
<parameter key="process_duration_for_mail" value="30"/>
<parameter key="encoding" value="SYSTEM"/>
<process expanded="true">
<operator activated="true" class="read_csv" compatibility="9.8.000" expanded="true" height="68" name="Read CSV" width="90" x="246" y="85">
<parameter key="csv_file" value="C:/Users/MartinSchmitz/Downloads/fr_lexicon.txt"/>
<parameter key="column_separators" value="\t"/>
<parameter key="trim_lines" value="false"/>
<parameter key="use_quotes" value="true"/>
<parameter key="quotes_character" value="""/>
<parameter key="escape_character" value="\"/>
<parameter key="skip_comments" value="false"/>
<parameter key="comment_characters" value="#"/>
<parameter key="starting_row" value="1"/>
<parameter key="parse_numbers" value="true"/>
<parameter key="decimal_character" value="."/>
<parameter key="grouped_digits" value="false"/>
<parameter key="grouping_character" value=","/>
<parameter key="infinity_representation" value=""/>
<parameter key="date_format" value=""/>
<parameter key="first_row_as_names" value="false"/>
<list key="annotations"/>
<parameter key="time_zone" value="SYSTEM"/>
<parameter key="locale" value="English (United States)"/>
<parameter key="encoding" value="SYSTEM"/>
<parameter key="read_all_values_as_polynominal" value="false"/>
<list key="data_set_meta_data_information"/>
<parameter key="read_not_matching_values_as_missings" value="true"/>
<parameter key="datamanagement" value="double_array"/>
<parameter key="data_management" value="auto"/>
</operator>
<operator activated="true" class="rename" compatibility="9.8.000" expanded="true" height="82" name="Rename" width="90" x="447" y="85">
<parameter key="old_name" value="att1"/>
<parameter key="new_name" value="word"/>
<list key="rename_additional_attributes">
<parameter key="att2" value="score"/>
</list>
</operator>
<operator activated="true" class="operator_toolbox:dictionary_sentiment_learner" compatibility="2.8.000-SNAPSHOT" expanded="true" height="103" name="Dictionary-Based Sentiment (Documents)" width="90" x="581" y="85">
<parameter key="value_attribute" value="score"/>
<parameter key="key_attribute" value="word"/>
<parameter key="negation_attribute" value=""/>
<parameter key="negation_window_size" value="1"/>
<parameter key="negation_strength" value=""/>
<parameter key="use_symmetric_negation_window" value="false"/>
<parameter key="use_intensifier" value="false"/>
<parameter key="intensifier_word" value=""/>
<parameter key="intensifier_value" value=""/>
<parameter key="use_symmetric_intensifier_window" value="false"/>
</operator>
<operator activated="true" class="text:create_document" compatibility="9.3.001" expanded="true" height="68" name="Create Document" width="90" x="246" y="289">
<parameter key="text" value="Rapidminer est un excellent logiciel"/>
<parameter key="add label" value="false"/>
<parameter key="label_type" value="nominal"/>
</operator>
<operator activated="true" class="collect" compatibility="9.8.000" expanded="true" height="82" name="Collect" width="90" x="380" y="289">
<parameter key="unfold" value="false"/>
</operator>
<operator activated="true" class="loop_collection" compatibility="9.8.000" expanded="true" height="82" name="Loop Collection" width="90" x="514" y="289">
<parameter key="set_iteration_macro" value="false"/>
<parameter key="macro_name" value="iteration"/>
<parameter key="macro_start_value" value="1"/>
<parameter key="unfold" value="false"/>
<process expanded="true">
<operator activated="true" class="text:tokenize" compatibility="9.3.001" expanded="true" height="68" name="Tokenize" width="90" x="246" y="34">
<parameter key="mode" value="non letters"/>
<parameter key="characters" value=".:"/>
<parameter key="language" value="English"/>
<parameter key="max_token_length" value="3"/>
</operator>
<connect from_port="single" to_op="Tokenize" to_port="document"/>
<connect from_op="Tokenize" from_port="document" to_port="output 1"/>
<portSpacing port="source_single" spacing="0"/>
<portSpacing port="sink_output 1" spacing="0"/>
<portSpacing port="sink_output 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="operator_toolbox:apply_model_documents" compatibility="2.8.000-SNAPSHOT" expanded="true" height="103" name="Apply Model (Documents)" width="90" x="648" y="289">
<list key="application_parameters"/>
</operator>
<connect from_op="Read CSV" from_port="output" to_op="Rename" to_port="example set input"/>
<connect from_op="Rename" from_port="example set output" to_op="Dictionary-Based Sentiment (Documents)" to_port="exa"/>
<connect from_op="Dictionary-Based Sentiment (Documents)" from_port="mod" to_op="Apply Model (Documents)" to_port="mod"/>
<connect from_op="Create Document" from_port="output" to_op="Collect" to_port="input 1"/>
<connect from_op="Collect" from_port="collection" to_op="Loop Collection" to_port="collection"/>
<connect from_op="Loop Collection" from_port="output 1" to_op="Apply Model (Documents)" to_port="doc"/>
<connect from_op="Apply Model (Documents)" from_port="exa" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
- Sr. Director Data Solutions, Altair RapidMiner -
Dortmund, Germany1 -
MartinLiebig Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,533 RM Data Scientist- Sr. Director Data Solutions, Altair RapidMiner -
Dortmund, Germany0
Answers
Dortmund, Germany
thanks for your answer. how can I manage the "dictionary based sentiment operator" in order to access to french versions mentioned of vader or wordnet?
best regards
Dortmund, Germany
In case not, how this process allow me to access one of those ressources?
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="9.8.000" expanded="true" name="Process">
<parameter key="logverbosity" value="init"/>
<parameter key="random_seed" value="2001"/>
<parameter key="send_mail" value="never"/>
<parameter key="notification_email" value=""/>
<parameter key="process_duration_for_mail" value="30"/>
<parameter key="encoding" value="SYSTEM"/>
<process expanded="true">
<operator activated="true" class="open_file" compatibility="9.8.000" expanded="true" height="68" name="Open File" width="90" x="45" y="85">
<parameter key="resource_type" value="URL"/>
<parameter key="url" value="https://raw.githubusercontent.com/thomas7lieues/vader_FR/master/vaderSentiment_fr/fr_lexicon.txt"/>
<description align="center" color="transparent" colored="false" width="126">https://github.com/cjhutto/vaderSentiment</description>
</operator>
<operator activated="true" class="read_csv" compatibility="9.8.000" expanded="true" height="68" name="Read CSV" width="90" x="179" y="85">
<parameter key="column_separators" value="\t"/>
<parameter key="trim_lines" value="false"/>
<parameter key="use_quotes" value="true"/>
<parameter key="quotes_character" value="""/>
<parameter key="escape_character" value="\"/>
<parameter key="skip_comments" value="false"/>
<parameter key="comment_characters" value="#"/>
<parameter key="starting_row" value="1"/>
<parameter key="parse_numbers" value="true"/>
<parameter key="decimal_character" value="."/>
<parameter key="grouped_digits" value="false"/>
<parameter key="grouping_character" value=","/>
<parameter key="infinity_representation" value=""/>
<parameter key="date_format" value=""/>
<parameter key="first_row_as_names" value="false"/>
<list key="annotations"/>
<parameter key="time_zone" value="SYSTEM"/>
<parameter key="locale" value="English (United States)"/>
<parameter key="encoding" value="SYSTEM"/>
<parameter key="read_all_values_as_polynominal" value="false"/>
<list key="data_set_meta_data_information"/>
<parameter key="read_not_matching_values_as_missings" value="true"/>
<parameter key="datamanagement" value="double_array"/>
<parameter key="data_management" value="auto"/>
</operator>
<operator activated="true" class="rename" compatibility="9.8.000" expanded="true" height="82" name="Rename" width="90" x="313" y="85">
<parameter key="old_name" value="att1"/>
<parameter key="new_name" value="word"/>
<list key="rename_additional_attributes">
<parameter key="att2" value="score"/>
</list>
</operator>
<operator activated="true" class="operator_toolbox:dictionary_sentiment_learner" compatibility="2.8.000-SNAPSHOT" expanded="true" height="103" name="Dictionary-Based Sentiment (Documents)" width="90" x="514" y="85">
<parameter key="value_attribute" value="score"/>
<parameter key="key_attribute" value="word"/>
<parameter key="negation_attribute" value=""/>
<parameter key="negation_window_size" value="1"/>
<parameter key="negation_strength" value=""/>
<parameter key="use_symmetric_negation_window" value="false"/>
<parameter key="use_intensifier" value="false"/>
<parameter key="intensifier_word" value=""/>
<parameter key="intensifier_value" value=""/>
<parameter key="use_symmetric_intensifier_window" value="false"/>
</operator>
<operator activated="true" class="text:create_document" compatibility="9.3.001" expanded="true" height="68" name="Create Document" width="90" x="246" y="289">
<parameter key="text" value="Rapidminer est un excellent logiciel"/>
<parameter key="add label" value="false"/>
<parameter key="label_type" value="nominal"/>
</operator>
<operator activated="true" class="collect" compatibility="9.8.000" expanded="true" height="82" name="Collect" width="90" x="380" y="289">
<parameter key="unfold" value="false"/>
</operator>
<operator activated="true" class="loop_collection" compatibility="9.8.000" expanded="true" height="82" name="Loop Collection" width="90" x="514" y="289">
<parameter key="set_iteration_macro" value="false"/>
<parameter key="macro_name" value="iteration"/>
<parameter key="macro_start_value" value="1"/>
<parameter key="unfold" value="false"/>
<process expanded="true">
<operator activated="true" class="text:tokenize" compatibility="9.3.001" expanded="true" height="68" name="Tokenize" width="90" x="246" y="34">
<parameter key="mode" value="non letters"/>
<parameter key="characters" value=".:"/>
<parameter key="language" value="English"/>
<parameter key="max_token_length" value="3"/>
</operator>
<connect from_port="single" to_op="Tokenize" to_port="document"/>
<connect from_op="Tokenize" from_port="document" to_port="output 1"/>
<portSpacing port="source_single" spacing="0"/>
<portSpacing port="sink_output 1" spacing="0"/>
<portSpacing port="sink_output 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="operator_toolbox:apply_model_documents" compatibility="2.8.000-SNAPSHOT" expanded="true" height="103" name="Apply Model (Documents)" width="90" x="648" y="289">
<list key="application_parameters"/>
</operator>
<connect from_op="Open File" from_port="file" to_op="Read CSV" to_port="file"/>
<connect from_op="Read CSV" from_port="output" to_op="Rename" to_port="example set input"/>
<connect from_op="Rename" from_port="example set output" to_op="Dictionary-Based Sentiment (Documents)" to_port="exa"/>
<connect from_op="Dictionary-Based Sentiment (Documents)" from_port="mod" to_op="Apply Model (Documents)" to_port="mod"/>
<connect from_op="Create Document" from_port="output" to_op="Collect" to_port="input 1"/>
<connect from_op="Collect" from_port="collection" to_op="Loop Collection" to_port="collection"/>
<connect from_op="Loop Collection" from_port="output 1" to_op="Apply Model (Documents)" to_port="doc"/>
<connect from_op="Apply Model (Documents)" from_port="exa" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
This is even more powerful than extract sentiment, but obviously also harder to use. I will create a ticket to add french vader to the Extract sentiment operator. Do you have any other dictionary to add?
Best,
Martin
Dortmund, Germany
WOLF project is the french translation of wordnet, probably a good idea to add it too.
rapidminer popularity will increase within the french community
trying to copy/paste the xml code ("a full training process looks like this") in rapid miner.. but nothing happens.
could you help ?
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="9.8.000" expanded="true" name="Process">
<parameter key="logverbosity" value="init"/>
<parameter key="random_seed" value="2001"/>
<parameter key="send_mail" value="never"/>
<parameter key="notification_email" value=""/>
<parameter key="process_duration_for_mail" value="30"/>
<parameter key="encoding" value="SYSTEM"/>
<process expanded="true">
<operator activated="true" class="open_file" compatibility="9.8.000" expanded="true" height="68" name="Open File" width="90" x="45" y="85">
<parameter key="resource_type" value="URL"/>
<parameter key="url" value="https://raw.githubusercontent.com/thomas7lieues/vader_FR/master/vaderSentiment_fr/fr_lexicon.txt"/>
<description align="center" color="transparent" colored="false" width="126">https://github.com/cjhutto/vaderSentiment</description>
</operator>
<operator activated="true" class="read_csv" compatibility="9.8.000" expanded="true" height="68" name="Read CSV" width="90" x="179" y="85">
<parameter key="column_separators" value="\t"/>
<parameter key="trim_lines" value="false"/>
<parameter key="use_quotes" value="true"/>
<parameter key="quotes_character" value="""/>
<parameter key="escape_character" value="\"/>
<parameter key="skip_comments" value="false"/>
<parameter key="comment_characters" value="#"/>
<parameter key="starting_row" value="1"/>
<parameter key="parse_numbers" value="true"/>
<parameter key="decimal_character" value="."/>
<parameter key="grouped_digits" value="false"/>
<parameter key="grouping_character" value=","/>
<parameter key="infinity_representation" value=""/>
<parameter key="date_format" value=""/>
<parameter key="first_row_as_names" value="false"/>
<list key="annotations"/>
<parameter key="time_zone" value="SYSTEM"/>
<parameter key="locale" value="English (United States)"/>
<parameter key="encoding" value="SYSTEM"/>
<parameter key="read_all_values_as_polynominal" value="false"/>
<list key="data_set_meta_data_information"/>
<parameter key="read_not_matching_values_as_missings" value="true"/>
<parameter key="datamanagement" value="double_array"/>
<parameter key="data_management" value="auto"/>
</operator>
<operator activated="true" class="rename" compatibility="9.8.000" expanded="true" height="82" name="Rename" width="90" x="313" y="85">
<parameter key="old_name" value="att1"/>
<parameter key="new_name" value="word"/>
<list key="rename_additional_attributes">
<parameter key="att2" value="score"/>
</list>
</operator>
<operator activated="true" class="operator_toolbox:dictionary_sentiment_learner" compatibility="2.8.000-SNAPSHOT" expanded="true" height="103" name="Dictionary-Based Sentiment (Documents)" width="90" x="514" y="85">
<parameter key="value_attribute" value="score"/>
<parameter key="key_attribute" value="word"/>
<parameter key="negation_attribute" value=""/>
<parameter key="negation_window_size" value="1"/>
<parameter key="negation_strength" value=""/>
<parameter key="use_symmetric_negation_window" value="false"/>
<parameter key="use_intensifier" value="false"/>
<parameter key="intensifier_word" value=""/>
<parameter key="intensifier_value" value=""/>
<parameter key="use_symmetric_intensifier_window" value="false"/>
</operator>
<operator activated="true" class="text:create_document" compatibility="9.3.001" expanded="true" height="68" name="Create Document" width="90" x="246" y="289">
<parameter key="text" value="Rapidminer est un excellent logiciel"/>
<parameter key="add label" value="false"/>
<parameter key="label_type" value="nominal"/>
</operator>
<operator activated="true" class="collect" compatibility="9.8.000" expanded="true" height="82" name="Collect" width="90" x="380" y="289">
<parameter key="unfold" value="false"/>
</operator>
<operator activated="true" class="loop_collection" compatibility="9.8.000" expanded="true" height="82" name="Loop Collection" width="90" x="514" y="289">
<parameter key="set_iteration_macro" value="false"/>
<parameter key="macro_name" value="iteration"/>
<parameter key="macro_start_value" value="1"/>
<parameter key="unfold" value="false"/>
<process expanded="true">
<operator activated="true" class="text:tokenize" compatibility="9.3.001" expanded="true" height="68" name="Tokenize" width="90" x="246" y="34">
<parameter key="mode" value="non letters"/>
<parameter key="characters" value=".:"/>
<parameter key="language" value="English"/>
<parameter key="max_token_length" value="3"/>
</operator>
<connect from_port="single" to_op="Tokenize" to_port="document"/>
<connect from_op="Tokenize" from_port="document" to_port="output 1"/>
<portSpacing port="source_single" spacing="0"/>
<portSpacing port="sink_output 1" spacing="0"/>
<portSpacing port="sink_output 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="operator_toolbox:apply_model_documents" compatibility="2.8.000-SNAPSHOT" expanded="true" height="103" name="Apply Model (Documents)" width="90" x="648" y="289">
<list key="application_parameters"/>
</operator>
<connect from_op="Open File" from_port="file" to_op="Read CSV" to_port="file"/>
<connect from_op="Read CSV" from_port="output" to_op="Rename" to_port="example set input"/>
<connect from_op="Rename" from_port="example set output" to_op="Dictionary-Based Sentiment (Documents)" to_port="exa"/>
<connect from_op="Dictionary-Based Sentiment (Documents)" from_port="mod" to_op="Apply Model (Documents)" to_port="mod"/>
<connect from_op="Create Document" from_port="output" to_op="Collect" to_port="input 1"/>
<connect from_op="Collect" from_port="collection" to_op="Loop Collection" to_port="collection"/>
<connect from_op="Loop Collection" from_port="output 1" to_op="Apply Model (Documents)" to_port="doc"/>
<connect from_op="Apply Model (Documents)" from_port="exa" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
May I ask you few additional questions, in oder to fine tune the process?
1- working with example set
As I have an example set containing reviews, I've added a "data to document" operator before the "loop collection" operator (I havent't seen an operator like "Apply Model (Documents)" dedicated to example sets). then I've put in the "loop" all my text processing operators, and it looks fine. Is it the right way?
2- using emojis
I've seen in the vader repository that there are two others files that could be helpful (I've lot of emoticons in my reviews):
is there a way to integrate them in this process ?
3- understanding the columns in the dictionary
- att1 is the word of de dictionary
- att2 seems to be the value of the polarity
- att3: is it the weight?
- att4: how those values are used?
4- using polarity_scores_max
https://github.com/thomas7lieues/vader_FR
on this web page it is indicated that we can use polarity_scores_max: how is it possible?
# Note : You can use polarity_scores_max instead of polarity_scores. polarity_scores_max uses fuzzywuzzy to get the most similar words with your inputs. For example "connar" won't be detected with polarity_scores but with polarity_scores_max
5- Build my own dictionary
If I want to add sentiment words and weights related to the specific domain I'm working on, what would be the best process?
just adding new lines in the dictionary file?
I really enjoy using this dictionary on my data set
all the best,
Dortmund, Germany
the "vader fr" process (works fine alone):
thanks for your help
best
Dortmund, Germany
Of course not, this is not commercial but a research purpose => (working on health aspects and impacts of digital practices => I'm working on parents and children reviews coming from app stores, twitter, blogs etc)
But as I'm working on a french dataset that would be very useful.
May I ask you also :
1 - WORD2VEC
- I've read your article "wordSynonym Detection with Word2Vec" => I've tried to implement the process but I've obtained strange results : do this operator works with every language (e.g french of course)?
As I'm trying to extract topics from the data set, I've read and adapted your excellent article dealing with amazon reviews, thinking that this process could fit part of my needs. It is really inspiring! I wonder if there's any other possibilities to visualize results, such as dendrogram, etc?
Best,
Dortmund, Germany
Dortmund, Germany
thanks for having done it. I'd appreciate receiving a preview build, indeed.
I wish you a happy new year!
Best,