how to count "exampleword" in tweet

Mustafa_AVDAN · December 2017

Ekran Görüntüsü (1).png hello again;Im sorry to I asked the same question , but I need some help for my project and ı cant continue...Anybody help me?How can I count "HASTAGHWORD" at each tweets in rapid miner?which operator can help me?ı didnt use Exel , this picture is just for example...I will got tweets with Search tweet Operator, after that ı will count some words,on the Search tweet dataset.Finally ı will generate a new column and ı will add this value of counter to my new column for each row(each tweets)...Please help me ; I must do it this week!:[

Edin_Klapic · December 2017

Hi @Mustafa_AVDAN,

One starting point might be to use the Split Operator and split the examples by HASHTAGWORD...

On the other side, you may also take a look at the Text Processing extension. The Operators in there could also help.

Best regards,

Edin

lionelderkrikor · December 2017

Hi Mustafa,

If you have Python installed on your computer, you can use the "Execute Python" operator (to download and install from Marketplace).

There are only 5 lines of codes to perform the wanted task.

From the "Search Twitter" operator, i added a "select attribute" operator to retain only the "Text" attribute where there are the Tweets.

To modify your hashtagword, you have just to :

- Click on "Execute Python" operator -> parameters -> Edit text

- in the code, set hashtagword = "xxxxx" where xxxxx is your wanted hashtagword

Here the process :

<?xml version="1.0" encoding="UTF-8"?><process version="8.0.001">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="8.0.001" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="social_media:search_twitter" compatibility="7.3.000" expanded="true" height="68" name="Search Twitter" width="90" x="112" y="34">
        <parameter key="connection" value="dkk"/>
        <parameter key="query" value="video"/>
      </operator>
      <operator activated="true" class="select_attributes" compatibility="8.0.001" expanded="true" height="82" name="Select Attributes" width="90" x="246" y="34">
        <parameter key="attribute_filter_type" value="single"/>
        <parameter key="attribute" value="Text"/>
      </operator>
      <operator activated="true" class="python_scripting:execute_python" compatibility="7.4.000" expanded="true" height="82" name="Execute Python" width="90" x="447" y="34">
        <parameter key="script" value="import pandas as pd&#10;import numpy as np&#10;import re&#10;&#10;# rm_main is a mandatory function, &#10;# the number of arguments has to be the number of input ports (can be none)&#10;def rm_main(data):&#10;&#10;   hashtagword = &quot;of&quot;&#10;   occurence = np.arange(len(data))&#10;   &#10;   for i in range(len(data)) : &#10;     occurence[i] = len(re.findall(hashtagword,data.iloc[i,0]))&#10;   &#10;   data['Occurence'] = occurence&#10;&#10;    # connect 2 output ports to see the results&#10;   return data"/>
      </operator>
      <connect from_op="Search Twitter" from_port="output" to_op="Select Attributes" to_port="example set input"/>
      <connect from_op="Select Attributes" from_port="example set output" to_op="Execute Python" to_port="input 1"/>
      <connect from_op="Execute Python" from_port="output 1" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>

I hope this will be helpful,

Regards,

Lionel

lionelderkrikor · December 2017

Hi again @Mustafa_AVDAN

After further investigation, your task is possible without Python.

1. First you have to download and install the Text Processing extension from the marketplace.

2. Here the process :

<?xml version="1.0" encoding="UTF-8"?><process version="8.0.001">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="8.0.001" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="social_media:search_twitter" compatibility="7.3.000" expanded="true" height="68" name="Search Twitter" width="90" x="45" y="34">
        <parameter key="connection" value="dkk"/>
        <parameter key="query" value="video"/>
      </operator>
      <operator activated="true" class="select_attributes" compatibility="8.0.001" expanded="true" height="82" name="Select Attributes" width="90" x="179" y="34">
        <parameter key="attribute_filter_type" value="single"/>
        <parameter key="attribute" value="Text"/>
      </operator>
      <operator activated="true" class="nominal_to_text" compatibility="8.0.001" expanded="true" height="82" name="Nominal to Text" width="90" x="313" y="34"/>
      <operator activated="true" class="text:process_document_from_data" compatibility="7.5.000" expanded="true" height="82" name="Process Documents from Data" width="90" x="648" y="34">
        <parameter key="vector_creation" value="Term Occurrences"/>
        <parameter key="keep_text" value="true"/>
        <list key="specify_weights"/>
        <process expanded="true">
          <operator activated="true" class="text:tokenize" compatibility="7.5.000" expanded="true" height="68" name="Tokenize" width="90" x="112" y="34"/>
          <connect from_port="document" to_op="Tokenize" to_port="document"/>
          <connect from_op="Tokenize" from_port="document" to_port="document 1"/>
          <portSpacing port="source_document" spacing="0"/>
          <portSpacing port="sink_document 1" spacing="0"/>
          <portSpacing port="sink_document 2" spacing="0"/>
        </process>
      </operator>
      <operator activated="true" class="select_attributes" compatibility="8.0.001" expanded="true" height="82" name="Select Attributes (2)" width="90" x="782" y="34">
        <parameter key="attribute_filter_type" value="regular_expression"/>
        <parameter key="attribute" value="Text"/>
        <parameter key="regular_expression" value="of\b"/>
      </operator>
      <connect from_op="Search Twitter" from_port="output" to_op="Select Attributes" to_port="example set input"/>
      <connect from_op="Select Attributes" from_port="example set output" to_op="Nominal to Text" to_port="example set input"/>
      <connect from_op="Nominal to Text" from_port="example set output" to_op="Process Documents from Data" to_port="example set"/>
      <connect from_op="Process Documents from Data" from_port="example set" to_op="Select Attributes (2)" to_port="example set input"/>
      <connect from_op="Process Documents from Data" from_port="word list" to_port="result 2"/>
      <connect from_op="Select Attributes (2)" from_port="example set output" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
      <portSpacing port="sink_result 3" spacing="0"/>
    </process>
  </operator>
</process>

3. Set your "hashtagword" in the "select Attributes(2)" operator parameters :

for example in this process, I have perform some tests with the word "of". So you have to replace of

by your own hashtagword in the regular expression parameter :

4. The results view looks like this :

I think you have now response elements.

Regards,

Lionel

lionelderkrikor · December 2017

Hi @Mustafa_AVDAN again (and again)

You can find a second release of the last process more in the "RM spirit" (easier to use and with note)

Here the process :

<?xml version="1.0" encoding="UTF-8"?><process version="8.0.001">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="8.0.001" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="social_media:search_twitter" compatibility="7.3.000" expanded="true" height="68" name="Search Twitter" width="90" x="45" y="34">
        <parameter key="connection" value="dkk"/>
        <parameter key="query" value="video"/>
      </operator>
      <operator activated="true" class="select_attributes" compatibility="8.0.001" expanded="true" height="82" name="Select Attributes" width="90" x="179" y="34">
        <parameter key="attribute_filter_type" value="single"/>
        <parameter key="attribute" value="Text"/>
      </operator>
      <operator activated="true" class="nominal_to_text" compatibility="8.0.001" expanded="true" height="82" name="Nominal to Text" width="90" x="313" y="34"/>
      <operator activated="true" class="set_macro" compatibility="8.0.001" expanded="true" height="82" name="Set hashtagword" width="90" x="447" y="34">
        <parameter key="macro" value="hashTagword"/>
        <parameter key="value" value="of"/>
        <description align="center" color="red" colored="true" width="126">Set your hashtagword by modifying the parameter &amp;quot;value&amp;quot; (don't modify the &amp;quot;macro&amp;quot; name)</description>
      </operator>
      <operator activated="true" class="text:process_document_from_data" compatibility="7.5.000" expanded="true" height="82" name="Process Documents from Data" width="90" x="648" y="34">
        <parameter key="vector_creation" value="Term Occurrences"/>
        <parameter key="keep_text" value="true"/>
        <list key="specify_weights"/>
        <process expanded="true">
          <operator activated="true" class="text:tokenize" compatibility="7.5.000" expanded="true" height="68" name="Tokenize" width="90" x="112" y="34"/>
          <connect from_port="document" to_op="Tokenize" to_port="document"/>
          <connect from_op="Tokenize" from_port="document" to_port="document 1"/>
          <portSpacing port="source_document" spacing="0"/>
          <portSpacing port="sink_document 1" spacing="0"/>
          <portSpacing port="sink_document 2" spacing="0"/>
        </process>
      </operator>
      <operator activated="true" class="select_attributes" compatibility="8.0.001" expanded="true" height="82" name="Select Attributes (2)" width="90" x="782" y="34">
        <parameter key="attribute_filter_type" value="regular_expression"/>
        <parameter key="attribute" value="Text"/>
        <parameter key="regular_expression" value="%{hashTagword}\b"/>
      </operator>
      <connect from_op="Search Twitter" from_port="output" to_op="Select Attributes" to_port="example set input"/>
      <connect from_op="Select Attributes" from_port="example set output" to_op="Nominal to Text" to_port="example set input"/>
      <connect from_op="Nominal to Text" from_port="example set output" to_op="Set hashtagword" to_port="through 1"/>
      <connect from_op="Set hashtagword" from_port="through 1" to_op="Process Documents from Data" to_port="example set"/>
      <connect from_op="Process Documents from Data" from_port="example set" to_op="Select Attributes (2)" to_port="example set input"/>
      <connect from_op="Process Documents from Data" from_port="word list" to_port="result 2"/>
      <connect from_op="Select Attributes (2)" from_port="example set output" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
      <portSpacing port="sink_result 3" spacing="0"/>
    </process>
  </operator>
</process>

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

how to count "exampleword" in tweet

Answers