calculate tweet time interval for each user
hi i have twitter dataset and i want to calculate tweets time intervals for each user... can i do this with rapidminer??
in my dataset i have user_id attribute that show the id of user that send the tweet and also time attribute thar show the send time of each tweet...
how can i do this process in rapidminer
Best Answer
-
lionelderkrikor RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn
We should sort the dataset by user_id and then, in deed, you're right, by created_at. For this operation, I used
the Sort (advanced) operator from the Jackhammer extension (to install from the marketplace).
Here the new process :
<?xml version="1.0" encoding="UTF-8"?><process version="8.2.000">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="8.2.000" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="read_csv" compatibility="8.2.000" expanded="true" height="68" name="Read CSV" width="90" x="45" y="34">
<parameter key="csv_file" value="C:\Users\Lionel\Documents\Formations_DataScience\Rapidminer\Tests_Rapidminer\Tweets_Interval\data.csv"/>
<parameter key="column_separators" value=","/>
<parameter key="first_row_as_names" value="false"/>
<list key="annotations">
<parameter key="0" value="Name"/>
</list>
<parameter key="encoding" value="windows-1252"/>
<list key="data_set_meta_data_information">
<parameter key="0" value="created_at.true.polynominal.attribute"/>
<parameter key="1" value="user_id.true.real.attribute"/>
</list>
</operator>
<operator activated="true" class="nominal_to_date" compatibility="8.2.000" expanded="true" height="82" name="Nominal to Date" width="90" x="179" y="34">
<parameter key="attribute_name" value="created_at"/>
<parameter key="date_type" value="date_time"/>
<parameter key="date_format" value="EEE MMM dd HH:mm:ss +0000 yyyy"/>
</operator>
<operator activated="true" class="rmx_toolkit:sort_advanced" compatibility="2.1.784" expanded="true" height="82" name="Sort (Advanced)" width="90" x="380" y="34">
<parameter key="primary_sort_attribute" value="user_id"/>
<list key="additional_sort_attributes">
<parameter key="created_at" value="increasing"/>
</list>
</operator>
<operator activated="true" class="series:lag_series" compatibility="7.4.000" expanded="true" height="82" name="Lag Series" width="90" x="514" y="34">
<list key="attributes">
<parameter key="created_at" value="1"/>
</list>
</operator>
<operator activated="true" class="generate_attributes" compatibility="8.2.000" expanded="true" height="82" name="Generate Attributes (2)" width="90" x="648" y="34">
<list key="function_descriptions">
<parameter key="tweet_interval" value="date_diff([created_at-1],created_at)"/>
</list>
</operator>
<connect from_op="Read CSV" from_port="output" to_op="Nominal to Date" to_port="example set input"/>
<connect from_op="Nominal to Date" from_port="example set output" to_op="Sort (Advanced)" to_port="example set input"/>
<connect from_op="Sort (Advanced)" from_port="example set output" to_op="Lag Series" to_port="example set input"/>
<connect from_op="Lag Series" from_port="example set output" to_op="Generate Attributes (2)" to_port="example set input"/>
<connect from_op="Generate Attributes (2)" from_port="example set output" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>You can note that the interval between tweets is in milliseconds. You can customize the formula
in the last Generate Attributes operator to convert the interval in seconds, minutes, hours, days etc.
Regards,
Lionel
1
Answers
Hi @ramzanzadeh72,
Does this process answer to your need ?
Regards,
Lionel
hi @lionelderkrikor
thanke you for your reply and attention
it work for single user but in my dataset i have a set of users that each user send a set of tweets... for calculation this interval for each user what should i do???
Hi again @ramzanzadeh72,
Could you share your dataset(s) and process to better understand your problem.
Regards,
Lionel
@lionelderkrikor
i share part of my dataset that user_id show id of user that send tweet and create_at show the time that tweet send by user... in this dataset we have 3 user and each user send multiple tweet that create_at show the send time of tweet.
so we should first sort the tweets send by each user base on create_time and then calculate interval of sequential tweets of each user.
Thanke you... thats right....
But I have another question... how can I calculate entropy for these intervals for each user???