Options

calculate tweet time interval for each user

ramzanzadeh72ramzanzadeh72 Member Posts: 14 Contributor I
edited June 2019 in Help

hi i have twitter dataset and i want to calculate tweets time intervals for each user... can i do this with rapidminer??

in my dataset i have user_id attribute  that show the id of user that send the tweet and also time attribute thar show the send time of each tweet... 

how can i do this process in rapidminer

 

Best Answer

  • Options
    lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn
    Solution Accepted

    @ramzanzadeh72,

     

    We should sort the dataset by user_id and then, in deed, you're right, by created_at. For this operation, I used

    the Sort (advanced) operator from the Jackhammer extension (to install from the marketplace).

    Here the new process : 

    <?xml version="1.0" encoding="UTF-8"?><process version="8.2.000">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="8.2.000" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="read_csv" compatibility="8.2.000" expanded="true" height="68" name="Read CSV" width="90" x="45" y="34">
    <parameter key="csv_file" value="C:\Users\Lionel\Documents\Formations_DataScience\Rapidminer\Tests_Rapidminer\Tweets_Interval\data.csv"/>
    <parameter key="column_separators" value=","/>
    <parameter key="first_row_as_names" value="false"/>
    <list key="annotations">
    <parameter key="0" value="Name"/>
    </list>
    <parameter key="encoding" value="windows-1252"/>
    <list key="data_set_meta_data_information">
    <parameter key="0" value="created_at.true.polynominal.attribute"/>
    <parameter key="1" value="user_id.true.real.attribute"/>
    </list>
    </operator>
    <operator activated="true" class="nominal_to_date" compatibility="8.2.000" expanded="true" height="82" name="Nominal to Date" width="90" x="179" y="34">
    <parameter key="attribute_name" value="created_at"/>
    <parameter key="date_type" value="date_time"/>
    <parameter key="date_format" value="EEE MMM dd HH:mm:ss +0000 yyyy"/>
    </operator>
    <operator activated="true" class="rmx_toolkit:sort_advanced" compatibility="2.1.784" expanded="true" height="82" name="Sort (Advanced)" width="90" x="380" y="34">
    <parameter key="primary_sort_attribute" value="user_id"/>
    <list key="additional_sort_attributes">
    <parameter key="created_at" value="increasing"/>
    </list>
    </operator>
    <operator activated="true" class="series:lag_series" compatibility="7.4.000" expanded="true" height="82" name="Lag Series" width="90" x="514" y="34">
    <list key="attributes">
    <parameter key="created_at" value="1"/>
    </list>
    </operator>
    <operator activated="true" class="generate_attributes" compatibility="8.2.000" expanded="true" height="82" name="Generate Attributes (2)" width="90" x="648" y="34">
    <list key="function_descriptions">
    <parameter key="tweet_interval" value="date_diff([created_at-1],created_at)"/>
    </list>
    </operator>
    <connect from_op="Read CSV" from_port="output" to_op="Nominal to Date" to_port="example set input"/>
    <connect from_op="Nominal to Date" from_port="example set output" to_op="Sort (Advanced)" to_port="example set input"/>
    <connect from_op="Sort (Advanced)" from_port="example set output" to_op="Lag Series" to_port="example set input"/>
    <connect from_op="Lag Series" from_port="example set output" to_op="Generate Attributes (2)" to_port="example set input"/>
    <connect from_op="Generate Attributes (2)" from_port="example set output" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>

    You can note that the interval between tweets is in milliseconds. You can customize the formula

    in the last Generate Attributes operator to convert the interval in seconds, minutes, hours, days etc.

     

    Regards,

     

    Lionel

     

Answers

  • Options
    lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn

    Hi @ramzanzadeh72,

     

    Does this process answer to your need ?

     

    <?xml version="1.0" encoding="UTF-8"?><process version="8.2.000">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="8.2.000" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="social_media:search_twitter" compatibility="8.1.000" expanded="true" height="68" name="Search Twitter" width="90" x="112" y="34">
    <parameter key="connection" value="dkk"/>
    <parameter key="query" value="test"/>
    </operator>
    <operator activated="true" class="select_attributes" compatibility="8.2.000" expanded="true" height="82" name="Select Attributes" width="90" x="246" y="34">
    <parameter key="attribute_filter_type" value="subset"/>
    <parameter key="attributes" value="From-User-Id|Created-At"/>
    </operator>
    <operator activated="true" class="generate_attributes" compatibility="8.2.000" expanded="true" height="82" name="Generate Attributes (3)" width="90" x="380" y="34">
    <list key="function_descriptions">
    <parameter key="sent_at" value="[Created-At]"/>
    </list>
    </operator>
    <operator activated="true" class="series:lag_series" compatibility="7.4.000" expanded="true" height="82" name="Lag Series" width="90" x="514" y="34">
    <list key="attributes">
    <parameter key="sent_at" value="1"/>
    </list>
    </operator>
    <operator activated="true" class="generate_attributes" compatibility="8.2.000" expanded="true" height="82" name="Generate Attributes (2)" width="90" x="648" y="34">
    <list key="function_descriptions">
    <parameter key="tweet_interval" value="date_diff(sent_at,[sent_at-1])"/>
    </list>
    </operator>
    <operator activated="true" class="sort" compatibility="8.2.000" expanded="true" height="82" name="Sort" width="90" x="782" y="34">
    <parameter key="attribute_name" value="From-User-Id"/>
    <parameter key="sorting_direction" value="decreasing"/>
    </operator>
    <connect from_op="Search Twitter" from_port="output" to_op="Select Attributes" to_port="example set input"/>
    <connect from_op="Select Attributes" from_port="example set output" to_op="Generate Attributes (3)" to_port="example set input"/>
    <connect from_op="Generate Attributes (3)" from_port="example set output" to_op="Lag Series" to_port="example set input"/>
    <connect from_op="Lag Series" from_port="example set output" to_op="Generate Attributes (2)" to_port="example set input"/>
    <connect from_op="Generate Attributes (2)" from_port="example set output" to_op="Sort" to_port="example set input"/>
    <connect from_op="Sort" from_port="example set output" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>

    Regards,

     

    Lionel

  • Options
    ramzanzadeh72ramzanzadeh72 Member Posts: 14 Contributor I

    hi  @lionelderkrikor

    thanke you  for your reply and attention

    it work for single user but in my dataset i have a set of users that each user send a set of tweets... for calculation this interval for each user what should i do???

     

  • Options
    lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn

    Hi again @ramzanzadeh72,

     

    Could you share your dataset(s) and process to better understand your problem.

     

    Regards,

     

    Lionel

  • Options
    ramzanzadeh72ramzanzadeh72 Member Posts: 14 Contributor I

    @lionelderkrikor

    i share part of my dataset that user_id show id of user that send tweet and create_at show the time that tweet send by user... in this dataset we have 3 user and each user send multiple tweet that create_at show the send time of tweet.

    so we should first sort the tweets send by each user base on create_time and then calculate interval of sequential tweets of each user.

    data.csv 237.9K
  • Options
    ramzanzadeh72ramzanzadeh72 Member Posts: 14 Contributor I
    @lionelderkrikor
    Thanke you... thats right....
    But I have another question... how can I calculate entropy for these intervals for each user???
Sign In or Register to comment.