New: how can I filter Key words after getting Twitter user statues

zhao_huangzhao_huang Member Posts: 9 Contributor I
edited December 2018 in Help

Dear All, 

 

First of all, please forgive my interruption, I am a 100% new for Rapidmaner, I'm doing a Twitter content analysis for an urgent paper, after obtaining data through the "Get Twitter User statures"function, I hope to continue the "Data analysis". I want to collect the corresponding topic texts by setting some Key Words in this obtained data.

But I have been searched for a long time, do not know how to operate.

Because I need to collect content posted by specific Twitter users and then look for specific topics in this data for analysis,I tested "Search Twitter", yeah, I could use query for searching different posts, but I cound't setting the specific Twitter acount. Maybe you could give me some advice or solution.

 

I'm waiting for your suggestions, 

Thank you all, 

 

Best regards, 

 

Z. H

Answers

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    When you refer to keywords, do mean the hashtags each user tends to tweet out? Like #soda, #beer, etc? Or just in general?

     

    I think a lot of this is going to depend on how you tokenize each status. Did you see my video and process here? http://www.neuralmarkettrends.com/use-rapidminer-discover-twitter-content/

  • zhao_huangzhao_huang Member Posts: 9 Contributor I

    Dear Thomas, 

     

    Yes. I want to find out some special evenments in form of hastags that each user tends to tweet, but I think it will be the same way than what you mentioned, I will check your video as soon as possible ! 

    Thank you for sharing,

     

    Best, 

    Z.H

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    @zhao_huang I use the Specifiy Characters parameter in the Tokenize operator and set it to

     .!;:[,' ?]

    That helps me preserve hashtags when tokenizing. 

  • zhao_huangzhao_huang Member Posts: 9 Contributor I

    I read your post, and I'm trying to creat the process, but I think I do not really understand how do these things function, I'm sorry about my ignorance... Maybe I will have more questions to ask you in the furture...

  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager

    hello @zhao_huang welcome to the community! I'd recommend posting your XML process here (see https://youtu.be/KkgB5QXWXJ8 and "Read Before Posting" on right when you reply) and attach your dataset. This way we can replicate what you're doing and help you better.

     

    Scott

  • zhao_huangzhao_huang Member Posts: 9 Contributor I

    Dear Thomas, 

     

    Thank you for you reaction, I'm sorry about my late response, 'cause I was taking an international flight to Nairobi for my fieldwork research. 

    I watch your tutorial video, and I followed you to set Marcos in order to find my target tweets in three specific accounts. But, I don't know how can I run the research.

    If I use your XML, there's a question: I only focus on three twitter accounts, and I just need to find intersting posts in these three accounts, if so, how can I do that ? How can I focus on three tweeter accounts with these key words?  I change "search twitter for key word' to "get twitter user statue" ? Or I have to do both?  And for the period, do you have some advice?  Sorry about my thousands questions...

     I tried to do it, but not so sure: 

    I copied my XML in attachment: 

    <?xml version="1.0" encoding="UTF-8"?><process version="8.1.000">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="8.1.000" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="subprocess" compatibility="8.1.000" expanded="true" height="82" name="Retrieve Twitter Data" width="90" x="45" y="34">
    <process expanded="true">
    <operator activated="true" class="set_macros" compatibility="8.1.000" expanded="true" height="82" name="Set Macros" width="90" x="45" y="34">
    <list key="macros">
    <parameter key="keyword1" value="#OBOR"/>
    <parameter key="keyword2" value="#SilkRoad"/>
    <parameter key="keyword3" value="#OneBeltOneRoad"/>
    <parameter key="keyword4" value="#BeltandRoad"/>
    <parameter key="Keyword5" value="#OneBeltOneRoadInitiative"/>
    <parameter key="Period" value="2016.01.01 - 2018.02.28"/>
    </list>
    <description align="center" color="transparent" colored="false" width="126">Set global variables here. Such as keyword search.</description>
    </operator>
    <operator activated="false" class="retrieve" compatibility="8.1.000" expanded="true" height="68" name="Retrieve Twitter Content Ideas" width="90" x="179" y="544">
    <parameter key="repository_entry" value="../data/%{keyword1} Twitter Content Ideas"/>
    </operator>
    <operator activated="true" class="social_media:search_twitter" compatibility="8.0.010" expanded="true" height="68" name="Search Twitter for Keyword3" width="90" x="179" y="238">
    <parameter key="connection" value="Twitter - Studio Connection"/>
    <parameter key="query" value="OneBeltOneRoad"/>
    <parameter key="limit" value="30000"/>
    <parameter key="language" value="en"/>
    <parameter key="until" value="2018.02.28 19:15:48 +0100"/>
    </operator>
    <operator activated="true" class="social_media:search_twitter" compatibility="8.0.010" expanded="true" height="68" name="Search Twitter for Keyword2" width="90" x="179" y="136">
    <parameter key="connection" value="Twitter ZH"/>
    <parameter key="query" value="SilkRoad"/>
    <parameter key="limit" value="30000"/>
    <parameter key="language" value="en"/>
    <parameter key="until" value="2018.02.28 19:16:09 +0100"/>
    </operator>
    <operator activated="true" class="social_media:search_twitter" compatibility="8.0.010" expanded="true" height="68" name="Search Twitter for Keyword 1" width="90" x="179" y="34">
    <parameter key="connection" value="Twitter ZH"/>
    <parameter key="query" value="OBOR"/>
    <parameter key="limit" value="30000"/>
    <parameter key="language" value="en"/>
    <parameter key="until" value="2018.02.28 19:16:17 +0100"/>
    </operator>
    <operator activated="true" class="social_media:search_twitter" compatibility="8.1.000" expanded="true" height="68" name="Search Twitter for Key word 4" width="90" x="179" y="340">
    <parameter key="connection" value="Twitter ZH"/>
    <parameter key="query" value="BeltandRoad"/>
    <parameter key="limit" value="30000"/>
    <parameter key="language" value="En"/>
    <parameter key="until" value="2018.02.28 19:17:16 +0100"/>
    </operator>
    <operator activated="true" class="social_media:search_twitter" compatibility="8.1.000" expanded="true" height="68" name="Search Twitter for Key word 5" width="90" x="179" y="442">
    <parameter key="connection" value="Twitter ZH"/>
    <parameter key="query" value="OneBeltOneRoadInitiative"/>
    <parameter key="limit" value="30000"/>
    <parameter key="until" value="2018.02.28 19:18:25 +0100"/>
    </operator>
    <operator activated="true" class="append" compatibility="8.1.000" expanded="true" height="166" name="Append Data Set together" width="90" x="447" y="34"/>
    <operator activated="true" class="remove_duplicates" compatibility="8.1.000" expanded="true" height="103" name="Remove Duplicate IDs" width="90" x="581" y="34">
    <parameter key="attribute_filter_type" value="single"/>
    <parameter key="attribute" value="Id"/>
    <parameter key="include_special_attributes" value="true"/>
    </operator>
    <operator activated="true" class="store" compatibility="8.1.000" expanded="true" height="68" name="Store Data for later reuse" width="90" x="715" y="34">
    <parameter key="repository_entry" value="../data/%{keyword1} Twitter Content Ideas"/>
    </operator>
    <connect from_port="in 1" to_op="Set Macros" to_port="through 1"/>
    <connect from_op="Search Twitter for Keyword3" from_port="output" to_op="Append Data Set together" to_port="example set 3"/>
    <connect from_op="Search Twitter for Keyword2" from_port="output" to_op="Append Data Set together" to_port="example set 2"/>
    <connect from_op="Search Twitter for Keyword 1" from_port="output" to_op="Append Data Set together" to_port="example set 1"/>
    <connect from_op="Search Twitter for Key word 4" from_port="output" to_op="Append Data Set together" to_port="example set 4"/>
    <connect from_op="Search Twitter for Key word 5" from_port="output" to_op="Append Data Set together" to_port="example set 5"/>
    <connect from_op="Append Data Set together" from_port="merged set" to_op="Remove Duplicate IDs" to_port="example set input"/>
    <connect from_op="Remove Duplicate IDs" from_port="example set output" to_op="Store Data for later reuse" to_port="input"/>
    <connect from_op="Store Data for later reuse" from_port="through" to_port="out 1"/>
    <portSpacing port="source_in 1" spacing="0"/>
    <portSpacing port="source_in 2" spacing="0"/>
    <portSpacing port="sink_out 1" spacing="0"/>
    <portSpacing port="sink_out 2" spacing="0"/>
    </process>
    <description align="center" color="transparent" colored="false" width="126">Retrieves Twitter Data, Appends, and Stores</description>
    </operator>
    <operator activated="true" class="subprocess" compatibility="8.1.000" expanded="true" height="82" name="ETL Subprocess" width="90" x="179" y="34">
    <process expanded="true">
    <operator activated="true" class="remove_duplicates" compatibility="8.1.000" expanded="true" height="103" name="Remove Duplicates" width="90" x="45" y="34">
    <parameter key="attribute_filter_type" value="single"/>
    <parameter key="attribute" value="From-User"/>
    <description align="center" color="transparent" colored="false" width="126">Remove Duplicate Tweets from same user</description>
    </operator>
    <operator activated="true" class="generate_attributes" compatibility="8.1.000" expanded="true" height="82" name="Generate Arbitrary Label" width="90" x="179" y="34">
    <list key="function_descriptions">
    <parameter key="label" value="if([Retweet-Count]&lt;eval(%{retweetcount}),&quot;Not Important&quot;,&quot;Important&quot;)"/>
    </list>
    </operator>
    <operator activated="false" class="filter_examples" compatibility="8.1.000" expanded="true" height="103" name="Filter Examples" width="90" x="313" y="34">
    <parameter key="invert_filter" value="true"/>
    <list key="filters_list">
    <parameter key="filters_entry_key" value="Text.contains.RT"/>
    </list>
    </operator>
    <operator activated="true" class="set_role" compatibility="8.1.000" expanded="true" height="82" name="Set Role" width="90" x="447" y="34">
    <parameter key="attribute_name" value="label"/>
    <parameter key="target_role" value="label"/>
    <list key="set_additional_roles"/>
    <description align="center" color="transparent" colored="false" width="126">Set Role for Label</description>
    </operator>
    <operator activated="true" class="select_attributes" compatibility="8.1.000" expanded="true" height="82" name="Select Attributes" width="90" x="581" y="34">
    <parameter key="attribute_filter_type" value="subset"/>
    <parameter key="attributes" value="Text|label"/>
    <parameter key="include_special_attributes" value="true"/>
    </operator>
    <operator activated="true" class="nominal_to_text" compatibility="8.1.000" expanded="true" height="82" name="Nominal to Text" width="90" x="715" y="34">
    <parameter key="attribute_filter_type" value="single"/>
    <parameter key="attribute" value="Text"/>
    </operator>
    <operator activated="true" class="extract_macro" compatibility="8.1.000" expanded="true" height="68" name="Extract Macro (3)" width="90" x="849" y="34">
    <parameter key="macro" value="label_count"/>
    <parameter key="macro_type" value="statistics"/>
    <parameter key="statistics" value="count"/>
    <parameter key="attribute_name" value="label"/>
    <parameter key="attribute_value" value="Important"/>
    <list key="additional_macros"/>
    </operator>
    <connect from_port="in 1" to_op="Remove Duplicates" to_port="example set input"/>
    <connect from_op="Remove Duplicates" from_port="example set output" to_op="Generate Arbitrary Label" to_port="example set input"/>
    <connect from_op="Generate Arbitrary Label" from_port="example set output" to_op="Set Role" to_port="example set input"/>
    <connect from_op="Set Role" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
    <connect from_op="Select Attributes" from_port="example set output" to_op="Nominal to Text" to_port="example set input"/>
    <connect from_op="Nominal to Text" from_port="example set output" to_op="Extract Macro (3)" to_port="example set"/>
    <connect from_op="Extract Macro (3)" from_port="example set" to_port="out 1"/>
    <portSpacing port="source_in 1" spacing="0"/>
    <portSpacing port="source_in 2" spacing="0"/>
    <portSpacing port="sink_out 1" spacing="0"/>
    <portSpacing port="sink_out 2" spacing="0"/>
    </process>
    <description align="center" color="transparent" colored="false" width="126">Binning for Label subprocess - suspect</description>
    </operator>
    <operator activated="true" class="text:process_document_from_data" compatibility="8.1.000" expanded="true" height="82" name="Process Documents from Data" width="90" x="313" y="34">
    <parameter key="prune_method" value="percentual"/>
    <parameter key="prune_below_percent" value="5.0"/>
    <parameter key="prune_above_percent" value="50.0"/>
    <parameter key="prune_below_absolute" value="100"/>
    <parameter key="prune_above_absolute" value="500"/>
    <list key="specify_weights"/>
    <process expanded="true">
    <operator activated="true" class="text:extract_information" compatibility="8.1.000" expanded="true" height="68" name="Extract Links for later use" width="90" x="45" y="34">
    <parameter key="query_type" value="Regular Expression"/>
    <list key="string_machting_queries"/>
    <list key="regular_expression_queries">
    <parameter key="Tweet Links" value="http.*"/>
    </list>
    <list key="regular_region_queries"/>
    <list key="xpath_queries"/>
    <list key="namespaces"/>
    <list key="index_queries"/>
    <list key="jsonpath_queries"/>
    </operator>
    <operator activated="true" class="text:replace_tokens" compatibility="8.1.000" expanded="true" height="68" name="Replace http links" width="90" x="179" y="34">
    <list key="replace_dictionary">
    <parameter key="http.*" value="link"/>
    </list>
    </operator>
    <operator activated="true" class="text:tokenize" compatibility="8.1.000" expanded="true" height="68" name="Tokenize" width="90" x="313" y="34">
    <parameter key="mode" value="specify characters"/>
    <parameter key="characters" value=" .!;:[,' ?]"/>
    </operator>
    <operator activated="true" class="text:transform_cases" compatibility="8.1.000" expanded="true" height="68" name="Transform Cases" width="90" x="447" y="34"/>
    <operator activated="true" class="text:filter_by_length" compatibility="8.1.000" expanded="true" height="68" name="Filter Tokens (by Length)" width="90" x="581" y="34"/>
    <operator activated="true" class="text:generate_n_grams_terms" compatibility="8.1.000" expanded="true" height="68" name="Generate n-Grams (Terms)" width="90" x="715" y="34"/>
    <operator activated="true" class="text:filter_tokens_by_content" compatibility="8.1.000" expanded="true" height="68" name="Filter Tokens (by Content)" width="90" x="849" y="34">
    <parameter key="string" value="link"/>
    <parameter key="invert condition" value="true"/>
    </operator>
    <operator activated="true" class="text:filter_stopwords_english" compatibility="8.1.000" expanded="true" height="68" name="Filter Stopwords (English)" width="90" x="983" y="34"/>
    <connect from_port="document" to_op="Extract Links for later use" to_port="document"/>
    <connect from_op="Extract Links for later use" from_port="document" to_op="Replace http links" to_port="document"/>
    <connect from_op="Replace http links" from_port="document" to_op="Tokenize" to_port="document"/>
    <connect from_op="Tokenize" from_port="document" to_op="Transform Cases" to_port="document"/>
    <connect from_op="Transform Cases" from_port="document" to_op="Filter Tokens (by Length)" to_port="document"/>
    <connect from_op="Filter Tokens (by Length)" from_port="document" to_op="Generate n-Grams (Terms)" to_port="document"/>
    <connect from_op="Generate n-Grams (Terms)" from_port="document" to_op="Filter Tokens (by Content)" to_port="document"/>
    <connect from_op="Filter Tokens (by Content)" from_port="document" to_op="Filter Stopwords (English)" to_port="document"/>
    <connect from_op="Filter Stopwords (English)" from_port="document" to_port="document 1"/>
    <portSpacing port="source_document" spacing="0"/>
    <portSpacing port="sink_document 1" spacing="0"/>
    <portSpacing port="sink_document 2" spacing="0"/>
    </process>
    </operator>
    <operator activated="true" class="multiply" compatibility="8.1.000" expanded="true" height="103" name="Multiply" width="90" x="447" y="34"/>
    <operator activated="true" class="subprocess" compatibility="8.1.000" expanded="true" height="103" name="Clustering Stuff" width="90" x="581" y="34">
    <process expanded="true">
    <operator activated="true" class="select_attributes" compatibility="8.1.000" expanded="true" height="82" name="Remove Tweet Links" width="90" x="45" y="34">
    <parameter key="attribute_filter_type" value="single"/>
    <parameter key="attribute" value="Tweet Links"/>
    <parameter key="attributes" value="Tweet Links"/>
    <parameter key="invert_selection" value="true"/>
    </operator>
    <operator activated="true" class="x_means" compatibility="7.5.003" expanded="true" height="82" name="X-Means" width="90" x="179" y="34">
    <parameter key="measure_types" value="BregmanDivergences"/>
    <parameter key="divergence" value="SquaredEuclideanDistance"/>
    </operator>
    <operator activated="true" class="extract_prototypes" compatibility="8.1.000" expanded="true" height="82" name="Extract Cluster Prototypes" width="90" x="313" y="136"/>
    <operator activated="true" class="store" compatibility="8.1.000" expanded="true" height="68" name="Store Cluster Model" width="90" x="447" y="34">
    <parameter key="repository_entry" value="../results/%{keyword1} Twitter Content Cluster Model"/>
    </operator>
    <connect from_port="in 1" to_op="Remove Tweet Links" to_port="example set input"/>
    <connect from_op="Remove Tweet Links" from_port="example set output" to_op="X-Means" to_port="example set"/>
    <connect from_op="X-Means" from_port="cluster model" to_op="Extract Cluster Prototypes" to_port="model"/>
    <connect from_op="Extract Cluster Prototypes" from_port="example set" to_op="Store Cluster Model" to_port="input"/>
    <connect from_op="Extract Cluster Prototypes" from_port="model" to_port="out 2"/>
    <connect from_op="Store Cluster Model" from_port="through" to_port="out 1"/>
    <portSpacing port="source_in 1" spacing="0"/>
    <portSpacing port="source_in 2" spacing="0"/>
    <portSpacing port="sink_out 1" spacing="0"/>
    <portSpacing port="sink_out 2" spacing="0"/>
    <portSpacing port="sink_out 3" spacing="0"/>
    </process>
    </operator>
    <operator activated="true" class="store" compatibility="8.1.000" expanded="true" height="68" name="Store WordList" width="90" x="447" y="289">
    <parameter key="repository_entry" value="../results/%{keyword1} Twitter Content Ideas Wordlist"/>
    </operator>
    <operator activated="true" class="text:wordlist_to_data" compatibility="8.1.000" expanded="true" height="82" name="WordList to Data" width="90" x="581" y="289"/>
    <operator activated="true" class="sort" compatibility="8.1.000" expanded="true" height="82" name="Sort" width="90" x="715" y="289">
    <parameter key="attribute_name" value="total"/>
    <parameter key="sorting_direction" value="decreasing"/>
    </operator>
    <operator activated="true" class="select_attributes" compatibility="8.1.000" expanded="true" height="82" name="Remove Tweet Links (2)" width="90" x="581" y="136">
    <parameter key="attribute_filter_type" value="single"/>
    <parameter key="attribute" value="Tweet Links"/>
    <parameter key="attributes" value="Tweet Links"/>
    <parameter key="invert_selection" value="true"/>
    </operator>
    <operator activated="true" class="subprocess" compatibility="8.1.000" expanded="true" height="82" name="Determine Influence Factors" width="90" x="715" y="136">
    <process expanded="true">
    <operator activated="true" class="weight_by_correlation" compatibility="8.1.000" expanded="true" height="82" name="Weight by Correlation" width="90" x="45" y="34"/>
    <operator activated="true" class="weights_to_data" compatibility="8.1.000" expanded="true" height="68" name="Weights to Data" width="90" x="179" y="34"/>
    <operator activated="true" class="generate_attributes" compatibility="6.4.000" expanded="true" height="82" name="Generate Attributes (2)" width="90" x="313" y="34">
    <list key="function_descriptions">
    <parameter key="Method" value="&quot;Correlation&quot;"/>
    </list>
    </operator>
    <operator activated="true" class="weight_by_gini_index" compatibility="8.1.000" expanded="true" height="82" name="Weight by Gini Index" width="90" x="45" y="120"/>
    <operator activated="true" class="weight_by_information_gain" compatibility="8.1.000" expanded="true" height="82" name="Weight by Information Gain" width="90" x="45" y="210"/>
    <operator activated="true" class="weight_by_information_gain_ratio" compatibility="8.1.000" expanded="true" height="82" name="Weight by Information Gain Ratio" width="90" x="45" y="300"/>
    <operator activated="true" class="weights_to_data" compatibility="8.1.000" expanded="true" height="68" name="Weights to Data (2)" width="90" x="179" y="120"/>
    <operator activated="true" class="generate_attributes" compatibility="6.4.000" expanded="true" height="82" name="Generate Attributes (3)" width="90" x="313" y="120">
    <list key="function_descriptions">
    <parameter key="Method" value="&quot;Gini&quot;"/>
    </list>
    </operator>
    <operator activated="true" class="weights_to_data" compatibility="8.1.000" expanded="true" height="68" name="Weights to Data (3)" width="90" x="179" y="210"/>
    <operator activated="true" class="generate_attributes" compatibility="6.4.000" expanded="true" height="82" name="Generate Attributes (4)" width="90" x="313" y="210">
    <list key="function_descriptions">
    <parameter key="Method" value="&quot;InfoGain&quot;"/>
    </list>
    </operator>
    <operator activated="true" class="weights_to_data" compatibility="8.1.000" expanded="true" height="68" name="Weights to Data (4)" width="90" x="179" y="300"/>
    <operator activated="true" class="generate_attributes" compatibility="6.4.000" expanded="true" height="82" name="Generate Attributes (5)" width="90" x="313" y="300">
    <list key="function_descriptions">
    <parameter key="Method" value="&quot;InfoGainRatio&quot;"/>
    </list>
    </operator>
    <operator activated="true" class="append" compatibility="8.1.000" expanded="true" height="145" name="Append" width="90" x="447" y="30"/>
    <operator activated="true" class="pivot" compatibility="8.1.000" expanded="true" height="82" name="Pivot" width="90" x="581" y="30">
    <parameter key="group_attribute" value="Attribute"/>
    <parameter key="index_attribute" value="Method"/>
    </operator>
    <operator activated="true" class="generate_aggregation" compatibility="6.5.002" expanded="true" height="82" name="Generate Aggregation" width="90" x="715" y="30">
    <parameter key="attribute_name" value="Importance"/>
    <parameter key="attribute_filter_type" value="value_type"/>
    <parameter key="value_type" value="numeric"/>
    <parameter key="aggregation_function" value="average"/>
    </operator>
    <operator activated="true" class="normalize" compatibility="7.5.003" expanded="true" height="103" name="Normalize" width="90" x="849" y="30">
    <parameter key="attribute_filter_type" value="single"/>
    <parameter key="attribute" value="Importance"/>
    <parameter key="method" value="range transformation"/>
    </operator>
    <operator activated="true" class="sort" compatibility="8.1.000" expanded="true" height="82" name="Sort again" width="90" x="983" y="34">
    <parameter key="attribute_name" value="Importance"/>
    <parameter key="sorting_direction" value="decreasing"/>
    </operator>
    <operator activated="true" class="order_attributes" compatibility="8.1.000" expanded="true" height="82" name="Reorder Attributes" width="90" x="1117" y="34">
    <parameter key="attribute_ordering" value="Attribute|Importance"/>
    <parameter key="handle_unmatched" value="remove"/>
    </operator>
    <operator activated="true" class="filter_example_range" compatibility="8.1.000" expanded="true" height="82" name="Select Top 20" width="90" x="1251" y="34">
    <parameter key="first_example" value="1"/>
    <parameter key="last_example" value="20"/>
    </operator>
    <connect from_port="in 1" to_op="Weight by Correlation" to_port="example set"/>
    <connect from_op="Weight by Correlation" from_port="weights" to_op="Weights to Data" to_port="attribute weights"/>
    <connect from_op="Weight by Correlation" from_port="example set" to_op="Weight by Gini Index" to_port="example set"/>
    <connect from_op="Weights to Data" from_port="example set" to_op="Generate Attributes (2)" to_port="example set input"/>
    <connect from_op="Generate Attributes (2)" from_port="example set output" to_op="Append" to_port="example set 1"/>
    <connect from_op="Weight by Gini Index" from_port="weights" to_op="Weights to Data (2)" to_port="attribute weights"/>
    <connect from_op="Weight by Gini Index" from_port="example set" to_op="Weight by Information Gain" to_port="example set"/>
    <connect from_op="Weight by Information Gain" from_port="weights" to_op="Weights to Data (3)" to_port="attribute weights"/>
    <connect from_op="Weight by Information Gain" from_port="example set" to_op="Weight by Information Gain Ratio" to_port="example set"/>
    <connect from_op="Weight by Information Gain Ratio" from_port="weights" to_op="Weights to Data (4)" to_port="attribute weights"/>
    <connect from_op="Weights to Data (2)" from_port="example set" to_op="Generate Attributes (3)" to_port="example set input"/>
    <connect from_op="Generate Attributes (3)" from_port="example set output" to_op="Append" to_port="example set 2"/>
    <connect from_op="Weights to Data (3)" from_port="example set" to_op="Generate Attributes (4)" to_port="example set input"/>
    <connect from_op="Generate Attributes (4)" from_port="example set output" to_op="Append" to_port="example set 3"/>
    <connect from_op="Weights to Data (4)" from_port="example set" to_op="Generate Attributes (5)" to_port="example set input"/>
    <connect from_op="Generate Attributes (5)" from_port="example set output" to_op="Append" to_port="example set 4"/>
    <connect from_op="Append" from_port="merged set" to_op="Pivot" to_port="example set input"/>
    <connect from_op="Pivot" from_port="example set output" to_op="Generate Aggregation" to_port="example set input"/>
    <connect from_op="Generate Aggregation" from_port="example set output" to_op="Normalize" to_port="example set input"/>
    <connect from_op="Normalize" from_port="example set output" to_op="Sort again" to_port="example set input"/>
    <connect from_op="Sort again" from_port="example set output" to_op="Reorder Attributes" to_port="example set input"/>
    <connect from_op="Reorder Attributes" from_port="example set output" to_op="Select Top 20" to_port="example set input"/>
    <connect from_op="Select Top 20" from_port="example set output" to_port="out 1"/>
    <portSpacing port="source_in 1" spacing="0"/>
    <portSpacing port="source_in 2" spacing="0"/>
    <portSpacing port="sink_out 1" spacing="0"/>
    <portSpacing port="sink_out 2" spacing="0"/>
    </process>
    </operator>
    <operator activated="true" class="store" compatibility="8.1.000" expanded="true" height="68" name="Store Influence Wrds" width="90" x="849" y="136">
    <parameter key="repository_entry" value="../results/%{keyword1} Twitter Content Influence Words"/>
    </operator>
    <operator activated="true" class="write_excel" compatibility="8.1.000" expanded="true" height="82" name="Write Important Words" width="90" x="983" y="136">
    <parameter key="excel_file" value="C:\Users\Thomas Ott\Dropbox\Twitter Influencers\%{keyword1} Todays Powerful Words to use in your Tweets.xlsx"/>
    </operator>
    <connect from_op="Retrieve Twitter Data" from_port="out 1" to_op="ETL Subprocess" to_port="in 1"/>
    <connect from_op="ETL Subprocess" from_port="out 1" to_op="Process Documents from Data" to_port="example set"/>
    <connect from_op="Process Documents from Data" from_port="example set" to_op="Multiply" to_port="input"/>
    <connect from_op="Process Documents from Data" from_port="word list" to_op="Store WordList" to_port="input"/>
    <connect from_op="Multiply" from_port="output 1" to_op="Clustering Stuff" to_port="in 1"/>
    <connect from_op="Multiply" from_port="output 2" to_op="Remove Tweet Links (2)" to_port="example set input"/>
    <connect from_op="Clustering Stuff" from_port="out 1" to_port="result 1"/>
    <connect from_op="Clustering Stuff" from_port="out 2" to_port="result 2"/>
    <connect from_op="Store WordList" from_port="through" to_op="WordList to Data" to_port="word list"/>
    <connect from_op="WordList to Data" from_port="example set" to_op="Sort" to_port="example set input"/>
    <connect from_op="Sort" from_port="example set output" to_port="result 4"/>
    <connect from_op="Remove Tweet Links (2)" from_port="example set output" to_op="Determine Influence Factors" to_port="in 1"/>
    <connect from_op="Determine Influence Factors" from_port="out 1" to_op="Store Influence Wrds" to_port="input"/>
    <connect from_op="Store Influence Wrds" from_port="through" to_op="Write Important Words" to_port="input"/>
    <connect from_op="Write Important Words" from_port="through" to_port="result 3"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="63"/>
    <portSpacing port="sink_result 3" spacing="126"/>
    <portSpacing port="sink_result 4" spacing="84"/>
    <portSpacing port="sink_result 5" spacing="0"/>
    </process>
    </operator>
    </process>

    I look forward to hearing from you, 

     

    Best, 

     

    ZH

  • zhao_huangzhao_huang Member Posts: 9 Contributor I

    Dear All, 

    A bit desprate...

    I spend all nitht on my search with rapidminer on Twitter, but failed...

    Here's my need:I want to reach some tweets from a particular Twitter account and these tweets contain special phrase (such as sport OR Football OR Swimming OR Pingpang) during a particular period (between 01/01/2016-02/28/2018).

    So, I tried to use this way to get tweets, but, failed... Do you have some solutions ? THANK YOU VERY MUCH !!! You will save me...

    <?xml version="1.0" encoding="UTF-8"?><process version="8.1.000">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="8.1.000" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="social_media:get_twitter_user_statuses" compatibility="8.1.000" expanded="true" height="68" name="Get Twitter User Statuses" width="90" x="45" y="136">
    <parameter key="connection" value="Twitter zhao"/>
    <parameter key="user" value="ChinaEUMission"/>
    <parameter key="limit" value="5664"/>
    </operator>
    <operator activated="true" class="set_macros" compatibility="8.1.000" expanded="true" height="82" name="Set Macros" width="90" x="179" y="85">
    <list key="macros">
    <parameter key="Key Word1" value="#OBOR"/>
    <parameter key="Key Word2" value="#Onebeltoneroad"/>
    <parameter key="Key Word3" value="#beltandroad"/>
    <parameter key="Key Word4" value="#SilkRoad"/>
    <parameter key="Key Word5" value="#Onebeltandoneroad"/>
    <parameter key="Key Word6" value="#Onebeltandoneroadinitative"/>
    </list>
    </operator>
    <operator activated="true" class="social_media:search_twitter" compatibility="8.1.000" expanded="true" height="68" name="Search Twitter Key word 1" width="90" x="313" y="85">
    <parameter key="connection" value="Twitter zhao"/>
    <parameter key="query" value="OBOR"/>
    <parameter key="limit" value="10000"/>
    <parameter key="language" value="en"/>
    <parameter key="until" value="2018.03.01 22:04:36 +0300"/>
    </operator>
    <operator activated="true" class="social_media:search_twitter" compatibility="8.1.000" expanded="true" height="68" name="Search Twitter Key word 2" width="90" x="313" y="187">
    <parameter key="connection" value="Twitter zhao"/>
    <parameter key="query" value="Onebeltoneroad"/>
    <parameter key="limit" value="10000"/>
    <parameter key="language" value="en"/>
    <parameter key="until" value="2018.03.01 22:04:36 +0300"/>
    </operator>
    <operator activated="true" class="social_media:search_twitter" compatibility="8.1.000" expanded="true" height="68" name="Search Twitter Key word 3" width="90" x="313" y="289">
    <parameter key="connection" value="Twitter zhao"/>
    <parameter key="query" value="Beltandroad"/>
    <parameter key="limit" value="10000"/>
    <parameter key="language" value="en"/>
    <parameter key="until" value="2018.03.01 22:04:36 +0300"/>
    </operator>
    <operator activated="true" class="social_media:search_twitter" compatibility="8.1.000" expanded="true" height="68" name="Search Twitter Key word 4" width="90" x="313" y="391">
    <parameter key="connection" value="Twitter zhao"/>
    <parameter key="query" value="silkroad"/>
    <parameter key="limit" value="10000"/>
    <parameter key="language" value="en"/>
    <parameter key="until" value="2018.03.01 22:04:36 +0300"/>
    </operator>
    <operator activated="true" class="social_media:search_twitter" compatibility="8.1.000" expanded="true" height="68" name="Search Twitter Key word 5" width="90" x="313" y="493">
    <parameter key="connection" value="Twitter zhao"/>
    <parameter key="query" value="Onebeltandoneroad"/>
    <parameter key="limit" value="10000"/>
    <parameter key="language" value="en"/>
    <parameter key="until" value="2018.03.01 22:04:36 +0300"/>
    </operator>
    <operator activated="true" class="social_media:search_twitter" compatibility="8.1.000" expanded="true" height="68" name="Search Twitter Key word 6" width="90" x="313" y="595">
    <parameter key="connection" value="Twitter zhao"/>
    <parameter key="query" value="onebeltandoneroadinitiative"/>
    <parameter key="limit" value="10000"/>
    <parameter key="language" value="en"/>
    <parameter key="until" value="2018.03.01 22:04:36 +0300"/>
    </operator>
    <operator activated="true" class="append" compatibility="8.1.000" expanded="true" height="187" name="Append" width="90" x="514" y="85"/>
    <operator activated="true" class="write_excel" compatibility="8.1.000" expanded="true" height="82" name="Write Excel" width="90" x="648" y="136">
    <parameter key="excel_file" value="/Users/Alexandre/Desktop/Test Rapid miner tweeter.xlsx"/>
    </operator>
    <connect from_op="Get Twitter User Statuses" from_port="output" to_op="Set Macros" to_port="through 1"/>
    <connect from_op="Search Twitter Key word 1" from_port="output" to_op="Append" to_port="example set 1"/>
    <connect from_op="Search Twitter Key word 2" from_port="output" to_op="Append" to_port="example set 2"/>
    <connect from_op="Search Twitter Key word 3" from_port="output" to_op="Append" to_port="example set 3"/>
    <connect from_op="Search Twitter Key word 4" from_port="output" to_op="Append" to_port="example set 4"/>
    <connect from_op="Search Twitter Key word 5" from_port="output" to_op="Append" to_port="example set 5"/>
    <connect from_op="Search Twitter Key word 6" from_port="output" to_op="Append" to_port="example set 6"/>
    <connect from_op="Append" from_port="merged set" to_op="Write Excel" to_port="input"/>
    <connect from_op="Write Excel" from_port="through" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>

     

  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager

    hi @zhao_huang - I just looked at your process and honestly at quick glance it looks fine EXCEPT I am almost certain you're going to hit the API quota limit for Twitter. No question. See this page for Twitter REST API rate limits for free tier users...

     

    Scott

     

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    Well I'm glad I saw this @zhao_huang, if you want to get someone's attention in the forums, you should use the '@' symbol.

     

    That said, you are probably @sgenzer is right, you're probably rate blocked. If that's the case you have to create a whole new Twitter connection.

  • zhao_huangzhao_huang Member Posts: 9 Contributor I

    Dear @Thomas_Ott and @sgenzer , thank you all for the help. 

    Yes, I found that Twitter has a limitation for capturing data, I cound collecting all data from three diplomatic accounts that I'm focusing on. But I'm not able to collect the data that I need from three media accounts, every time, Rapidminer told me "error on connecting to API", or I receive a part of tweets. 

    Do you have any solution about that ? 

    I found another way to collect the data that I nned, here is my code: 

    <?xml version="1.0" encoding="UTF-8"?><process version="8.1.000">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="8.1.000" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="social_media:get_twitter_user_statuses" compatibility="8.1.000" expanded="true" height="68" name="Get Twitter User Statuses CGTN" width="90" x="45" y="85">
    <parameter key="connection" value="New"/>
    <parameter key="user" value="CGTNOfficial"/>
    <parameter key="limit" value="543000"/>
    </operator>
    <operator activated="true" class="social_media:get_twitter_user_statuses" compatibility="8.1.000" expanded="true" height="68" name="Get Twitter User Statuses Xinhua" width="90" x="45" y="187">
    <parameter key="connection" value="Twitter zh"/>
    <parameter key="user" value="xhnews"/>
    <parameter key="limit" value="660000"/>
    </operator>
    <operator activated="true" class="social_media:get_twitter_user_statuses" compatibility="8.1.000" expanded="true" height="68" name="Get Twitter User Statuses People's Daily" width="90" x="45" y="289">
    <parameter key="connection" value="New"/>
    <parameter key="user" value="PDChina"/>
    <parameter key="limit" value="556000"/>
    </operator>
    <operator activated="true" class="append" compatibility="8.1.000" expanded="true" height="124" name="Append" width="90" x="179" y="136"/>
    <operator activated="true" class="write_excel" compatibility="8.1.000" expanded="true" height="82" name="Write Excel" width="90" x="581" y="85">
    <parameter key="excel_file" value="/Users/Alexandre/OBOR_Mediacontents1 Timeline 2.xlsx"/>
    </operator>
    <connect from_op="Get Twitter User Statuses CGTN" from_port="output" to_op="Append" to_port="example set 1"/>
    <connect from_op="Get Twitter User Statuses Xinhua" from_port="output" to_op="Append" to_port="example set 2"/>
    <connect from_op="Get Twitter User Statuses People's Daily" from_port="output" to_op="Append" to_port="example set 3"/>
    <connect from_op="Append" from_port="merged set" to_op="Write Excel" to_port="input"/>
    <connect from_op="Write Excel" from_port="through" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>
  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    @zhao_huang you're trying to collect too many tweets OR the account doesn't have that many tweets available. It's throwing errors because of that.

     

    You will be limited as to how many tweets you can extract for free, that's just how Twitter does things. You got to pay for the entire tweet history.

     

    The work around is to start collecting tweets on a daily basis and append them into one big data file over time. 

  • zhao_huangzhao_huang Member Posts: 9 Contributor I

    @Thomas_Ott Yes, Thomas, I have already realized that. For these three accounts, they've got lots of tweets, that's why I cound't collect these data... Twitter wants to do some business...

     

     

  • zhao_huangzhao_huang Member Posts: 9 Contributor I

    Dear @Thomas_Ott

    I tried to use your XML for analyzing my case, but it doesn't work, I ran for testing, but there were some potentiel for "determining influence factor" and "sort", in the ETL process, "The retweet account is unknown".  So , I just want to know if you have another video for more details about these question? 

     

    <?xml version="1.0" encoding="UTF-8"?><process version="8.1.000">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="8.1.000" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="subprocess" compatibility="8.1.000" expanded="true" height="82" name="Collect data" width="90" x="45" y="34">
    <process expanded="true">
    <operator activated="true" class="social_media:get_twitter_user_statuses" compatibility="8.1.000" expanded="true" height="68" name="Get Twitter User Statuses China EU Mission" width="90" x="112" y="136">
    <parameter key="connection" value="Twitter zh"/>
    <parameter key="user" value="ChinaMissionGva"/>
    <parameter key="limit" value="10000"/>
    </operator>
    <operator activated="true" class="social_media:get_twitter_user_statuses" compatibility="8.1.000" expanded="true" height="68" name="Get Twitter User Statuses China Mission GVA" width="90" x="112" y="238">
    <parameter key="connection" value="Twitter zh"/>
    <parameter key="user" value="ChinaEUMission"/>
    <parameter key="limit" value="10000"/>
    </operator>
    <operator activated="true" class="social_media:get_twitter_user_statuses" compatibility="8.1.000" expanded="true" height="68" name="Get Twitter User Statuses China 2 UN" width="90" x="112" y="34">
    <parameter key="connection" value="Twitter zh"/>
    <parameter key="user" value="Chinamission2un"/>
    <parameter key="limit" value="10000"/>
    </operator>
    <operator activated="true" class="append" compatibility="8.1.000" expanded="true" height="124" name="Append" width="90" x="313" y="85"/>
    <operator activated="true" class="filter_examples" compatibility="8.1.000" expanded="true" height="103" name="Filter Examples" width="90" x="447" y="85">
    <list key="filters_list">
    <parameter key="filters_entry_key" value="Created-At.gt.01/01/2016 0:00:00 AM"/>
    <parameter key="filters_entry_key" value="Created-At.lt.03/01/2018 0:00:00 AM"/>
    </list>
    </operator>
    <operator activated="true" class="filter_examples" compatibility="8.1.000" expanded="true" height="103" name="Filter Examples KW" width="90" x="581" y="85">
    <list key="filters_list">
    <parameter key="filters_entry_key" value="Text.contains.OBOR"/>
    <parameter key="filters_entry_key" value="Text.contains.Onebeltoneroad"/>
    <parameter key="filters_entry_key" value="Text.contains.Beltandroad"/>
    <parameter key="filters_entry_key" value="Text.contains.Belt"/>
    <parameter key="filters_entry_key" value="Text.contains.Silkroad"/>
    <parameter key="filters_entry_key" value="Text.contains.Silk road"/>
    </list>
    <parameter key="filters_logic_and" value="false"/>
    </operator>
    <connect from_op="Get Twitter User Statuses China EU Mission" from_port="output" to_op="Append" to_port="example set 2"/>
    <connect from_op="Get Twitter User Statuses China Mission GVA" from_port="output" to_op="Append" to_port="example set 3"/>
    <connect from_op="Get Twitter User Statuses China 2 UN" from_port="output" to_op="Append" to_port="example set 1"/>
    <connect from_op="Append" from_port="merged set" to_op="Filter Examples" to_port="example set input"/>
    <connect from_op="Filter Examples" from_port="example set output" to_op="Filter Examples KW" to_port="example set input"/>
    <connect from_op="Filter Examples KW" from_port="example set output" to_port="out 1"/>
    <portSpacing port="source_in 1" spacing="0"/>
    <portSpacing port="sink_out 1" spacing="0"/>
    <portSpacing port="sink_out 2" spacing="0"/>
    </process>
    </operator>
    <operator activated="true" class="subprocess" compatibility="8.1.000" expanded="true" height="82" name="ETL Subprocess" width="90" x="179" y="34">
    <process expanded="true">
    <operator activated="true" class="remove_duplicates" compatibility="8.1.000" expanded="true" height="103" name="Remove Duplicates" width="90" x="45" y="34">
    <parameter key="attribute_filter_type" value="single"/>
    <parameter key="attribute" value="From-User"/>
    <description align="center" color="transparent" colored="false" width="126">Remove Duplicate Tweets from same user</description>
    </operator>
    <operator activated="true" class="generate_attributes" compatibility="8.1.000" expanded="true" height="82" name="Generate Arbitrary Label" width="90" x="179" y="34">
    <list key="function_descriptions">
    <parameter key="label" value="if([Retweet-Count]&lt;eval(%{retweetcount}),&quot;Not Important&quot;,&quot;Important&quot;)"/>
    </list>
    </operator>
    <operator activated="false" class="filter_examples" compatibility="8.1.000" expanded="true" height="103" name="Filter Examples (2)" width="90" x="313" y="34">
    <parameter key="invert_filter" value="true"/>
    <list key="filters_list">
    <parameter key="filters_entry_key" value="Text.contains.RT"/>
    </list>
    </operator>
    <operator activated="true" class="set_role" compatibility="8.1.000" expanded="true" height="82" name="Set Role" width="90" x="447" y="34">
    <parameter key="attribute_name" value="label"/>
    <parameter key="target_role" value="label"/>
    <list key="set_additional_roles"/>
    <description align="center" color="transparent" colored="false" width="126">Set Role for Label</description>
    </operator>
    <operator activated="true" class="select_attributes" compatibility="8.1.000" expanded="true" height="82" name="Select Attributes" width="90" x="581" y="34">
    <parameter key="attribute_filter_type" value="subset"/>
    <parameter key="attributes" value="Text|label"/>
    <parameter key="include_special_attributes" value="true"/>
    </operator>
    <operator activated="true" class="nominal_to_text" compatibility="8.1.000" expanded="true" height="82" name="Nominal to Text" width="90" x="715" y="34">
    <parameter key="attribute_filter_type" value="single"/>
    <parameter key="attribute" value="Text"/>
    </operator>
    <operator activated="true" class="extract_macro" compatibility="8.1.000" expanded="true" height="68" name="Extract Macro (3)" width="90" x="849" y="34">
    <parameter key="macro" value="label_count"/>
    <parameter key="macro_type" value="statistics"/>
    <parameter key="statistics" value="count"/>
    <parameter key="attribute_name" value="label"/>
    <parameter key="attribute_value" value="Important"/>
    <list key="additional_macros"/>
    </operator>
    <connect from_port="in 1" to_op="Remove Duplicates" to_port="example set input"/>
    <connect from_op="Remove Duplicates" from_port="example set output" to_op="Generate Arbitrary Label" to_port="example set input"/>
    <connect from_op="Generate Arbitrary Label" from_port="example set output" to_op="Set Role" to_port="example set input"/>
    <connect from_op="Set Role" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
    <connect from_op="Select Attributes" from_port="example set output" to_op="Nominal to Text" to_port="example set input"/>
    <connect from_op="Nominal to Text" from_port="example set output" to_op="Extract Macro (3)" to_port="example set"/>
    <connect from_op="Extract Macro (3)" from_port="example set" to_port="out 1"/>
    <portSpacing port="source_in 1" spacing="0"/>
    <portSpacing port="source_in 2" spacing="0"/>
    <portSpacing port="sink_out 1" spacing="0"/>
    <portSpacing port="sink_out 2" spacing="0"/>
    </process>
    <description align="center" color="transparent" colored="false" width="126">Binning for Label subprocess - suspect</description>
    </operator>
    <operator activated="true" class="text:process_document_from_data" compatibility="8.1.000" expanded="true" height="82" name="Process Documents from Data" width="90" x="313" y="34">
    <parameter key="prune_method" value="percentual"/>
    <parameter key="prune_below_percent" value="5.0"/>
    <parameter key="prune_above_percent" value="50.0"/>
    <parameter key="prune_below_absolute" value="100"/>
    <parameter key="prune_above_absolute" value="500"/>
    <list key="specify_weights"/>
    <process expanded="true">
    <operator activated="true" class="text:extract_information" compatibility="8.1.000" expanded="true" height="68" name="Extract Links for later use" width="90" x="45" y="34">
    <parameter key="query_type" value="Regular Expression"/>
    <list key="string_machting_queries"/>
    <list key="regular_expression_queries">
    <parameter key="Tweet Links" value="http.*"/>
    </list>
    <list key="regular_region_queries"/>
    <list key="xpath_queries"/>
    <list key="namespaces"/>
    <list key="index_queries"/>
    <list key="jsonpath_queries"/>
    </operator>
    <operator activated="true" class="text:replace_tokens" compatibility="8.1.000" expanded="true" height="68" name="Replace http links" width="90" x="179" y="34">
    <list key="replace_dictionary">
    <parameter key="http.*" value="link"/>
    </list>
    </operator>
    <operator activated="true" class="text:tokenize" compatibility="8.1.000" expanded="true" height="68" name="Tokenize" width="90" x="313" y="34">
    <parameter key="mode" value="specify characters"/>
    <parameter key="characters" value=" .!;:[,' ?]"/>
    </operator>
    <operator activated="true" class="text:transform_cases" compatibility="8.1.000" expanded="true" height="68" name="Transform Cases" width="90" x="447" y="34"/>
    <operator activated="true" class="text:filter_by_length" compatibility="8.1.000" expanded="true" height="68" name="Filter Tokens (by Length)" width="90" x="581" y="34"/>
    <operator activated="true" class="text:generate_n_grams_terms" compatibility="8.1.000" expanded="true" height="68" name="Generate n-Grams (Terms)" width="90" x="715" y="34"/>
    <operator activated="true" class="text:filter_tokens_by_content" compatibility="8.1.000" expanded="true" height="68" name="Filter Tokens (by Content)" width="90" x="849" y="34">
    <parameter key="string" value="link"/>
    <parameter key="invert condition" value="true"/>
    </operator>
    <operator activated="true" class="text:filter_stopwords_english" compatibility="8.1.000" expanded="true" height="68" name="Filter Stopwords (English)" width="90" x="983" y="34"/>
    <connect from_port="document" to_op="Extract Links for later use" to_port="document"/>
    <connect from_op="Extract Links for later use" from_port="document" to_op="Replace http links" to_port="document"/>
    <connect from_op="Replace http links" from_port="document" to_op="Tokenize" to_port="document"/>
    <connect from_op="Tokenize" from_port="document" to_op="Transform Cases" to_port="document"/>
    <connect from_op="Transform Cases" from_port="document" to_op="Filter Tokens (by Length)" to_port="document"/>
    <connect from_op="Filter Tokens (by Length)" from_port="document" to_op="Generate n-Grams (Terms)" to_port="document"/>
    <connect from_op="Generate n-Grams (Terms)" from_port="document" to_op="Filter Tokens (by Content)" to_port="document"/>
    <connect from_op="Filter Tokens (by Content)" from_port="document" to_op="Filter Stopwords (English)" to_port="document"/>
    <connect from_op="Filter Stopwords (English)" from_port="document" to_port="document 1"/>
    <portSpacing port="source_document" spacing="0"/>
    <portSpacing port="sink_document 1" spacing="0"/>
    <portSpacing port="sink_document 2" spacing="0"/>
    </process>
    </operator>
    <operator activated="true" class="multiply" compatibility="8.1.000" expanded="true" height="103" name="Multiply" width="90" x="447" y="34"/>
    <operator activated="true" class="subprocess" compatibility="8.1.000" expanded="true" height="103" name="Clustering Stuff" width="90" x="581" y="34">
    <process expanded="true">
    <operator activated="true" class="select_attributes" compatibility="8.1.000" expanded="true" height="82" name="Remove Tweet Links" width="90" x="45" y="34">
    <parameter key="attribute_filter_type" value="single"/>
    <parameter key="attribute" value="Tweet Links"/>
    <parameter key="attributes" value="Tweet Links"/>
    <parameter key="invert_selection" value="true"/>
    </operator>
    <operator activated="true" class="x_means" compatibility="7.5.003" expanded="true" height="82" name="X-Means" width="90" x="179" y="34">
    <parameter key="measure_types" value="BregmanDivergences"/>
    <parameter key="divergence" value="SquaredEuclideanDistance"/>
    </operator>
    <operator activated="true" class="extract_prototypes" compatibility="8.1.000" expanded="true" height="82" name="Extract Cluster Prototypes" width="90" x="313" y="136"/>
    <operator activated="true" class="store" compatibility="8.1.000" expanded="true" height="68" name="Store Cluster Model" width="90" x="447" y="34">
    <parameter key="repository_entry" value="../results/%{keyword1} Twitter Content Cluster Model"/>
    </operator>
    <connect from_port="in 1" to_op="Remove Tweet Links" to_port="example set input"/>
    <connect from_op="Remove Tweet Links" from_port="example set output" to_op="X-Means" to_port="example set"/>
    <connect from_op="X-Means" from_port="cluster model" to_op="Extract Cluster Prototypes" to_port="model"/>
    <connect from_op="Extract Cluster Prototypes" from_port="example set" to_op="Store Cluster Model" to_port="input"/>
    <connect from_op="Extract Cluster Prototypes" from_port="model" to_port="out 2"/>
    <connect from_op="Store Cluster Model" from_port="through" to_port="out 1"/>
    <portSpacing port="source_in 1" spacing="0"/>
    <portSpacing port="source_in 2" spacing="0"/>
    <portSpacing port="sink_out 1" spacing="0"/>
    <portSpacing port="sink_out 2" spacing="0"/>
    <portSpacing port="sink_out 3" spacing="0"/>
    </process>
    </operator>
    <operator activated="true" class="store" compatibility="8.1.000" expanded="true" height="68" name="Store WordList" width="90" x="447" y="289">
    <parameter key="repository_entry" value="../results/%{keyword1} Twitter Content Ideas Wordlist"/>
    </operator>
    <operator activated="true" class="text:wordlist_to_data" compatibility="8.1.000" expanded="true" height="82" name="WordList to Data" width="90" x="581" y="289"/>
    <operator activated="true" class="sort" compatibility="8.1.000" expanded="true" height="82" name="Sort" width="90" x="715" y="289">
    <parameter key="attribute_name" value="Text"/>
    </operator>
    <operator activated="true" class="select_attributes" compatibility="8.1.000" expanded="true" height="82" name="Remove Tweet Links (2)" width="90" x="581" y="136">
    <parameter key="attribute_filter_type" value="single"/>
    <parameter key="attribute" value="Tweet Links"/>
    <parameter key="attributes" value="Tweet Links"/>
    <parameter key="invert_selection" value="true"/>
    </operator>
    <operator activated="true" class="subprocess" compatibility="8.1.000" expanded="true" height="82" name="Determine Influence Factors" width="90" x="715" y="136">
    <process expanded="true">
    <operator activated="true" class="weight_by_correlation" compatibility="8.1.000" expanded="true" height="82" name="Weight by Correlation" width="90" x="45" y="34"/>
    <operator activated="true" class="weights_to_data" compatibility="8.1.000" expanded="true" height="68" name="Weights to Data" width="90" x="179" y="34"/>
    <operator activated="true" class="generate_attributes" compatibility="6.4.000" expanded="true" height="82" name="Generate Attributes (2)" width="90" x="313" y="34">
    <list key="function_descriptions">
    <parameter key="Correlation" value="&quot;RT&quot;"/>
    </list>
    </operator>
    <operator activated="true" class="weight_by_gini_index" compatibility="8.1.000" expanded="true" height="82" name="Weight by Gini Index" width="90" x="45" y="120"/>
    <operator activated="true" class="weight_by_information_gain" compatibility="8.1.000" expanded="true" height="82" name="Weight by Information Gain" width="90" x="45" y="210"/>
    <operator activated="true" class="weight_by_information_gain_ratio" compatibility="8.1.000" expanded="true" height="82" name="Weight by Information Gain Ratio" width="90" x="45" y="300"/>
    <operator activated="true" class="weights_to_data" compatibility="8.1.000" expanded="true" height="68" name="Weights to Data (2)" width="90" x="179" y="120"/>
    <operator activated="true" class="generate_attributes" compatibility="6.4.000" expanded="true" height="82" name="Generate Attributes (3)" width="90" x="313" y="120">
    <list key="function_descriptions">
    <parameter key="Gini Index" value="&quot;Xhnews&quot;"/>
    </list>
    </operator>
    <operator activated="true" class="weights_to_data" compatibility="8.1.000" expanded="true" height="68" name="Weights to Data (3)" width="90" x="179" y="210"/>
    <operator activated="true" class="generate_attributes" compatibility="6.4.000" expanded="true" height="82" name="Generate Attributes (4)" width="90" x="313" y="210">
    <list key="function_descriptions">
    <parameter key="Gain" value="&quot;CGTNOfficial&quot;"/>
    </list>
    </operator>
    <operator activated="true" class="weights_to_data" compatibility="8.1.000" expanded="true" height="68" name="Weights to Data (4)" width="90" x="179" y="300"/>
    <operator activated="true" class="generate_attributes" compatibility="6.4.000" expanded="true" height="82" name="Generate Attributes (5)" width="90" x="313" y="300">
    <list key="function_descriptions">
    <parameter key="Gani ratio" value="&quot;PDChina&quot;"/>
    </list>
    </operator>
    <operator activated="true" class="append" compatibility="8.1.000" expanded="true" height="68" name="Append (2)" width="90" x="447" y="30"/>
    <operator activated="true" class="pivot" compatibility="8.1.000" expanded="true" height="82" name="Pivot" width="90" x="581" y="30">
    <parameter key="group_attribute" value="Attribute"/>
    <parameter key="index_attribute" value="Text"/>
    </operator>
    <operator activated="true" class="generate_aggregation" compatibility="6.5.002" expanded="true" height="82" name="Generate Aggregation" width="90" x="715" y="30">
    <parameter key="attribute_name" value="Importance"/>
    <parameter key="attribute_filter_type" value="value_type"/>
    <parameter key="value_type" value="numeric"/>
    <parameter key="aggregation_function" value="average"/>
    </operator>
    <operator activated="true" class="normalize" compatibility="7.5.003" expanded="true" height="103" name="Normalize" width="90" x="849" y="30">
    <parameter key="attribute_filter_type" value="single"/>
    <parameter key="attribute" value="Importance"/>
    <parameter key="method" value="range transformation"/>
    </operator>
    <operator activated="true" class="sort" compatibility="8.1.000" expanded="true" height="82" name="Sort again" width="90" x="983" y="34">
    <parameter key="attribute_name" value="Importance"/>
    <parameter key="sorting_direction" value="decreasing"/>
    </operator>
    <operator activated="true" class="order_attributes" compatibility="8.1.000" expanded="true" height="82" name="Reorder Attributes" width="90" x="1117" y="34">
    <parameter key="attribute_ordering" value="Attribute|Importance"/>
    <parameter key="handle_unmatched" value="remove"/>
    </operator>
    <connect from_port="in 1" to_op="Weight by Correlation" to_port="example set"/>
    <connect from_op="Weight by Correlation" from_port="weights" to_op="Weights to Data" to_port="attribute weights"/>
    <connect from_op="Weight by Correlation" from_port="example set" to_op="Weight by Gini Index" to_port="example set"/>
    <connect from_op="Weights to Data" from_port="example set" to_op="Generate Attributes (2)" to_port="example set input"/>
    <connect from_op="Generate Attributes (2)" from_port="example set output" to_op="Pivot" to_port="example set input"/>
    <connect from_op="Weight by Gini Index" from_port="weights" to_op="Weights to Data (2)" to_port="attribute weights"/>
    <connect from_op="Weight by Gini Index" from_port="example set" to_op="Weight by Information Gain" to_port="example set"/>
    <connect from_op="Weight by Information Gain" from_port="weights" to_op="Weights to Data (3)" to_port="attribute weights"/>
    <connect from_op="Weight by Information Gain" from_port="example set" to_op="Weight by Information Gain Ratio" to_port="example set"/>
    <connect from_op="Weight by Information Gain Ratio" from_port="weights" to_op="Weights to Data (4)" to_port="attribute weights"/>
    <connect from_op="Weights to Data (2)" from_port="example set" to_op="Generate Attributes (3)" to_port="example set input"/>
    <connect from_op="Weights to Data (3)" from_port="example set" to_op="Generate Attributes (4)" to_port="example set input"/>
    <connect from_op="Weights to Data (4)" from_port="example set" to_op="Generate Attributes (5)" to_port="example set input"/>
    <connect from_op="Pivot" from_port="example set output" to_op="Generate Aggregation" to_port="example set input"/>
    <connect from_op="Generate Aggregation" from_port="example set output" to_op="Normalize" to_port="example set input"/>
    <connect from_op="Normalize" from_port="example set output" to_op="Sort again" to_port="example set input"/>
    <connect from_op="Sort again" from_port="example set output" to_op="Reorder Attributes" to_port="example set input"/>
    <connect from_op="Reorder Attributes" from_port="example set output" to_port="out 1"/>
    <portSpacing port="source_in 1" spacing="0"/>
    <portSpacing port="source_in 2" spacing="0"/>
    <portSpacing port="sink_out 1" spacing="0"/>
    <portSpacing port="sink_out 2" spacing="0"/>
    </process>
    </operator>
    <operator activated="true" class="store" compatibility="8.1.000" expanded="true" height="68" name="Store Influence Wrds" width="90" x="849" y="136">
    <parameter key="repository_entry" value="../results/%{keyword1} Twitter Content Influence Words"/>
    </operator>
    <operator activated="true" class="write_excel" compatibility="8.1.000" expanded="true" height="82" name="Write Important Words" width="90" x="983" y="136">
    <parameter key="excel_file" value="C:\Users\Thomas Ott\Dropbox\Twitter Influencers\%{keyword1} Todays Powerful Words to use in your Tweets.xlsx"/>
    </operator>
    <connect from_op="Collect data" from_port="out 1" to_op="ETL Subprocess" to_port="in 1"/>
    <connect from_op="ETL Subprocess" from_port="out 1" to_op="Process Documents from Data" to_port="example set"/>
    <connect from_op="Process Documents from Data" from_port="example set" to_op="Multiply" to_port="input"/>
    <connect from_op="Process Documents from Data" from_port="word list" to_op="Store WordList" to_port="input"/>
    <connect from_op="Multiply" from_port="output 1" to_op="Clustering Stuff" to_port="in 1"/>
    <connect from_op="Multiply" from_port="output 2" to_op="Remove Tweet Links (2)" to_port="example set input"/>
    <connect from_op="Clustering Stuff" from_port="out 1" to_port="result 1"/>
    <connect from_op="Clustering Stuff" from_port="out 2" to_port="result 2"/>
    <connect from_op="Store WordList" from_port="through" to_op="WordList to Data" to_port="word list"/>
    <connect from_op="WordList to Data" from_port="example set" to_op="Sort" to_port="example set input"/>
    <connect from_op="Sort" from_port="example set output" to_port="result 4"/>
    <connect from_op="Remove Tweet Links (2)" from_port="example set output" to_op="Determine Influence Factors" to_port="in 1"/>
    <connect from_op="Determine Influence Factors" from_port="out 1" to_op="Store Influence Wrds" to_port="input"/>
    <connect from_op="Store Influence Wrds" from_port="through" to_op="Write Important Words" to_port="input"/>
    <connect from_op="Write Important Words" from_port="through" to_port="result 3"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="63"/>
    <portSpacing port="sink_result 3" spacing="126"/>
    <portSpacing port="sink_result 4" spacing="84"/>
    <portSpacing port="sink_result 5" spacing="0"/>
    </process>
    </operator>
    </process>

    And I'd like also to know how may I analyze which Twitter account has been most frequently reposted by the account that I obseved by filtering the keywords such as "RT" or "@the name of account". 

     

    Thank you  ! 

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    @zhao_huang That's because you just can't attach your subprocess to the process I created and expect it to work. So you're going to have to 1) understand how the data flows through my process and 2) modify it to make it work with what you want to do. 

  • zhao_huangzhao_huang Member Posts: 9 Contributor I

    @Thomas_Ott  I'm sorry about so many questions of mine... 

    But when I re-run your XML, il seems have the same problems that I've met in "dertermine influence factors" and "sort"...

     

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    @zhao_huang that's not a problem, the metadata didn't propogate all the way through. It should run fine. 

Sign In or Register to comment.