Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.

"Concatenate all text from Twitter feed"

robinrobin Member Posts: 100 Guru
edited June 2019 in Help

I am collecting all of the text from a Twitter feed using the Twitter operator. I am storing this data into a MySQL database. I extract this information from the MySQL table to process through a sentiment analysis engine, as part of the process I need to include the all of the examples into an API format before submitting. I use the Generate Attribute operator to create the API secret and key required, however when I join and then concatinate the date into a single file (Select Attribute used to drop the attributes I do not need) I only have the first tweet that was pulled from the DB included in the API submission file, the rest of the tweets are missing.

 

What am I doing incorrectly? I have tried turnig the data into documents and combing, I have tried creating collections and flattening them. I have really tried everything but I am just unable to insert the roughly 1,5k tweets that I have pulled down into the file format. 

 


curl -X POST --header 'Content-Type: application/json' --header 'Accept: application/hal+json' --header 'X-API-SECRET-KEY:XXXXXXXxXXXxXXXXXX' --header 'X-API-XXXXXXXXXXXXX' -d '{
"name": "account_name",
"gender": 0,
"content": {
"content_handle": "account_handle",
"content_source": 1,2
"content_date": "10/04/2018 22:00:46 PM SAST",
"language_content": "need to insert the free text examples from twitter in here."
},
"person_handle": "key_account_manager"

 

In the code above I need to insert the Twitter Text into the "language_content" field, but can only ever insert a single line of Twitter data. 

 

@RitualGym Hey guys, the gym here in Illovo was supposed to be opening in April, is it still happening? I’m keen to get stared. 💪🏼 @RitualGym Hey guys, the gym here in Illovo was supposed to be opening in April, is it still happening? I’m keen to get stared. 💪🏼
@OneDayOnlycoza House of Chards 🥦 https://t.co/ljZQD93t3i @OneDayOnlycoza House of Chards 🥦 https://t.co/ljZQD93t3i
@ThatDarnKitteh @Nick_Frost If this isn’t your handle by the end of the day I’m unfollowing 😂 @ThatDarnKitteh @Nick_Frost If this isn’t your handle by the end of the day I’m unfollowing 😂
@Nick_Frost Have you seen what happens when people try combine their names for their kids? 😣😖🤢🤮 @Nick_Frost Have you seen what happens when people try combine their names for their kids? 😣😖🤢🤮
This is the best thing on the internet right now. 😂 https://t.co/TchhxFUGqT This is the best thing on the internet right now. 😂 https://t.co/TchhxFUGqT
@OneDayOnlycoza UK size 7. 😉 https://t.co/DAIaK0V5oF @OneDayOnlycoza UK size 7. 😉 https://t.co/DAIaK0V5oF

Above is the text that I need to insert into the file. This text comes from a MySQL db, so has /r/n characters seperating the fileds (which I believe is causing the issue)

 

At my wits end, please help me see the wood for the trees. 

Best Answer

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn
    Solution Accepted

    @robin I'm running out the door ATM, but this almost gets you there. You'll need to loop over the combined document to add in the header stuff.

     

    <?xml version="1.0" encoding="UTF-8"?><process version="8.1.001">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="8.1.001" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="retrieve" compatibility="8.1.001" expanded="true" height="68" name="Retrieve RobinJSON" width="90" x="112" y="34">
    <parameter key="repository_entry" value="//Community Answers/data/RobinJSON"/>
    </operator>
    <operator activated="true" class="extract_macro" compatibility="8.1.001" expanded="true" height="68" name="Extract Macro" width="90" x="246" y="34">
    <parameter key="macro" value="Num"/>
    <list key="additional_macros"/>
    </operator>
    <operator activated="true" class="concurrency:loop" compatibility="8.1.001" expanded="true" height="82" name="Loop" width="90" x="380" y="34">
    <parameter key="number_of_iterations" value="%{Num}"/>
    <parameter key="enable_parallel_execution" value="false"/>
    <process expanded="true">
    <operator activated="true" class="extract_macro" compatibility="8.1.001" expanded="true" height="68" name="Extract Macro (2)" width="90" x="112" y="34">
    <parameter key="macro" value="insert"/>
    <parameter key="macro_type" value="data_value"/>
    <parameter key="attribute_name" value="Test"/>
    <parameter key="example_index" value="%{iteration}"/>
    <list key="additional_macros"/>
    </operator>
    <operator activated="true" class="text:create_document" compatibility="8.1.000" expanded="true" height="68" name="Create Document" width="90" x="246" y="34">
    <parameter key="text" value="&quot;language_content&quot;: &quot;%{insert}&quot; &#10; }, &#10;"/>
    </operator>
    <operator activated="false" class="text:documents_to_data" compatibility="8.1.000" expanded="true" height="68" name="Documents to Data" width="90" x="313" y="187">
    <parameter key="text_attribute" value="yumyum"/>
    </operator>
    <connect from_port="input 1" to_op="Extract Macro (2)" to_port="example set"/>
    <connect from_op="Create Document" from_port="output" to_port="output 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="source_input 2" spacing="0"/>
    <portSpacing port="sink_output 1" spacing="0"/>
    <portSpacing port="sink_output 2" spacing="0"/>
    </process>
    </operator>
    <operator activated="true" class="text:combine_documents" compatibility="8.1.000" expanded="true" height="82" name="Combine Documents" width="90" x="514" y="34"/>
    <connect from_op="Retrieve RobinJSON" from_port="output" to_op="Extract Macro" to_port="example set"/>
    <connect from_op="Extract Macro" from_port="example set" to_op="Loop" to_port="input 1"/>
    <connect from_op="Loop" from_port="output 1" to_op="Combine Documents" to_port="documents 1"/>
    <connect from_op="Combine Documents" from_port="document" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>

Answers

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    @robin do you have a Loop operator iterate over this?

  • robinrobin Member Posts: 100 Guru

    Yes I did. I copletely ignored the macro I set up at the begining of the process to perform the loop and started doing some crazy things in the end. 

     

    Note to self a macro can be used inside Generate Document.

     

    Thank you Thomas

Sign In or Register to comment.