Options

Get flickr Data

MBMMBM Member Posts: 23 Contributor I
edited November 2018 in Help

Hey all, 

 

I am very new here and I need a proper advice, please. What approach would you recommend to get meta data of flickr photos? I need something like geo data and other data which is provided by flickr to use the common data-mining methods in rapidminer to analyse the data. My problem ist: I know how to use the data mining methods but I don't exactely know how to get the data. Of course there is a flickr API... but I really don't know where to start to think about... souId I start studying how to use a web crawler? Or should I start studying how to use the flickr API? I need an advice where to start to think about... I have got an example rapidminer process to use an API but I don't even understand exactely what it does and that makes my crazy. I want to understand the process...

 

Any help out there?

 

Best wishes 

Marcel

Best Answers

  • Options
    VaclavVaclav RapidMiner Certified Expert, Member Posts: 23 Maven
    Solution Accepted

    Hello,

    you can use Get Page or Get Pages operator from webmining extension. Then you need ID of images and API key. If you have that, you can download EXIF in XML format using:

    https://api.flickr.com/services/rest/?method=flickr.photos.getExif&api_key=4471ecc1512fb9ab0f48aa1e1d0eb9ee&photo_id=28367629061&format=rest

     

    This example was taken from:

    https://www.flickr.com/services/api/explore/flickr.photos.getExif

     

    Best wishes,

    Vaclav

  • Options
    VaclavVaclav RapidMiner Certified Expert, Member Posts: 23 Maven
    Solution Accepted

    Hello Marcel,

    try this process:

     

    <?xml version="1.0" encoding="UTF-8"?><process version="7.2.001">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="7.2.001" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="text:create_document" compatibility="7.2.000" expanded="true" height="68" name="Create Document" width="90" x="45" y="34">
    <parameter key="text" value="&lt;?xml version=&quot;1.0&quot; encoding=&quot;utf-8&quot; ?&gt;&#10;&lt;rsp stat=&quot;ok&quot;&gt;&#10;&lt;comments photo_id=&quot;19043683190&quot;&gt;&#10;&lt;comment id=&quot;7309457-19043683190-72157655160157292&quot; author=&quot;8866365@N08&quot; realname=&quot;Sabien&quot;&gt;Mooi beeld!&lt;/comment&gt;&#10;&lt;comment id=&quot;7309457-19043683190-72157655239336425&quot; author=&quot;128586472@N07&quot; realname=&quot;&quot;&gt;Jolies lignes, belle réalisation !&lt;/comment&gt;&#10;&lt;comment id=&quot;7309457-19043683190-72157656031039508&quot; author=&quot;34303829@N08&quot; realname=&quot;&quot;&gt;mooi gedaan Wouter&lt;/comment&gt;&#10;&lt;/comments&gt;&#10;&lt;/rsp&gt;"/>
    </operator>
    <operator activated="true" class="text:cut_document" compatibility="7.2.000" expanded="true" height="68" name="Cut Document" width="90" x="179" y="34">
    <parameter key="query_type" value="Regular Region"/>
    <list key="string_machting_queries"/>
    <list key="regular_expression_queries">
    <parameter key="line" value="^(.*comment.*)"/>
    </list>
    <list key="regular_region_queries">
    <parameter key="text" value="&lt;comment .comment&gt;"/>
    </list>
    <list key="xpath_queries"/>
    <list key="namespaces"/>
    <list key="index_queries"/>
    <list key="jsonpath_queries"/>
    <process expanded="true">
    <operator activated="false" class="text:filter_tokens_by_content" compatibility="7.2.000" expanded="true" height="68" name="Filter Tokens (by Content)" width="90" x="179" y="34">
    <parameter key="string" value="comment id"/>
    <parameter key="regular_expression" value="commnet id.*"/>
    </operator>
    <connect from_port="segment" to_port="document 1"/>
    <portSpacing port="source_segment" spacing="0"/>
    <portSpacing port="sink_document 1" spacing="0"/>
    <portSpacing port="sink_document 2" spacing="0"/>
    </process>
    </operator>
    <operator activated="true" class="loop_collection" compatibility="7.2.001" expanded="true" height="82" name="Loop Collection" width="90" x="313" y="34">
    <process expanded="true">
    <operator activated="true" class="text:extract_information" compatibility="7.2.000" expanded="true" height="68" name="Extract Information" width="90" x="45" y="34">
    <parameter key="query_type" value="Regular Expression"/>
    <list key="string_machting_queries"/>
    <list key="regular_expression_queries">
    <parameter key="id" value="id=[&quot;']([^'&quot;]+)"/>
    <parameter key="author" value="author=[&quot;']([^'&quot;]+)"/>
    <parameter key="realname" value="realname=[&quot;']([^'&quot;]+)"/>
    <parameter key="comment" value="&gt;(.+?)&lt;/comment&gt;"/>
    </list>
    <list key="regular_region_queries"/>
    <list key="xpath_queries">
    <parameter key="id" value="/*/comment/comments/@id"/&gt;
    </list>
    <list key="namespaces"/>
    <list key="index_queries"/>
    <list key="jsonpath_queries"/>
    </operator>
    <operator activated="true" class="text:documents_to_data" compatibility="7.2.000" expanded="true" height="82" name="Documents to Data (2)" width="90" x="179" y="34">
    <parameter key="text_attribute" value="text"/>
    </operator>
    <connect from_port="single" to_op="Extract Information" to_port="document"/>
    <connect from_op="Extract Information" from_port="document" to_op="Documents to Data (2)" to_port="documents 1"/>
    <connect from_op="Documents to Data (2)" from_port="example set" to_port="output 1"/>
    <portSpacing port="source_single" spacing="0"/>
    <portSpacing port="sink_output 1" spacing="0"/>
    <portSpacing port="sink_output 2" spacing="0"/>
    </process>
    </operator>
    <operator activated="true" class="append" compatibility="7.2.001" expanded="true" height="82" name="Append" width="90" x="447" y="34"/>
    <connect from_op="Create Document" from_port="output" to_op="Cut Document" to_port="document"/>
    <connect from_op="Cut Document" from_port="documents" to_op="Loop Collection" to_port="collection"/>
    <connect from_op="Loop Collection" from_port="output 1" to_op="Append" to_port="example set 1"/>
    <connect from_op="Append" from_port="merged set" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>

     

    Best wishes,

    Vaclav 

Answers

  • Options
    MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,517 RM Data Scientist

    that page should help:

    https://www.flickr.com/services/api/

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • Options
    MBMMBM Member Posts: 23 Contributor I

    cool, thanks a lot! 

     

    so, I can use the methods there, cool! But how do I create an application in rapidminer using the methods? It must be possible to call the methods from flickr?

     

    flickr.photos.getExif seems useful, I guess...

    Retrieves a list of EXIF/TIFF/GPS tags for a given photo. The calling user must have permission to view the photo.
  • Options
    MBMMBM Member Posts: 23 Contributor I

    ok, this makes sense, thank you! I will try my very best!

  • Options
    MBMMBM Member Posts: 23 Contributor I

    soo, me again. I am one step further and I have access to some of the data on flickr. But I now want the XML data in a table. I know I will need xpath but in which operator do I apply xpath? In Read XML or Cut Document or...?... I am a little confused.

    My Process and thinking to get a table so far is: 

    Get Page_JSON to XML_Write Document_Read XML_Data to Documents_

    Process Documents (Cut Document (Remove Document Parts_Extract Information) 

     

    Am I on the right way?

     

     

  • Options
    MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,517 RM Data Scientist

    Most likely Read XML

     

    Best,

    Martin

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • Options
    MBMMBM Member Posts: 23 Contributor I

    Since the xml is like the following

    <?xml version="1.0" encoding="utf-8" ?>
    <rsp stat="ok">
    <comments photo_id="19043683190">
    <comment id="7309457-19043683190-72157655160157292" author="8866365@N08" realname="Sabien">Mooi beeld!</comment>
    <comment id="7309457-19043683190-72157655239336425" author="128586472@N07" realname="">Jolies lignes, belle réalisation !</comment>
    <comment id="7309457-19043683190-72157656031039508" author="34303829@N08"  realname="">mooi gedaan Wouter</comment>
    </comments>
    </rsp>

    in "Read XML" I am now able to get e.g. "realname" and the Text e.g. "Mooi beeld". But I so far it is not very handy because I have to select each and every single tag in the "Import Configuration Wizard". Is there a function in RM to get an automatism that successively selects the relevant tags? Or is it even possible with only XPath?

     

    Regards

    Marcel

  • Options
    MBMMBM Member Posts: 23 Contributor I

    wow, thank you. That makes the data collection so much easier. I can use this for any similar cases!

    You made my day!

     

    best wishes

    marcel

  • Options
    MBMMBM Member Posts: 23 Contributor I

    @Vaclav thank you for everything. 

     

    It almost works. I manually got flickr specific frob, auth_token and api_sig to get the right URL for "Get Pages" BUT in RapidMiner there is just this:

     

    <?xml version="1.0" encoding="utf-8" ?>
    <rsp stat="fail">
    <err code="95" msg="SSL is required" />
    </rsp>

    How can I make ssl queries in RapidMiner?

     

    Best wishes

     

    marcel

     

     

Sign In or Register to comment.