Get flickr Data

MBM · July 2016

Hey all,

I am very new here and I need a proper advice, please. What approach would you recommend to get meta data of flickr photos? I need something like geo data and other data which is provided by flickr to use the common data-mining methods in rapidminer to analyse the data. My problem ist: I know how to use the data mining methods but I don't exactely know how to get the data. Of course there is a flickr API... but I really don't know where to start to think about... souId I start studying how to use a web crawler? Or should I start studying how to use the flickr API? I need an advice where to start to think about... I have got an example rapidminer process to use an API but I don't even understand exactely what it does and that makes my crazy. I want to understand the process...

Any help out there?

Best wishes

Marcel

Vaclav · July 2016

Hello,

you can use Get Page or Get Pages operator from webmining extension. Then you need ID of images and API key. If you have that, you can download EXIF in XML format using:

https://api.flickr.com/services/rest/?method=flickr.photos.getExif&api_key=4471ecc1512fb9ab0f48aa1e1d0eb9ee&photo_id=28367629061&format=rest

This example was taken from:

https://www.flickr.com/services/api/explore/flickr.photos.getExif

Best wishes,

Vaclav

Vaclav · August 2016

Hello Marcel,

try this process:

<?xml version="1.0" encoding="UTF-8"?><process version="7.2.001">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="7.2.001" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="text:create_document" compatibility="7.2.000" expanded="true" height="68" name="Create Document" width="90" x="45" y="34">
        <parameter key="text" value="&lt;?xml version=&quot;1.0&quot; encoding=&quot;utf-8&quot; ?&gt;&#10;&lt;rsp stat=&quot;ok&quot;&gt;&#10;&lt;comments photo_id=&quot;19043683190&quot;&gt;&#10;&lt;comment id=&quot;7309457-19043683190-72157655160157292&quot; author=&quot;8866365@N08&quot; realname=&quot;Sabien&quot;&gt;Mooi beeld!&lt;/comment&gt;&#10;&lt;comment id=&quot;7309457-19043683190-72157655239336425&quot; author=&quot;128586472@N07&quot; realname=&quot;&quot;&gt;Jolies lignes, belle réalisation !&lt;/comment&gt;&#10;&lt;comment id=&quot;7309457-19043683190-72157656031039508&quot; author=&quot;34303829@N08&quot;  realname=&quot;&quot;&gt;mooi gedaan Wouter&lt;/comment&gt;&#10;&lt;/comments&gt;&#10;&lt;/rsp&gt;"/>
      </operator>
      <operator activated="true" class="text:cut_document" compatibility="7.2.000" expanded="true" height="68" name="Cut Document" width="90" x="179" y="34">
        <parameter key="query_type" value="Regular Region"/>
        <list key="string_machting_queries"/>
        <list key="regular_expression_queries">
          <parameter key="line" value="^(.*comment.*)"/>
        </list>
        <list key="regular_region_queries">
          <parameter key="text" value="&lt;comment .comment&gt;"/>
        </list>
        <list key="xpath_queries"/>
        <list key="namespaces"/>
        <list key="index_queries"/>
        <list key="jsonpath_queries"/>
        <process expanded="true">
          <operator activated="false" class="text:filter_tokens_by_content" compatibility="7.2.000" expanded="true" height="68" name="Filter Tokens (by Content)" width="90" x="179" y="34">
            <parameter key="string" value="comment id"/>
            <parameter key="regular_expression" value="commnet id.*"/>
          </operator>
          <connect from_port="segment" to_port="document 1"/>
          <portSpacing port="source_segment" spacing="0"/>
          <portSpacing port="sink_document 1" spacing="0"/>
          <portSpacing port="sink_document 2" spacing="0"/>
        </process>
      </operator>
      <operator activated="true" class="loop_collection" compatibility="7.2.001" expanded="true" height="82" name="Loop Collection" width="90" x="313" y="34">
        <process expanded="true">
          <operator activated="true" class="text:extract_information" compatibility="7.2.000" expanded="true" height="68" name="Extract Information" width="90" x="45" y="34">
            <parameter key="query_type" value="Regular Expression"/>
            <list key="string_machting_queries"/>
            <list key="regular_expression_queries">
              <parameter key="id" value="id=[&quot;']([^'&quot;]+)"/>
              <parameter key="author" value="author=[&quot;']([^'&quot;]+)"/>
              <parameter key="realname" value="realname=[&quot;']([^'&quot;]+)"/>
              <parameter key="comment" value="&gt;(.+?)&lt;/comment&gt;"/>
            </list>
            <list key="regular_region_queries"/>
            <list key="xpath_queries">
              <parameter key="id" value="/*/comment/comments/@id"/&gt;
            </list>
            <list key="namespaces"/>
            <list key="index_queries"/>
            <list key="jsonpath_queries"/>
          </operator>
          <operator activated="true" class="text:documents_to_data" compatibility="7.2.000" expanded="true" height="82" name="Documents to Data (2)" width="90" x="179" y="34">
            <parameter key="text_attribute" value="text"/>
          </operator>
          <connect from_port="single" to_op="Extract Information" to_port="document"/>
          <connect from_op="Extract Information" from_port="document" to_op="Documents to Data (2)" to_port="documents 1"/>
          <connect from_op="Documents to Data (2)" from_port="example set" to_port="output 1"/>
          <portSpacing port="source_single" spacing="0"/>
          <portSpacing port="sink_output 1" spacing="0"/>
          <portSpacing port="sink_output 2" spacing="0"/>
        </process>
      </operator>
      <operator activated="true" class="append" compatibility="7.2.001" expanded="true" height="82" name="Append" width="90" x="447" y="34"/>
      <connect from_op="Create Document" from_port="output" to_op="Cut Document" to_port="document"/>
      <connect from_op="Cut Document" from_port="documents" to_op="Loop Collection" to_port="collection"/>
      <connect from_op="Loop Collection" from_port="output 1" to_op="Append" to_port="example set 1"/>
      <connect from_op="Append" from_port="merged set" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>

Best wishes,

Vaclav

MartinLiebig · July 2016

that page should help:

https://www.flickr.com/services/api/

MBM · July 2016

cool, thanks a lot!

so, I can use the methods there, cool! But how do I create an application in rapidminer using the methods? It must be possible to call the methods from flickr?

flickr.photos.getExif seems useful, I guess...

Retrieves a list of EXIF/TIFF/GPS tags for a given photo. The calling user must have permission to view the photo.

MBM · July 2016

ok, this makes sense, thank you! I will try my very best!

MBM · August 2016

soo, me again. I am one step further and I have access to some of the data on flickr. But I now want the XML data in a table. I know I will need xpath but in which operator do I apply xpath? In Read XML or Cut Document or...?... I am a little confused.

My Process and thinking to get a table so far is:

Get Page_JSON to XML_Write Document_Read XML_Data to Documents_

Process Documents (Cut Document (Remove Document Parts_Extract Information)

Am I on the right way?

MartinLiebig · August 2016

Most likely Read XML

Best,

Martin

MBM · August 2016

Since the xml is like the following

<?xml version="1.0" encoding="utf-8" ?>
<rsp stat="ok">
<comments photo_id="19043683190">
<comment id="7309457-19043683190-72157655160157292" author="8866365@N08" realname="Sabien">Mooi beeld!</comment>
<comment id="7309457-19043683190-72157655239336425" author="128586472@N07" realname="">Jolies lignes, belle réalisation !</comment>
<comment id="7309457-19043683190-72157656031039508" author="34303829@N08"  realname="">mooi gedaan Wouter</comment>
</comments>
</rsp>

in "Read XML" I am now able to get e.g. "realname" and the Text e.g. "Mooi beeld". But I so far it is not very handy because I have to select each and every single tag in the "Import Configuration Wizard". Is there a function in RM to get an automatism that successively selects the relevant tags? Or is it even possible with only XPath?

Regards

Marcel

MBM · August 2016

wow, thank you. That makes the data collection so much easier. I can use this for any similar cases!

You made my day!

best wishes

marcel

MBM · August 2016

@Vaclav thank you for everything.

It almost works. I manually got flickr specific frob, auth_token and api_sig to get the right URL for "Get Pages" BUT in RapidMiner there is just this:

<?xml version="1.0" encoding="utf-8" ?>
<rsp stat="fail">
	<err code="95" msg="SSL is required" />
</rsp>

How can I make ssl queries in RapidMiner?

Best wishes

marcel

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Get flickr Data

Best Answers

Answers