Get flickr Data
Hey all,
I am very new here and I need a proper advice, please. What approach would you recommend to get meta data of flickr photos? I need something like geo data and other data which is provided by flickr to use the common data-mining methods in rapidminer to analyse the data. My problem ist: I know how to use the data mining methods but I don't exactely know how to get the data. Of course there is a flickr API... but I really don't know where to start to think about... souId I start studying how to use a web crawler? Or should I start studying how to use the flickr API? I need an advice where to start to think about... I have got an example rapidminer process to use an API but I don't even understand exactely what it does and that makes my crazy. I want to understand the process...
Any help out there?
Best wishes
Marcel
Best Answers
-
Vaclav RapidMiner Certified Expert, Member Posts: 23 Maven
Hello,
you can use Get Page or Get Pages operator from webmining extension. Then you need ID of images and API key. If you have that, you can download EXIF in XML format using:
https://api.flickr.com/services/rest/?method=flickr.photos.getExif&api_key=4471ecc1512fb9ab0f48aa1e1d0eb9ee&photo_id=28367629061&format=rest
This example was taken from:
https://www.flickr.com/services/api/explore/flickr.photos.getExif
Best wishes,
Vaclav
2 -
Vaclav RapidMiner Certified Expert, Member Posts: 23 Maven
Hello Marcel,
try this process:
<?xml version="1.0" encoding="UTF-8"?><process version="7.2.001">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="7.2.001" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="text:create_document" compatibility="7.2.000" expanded="true" height="68" name="Create Document" width="90" x="45" y="34">
<parameter key="text" value="<?xml version="1.0" encoding="utf-8" ?> <rsp stat="ok"> <comments photo_id="19043683190"> <comment id="7309457-19043683190-72157655160157292" author="8866365@N08" realname="Sabien">Mooi beeld!</comment> <comment id="7309457-19043683190-72157655239336425" author="128586472@N07" realname="">Jolies lignes, belle réalisation !</comment> <comment id="7309457-19043683190-72157656031039508" author="34303829@N08" realname="">mooi gedaan Wouter</comment> </comments> </rsp>"/>
</operator>
<operator activated="true" class="text:cut_document" compatibility="7.2.000" expanded="true" height="68" name="Cut Document" width="90" x="179" y="34">
<parameter key="query_type" value="Regular Region"/>
<list key="string_machting_queries"/>
<list key="regular_expression_queries">
<parameter key="line" value="^(.*comment.*)"/>
</list>
<list key="regular_region_queries">
<parameter key="text" value="<comment .comment>"/>
</list>
<list key="xpath_queries"/>
<list key="namespaces"/>
<list key="index_queries"/>
<list key="jsonpath_queries"/>
<process expanded="true">
<operator activated="false" class="text:filter_tokens_by_content" compatibility="7.2.000" expanded="true" height="68" name="Filter Tokens (by Content)" width="90" x="179" y="34">
<parameter key="string" value="comment id"/>
<parameter key="regular_expression" value="commnet id.*"/>
</operator>
<connect from_port="segment" to_port="document 1"/>
<portSpacing port="source_segment" spacing="0"/>
<portSpacing port="sink_document 1" spacing="0"/>
<portSpacing port="sink_document 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="loop_collection" compatibility="7.2.001" expanded="true" height="82" name="Loop Collection" width="90" x="313" y="34">
<process expanded="true">
<operator activated="true" class="text:extract_information" compatibility="7.2.000" expanded="true" height="68" name="Extract Information" width="90" x="45" y="34">
<parameter key="query_type" value="Regular Expression"/>
<list key="string_machting_queries"/>
<list key="regular_expression_queries">
<parameter key="id" value="id=["']([^'"]+)"/>
<parameter key="author" value="author=["']([^'"]+)"/>
<parameter key="realname" value="realname=["']([^'"]+)"/>
<parameter key="comment" value=">(.+?)</comment>"/>
</list>
<list key="regular_region_queries"/>
<list key="xpath_queries">
<parameter key="id" value="/*/comment/comments/@id"/>
</list>
<list key="namespaces"/>
<list key="index_queries"/>
<list key="jsonpath_queries"/>
</operator>
<operator activated="true" class="text:documents_to_data" compatibility="7.2.000" expanded="true" height="82" name="Documents to Data (2)" width="90" x="179" y="34">
<parameter key="text_attribute" value="text"/>
</operator>
<connect from_port="single" to_op="Extract Information" to_port="document"/>
<connect from_op="Extract Information" from_port="document" to_op="Documents to Data (2)" to_port="documents 1"/>
<connect from_op="Documents to Data (2)" from_port="example set" to_port="output 1"/>
<portSpacing port="source_single" spacing="0"/>
<portSpacing port="sink_output 1" spacing="0"/>
<portSpacing port="sink_output 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="append" compatibility="7.2.001" expanded="true" height="82" name="Append" width="90" x="447" y="34"/>
<connect from_op="Create Document" from_port="output" to_op="Cut Document" to_port="document"/>
<connect from_op="Cut Document" from_port="documents" to_op="Loop Collection" to_port="collection"/>
<connect from_op="Loop Collection" from_port="output 1" to_op="Append" to_port="example set 1"/>
<connect from_op="Append" from_port="merged set" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>Best wishes,
Vaclav
0
Answers
that page should help:
https://www.flickr.com/services/api/
Dortmund, Germany
cool, thanks a lot!
so, I can use the methods there, cool! But how do I create an application in rapidminer using the methods? It must be possible to call the methods from flickr?
flickr.photos.getExif seems useful, I guess...
ok, this makes sense, thank you! I will try my very best!
soo, me again. I am one step further and I have access to some of the data on flickr. But I now want the XML data in a table. I know I will need xpath but in which operator do I apply xpath? In Read XML or Cut Document or...?... I am a little confused.
My Process and thinking to get a table so far is:
Get Page_JSON to XML_Write Document_Read XML_Data to Documents_
Process Documents (Cut Document (Remove Document Parts_Extract Information)
Am I on the right way?
Most likely Read XML
Best,
Martin
Dortmund, Germany
Since the xml is like the following
in "Read XML" I am now able to get e.g. "realname" and the Text e.g. "Mooi beeld". But I so far it is not very handy because I have to select each and every single tag in the "Import Configuration Wizard". Is there a function in RM to get an automatism that successively selects the relevant tags? Or is it even possible with only XPath?
Regards
Marcel
wow, thank you. That makes the data collection so much easier. I can use this for any similar cases!
You made my day!
best wishes
marcel
@Vaclav thank you for everything.
It almost works. I manually got flickr specific frob, auth_token and api_sig to get the right URL for "Get Pages" BUT in RapidMiner there is just this:
How can I make ssl queries in RapidMiner?
Best wishes
marcel