RapidMiner

Highlighted
Newbie alsaqer002
Newbie

How to convert image data to structured data

Hello all,

I am working on a project on image and text mining and I want to know how to convert the image data to structured data.
I already download the image process extension and I found some useful information in this website


The power of machine learning for image mining and analytics

http://www.simafore.com/blog/the-power-of-machine-learning-for-image-mining-and-analytics?success=tr...


and


New case study: Image mining and unstructured data science

http://www.simafore.com/blog/new-case-study-image-mining-and-unstructured-data-science?success=true


Please can anyone help me to figure out what is inside the loop file operator. I need to know how did they convert the image data to structured data. I have spent more than 4  months working on my final project but I couldn't finish it because I'm stuck on that point.

Thanks for any help,
4 REPLIES
Community Manager Community Manager
Community Manager

Re: How to convert image data to structured data

hi...I have not used that image processing extension in a while and I don't think it's compatible with RM 7+ (it no longer appears in the marketplace).  However I would strongly recommend trying IBM Watson Bluemix APIs from within RapidMiner using the "Enrich Data by Webservice" operator to do your GET/POST requests.  There is a "Visual Recognition" API in Watson that is probably very good.  I will warn you that the Watson documentation, however, is not!  I have "Tone Analyzer" and "Language Translation" working in my RM and it is really quite amazing.

Good luck.

Scott
Scott Genzer
Senior Community Manager
RapidMiner, Inc.
RM Certified Expert
RM Certified Expert

Re: How to convert image data to structured data

@sgenzer it is compatible with RM 7. 
If you want to play with it you can get it here: http://www.burgsys.com/

I'll try out the Watson API, for much of my work sending data to cloud services isn't something that can be done, but perhaps it solves the original poster's problem.
-- Training, Consulting, Sales in China, Hong Kong & Taiwan --
www.RapidMinerChina.com
Newbie alsaqer002
Newbie

Re: How to convert image data to structured data


Thank you  @sgenzer and @JEdward

Yes, Image Processing extension is no longer appears in the marketplace, but it is compatible with RM 7.

Thanks a lot @sgenzer for your suggestions. I am interesting to try them, but I think they work very well with data at web, while I need to work with images from my computer.

Thanks @JEdward for this useful website.
Actually, I found the B-Designer extension, which includes all features that I need. But I couldn't get it until they send it to me. So, I contacted them and I am still waiting for their response.

I am still looking for how can I do OCR on images to get the text.

Thank you again,
Community Manager Community Manager
Community Manager

Re: How to convert image data to structured data

I have not had much need for OCR but again I would suggest using the RapidMiner "Enrich Data by Webservice" operator (under the Web Mining extension) to call an external API.  There are very good sources out there - a quick search found that Google has a free OCR API: https://cloud.google.com/vision/

Here is an example of a Enrich Data by Webservice operator that connects with the Google Maps API.  I have deleted my API key which you would need to replace with your own to see this working.  But you should get the idea.


<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="7.1.000">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="7.1.000" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="web:enrich_data_by_webservice" compatibility="7.0.000" expanded="true" height="68" name="Google Maps Distance Lookup" width="90" x="313" y="34">
        <parameter key="query_type" value="XPath"/>
        <list key="string_machting_queries"/>
        <list key="regular_expression_queries"/>
        <list key="regular_region_queries"/>
        <list key="xpath_queries">
          <parameter key="Distance" value="//distance/text/text()"/>
        </list>
        <list key="namespaces"/>
        <parameter key="assume_html" value="false"/>
        <list key="index_queries"/>
        <list key="jsonpath_queries"/>
        <parameter key="service_method" value="fgfgfgf"/>
        <parameter key="body" value="text=&lt;%title%&gt;"/>
        <parameter key="url" value="https://maps.googleapis.com/maps/api/distancematrix/xml?units=imperial&amp;origins=&lt;%address_for_google1%&gt;&amp;destinations=&lt;%address_for_google2%&gt;"/>
        <parameter key="delay" value="150"/>
        <list key="request_properties">
          <parameter key="key" value="mykey"/>
        </list>
      </operator>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
    </process>
  </operator>
</process>


Scott
Scott Genzer
Senior Community Manager
RapidMiner, Inc.
Twitter Feed