How to convert image data to structured data

Contributor II

How to convert image data to structured data

Hello all,

I am working on a project on image and text mining and I want to know how to convert the image data to structured data.
I already download the image process extension and I found some useful information in this website

The power of machine learning for image mining and analytics



New case study: Image mining and unstructured data science


Please can anyone help me to figure out what is inside the loop file operator. I need to know how did they convert the image data to structured data. I have spent more than 4  months working on my final project but I couldn't finish it because I'm stuck on that point.

Thanks for any help,
Elite II

Re: How to convert image data to structured data

hi...I have not used that image processing extension in a while and I don't think it's compatible with RM 7+ (it no longer appears in the marketplace).  However I would strongly recommend trying IBM Watson Bluemix APIs from within RapidMiner using the "Enrich Data by Webservice" operator to do your GET/POST requests.  There is a "Visual Recognition" API in Watson that is probably very good.  I will warn you that the Watson documentation, however, is not!  I have "Tone Analyzer" and "Language Translation" working in my RM and it is really quite amazing.

Good luck.

Scott Genzer
Certified RapidMiner Analyst
Genzer Consulting
Elite III

Re: How to convert image data to structured data

@sgenzer it is compatible with RM 7. 
If you want to play with it you can get it here: http://www.burgsys.com/

I'll try out the Watson API, for much of my work sending data to cloud services isn't something that can be done, but perhaps it solves the original poster's problem.
-- Training, Consulting, Sales in China, Hong Kong & Taiwan --
Contributor II

Re: How to convert image data to structured data

Thank you  @sgenzer and @JEdward

Yes, Image Processing extension is no longer appears in the marketplace, but it is compatible with RM 7.

Thanks a lot @sgenzer for your suggestions. I am interesting to try them, but I think they work very well with data at web, while I need to work with images from my computer.

Thanks @JEdward for this useful website.
Actually, I found the B-Designer extension, which includes all features that I need. But I couldn't get it until they send it to me. So, I contacted them and I am still waiting for their response.

I am still looking for how can I do OCR on images to get the text.

Thank you again,
Elite II

Re: How to convert image data to structured data

I have not had much need for OCR but again I would suggest using the RapidMiner "Enrich Data by Webservice" operator (under the Web Mining extension) to call an external API.  There are very good sources out there - a quick search found that Google has a free OCR API: https://cloud.google.com/vision/

Here is an example of a Enrich Data by Webservice operator that connects with the Google Maps API.  I have deleted my API key which you would need to replace with your own to see this working.  But you should get the idea.

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="7.1.000">
  <operator activated="true" class="process" compatibility="7.1.000" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="web:enrich_data_by_webservice" compatibility="7.0.000" expanded="true" height="68" name="Google Maps Distance Lookup" width="90" x="313" y="34">
        <parameter key="query_type" value="XPath"/>
        <list key="string_machting_queries"/>
        <list key="regular_expression_queries"/>
        <list key="regular_region_queries"/>
        <list key="xpath_queries">
          <parameter key="Distance" value="//distance/text/text()"/>
        <list key="namespaces"/>
        <parameter key="assume_html" value="false"/>
        <list key="index_queries"/>
        <list key="jsonpath_queries"/>
        <parameter key="service_method" value="fgfgfgf"/>
        <parameter key="body" value="text=&lt;%title%&gt;"/>
        <parameter key="url" value="https://maps.googleapis.com/maps/api/distancematrix/xml?units=imperial&amp;origins=&lt;%address_for_google1%&gt;&amp;destinations=&lt;%address_for_google2%&gt;"/>
        <parameter key="delay" value="150"/>
        <list key="request_properties">
          <parameter key="key" value="mykey"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>

Scott Genzer
Certified RapidMiner Analyst
Genzer Consulting