How to convert image data to structured data

alsaqer002alsaqer002 Member Posts: 5 Contributor II
edited November 2018 in Help
Hello all,

I am working on a project on image and text mining and I want to know how to convert the image data to structured data.
I already download the image process extension and I found some useful information in this website


The power of machine learning for image mining and analytics

http://www.simafore.com/blog/the-power-of-machine-learning-for-image-mining-and-analytics?success=true


and


New case study: Image mining and unstructured data science

http://www.simafore.com/blog/new-case-study-image-mining-and-unstructured-data-science?success=true


Please can anyone help me to figure out what is inside the loop file operator. I need to know how did they convert the image data to structured data. I have spent more than 4  months working on my final project but I couldn't finish it because I'm stuck on that point.

Thanks for any help,

Answers

  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager
    hi...I have not used that image processing extension in a while and I don't think it's compatible with RM 7+ (it no longer appears in the marketplace).  However I would strongly recommend trying IBM Watson Bluemix APIs from within RapidMiner using the "Enrich Data by Webservice" operator to do your GET/POST requests.  There is a "Visual Recognition" API in Watson that is probably very good.  I will warn you that the Watson documentation, however, is not!  I have "Tone Analyzer" and "Language Translation" working in my RM and it is really quite amazing.

    Good luck.

    Scott
  • JEdwardJEdward RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 578 Unicorn
    @sgenzer it is compatible with RM 7. 
    If you want to play with it you can get it here: http://www.burgsys.com/

    I'll try out the Watson API, for much of my work sending data to cloud services isn't something that can be done, but perhaps it solves the original poster's problem.
  • alsaqer002alsaqer002 Member Posts: 5 Contributor II

    Thank you  @sgenzer and @JEdward

    Yes, Image Processing extension is no longer appears in the marketplace, but it is compatible with RM 7.

    Thanks a lot @sgenzer for your suggestions. I am interesting to try them, but I think they work very well with data at web, while I need to work with images from my computer.

    Thanks @JEdward for this useful website.
    Actually, I found the B-Designer extension, which includes all features that I need. But I couldn't get it until they send it to me. So, I contacted them and I am still waiting for their response.

    I am still looking for how can I do OCR on images to get the text.

    Thank you again,
  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager
    I have not had much need for OCR but again I would suggest using the RapidMiner "Enrich Data by Webservice" operator (under the Web Mining extension) to call an external API.  There are very good sources out there - a quick search found that Google has a free OCR API: https://cloud.google.com/vision/

    Here is an example of a Enrich Data by Webservice operator that connects with the Google Maps API.  I have deleted my API key which you would need to replace with your own to see this working.  But you should get the idea.

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="7.1.000">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="7.1.000" expanded="true" name="Process">
        <process expanded="true">
          <operator activated="true" class="web:enrich_data_by_webservice" compatibility="7.0.000" expanded="true" height="68" name="Google Maps Distance Lookup" width="90" x="313" y="34">
            <parameter key="query_type" value="XPath"/>
            <list key="string_machting_queries"/>
            <list key="regular_expression_queries"/>
            <list key="regular_region_queries"/>
            <list key="xpath_queries">
              <parameter key="Distance" value="//distance/text/text()"/>
            </list>
            <list key="namespaces"/>
            <parameter key="assume_html" value="false"/>
            <list key="index_queries"/>
            <list key="jsonpath_queries"/>
            <parameter key="service_method" value="fgfgfgf"/>
            <parameter key="body" value="text=&lt;%title%&gt;"/>
            <parameter key="url" value=";"/>
            <parameter key="delay" value="150"/>
            <list key="request_properties">
              <parameter key="key" value="mykey"/>
            </list>
          </operator>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
        </process>
      </operator>
    </process>
    Scott
Sign In or Register to comment.