NMIST for Hebrew Characters

sgenzer · July 2019

Starting a new thread as I did not want to hijack @mansour_ebrahim's thread

@jacobcybulski oh I'm sorry that I forgot that there was already a NMIST example here on the forum. Maybe I should search before asking LOL

Actually the reason for this is both personal and professional

Of course professionally an example on the repo of this classic problem is going to be helpful for others. But on the personal side I am in search of a RM solution of a strange project I'm working on - basically OCR on old Jewish tombstones from the 19th century. These stones are from a cemetery in Poland that was destroyed by the Nazis in 1939. They took the stones and used them to pave a road down to a river for tanks and trucks. By some amazing good luck, the stones survived by slipping into the river itself long after the war and have been recently pulled out of the river by some very kind and hardworking Poles in the town. One or more of these stones are likely ancestors of mine - and of course many other people as well. So it is both a fascinating data science problem, and a very worth cause.

So back to data science, my thinking is that if I can train a model to identify the letters of the Hebrew alphabet in the same methodology as NMIST, I can build a basic OCR engine to help people read the stones. Here are a few examples:

Image: https://us.v-cdn.net/6030995/uploads/editor/ls/5acworogbluw.jpg

Image: https://us.v-cdn.net/6030995/uploads/editor/gb/q228dak57sea.jpg

There are hundreds of these. If I can train an OCR engine to transcribe the letters, it is fairly trivial to push the text to Google Translate and get them translated into English.

Thoughts?

Scott

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

NMIST for Hebrew Characters