"text-processing: extract dates from documents"
I've got a question regarding the extraction of dates from documents and would be very happy for help...
My problem is as follows: I want to crawl and process webcontent for subsequent classification. Besides other things, I sure would like to organize the documents by date in order to look for trends or link them to external events. In order to do this, I need to extract dates from them (that is the html-document or the documents content itself.)
Can anybody give me a hint how to achieve this? I've seen that there is a "Extract Information"-Operator, but I don't know how to use it to achieve my goal... (I cant let it match a list of possible dates, which was my first idea...)
Any help is greatly appreciated!