Extracting date from textfiles
my name is Timo and I would be glad if you could please help me with my problem:
I have a lot of textfiles, especially press releases from different firms, and I would like to extract the date out of these press releases.
The problem is, that there is no standard format for the date, i.e. sometimes it's "14.08.2008" and sometimes "04 November 05" or "14 November 2005".
I know how to tokenize, generate n-grams,... and so on, but I don't know how I can extract the date Information from these files.
My idea was to work with the "generate n-grams" operator, but I don't know which Regex I have to insert.
Maybe you could help me
Thank you very much!