06-11-2008 06:38 AM
06-13-2008 05:17 AM
06-24-2008 01:59 PM
1) my text is filtered against a set of English stop words and some words are pruned (Ex. and, or..).
- I have to work with texts on biology and so I'm wondering what happens with strange words such as IL-6. Are these words filtered or maintained?
2) The stemmer keeps only the "basic chunks" of my words. I think that this is based on a dictionary.
- Could you tell me which dictionary is that? I need to know that precisely in order to answer to the question "does it contain some medical terms such as glicolase..?" that is crucial for me now
- What does it happen to my strange word (Ex. IL-6)? Are they pruned, chunked in some way or kept as they are?
07-06-2008 08:23 AM