Options

Removing StopWords using Dictionary

HyramHyram Member Posts: 39 Contributor II
edited June 2020 in Help
Hi
I am using my own dictionary to remove Stopwords. On close analysis, words like "is" are not being removed, although they are in the dictionary. Any clue as to why this is happening?
Thanks,
Hyram

Best Answers

Answers

  • Options
    kaymankayman Member Posts: 662 Unicorn
    Can you share your process? No need to add data, just the process itself.
  • Options
    HyramHyram Member Posts: 39 Contributor II
    edited June 2020
    Yes sure, thanks @kayman
    Attached

    For the dictionary, I am using NLTK stopswords. Not sure if my encoder setting is right?
  • Options
    HyramHyram Member Posts: 39 Contributor II
    @kayman thanks for looking. Some answers to your questions:
    1. I am using 'non-letters' to tokenise my words and it seems to work. No full sentences as a result;
    2. Correct, I transform to lower case;
    3. Correct - I filter by length of 2 i.e. any characters with < 2 are out
    4. You have a good point as I have not checked this. I basically cut and pasted it into a Word doc

    I initially used 'filter Stopwords (English)' but it was excluding words like 'like' which I wanted to keep.
    Thanks!
  • Options
    HyramHyram Member Posts: 39 Contributor II
    Thanks @kayman
    Really appreciate your help! Will try what the operator notes suggest which is inline with what you are saying re txt format.
  • Options
    HyramHyram Member Posts: 39 Contributor II
    @kayman
    Your suggestion re file format worked. Thank you!
Sign In or Register to comment.