Which stop words are used by EnglishStopwordFilter?

Legacy UserLegacy User Member Posts: 0 Newbie
edited November 2018 in Help
Hi, I can't find the word list for the EnglishStopwordFilter. I even looked briefly at the source code but I'm not a coder... :(

Where do I find information about the stop words that are used?

Thank you very much!

Answers

  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    Hi,

    here are the stopwords:

        private static String[] stopWords = new String[] {
        "abaft", "aboard", "about", "above", "across", "afore", "aforesaid", "after", "again", "against",
        "agin", "ago", "aint", "albeit", "all", "almost", "alone", "along", "alongside", "already",
        "also", "although", "always", "am", "american", "amid", "amidst", "among", "amongst", "an",
        "and", "anent", "another", "any", "anybody", "anyone", "anything", "are", "aren", "around",
        "as", "aslant", "astride", "at", "athwart", "away", "back", "bar", "barring", "be",
        "because", "been", "before", "behind", "being", "below", "beneath", "beside", "besides", "best",
        "better", "between", "betwixt", "beyond", "both", "but", "by", "can", "cannot", "certain",
        "circa", "close", "concerning", "considering", "cos", "could", "couldn", "couldst", "dare", "dared",
        "daren", "dares", "daring", "despite", "did", "didn", "different", "directly", "do", "does",
        "doesn", "doing", "done", "don", "dost", "doth", "down", "during", "durst", "each",
        "early", "either", "em", "english", "enough", "ere", "even", "ever", "every", "everybody",
        "everyone", "everything", "except", "excepting", "failing", "far", "few", "first", "five", "following",
        "for", "four", "from", "gonna", "gotta", "had", "hadn", "hard", "has", "hasn",
        "hast", "hath", "have", "haven", "having", "he", "her", "here", "hers", "herself",
        "high", "him", "himself", "his", "home", "how", "howbeit", "however", "id", "if",
        "ill", "immediately", "important", "in", "inside", "instantly", "into", "is", "isn", "it",
        "its", "itself", "ve", "just", "large", "last", "later", "least", "left", "less",
        "lest", "let", "like", "likewise", "little", "living", "long", "many", "may", "mayn",
        "me", "mid", "midst", "might", "mightn", "mine", "minus", "more", "most", "much",
        "must", "mustn", "my", "myself", "near", "neath", "need", "needed", "needing", "needn",
        "needs", "neither", "never", "nevertheless", "new", "next", "nigh", "nigher", "nighest", "nisi",
        "no", "one", "nobody", "none", "nor", "not", "nothing", "notwithstanding", "now", "er",
        "of", "off", "often", "on", "once", "oneself", "only", "onto", "open", "or",
        "other", "otherwise", "ought", "oughtn", "our", "ours", "ourselves", "out", "outside", "over",
        "own", "past", "pending", "per", "perhaps", "plus", "possible", "present", "probably", "provided",
        "providing", "public", "qua", "quite", "rather", "re", "real", "really", "respecting", "right",
        "round", "same", "sans", "save", "saving", "second", "several", "shall", "shalt", "shan",
        "she", "shed", "shell", "short", "should", "shouldn", "since", "six", "small", "so",
        "some", "somebody", "someone", "something", "sometimes", "soon", "special", "still", "such", "summat",
        "supposing", "sure", "than", "that", "the", "thee", "their", "theirs", "them", "themselves",
        "then", "there", "these", "they", "thine", "this", "tho", "those", "thou", "though",
        "three", "thro", "through", "throughout", "thru", "thyself", "till", "to", "today", "together",
        "too", "touching", "toward", "towards", "true", "twas", "tween", "twere", "twill", "twixt",
        "two", "twould", "under", "underneath", "unless", "unlike", "until", "unto", "up", "upon",
        "us", "used", "usually", "versus", "very", "via", "vice", "vis-a-vis", "wanna", "wanting",
        "was", "wasn", "way", "we", "well", "were", "weren", "wert", "what", "whatever",
        "when", "whencesoever", "whenever", "whereas", "where", "whether", "which", "whichever", "whichsoever", "while",
        "whilst", "who", "whoever", "whole", "whom", "whore", "whose", "whoso", "whosoever", "will",
        "with", "within", "without", "wont", "would", "wouldn", "wouldst", "ye", "yet", "you",
        "your", "yours", "yourself", "yourselves"};
    You can obtain those (and also for the other languages) from the file "StopwordsEnglish" in the wvtool project at SourceForge.

    Cheers,
    Ingo
  • Legacy UserLegacy User Member Posts: 0 Newbie
    Ingo, thank you very much!
  • Legacy UserLegacy User Member Posts: 0 Newbie
    If I want to add my own file using StopwordFilterFile
    can you please put an example of the text file

    Thank you very much
  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    Hi,

    as far as I remember it's simple one stopword per line in this file, for example:

    this
    is
    my
    list
    of
    stopwords


    Cheers,
    Ingo
  • lexusboylexusboy Member Posts: 22 Maven
    Hello Ingo,

    Is it possible to modify the standard stop word file that RapidMiner uses?

    Best Regards,
    Bhavya
  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi Bhavya,
    this is not possible, but you could simply copy the words posted by Ingo into a file with the format Ingo mentioned and then use the StopwordFilterFile operator instead.

    Greetings,
      Sebastian
  • lexusboylexusboy Member Posts: 22 Maven
    Hi Sebastian,

    Thanks for your reply, would it be possible for you to give me a list of the stop words for the German filter used by RapidMiner. Have a nice day :)
  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,

    here it comes:
    "ab", "aber", "Aber", "alle", "allein", "allem", "allen", "aller", "als", "Als", "also", "alt", "am", "Am", "an", "andere", "anderen", "arbeiten", "auch", "Auch", "auf", "Auf", "Aufgabe", "aus", "außer", "bald", "beginnen", "bei", "Bei", "beide", "beiden", "beim", "bekannt", "bekennen", "bereits",
    "berichten", "bestehen", "betonen", "betonte", "bin", "bis", "bißchen", "bisschen", "Bisschen", "bist", "bleiben", "bringen", "da", "dabei", "dadurch", "dafür", "dagegen", "dahinter", "damit", "danach", "daneben", "dann", "daran", "darauf", "daraus", "darin", "darüber", "darum", "darunter", "das", "Das", "daß", "dass", "Dass", "dasselbe",
    "davon", "davor", "dazu", "dazwischen", "dein", "deine", "deinem", "deinen", "deiner", "deines", "dem", "demselben", "den", "denen", "denn", "der", "Der", "deren", "derselben", "des", "desselben", "dessen", "deutlich", "dich", "die", "Die", "dies", "Dies", "diese", "Diese", "dieselbe", "dieselben", "diesem", "diesen", "dieser", "dieses", "dir",
    "doch", "Doch", "dort", "drei", "du", "durch", "dürfen", "ebenso", "eigen", "eigenen", "ein", "Ein", "eine", "Eine", "einem", "einen", "einer", "eines", "einig", "einige", "einigen", "einmal", "entlang", "entscheiden", "entsprechen", "EPD", "er", "Er", "erhalten", "erklären", "erklärte", "erst", "ersten", "es", "Es", "etwa", "etwas", "euch",
    "euer", "eure", "eurem", "euren", "eurer", "eures", "fest", "finden", "fordern", "fragen", "frei", "früh", "führen", "fünf", "für", "Für", "fürs", "ganz", "gar", "gebe", "geben", "gegen", "gegenüber", "gehen", "gehören", "geht", "gemeinsam", "genau", "gewesen", "gibt", "glauben", "gleich", "groß", "großen", "gründen", "gut", "habe", "haben",
    "handeln", "hat", "hatte", "hätte", "hatten", "hätten", "heilig", "heißt", "her", "herein", "herum", "heute", "hin", "hinter", "hintern", "hoch", "hören", "ich", "ihm", "ihn", "Ihnen", "ihnen", "ihr", "ihre", "Ihre", "ihrem", "Ihrem", "ihren", "Ihren", "Ihrer", "ihrer", "ihres", "Ihres", "im", "Im", "immer", "in", "In", "ins", "international",
    "ist", "ja", "je", "jedesmal", "jedoch", "jene", "jenem", "jenen", "jener", "jenes", "jetzt", "jung", "kann", "KAP", "kaum", "kein", "keine", "keinem", "keinen", "keiner", "keines", "kirchlich", "klein", "kommen", "könne", "können", "könnten", "kritisieren", "lang", "laß", "lass", "lassen", "leben", "letzen", "letzte", "letzten", "machen",
    "man", "mehr", "mein", "meine", "meinem", "meinen", "meiner", "meines", "meist", "mich", "mir", "mit", "Mit", "mitteilen", "möglich", "muß", "muss", "müsse", "müssen", "müßten", "müssten", "nach", "Nach", "nachdem", "nah", "nämlich", "national", "neben", "nehmen", "nein", "nennen", "neu", "neue", "neuen", "nicht", "nichts", "noch", "nun", "nur",
    "ob", "ober", "obgleich", "oder", "ohne", "paar", "Recht", "recht", "reich", "religiös", "rund", "sagte", "schaffen", "schon", "schreiben", "schwer", "sehen", "sehr", "sei", "seien", "sein", "seine", "seinem", "seinen", "seiner", "seines", "seit", "seitdem", "selbst", "Selbst", "setzen", "sich", "Sie", "sie", "sind", "so", "So", "sogar",
    "solch", "solche", "solchem", "solchen", "solcher", "solches", "soll", "sollen", "sollte", "sollten", "sondern", "sonst", "soviel", "soweit", "sowie", "spät", "sprechen", "stark", "stehen", "steht", "stellen", "teilen", "teilte", "über", "um", "und", "Und", "uns", "unser", "unsere", "unserem", "unseren", "unserer", "unseres", "unter",
    "vergangen", "vergangenen", "vergehen", "veröffentlichen", "viel", "viele", "vier", "voll", "vom", "von", "Von", "vor", "Vor", "vorsitzen", "währen", "während", "war", "wäre", "waren", "wären", "warum", "was", "wegen", "weil", "weit", "weiter", "welche", "welchem", "welchen", "welcher", "welches", "wem", "wen", "wenig", "wenige", "wenn", "Wenn",
    "wer", "werde", "werden", "weshalb", "wessen", "wichtig", "wie", "Wie", "wieder", "will", "wir", "Wir", "wird", "wo", "wollen", "womit", "worden", "wurde", "wurden", "würden", "zehn", "zeigen", "zentral", "zu", "Zu", "zum", "zur", "zwar", "zwei", "zweit", "zwischen", "zwischens"
    Greetings,
      Sebastian
Sign In or Register to comment.