Options

Text mining in Spanish

MarlaBotMarlaBot Administrator, Moderator, Employee, Member Posts: 57 Community Manager
edited March 2020 in Help
A RapidMiner user wants to know the answer to this question: "I'm trying to see if there is anything in the app for text mining in Spanish? I know about Rosette extension but I wonder if there's an operator for that from RapidMiner. Thank you."

Answers

  • Options
    sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager
    hi @MarlaBot that's a good question. You can use the normal text processing tools for Spanish, just like you can for any other language. There is not a stopword dictionary built in but of course you can find many online.

    I am tagging our resident Spanish-speaking unicorn @rfuentealba and tagging this to his "RapidMiner en Castellano" group in case he has more suggestions.

    Scott

  • Options
    rfuentealbarfuentealba Moderator, RapidMiner Certified Analyst, Member, University Professor Posts: 568 Unicorn
    Hello, @MarlaBot and @sgenzer.

    I do have suggestions but not many more than the ones already you did. The only two things I could not achieve with RapidMiner alone (and this was before knowing the Rosette) were POS tagging and words vectors, but for these you can use Python and a package named "pattern".

    By the way and having experience applying NLP in many other languages (Spanish, Portuguese, German and Southern Chilean Spanish), I recommend you to generate your own list of stop words.

    If there is anything I can do to help, just drop us a line over here and we'll see how to solve it :)

    -- Spanish.

    Tengo algunas sugerencias, pero no muchas más que las que Scott ya mencionó. Las únicas dos cosas que no pude hacer con RapidMiner (y esto fue antes de conocer la extensión Rosette) fueron etiquetado de partes del habla y los vectores de palabras, pero para ambas es posible usar Python y un paquete llamado "pattern".

    Por lo demás, y teniendo experiencia en NLP en varios otros idiomas (castellano, portugués, alemán y chileno-sureño), te recomiendo crear tu propia lista de palabras de detención en vez de usar las predefinidas.

    Si hay cualquier cosa en que pueda ayudarte, sólo escríbenos por acá y vemos cómo lo resolvemos :)

    Un abrazo,

    Rodrigo.
Sign In or Register to comment.