The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
How to correct the wrong words?
jozeftomas_2020
Member Posts: 40 Learner III
Hello
How to get in rapidminer
Improved spelling of words?
For example a word
meeseg - > message
or
veeeery gooood - >very good
Does anyone know
Tagged:
0
Answers
I think I answered this same question in another thread. If you generate a wordlist first and you compile a list of substitutions you want to make, then you can use the "Replace Tokens" operator. If you are looking for an automated way to do this (i.e., RapidMiner identifies misspellings and replaces them automatically), there isn't a built-in solution for that. There might be some third party software you could access via an API though.
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts
Hi @jozeftomas_2020,
As @Telcontar120 said, there isn't a built-in solution for performing what you want to do.
So I propose to use a Python script using the textblob library . Here some results :
However, when the words are too mispelled, the script is not able to correct them correctly (like the examples you gave) :
FYI, spelling correction is based on Peter Norvig’s “How to Write a Spelling Corrector” as implemented in the pattern library. It is about 70% accurate.
The process :
To execute this process, you have to :
- Install Python on your computer
- Install the textblob library
- Install the Execute Python operator from the marketplace
- Set the name of your text attribute in the Set Macros operator
I hope it helps,
Regards,
Lionel
Hello
Thank you
The last part you said
Did not get Macro setup?
What exactly should I do?
I want R to use this code
https://cran.r-project.org/web/packages/hunspell/vignettes/intro.html
But I do not know how to run rapidminer on my data.
Maybe help me?
I installed anaconda but I do not know how to install textblob and use it in rapidminer?:smileysad:
Can someone help?
Thank you
Hi @jozeftomas_2020,
1. To set up the Set Macros :
Have you try to import the process I shared ? You have to enter in the parameters of this operator (in the "values" column)
the name of the attribute where there are the mispelled words.
2. To install textblob :
a. Type Win + R to open a window
b. Type "cmd" and then click OK
c. Type "pip install textblob" and click enter
textblob will be automatically installed on your computer.
Regards,
Lionel
Hi dear friend
I did all the steps
I want to correct spelling mistakes in my data, which has a text column
I loaded the data and then with the 'select attribute' operator I chose my text column and then I connected to the 'execute python' operator.
The column name I want to correct is 'text'.
But run this error
I do not know how to solve it
Can you help me once more?
Thanks a lot
Have you set the name of your text attribute (text) in the set macros operator with quotes? (value ='text')
Regards,
Lionel
Hello
Yes you got it
But it still has an error
look
Maybe help me Allow me to send a photo or sample process?
Thanks a lot
With respect
Hi, I did the same for installing textblob. But is this error?
What should i do
"
2. To install textblob:
a. Type Win + R to open a window
b. Type "cmd" and then click OK
c. Type "pip install textblob" and click enter
textblob will be automatically installed on your computer.
"
Regards.
Lionel
The 'pip' command is installed with Python.
So first install Python (Python 3.x) via
Anaconda.
Regards,
Lionel.
Hello
But I installed Python first.
How should I do now?
Thank you my friend
I've got it from Twitter, in the photo above
I have a search twitter operator before nominal to text.
This
Can you tell what the problem is?
And how can I run the preprocess code on my tweets in RapidMiner?
https://www.kdnuggets.com/2018/03/text-data-preprocessing-walkthrough-python.html
Thanks if you get started
With respect and dedication
Hi @jozeftomas_2020,
It will be very hard for us to understand your bug without your process, can you share it ?
and what you want to do in fine ?, correct the mispelled tweets ??
Regards,
Lionel
Hi @student_compute,
If you have, effectively, installed Python, 'pip' must be installed too. So I see only one solution :
You have to update your "environment variables" :
1/
- Search the pip.exe file on your computer. it is by default located in C:\Users\username\Anacondax\Scripts or C:\Users\username\Pythonx\Scripts. (where x = 2 or 3 according to the version of Python you installed).
or
- Type 'pip.exe' (with quote) in the search bar of windows 10 (bottom-left), then right click on the result and select open the location of the file.
2/ Then (here on Windows 10):
- open an explorer window
then click on properties
then
then
ikk
then
I
I hope it helps,
Regards,
Lionel
Hello
This is my process
I want to correct spelling mistakes in any tweets. And then I can do kmesan clustering. But I'm new to Python.
And in the RapidMiner program, I do not know how to write code for Python to achieve this goal.
Please, dear friend, if possible
With respect
I will be grateful . I'm waiting for your help
Hi @jozeftomas_2020,
Here the operational process to correct mispelled tweets :
Note that according to the number of tweets, the correction of the tweets may take many minutes.
Regards,
Lionel
@lionelderkrikor this is quite handy, thank you for this!
Hi,
You're welcome, @Thomas_Ott.
Happy corrections !
Regards,
Lionel
Thank you so much
Really your codes will surprise me
I do not know how to thank
But the master
In one comment, I typed a false word and run the program. As a result, the word was not corrected
Maybe check
like this
iphon worst phone appl made helo meseg
After running
iphon worst phone appl made helo meseg
I wanted to correct the two words helo, meseg as hello, message
Thank you
Hi @jozeftomas_2020,
I executed the script with your examples and here what I get (in your case, I don't know why, no correction is performed):
That's not what you're waiting for, but the spelling corrector try to find the nearest correct word from the mispelled word.
So :
- "held" is nearer from "helo" than "hello".
- "meet" is nearer from "meseg" than "message".
I think it will be very difficult to do best.
Regards,
Lionel
Hello.
Yes you are right.
Thanks again.
Is it possible just to send your last example xml file?
Thankful
Hi @jozeftomas_2020,
Here the last process :
Regards,
Lionel