Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
Using Word2Vec with LSTM
Hi everyone!
I am new to RapidMiner. All my background is in Python language. I will explain my problem but unfortunately, I can't provide any images right now. I follow some tutorials for creating a word2vec model and saving it ( or another option we can download a pre-train model). However, I have huge cuorps around 100,000 records. So, I am sure there are a huge number of words will be. but the model shows me only around 2000 words even when I try to make the window size and frequency of the word low. This is the first problem. Now coming to the second problem. I used the word2vec that I built with 2000 words. After that, i saw some tutorials on how to use embedding layers and text to embedding ID. They used a format with 4 columns ( ID, batch, word, label). they tokenized the sentence and put each token in a new row. I did my best to have the same format. But, even when I did it. I end up with two problems. This format will take up huge space when the data is too large and when I use word2vec with text to embedding id will replace the words with -2 for all of them I don't know why and what -2 means here?
if anyone did text classification with deep learning and word2vec I would appreciate his support. I really need a solution for these problems or at least an example of how to do it in RapidMiner. I have the 9.10.4 RapidMiner version.
Thanks in advance!.
I am new to RapidMiner. All my background is in Python language. I will explain my problem but unfortunately, I can't provide any images right now. I follow some tutorials for creating a word2vec model and saving it ( or another option we can download a pre-train model). However, I have huge cuorps around 100,000 records. So, I am sure there are a huge number of words will be. but the model shows me only around 2000 words even when I try to make the window size and frequency of the word low. This is the first problem. Now coming to the second problem. I used the word2vec that I built with 2000 words. After that, i saw some tutorials on how to use embedding layers and text to embedding ID. They used a format with 4 columns ( ID, batch, word, label). they tokenized the sentence and put each token in a new row. I did my best to have the same format. But, even when I did it. I end up with two problems. This format will take up huge space when the data is too large and when I use word2vec with text to embedding id will replace the words with -2 for all of them I don't know why and what -2 means here?
if anyone did text classification with deep learning and word2vec I would appreciate his support. I really need a solution for these problems or at least an example of how to do it in RapidMiner. I have the 9.10.4 RapidMiner version.
Thanks in advance!.
0
Answers