Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
[SOLVED] Transform Document-Term matrix to flat table?
RWingerter
Member Posts: 38 Contributor II
A newbie question.
I have a simple process which uses „Data to Documents“, „Process Documents“ and „Tokenize“ to turn a list of strings into a wordlist.The second result is my ExampleSet turned into a Document-Term Matrix.
My question is: How can I transform the Document-Term matrix (Document_ID x Term) to a flat table with three attributes (Document_ID, Term, occurrences)?
Regards,
Roland
I have a simple process which uses „Data to Documents“, „Process Documents“ and „Tokenize“ to turn a list of strings into a wordlist.The second result is my ExampleSet turned into a Document-Term Matrix.
My question is: How can I transform the Document-Term matrix (Document_ID x Term) to a flat table with three attributes (Document_ID, Term, occurrences)?
Regards,
Roland
0
Answers
It is more likely to get an answer by posting a (self-)process with a small chunk of your data. Currently I am not sure what you have and what you want.
Best
Marcin
thanks for your reply. Here is my example data and my simple process.
The input is a list of user queries (query_id, query, frequency), which is processed with "Process Documents from Data". The result is a word list and a document-term matrix. In addition, I would like to get a term-document table with Term, Query_ID, and TF*IDF, e.g.
Term Query_ID TF*IDF
---------------------------------
Term1 1 0.34
Term1 2 0.23
Term2 3 1.00
I tried various things without success. Maybe it's not difficult to do, but I didn't manage.
Sample data: Code: Any and all help welcome.
Thank you
Roland
thank you very much, it works like a charm. I had looked at the "De-Pivot" operator, but I had no idea how to adress the attribute names. I am not saying I understand your code (that will certainly take a while), but for now I am just happy to have a solution. Thanks again.
Kind regards
Roland