Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
byte address / word location for Textual ETL
Wanttoknow
Member Posts: 6 Contributor II
Hi,
I'm doing fine with the currently provided operators for text processing in RM 5.0 (great! guys :-*)
However there is one aspect that I would like to see during the vector creation of words in documents and that is the byte addresses per word occurence as a key to distinguish one word occurence from another.
This would require a whole new representation of the wordlist where every occurence is displayed with a byte address/word location in stead of the aggregated number of occurences per word per document.
This would open up a new range of possibilities such as determining what other words or terms are found in proximity of a certain word/term. This would be of great value to determine the context of documents.
Of course I would be glad to know if this would already be possible with some combination of current operators ::)
I'm doing fine with the currently provided operators for text processing in RM 5.0 (great! guys :-*)
However there is one aspect that I would like to see during the vector creation of words in documents and that is the byte addresses per word occurence as a key to distinguish one word occurence from another.
This would require a whole new representation of the wordlist where every occurence is displayed with a byte address/word location in stead of the aggregated number of occurences per word per document.
This would open up a new range of possibilities such as determining what other words or terms are found in proximity of a certain word/term. This would be of great value to determine the context of documents.
Of course I would be glad to know if this would already be possible with some combination of current operators ::)
0
Answers
by coincidence this is exactly what we are currently working on. Stay tuned :-)
Cheers,
Simon
Great. Looking forward to it.
Thanks for your reply
Apart from that, we have a lot of other ideas concerning the text processing extension - so it will probably take a while until the re-structuring is finished, stay tuned .. ;-)
Kind regards,
Tobias