Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.

[SOLVED] Format input documents

MarcosRLMarcosRL Member Posts: 53 Contributor II
edited December 2019 in Help
Hello friends of the community.
I have a question regarding the format of the input documents.
I try the procedure tokenize format files. "txt" and runs smoothly.
The original files I need to work with are in ". Docx" and ". Doc" for Microsoft Word, repeat the procedure for "tokenize" and read me document strange characters.
Is there a way to be able to document format. "Docx" and ". Doc"?

Answers

  • johan_CGjohan_CG Member Posts: 19 Contributor II
    Hi MarcosRL

    Do you find a solution for .docx and .doc? I 've got the same problem.
    Thanks in advance.

    Johan
  • MarcosRLMarcosRL Member Posts: 53 Contributor II
    Hi Joan
    yes, I solved.
    I did was convert documents from ".pdf" format to ".txt" (plain text format) instead of transforming Microsoft Word format (. docx - doc)
    Greetings from Argentina  :)
  • johan_CGjohan_CG Member Posts: 19 Contributor II
    Hi Marcos

    Thank you for the tips.

    Greetings from France ;)
Sign In or Register to comment.