Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.

Random sampling of a large corpus

frankcfrankc Member Posts: 3 Contributor I
edited November 2019 in Help
How can you pick a random sample from a large corpus for files to perform pre-processing and text mining with the text mining extension?  Is there an operator that does that?

Frank
Tagged:

Answers

  • homburghomburg Employee, Member Posts: 114 RM Data Scientist
    Hi frankc,

    just a quick question. Do you want to read a random set of files or read all files and shuffle a random set of documents?

    Cheers,
    Helge
  • bkrieverbkriever RapidMiner Certified Analyst, Member Posts: 11 Contributor II
    You should be able to use the Sample operator and select "use local random seed" to select a random sample, similar to a non-text data set.
Sign In or Register to comment.