Options

"Store Operator and Resad License Restrictions"

bhickiebhickie Member Posts: 2 Contributor I
edited June 2019 in Help
To Whomever This May Concern,

I have used the go to pages operator to crawl 200 or so URLs and then processed the HTML from these pages to create a classification dataset.

I would like to store this dataset using the store operator, but RM is telling me I am hitting the free version 1024 MB limit.  This seems odd to me given that the dataset when exported to CSV is only 40MB.  Can someone explain to me what is causing the file to be 25 times the size or more in store format?

I am trying to split up the indexing process from the modeling process so that I can quickly test a bunch of different text classification models.  I looked at doing this via exporting to csv and then re-reading the dataset back into RapidMiner.  However, I noticed that exporting the dataset to CSV causes foreign language information to be lost (none of this is important information in my case, but I am not sure if other information is being lost).

Is there a good work around for this that does not require a license upgrade?  Down the road I may upgrade to a license, but would like to experiment and prove that the modeling approach works and is the best way to solve my problem before I try to get budget to purchase a license.

I have also found that getting the data back into Rapid Miner from the CSV is not as seamless as expected.  Is there a maximum number of variables in a dataset that can be read in with the free license?

Any information/advice you can provide on this front will be very helpful.

Brandon
Tagged:

Answers

  • Options
    Marco_BoeckMarco_Boeck Administrator, Moderator, Employee, Member, University Professor Posts: 1,996 RM Engineering
    Hj,

    let me shed some light on your questions:

    1) The limit is not imposed on a per-data basis. It rather is a general limit that RapidMiner Studio (which needs memory to run itself) and your process may not consume more than 1024 MB of memory. If your Studio needs more memory while your process is running, it is stopped. Studio does not care if the data you are storing is 1 MB, 1 GB or 1 TB as long as you're not running into the process memory limit.

    2) Make sure you select the correct encoding settings for your "Write CSV" operator (you need to enable expert mode for the parameter to show up). Otherwise certain characters will explode.

    3) No there is no limit to Read CSV. You may want to make sure the File encoding matches the one you used to create the .csv file in the first place and that the column separator is identical.

    Regards,
    Marco
Sign In or Register to comment.