🎉 🎉 RAPIDMINER 9.10 IS OUT!!! 🎉🎉
Download the latest version helping analytics teams accelerate time-to-value for streaming and IIOT use cases.
Working with large datasets
I am working with a really large dataset (with >2,6 million examples, ~25 attributes, 1 polynominal ID).
After renaming some attributes and generating a basic mathematical calulation with another attribute, I wanted to apply a model on the predict those large set with the model. Unfortunately, it always crashed havin exceed memory limit. Even when I split them in subsets of 1 million examples this happens.
So my questions:
- Is there a smarter way to store those data (short array or some other options)?
- Would it be better to convert the ID into interger values?
- Interestingly, the workflow crashes when using materialize data and/or free memory.
Could you give me some tips, working with larger datasets?