Keep rmhdf5 size under control

kaymankayman Member Posts: 655   Unicorn
Hi there, when storing recordsets with text content the size of the rmhdf5 files seems to behave quite weird. I have files that in their original xml format are like 1M blow up to over 2 gigabytes when converting them to recordsets and store them as hdf5.

Is this a known problem? Or is their a way to convert to the old format again as this fills my disc a bit too fast? Loading them is no real issue, this goes pretty fast, it's just the file size that has me puzzled.

It's also not very consistent, other files similar in size can take up only a few kilobytes, so it seems like it happens only occasionally, but for me without any real reason as structure and content are quite similar. 


  • mschmitzmschmitz Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,123  RM Data Scientist
    Adding @jczogalla as the expert on it.
    - Head of Data Science Services at RapidMiner -
    Dortmund, Germany
  • jczogallajczogalla Employee, Member Posts: 140   RM Engineering
    Hi @kayman!

    That sounds indeed a bit strange... can you share the xml files? And a minimal process? 
    It is possible that if text data is repeating a lot, it can be very small. But it is also possible that if e.g. in a nominal column there is one very long string and all others are very small, that this could blow up the file size.

