How do I downsample my data without losing information?

Ghostrider · August 2010

I have too much data to run through RapidMiner and I want to downsample it without throwing out anything useful (most of my time-series examples are very similar so my inputs do not change very much). Most of the time, my inputs change slowly, but sometimes they change faster. Is there a downsampling operator which samples the slower-varying portions less frequently than the faster varying portions? Basically, if I was doing this by hand, the sampling would be non-uniform...this is tricky because on one hand, we don't want to completely filter out similar samples, they are useful for determining confidence. On the other hand, they slow down the learning.

fischer · August 2010

Hi,

just an idea: You can try to assign a score and filter all attributes not exceeding a threshold score. You can also randomize this score. One option would be to use outlier detection algorithms to rank examples. The major problem will be that probably computing the right sample is computationally as expensive as learning on the complete set.

Best,
Simon

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

How do I downsample my data without losing information?

Answers