Options

How do I downsample my data without losing information?

GhostriderGhostrider Member Posts: 60 Contributor II
I have too much data to run through RapidMiner and I want to downsample it without throwing out anything useful (most of my time-series examples are very similar so my inputs do not change very much).  Most of the time, my inputs change slowly, but sometimes they change faster.  Is there a downsampling operator which samples the slower-varying portions less frequently than the faster varying portions?  Basically, if I was doing this by hand, the sampling would be non-uniform...this is tricky because on one hand, we don't want to completely filter out similar samples, they are useful for determining confidence.  On the other hand, they slow down the learning.

Answers

  • Options
    fischerfischer Member Posts: 439 Maven
    Hi,

    just an idea: You can try to assign a score and filter all attributes not exceeding a threshold score. You can also randomize this score. One option would be to use outlier detection algorithms to rank examples. The major problem will be that probably computing the right sample is computationally  as expensive as learning on the complete set.

    Best,
    Simon
Sign In or Register to comment.