🦉 🎤   RapidMiner Wisdom 2020 - CALL FOR SPEAKERS   🦉 🎤

We are inviting all community members to submit proposals to speak at Wisdom 2020 in Boston.


Whether it's a cool RapidMiner trick or a use case implementation, we want to see what you have.
Form link is below and deadline for submissions is November 15. See you in Boston!

CLICK HERE TO GO TO ENTRY FORM

How do I downsample my data without losing information?

GhostriderGhostrider Member Posts: 60  Maven
I have too much data to run through RapidMiner and I want to downsample it without throwing out anything useful (most of my time-series examples are very similar so my inputs do not change very much).  Most of the time, my inputs change slowly, but sometimes they change faster.  Is there a downsampling operator which samples the slower-varying portions less frequently than the faster varying portions?  Basically, if I was doing this by hand, the sampling would be non-uniform...this is tricky because on one hand, we don't want to completely filter out similar samples, they are useful for determining confidence.  On the other hand, they slow down the learning.

Answers

  • fischerfischer Member Posts: 439  Guru
    Hi,

    just an idea: You can try to assign a score and filter all attributes not exceeding a threshold score. You can also randomize this score. One option would be to use outlier detection algorithms to rank examples. The major problem will be that probably computing the right sample is computationally  as expensive as learning on the complete set.

    Best,
    Simon
Sign In or Register to comment.