The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
How do I downsample my data without losing information?
Ghostrider
Member Posts: 60 Contributor II
in Help
I have too much data to run through RapidMiner and I want to downsample it without throwing out anything useful (most of my time-series examples are very similar so my inputs do not change very much). Most of the time, my inputs change slowly, but sometimes they change faster. Is there a downsampling operator which samples the slower-varying portions less frequently than the faster varying portions? Basically, if I was doing this by hand, the sampling would be non-uniform...this is tricky because on one hand, we don't want to completely filter out similar samples, they are useful for determining confidence. On the other hand, they slow down the learning.
0
Answers
just an idea: You can try to assign a score and filter all attributes not exceeding a threshold score. You can also randomize this score. One option would be to use outlier detection algorithms to rank examples. The major problem will be that probably computing the right sample is computationally as expensive as learning on the complete set.
Best,
Simon