The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Smoothing values
I have a following data set and i wanted to smooth the values, so that the outliers (high peaks are smoothed out with the general sample). How can we achieve this tried exponential smoothing with alpha =0.1 and didn't help. The column i need to smooth it out is VALUE
DT | ID | TOTAL | VALUE |
201704 | 1 | 1383860 | 10.95 |
201705 | 1 | 1828848 | 7.81 |
201706 | 1 | 1520265 | 11.86 |
201707 | 1 | 1823590 | 12.44 |
201708 | 1 | 1371700 | 9 |
201709 | 1 | 1117848 | 8.33 |
201710 | 1 | 1261479 | 6.73 |
201711 | 1 | 1673933 | 16.62 |
201711 | 2 | 561574 | 1316.41 |
201712 | 2 | 454604 | 15.53 |
201801 | 2 | 671094 | 32.87 |
201802 | 2 | 648078 | 22.48 |
201803 | 2 | 664679 | 9.49 |
201804 | 2 | 785520 | 20.59 |
201805 | 2 | 775158 | 18.3 |
201806 | 2 | 658977 | 346.71 |
201807 | 2 | 1075298 | 11.13 |
201808 | 2 | 709763 | 32.56 |
201809 | 2 | 800129 | 17.01 |
201810 | 2 | 817088 | 9.49 |
201811 | 2 | 882086 | 15.36 |
201812 | 2 | 862648 | 41.4 |
201901 | 2 | 891263 | 36.05 |
201902 | 2 | 893897 | 227.04 |
201903 | 2 | 843360 | 131.17 |
201904 | 2 | 559655 | 5.06 |
201905 | 2 | 919638 | 107.18 |
201804 | 3 | 5375.65 | 32.5 |
201805 | 3 | 5344.03 | 168.65 |
201806 | 3 | 5312.43 | 168.1 |
201807 | 3 | 6747.5 | 14.76 |
201808 | 3 | 6715.76 | 14.69 |
201809 | 3 | 5217.5 | 164.88 |
201810 | 3 | 5185.33 | 9.96 |
201811 | 3 | 5153.71 | 9.9 |
201812 | 3 | 5122.06 | 161 |
201901 | 3 | 5090.43 | 160.92 |
201902 | 3 | 5058.79 | 159.89 |
201903 | 3 | 5027.07 | 158.77 |
201904 | 3 | 4995.42 | 157.73 |
201905 | 3 | 4963.74 | 156.67 |
201806 | 4 | 3505205 | 15.49 |
201807 | 4 | 3548029 | 13.18 |
201808 | 4 | 3473771 | 28.84 |
201809 | 4 | 4727992 | 11.6 |
201810 | 4 | 5900274 | 7.53 |
201811 | 4 | 7799624 | 5.41 |
201812 | 4 | 6454174 | 11.22 |
201901 | 4 | 5810997 | 195.27 |
201902 | 4 | 7789077 | 2215.65 |
201903 | 4 | 9437120 | 9.49 |
201904 | 4 | 9575089 | 7.63 |
201905 | 4 | 9442829 | 17.13 |
201801 | 5 | 2128374 | 9 |
201802 | 5 | 2262821 | 12.69 |
201803 | 5 | 1948217 | 38.53 |
201804 | 5 | 2116825 | 549.87 |
201805 | 5 | 1369574 | 4.6 |
201806 | 5 | 1601645 | 13.86 |
0
Answers
You could try the Moving Average Filter with either a simple or a binom filter type. Out of interest, what do you mean with the Exponential Smoothing didn't help? Wasn't there an effect at all, or was the smoothing effect too small (and how did you defined what a good smoothing is for you).
If you have only outliers (seems to be in your case) and you are sure that these are outliers in your data and can be removed, you can also try to replace them.
First replace all the outliers by missing values. For example by using a fixed max value for valid values. You could use if (Value < 200.0, Value, MISSING_NUMERICAL) in a Generate Attribute operator, which would replace all values in Value attribute which are larger than 200.0. Then you could use Replace Missing Values (Series) to replace the new missing values.
Hopes this helpes
Fabian
PS: If you want a more sophisticated way to identify the outliers, have a look into outlier detection. For example the Detect Outlier operators or the Tukey Test from the Operator Toolbox.
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts