Smoothing values

msacs09 · July 2019

I have a following data set and i wanted to smooth the values, so that the outliers (high peaks are smoothed out with the general sample). How can we achieve this tried exponential smoothing with alpha =0.1 and didn't help. The column i need to smooth it out is VALUE

DT	ID	TOTAL	VALUE
201704	1	1383860	10.95
201705	1	1828848	7.81
201706	1	1520265	11.86
201707	1	1823590	12.44
201708	1	1371700	9
201709	1	1117848	8.33
201710	1	1261479	6.73
201711	1	1673933	16.62
201711	2	561574	1316.41
201712	2	454604	15.53
201801	2	671094	32.87
201802	2	648078	22.48
201803	2	664679	9.49
201804	2	785520	20.59
201805	2	775158	18.3
201806	2	658977	346.71
201807	2	1075298	11.13
201808	2	709763	32.56
201809	2	800129	17.01
201810	2	817088	9.49
201811	2	882086	15.36
201812	2	862648	41.4
201901	2	891263	36.05
201902	2	893897	227.04
201903	2	843360	131.17
201904	2	559655	5.06
201905	2	919638	107.18
201804	3	5375.65	32.5
201805	3	5344.03	168.65
201806	3	5312.43	168.1
201807	3	6747.5	14.76
201808	3	6715.76	14.69
201809	3	5217.5	164.88
201810	3	5185.33	9.96
201811	3	5153.71	9.9
201812	3	5122.06	161
201901	3	5090.43	160.92
201902	3	5058.79	159.89
201903	3	5027.07	158.77
201904	3	4995.42	157.73
201905	3	4963.74	156.67
201806	4	3505205	15.49
201807	4	3548029	13.18
201808	4	3473771	28.84
201809	4	4727992	11.6
201810	4	5900274	7.53
201811	4	7799624	5.41
201812	4	6454174	11.22
201901	4	5810997	195.27
201902	4	7789077	2215.65
201903	4	9437120	9.49
201904	4	9575089	7.63
201905	4	9442829	17.13
201801	5	2128374	9
201802	5	2262821	12.69
201803	5	1948217	38.53
201804	5	2116825	549.87
201805	5	1369574	4.6
201806	5	1601645	13.86

tftemme · July 2019

Hi @msacs09 ,

You could try the Moving Average Filter with either a simple or a binom filter type. Out of interest, what do you mean with the Exponential Smoothing didn't help? Wasn't there an effect at all, or was the smoothing effect too small (and how did you defined what a good smoothing is for you).

If you have only outliers (seems to be in your case) and you are sure that these are outliers in your data and can be removed, you can also try to replace them.

First replace all the outliers by missing values. For example by using a fixed max value for valid values. You could use if (Value < 200.0, Value, MISSING_NUMERICAL) in a Generate Attribute operator, which would replace all values in Value attribute which are larger than 200.0. Then you could use Replace Missing Values (Series) to replace the new missing values.

Hopes this helpes
Fabian

PS: If you want a more sophisticated way to identify the outliers, have a look into outlier detection. For example the Detect Outlier operators or the Tukey Test from the Operator Toolbox.

David_A · July 2019

Hi @msacs09,

the probably confusing thing of exponential smoothing is the effect of different values for the alpha value.

The closer the value is to 1, the less you are smoothing you values and the result looks more similar to the original data.

In contrast, an alpha value of 0.1 is a very strong smoothing, that can eliminate a lot of characteristics from your data.

Another thing is, that in your data the peaks a very extreme, compared to baseline data points, so finding the right amount of smoothing, without loosing to much information can be tricky.

Best,

David

Image: https://us.v-cdn.net/6030995/uploads/editor/la/cmc7ocs7ky6b.png

Telcontar120 · July 2019

You can also try capping high outlier values---you can do this with a Generate Attributes and a simple IF expression to replace with a fixed value above a certain threshold.

msacs09 · July 2019

Thank you all GREAT suggestions.

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Smoothing values

Answers