The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

Smoothing values

msacs09msacs09 Member Posts: 55 Contributor II
I have a following data set and i wanted to smooth the values, so that the outliers (high peaks are smoothed out with the general sample). How can we achieve this tried exponential smoothing with  alpha =0.1 and didn't help. The column i need to smooth it out is VALUE

DT ID TOTAL VALUE
201704 1 1383860 10.95
201705 1 1828848 7.81
201706 1 1520265 11.86
201707 1 1823590 12.44
201708 1 1371700 9
201709 1 1117848 8.33
201710 1 1261479 6.73
201711 1 1673933 16.62
201711 2 561574 1316.41
201712 2 454604 15.53
201801 2 671094 32.87
201802 2 648078 22.48
201803 2 664679 9.49
201804 2 785520 20.59
201805 2 775158 18.3
201806 2 658977 346.71
201807 2 1075298 11.13
201808 2 709763 32.56
201809 2 800129 17.01
201810 2 817088 9.49
201811 2 882086 15.36
201812 2 862648 41.4
201901 2 891263 36.05
201902 2 893897 227.04
201903 2 843360 131.17
201904 2 559655 5.06
201905 2 919638 107.18
201804 3 5375.65 32.5
201805 3 5344.03 168.65
201806 3 5312.43 168.1
201807 3 6747.5 14.76
201808 3 6715.76 14.69
201809 3 5217.5 164.88
201810 3 5185.33 9.96
201811 3 5153.71 9.9
201812 3 5122.06 161
201901 3 5090.43 160.92
201902 3 5058.79 159.89
201903 3 5027.07 158.77
201904 3 4995.42 157.73
201905 3 4963.74 156.67
201806 4 3505205 15.49
201807 4 3548029 13.18
201808 4 3473771 28.84
201809 4 4727992 11.6
201810 4 5900274 7.53
201811 4 7799624 5.41
201812 4 6454174 11.22
201901 4 5810997 195.27
201902 4 7789077 2215.65
201903 4 9437120 9.49
201904 4 9575089 7.63
201905 4 9442829 17.13
201801 5 2128374 9
201802 5 2262821 12.69
201803 5 1948217 38.53
201804 5 2116825 549.87
201805 5 1369574 4.6
201806 5 1601645 13.86

Answers

  • tftemmetftemme Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, RMResearcher, Member Posts: 164 RM Research
    Hi @msacs09 ,

    You could try the Moving Average Filter with either a simple or a binom filter type. Out of interest, what do you mean with the Exponential Smoothing didn't help? Wasn't there an effect at all, or was the smoothing effect too small (and how did you defined what a good smoothing is for you). 

    If you have only outliers (seems to be in your case) and you are sure that these are outliers in your data and can be removed, you can also try to replace them.

    First replace all the outliers by missing values. For example by using a fixed max value for valid values. You could use if (Value < 200.0, Value, MISSING_NUMERICAL) in a Generate Attribute operator, which would replace all values in Value attribute which are larger than 200.0. Then you could use Replace Missing Values (Series) to replace the new missing values.

    Hopes this helpes
    Fabian

    PS: If you want a more sophisticated way to identify the outliers, have a look into outlier detection. For example the Detect Outlier operators or the Tukey Test from the Operator Toolbox.

  • David_ADavid_A Administrator, Moderator, Employee-RapidMiner, RMResearcher, Member Posts: 297 RM Research

    the probably confusing thing of exponential smoothing is the effect of different values for the alpha value.
    The closer the value is to 1, the less you are smoothing you values and the result looks more similar to the original data.
    In contrast, an alpha value of 0.1 is a very strong smoothing, that can eliminate a lot of characteristics from your data.

    Another thing is, that in your data the peaks a very extreme, compared to baseline data points, so finding the right amount of smoothing, without loosing to much information can be tricky.

    Best,
    David




  • Telcontar120Telcontar120 RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn
    You can also try capping high outlier values---you can do this with a Generate Attributes and a simple IF expression to replace with a fixed value above a certain threshold.

    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
  • msacs09msacs09 Member Posts: 55 Contributor II
    Thank you all GREAT suggestions.
Sign In or Register to comment.