The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
How do I smooth by bin means?
For an assignment, i need to use smoothing by bin means. Where you sort a value, create bins of the same size, and replace the value with the bin mean.I'm having a tough time finding this feature. Discretization is the only section that discusses binning and I didn't see anything dealing with means in the transformations section. Does RapidMiner support this?
After searching a bit, I've only seen this technique mentioned in academic papers and presentations. Is this not a common technique for professionals? What is a more preferred smoothing approach?
Thanks,
Jamison
After searching a bit, I've only seen this technique mentioned in academic papers and presentations. Is this not a common technique for professionals? What is a more preferred smoothing approach?
Thanks,
Jamison
0
Answers
- Copy the attribute you want to smooth with the "Generate Attribute"-operator
- Use your favored discretization on the copied attribute
- Apply a average-aggregation with the copied attribute as grouping attribute and the original attribute as aggregation with the average-function
Here is an example process for this: Now you can delete ("Select Attributes"-operator) the copied attribute and the original attribute is smoothed. 8)It isn't very elegant, escpecially if you want to smooth more than one attribute, but maybe this is sufficient for your needs. I will ask around for another way to accomplish this.
That got me on the right track! I had to do one extra-step to join the averages back into the original set.
My bins are still not coming out the same as in Excel, so I'll need to review. I think the difference is that in Excel I created a bin every four rows whereas RapidMiner is creating ranges for the bins. This leads to some bins having 3 and some having 5 items. To resolve this I'm looking into sorting by my value and adding a row count column (can RM do this?). The row count column will become my field to discretize.
Edit:
I found "a" solution.
1. Sort by Value
2. Generate Id (this will be a row number based on the sort)
3. Set Role of new Id to Regular
4. Discretize by Size on Id from #2
5. Multiply
6. Aggregate values from #1 grouped by Id from #2
7. Join original to #6
You now have a data set with your values grouped by bin mean.
Jamison