The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
'% of total' and 'group by'... help needed
Hey guys,
In the case of different stores and sales:
Firstly, I need to add a column to calculate for each store to see the percentage of total sales of the whole dataset. (Update: I just learned Extract Marco, I got this part sorted, just need to find out how to show 1.001%, not just 0.01001 )
Then, I want to group the first top 10 store. So the top 10 will be labelled as top stores, and the other stores will be labelled as others.
Basically, the final result I want to get is a pie chart to see the top stores' percentage of sales against others.
I kinda know it is possibly aggregation operator, but I still can not find out how to make it work.
The other option is to use Turbo Prep, I am open to this as well.
Thanks for all your help.
In the case of different stores and sales:
Firstly, I need to add a column to calculate for each store to see the percentage of total sales of the whole dataset. (Update: I just learned Extract Marco, I got this part sorted, just need to find out how to show 1.001%, not just 0.01001 )
Then, I want to group the first top 10 store. So the top 10 will be labelled as top stores, and the other stores will be labelled as others.
Basically, the final result I want to get is a pie chart to see the top stores' percentage of sales against others.
I kinda know it is possibly aggregation operator, but I still can not find out how to make it work.
The other option is to use Turbo Prep, I am open to this as well.
Thanks for all your help.
0
Best Answer
-
BalazsBarany Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert Posts: 955 UnicornHi,
you can use aggregate with or without a grouping attribute.
If you split up your data into the top 10 and the rest, you just aggregate the rest, generate and rename the attributes to have the same names and types as the top 10, then you can use Append to add the rest row to the top 10.
You can use Filter Example Range twice: first for the rows 1 to 10, then, connecting the ori output to the second one, again 1 to 10, but with "invert filter" checked.
Regards,
Balázs5
Answers
If you sort by the sales descending and use Filter Example Range, you can easily select the top 10. But do the global sum calculation before that.
Then you use Generate Attribute to get the current sales percentage in relation to the total.
There's an operator Format Numbers that takes numerical columns and applies a format string to them. This should help you with the percent format.
If you do it in a process, you can always execute it again with new numbers. With Turbo Prep you'd need to do the same steps again when you get new numbers. (But you can save the Turbo Prep steps as a process, too.)
I wouldn't use a pie chart that just shows the top 10 unless there's also a "all the others" category. It would create a false impression.
Regards,
Balázs