Replicating Excel Pivot Table features
Hello,
Often I use excel to make a deeper inspection of my process results such as inspecting cluster composition based on the attributes, or inspecting GLM predictions on a scoring set. E.g. What is the distribution in each cluster for different cities of City attribute, or how are the predicted leads distributed across the various business Categories and Region attributes, etc. And I need to do this back and forth quite a few number of times until I'm satisfied that the scored data looks very similar to training data, or that my clusters are distinct enough. As a habit, I do not blindly trust an algorithm or a computer program.
I have used the pivot operator in RM but unlike excel, first of all it does not give me totals for rows or columns of the pivot table. Secondly, I'm not sure how to go about changing the numbers to percentage of row/column total or percentage of parent row/column total, etc. on the fly like in excel. I have been able to adjust the significant digits after a decimal point, but that's about it. Also the index attribute defined in the Pivot operator does not get passed to the next set of operators. E.g. If the index attribute was cluster, and the pivot table contained attributes cluster_0, cluster_1, 2, 3, etc... they do not get passed to the next operator - say, generate aggregation, or select attributes, etc. I have to manually type them in.
The reason I am looking for a solution inside rapidminer is to avoid having to do so much back and forth and keep recreating the same pivots over and over again with just different data values. If I can build the whole thing inside RM, I just need to hit play everytime and stretch my arms and relax until it finishes and dumps all pivot tables and other outputs onto the results window. Thank you very much.
Answers
Hi,
"If I can build the whole thing inside RM, I just need to hit play everytime and stretch my arms and relax until it finishes and dumps all..."
That's the spirit :smileyvery-happy:
In order to replicate the Excel Pivot functionality you actually need to combined 2 to 3 operators in RapidMiner. They are "Pivot", "Aggregate", and sometimes even "Generate Attributes".
Why those three? Because Excel is actually lying to you. The Pivot operation itself is actually only the rotation, but they built in the total / grouping / aggregation functionality into their Pivot tables as well. While this is certainly useful, it actually is more than just pivoting...
Anyway, maybe the hint above is already helpful. If not, and if you could share some data with us (in case it is not senstitive), we could try to help you building such a process delivering the desired output. Best would be an Excel file with the original data (like it is in RapidMiner) and then another tab showing the Pivot table like you want it to be. We can then work together on building the process for this and I am sure you can take it from there...
Cheers,
Ingo
Thank you @IngoRM, I guess I am lazy like that sometimes lol. You know what they say though, lazy people tend to find clever solutions :smileytongue: or atleast I hope to! Anyways thank you very much for your help once again. I will go ahead and try the aggregate and generate attributes to see if I can figure it out. But should you wish to take a look also, I am also sharing the data (anonymized) and the rapidminer process xml to give you an idea of what I'm trying to achieve. The excel file contains two sample pivots on one sheet and output of clustering on another. You can use the output data as input to the process by excluding the cluster attribute if you wish.
Best regards and a good afternoon!
@IngoRM attached with this post is the sample pivots. I found that I could not upload an xlsx with multiple sheets on the forum so I broke up the file into two csv files. The sample pivots no longer behave as excel pivots unfortunately. Sorry about the inconvenience.
You may also be interested in the Crosstab operator, which is available in the Statistics extension from the Marketplace. It essentially combines some of the steps into a single operator for convenience. But as Ingo already said, with Aggregate and Generate Aggregation you can already do everything you need inside the basic RapidMiner Studio.
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts
Ok, here we go:
I only did this for the Ethinicty and also added a couple of operators to format things a bit nicer (you know, like better names, percentage symbols and stuff like this). I hope you can take it from here...
Have a nice weekend,
Ingo
Thank you @IngoRM, sorry I couldn't reply sooner. Hope you had a nice weekend yourself. Greatly appreciate you helping me here! My sincere thanks :smileyhappy: