The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Remove Duplicates
Hi guys, I am new to RapidMiner, so please bear with me.
I am trying to use Turbo Prep to do some data cleansing before analysing because the dataset has many issues.
I will select name/ID and use remove duplicates function.
In this case:
ID: Number of buying:
AA01 8
AA01 10
It seems like RapidMiner will only keep the first row of 'Num of buying'.
Is there any way I can keep the average or sum or max or min of the column of 'Number of Buying'?
I am trying to use Turbo Prep to do some data cleansing before analysing because the dataset has many issues.
I will select name/ID and use remove duplicates function.
In this case:
ID: Number of buying:
AA01 8
AA01 10
It seems like RapidMiner will only keep the first row of 'Num of buying'.
Is there any way I can keep the average or sum or max or min of the column of 'Number of Buying'?
0
Best Answer
-
BalazsBarany Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert Posts: 955 UnicornHi!
Of course this is possible. However, this is called grouping or aggregation. You find it in Turbo Prep under "Pivot".
Regards,
Balázs5
Answers