The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

How to remove non-duplicate values?

MarlaBotMarlaBot Employee-RapidMiner, Member Posts: 57 Community Manager
edited March 2019 in Help
A RapidMiner user wants to know the answer to this question: "Hey! I have a data set of over 42000 records that has several duplicate and unique values. However, I would like to clean it up and remove only non-duplicate values and leave duplicate records. I know the “remove duplicates” operator removes duplicates but in my case, I want to do the opposite. Any idea how I could do this? Thank you."
Tagged:

Answers

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,533 RM Data Scientist
    Hi,
    cant you just join the duplicates on the original data? Than you have only duplicates remaining.
    BR,
    Martin
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • sgenzersgenzer Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager
    hi @MarlaBot so the Remove Duplicates operator has both options:



    Does this help? :smile:

    Scott
  • rfuentealbarfuentealba RapidMiner Certified Analyst, Member, University Professor Posts: 568 Unicorn
    Hey,

    You have 42000 records.

    Some are duplicate.
    Some are unique.

    If you need the non-uniques, the dup output from the Remove Duplicates operator obtains the records that aren't unique.

    Sorry, I was lost in translation, had to reorganize the question because I understood like, 3 different things. Yes, @sgenzer's question is fine. If what is required is an aggregation (like, the count of duplicated events), what @mschmitz says helps, too.
  • novice_minernovice_miner Member Posts: 3 Learner III
    Thanks for all your help. It worked like magic. 

    Best, 
  • Telcontar120Telcontar120 RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn
    I think this is the same question as in this thread, where I provided a similar answer:  https://community.rapidminer.com/discussion/comment/57000#Comment_57000
    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
Sign In or Register to comment.