The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
how to keep a partly duplicated sample
aileenzhou
Member Posts: 12 Contributor II
in Help
I have a dataset, there are some duplicated DOI. I must keep one of the duplicated DOIs based on 'source' attribute with preference: A>B>C, and delete rest.
For example, the data below, I want to keep row 1261 and 643, delete the rest.
Row DOI Source
18 10.1002/67 B
1261 10.1002/67 A
1400 10.1002/67 C
...
...
643 10.102/et.67 A
1428 10.102/et.67 C
Thank you in advance.
For example, the data below, I want to keep row 1261 and 643, delete the rest.
Row DOI Source
18 10.1002/67 B
1261 10.1002/67 A
1400 10.1002/67 C
...
...
643 10.102/et.67 A
1428 10.102/et.67 C
Thank you in advance.
0
Best Answer
-
lionelderkrikor RapidMiner Certified Analyst, Member Posts: 1,195 UnicornHi @aileenzhou,
If I good understand, one way to do that is to :
- Reorder attributes (1/ Source , 2/ DOI)
- Generate concatenation (via Generate Aggregation attribute)
- Sort alphabetically the concatenated attributes (via Sort attribute / sorting direction = increasing)
- Remove duplicates of this concatenated attribute.
- Split back the concatenated attribute to retrieve the original attributes (without the duplicates)
Take a look at the attached process and tell me if it answer to your need ...
Regards,
Lionel5
Answers