The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Keep samples based on prefered attribute value
aileenzhou
Member Posts: 12 Contributor II
in Help
I have a dataset, there are some duplicated DOI. I must keep one of the duplicated DOIs based on 'source' attribute with preference: B>C>A, and delete rest.
For example, the data below, I want to keep row 1261 and 643, delete the rest.
Row DOI Source
18 10.1002/67 A
1261 10.1002/67 B
1400 10.1002/67 C
... ...
643 10.102/et.67 C
1428 10.102/et.67 A
Thank you in advance.
For example, the data below, I want to keep row 1261 and 643, delete the rest.
Row DOI Source
18 10.1002/67 A
1261 10.1002/67 B
1400 10.1002/67 C
... ...
643 10.102/et.67 C
1428 10.102/et.67 A
Thank you in advance.
0
Best Answer
-
lionelderkrikor RapidMiner Certified Analyst, Member Posts: 1,195 UnicornHi @aileenzhou,
In this case, (B>C>A) :
Then use the same method as in the other thread, but by generating a new attribute called "Source_2" as described :
- Reorder attributes (1/ Source_2 , 2/ DOI)
- Generate a new attribute (for example called "Source_2") and replace in this new attribute :
*B by 1
*C by 1
*A by 2
- Generate concatenation of "Source_2" and "DOI" attributes (via Generate Aggregation attribute)
- Sort alphabetically the concatenated attributes (via Sort attribute / sorting direction = increasing)
- Remove duplicates of this concatenated attribute.
- Split back the concatenated attribute to retrieve the original attributes (without the duplicates) or remove them.
Take a look at the attached process and tell me if it answer to your need ...
Regards,
Lionel
6
Answers
Dortmund, Germany