The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Merge dulicate IDs
Hey there,
I have not found anything in the internet yet to solve my problem, so I'm trying it here.
I have a given Dataset, containing an ID attribute.
The problem now is that some examples use the same ID, for the reason they represent the same entity in reality.
However, not all of this examples contain the same values. For some of them are just missing, and some of them are completly different.
It looks something like this:
I now want to merge the examples with the same ID into one example, so the table looks like this:
The extra attribute should be generated for every new value that is occuring in the same attribute per example.
The missing values should just be filled up with given data.
What i need to know now is the right approach to solve this problem.
Which Operators are suited to solve this (in which order)?
I realy am thankful for every help, since none of my tries have brought me any closer to a solution.
What i thought about is a Loop for each Example and generate something like this, but this process would be huge and i have to check about 43k examples. Maybe there is a easy way to solve this i don't know about.
I have not found anything in the internet yet to solve my problem, so I'm trying it here.
I have a given Dataset, containing an ID attribute.
The problem now is that some examples use the same ID, for the reason they represent the same entity in reality.
However, not all of this examples contain the same values. For some of them are just missing, and some of them are completly different.
It looks something like this:
ID | attr1 | attr2 | attr3 | attr4 |
1 | XX | ? | ? | A |
1 | ? | YY | ? | A |
1 | ? | ? | ZZ | C |
2 | XX | ? | ? | B |
2 | ? | YY | ? | B |
2 | ? | ? | ZZ | D |
I now want to merge the examples with the same ID into one example, so the table looks like this:
ID | attr1 | attr2 | attr3 | attr4 | extr.attr4 |
1 | XX | YY | ZZ | A | C |
2 | XX | YY | ZZ | B | D |
The missing values should just be filled up with given data.
What i need to know now is the right approach to solve this problem.
Which Operators are suited to solve this (in which order)?
I realy am thankful for every help, since none of my tries have brought me any closer to a solution.
What i thought about is a Loop for each Example and generate something like this, but this process would be huge and i have to check about 43k examples. Maybe there is a easy way to solve this i don't know about.
Tagged:
0
Answers
the step without the extra attribute is easy - it's aggregate.
Do you only have nominal values? Then maybe aggregate with concat and a split afterwards does the job?
~Martin
Dortmund, Germany