The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
[SOLVED] Set operations with mismatched ID attributes
Here's my situation:
I am trying to create a time series based upon collection of data from about 90 different servers. I am attempting to track changes in inventory levels where inventory is stored as a list of words (eg. Apple, Apple, Pear, Apple, Kiwi, Mango). The basic idea is as follows:
Process:
1. Grab information using Web API key - > Save as xml.
2. Use the "Process Documents," "Cut Document" and "Extract Information" operators to create a word list (inventory list and count)
3. Use "Word List to Data" operator to create an example set.
Result
1. An example set with "word" as ID attribute and total as a regular attribute.
Here is an example of the kind of data that I expect to get after I use "Process Documents":
Parse #1 - (time 1)
Apples, 3
Oranges, 2
Bananas, 2
Parse #2 - (time 2)
Apples, 1
Oranges, 5
Bananas, 3
Kiwis, 7
(I have made it to this point successfully)
In order to track changes in inventory, I need to merge the results. As you can see, the problem with merging these two example sets is that the ID ("word") attribute column does not match. Parse #2 contains 7 Kiwi ("word") entries that parse #1 did not have. My problem is that I don't know how many different items there are, and therefore suspect that I will need to keep adding to my base ID attribute when I try to merge the parsed data into one spreadsheet or database.
One way to do this is to isolate the new ID attributes and run some set operations. I envision it would go something like this:
1. Load base ID attribute & new ID attribute
2. Use a "Set Minus" operator to find new ID attribute entries
3. Use a "Join" operator to merge Base ID attribute and new ID attribute entries
3a. Replace Base ID attribute
4. Use a "Join" operator to merge Base ID attribute and new ID attribute entries
4a. Replace new ID attribute
5. Use "Replace Missing Values" operator to make sure both new and old data align (e.g. Parse #1 had zero kiwi)
6. Use an "Append" operator to merge the data that should now have identical ID attribute.
I think this should work, but it seems convoluted. Do you understand what I am trying to do? Is there an easier way to do this?
Your help is much appreciated!
I am trying to create a time series based upon collection of data from about 90 different servers. I am attempting to track changes in inventory levels where inventory is stored as a list of words (eg. Apple, Apple, Pear, Apple, Kiwi, Mango). The basic idea is as follows:
Process:
1. Grab information using Web API key - > Save as xml.
2. Use the "Process Documents," "Cut Document" and "Extract Information" operators to create a word list (inventory list and count)
3. Use "Word List to Data" operator to create an example set.
Result
1. An example set with "word" as ID attribute and total as a regular attribute.
Here is an example of the kind of data that I expect to get after I use "Process Documents":
Parse #1 - (time 1)
Apples, 3
Oranges, 2
Bananas, 2
Parse #2 - (time 2)
Apples, 1
Oranges, 5
Bananas, 3
Kiwis, 7
(I have made it to this point successfully)
In order to track changes in inventory, I need to merge the results. As you can see, the problem with merging these two example sets is that the ID ("word") attribute column does not match. Parse #2 contains 7 Kiwi ("word") entries that parse #1 did not have. My problem is that I don't know how many different items there are, and therefore suspect that I will need to keep adding to my base ID attribute when I try to merge the parsed data into one spreadsheet or database.
One way to do this is to isolate the new ID attributes and run some set operations. I envision it would go something like this:
1. Load base ID attribute & new ID attribute
2. Use a "Set Minus" operator to find new ID attribute entries
3. Use a "Join" operator to merge Base ID attribute and new ID attribute entries
3a. Replace Base ID attribute
4. Use a "Join" operator to merge Base ID attribute and new ID attribute entries
4a. Replace new ID attribute
5. Use "Replace Missing Values" operator to make sure both new and old data align (e.g. Parse #1 had zero kiwi)
6. Use an "Append" operator to merge the data that should now have identical ID attribute.
I think this should work, but it seems convoluted. Do you understand what I am trying to do? Is there an easier way to do this?
Your help is much appreciated!
0
Answers
ID, T1, T2
Apples, 3, 1
Oranges, 2, 5
Bananas, 2, 3
Kiwis, 0, 7
That can be done with the Join operator (mode: outer join), followed by a replace missing values. Probably you have to rename the attributes of one of the examplesets beforehand.
Please let me know if I understood you correctly and if this works. If not, please define "merge the example sets" and give an example of the desired output.
Best regards,
Marius
In my particular case, my second example set added Kiwi as part of the key attribute. Since the first example set doesn't include Kiwi, don't I need to merge the key attributes of the example sets before I can use the join operator?
But seriously, just give things a try - in this case, the join operator does the job quite well, as you can also see in the attached process. Please note that I deactivated the option remove_duplicate_attributes.
Best regards,
Marius