The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Concatenate Examples
Dear all,
I think there should be an easy solution for this, but I hjust cant find it:
I have an example set with a few examples, lets say:
atts48
atts67
atts90
I would like to generate a "value" (data or macro) that contains all example values, which would be
atts48 , atts 67, atts90
for the example mentioned above.
Thanks a lot for your help!
Best,
Markus
I think there should be an easy solution for this, but I hjust cant find it:
I have an example set with a few examples, lets say:
atts48
atts67
atts90
I would like to generate a "value" (data or macro) that contains all example values, which would be
atts48 , atts 67, atts90
for the example mentioned above.
Thanks a lot for your help!
Best,
Markus
0
Answers
-Gagi
your idea sounds interesting! If the same ID is given multiple times it concatenates the values? I never thought about this? But another question is how to handle multiple values. A Join operator can always join two example sets, so how to concatenate 3, 4 or more examples?
Best,
Markus
I'm not really sure if I fully got the requirements. Gagi's suggestion seems to aim at multiple ExampleSets which should be joined somehow. But you want to concatenate the values of all examples (rows) in a single ExampleSet (table)? I'm not sure if there is an easier way using some special operators, but this should still be a simple solution for the task: Regards,
Matthias
Set IDs the same for examples you want to join in the same row
Filter on each unique id loop through examples and
Apply join once and then save results and then join again in a loop
Finally save the joined data-set.
The thing that seem strange to me is you have a data table with different examples but you somehow want to combine different examples into one long example?
to make a a bit more obviously why I want to do this:
I transpose my dataset and select the first attribute only (the attributes names). Now I want to concatenate them, to log them and eventually use them as regular expression.
I already thought about your looping workaround. I'll try it.
Best,
Markus
I have almost the same problem but I dont find the solution. I'm very new to rapidminer and I'm not comfortable with macro. I'm trying to concatenate attribute 2 for all the same values of attribute 1. The result should be assigned to attribute 3. Here is a small example :
att1 att2
val1 c
val2 a
val3 a
val2 b
val3 c
val1 b
val3 a
val2 a
so my result should be
att1 att2 att3
val1 c cb
val2 a aba
val3 a aca
val2 b aba
val3 c aca
val1 b cb
val3 a aca
val2 a aba
I tried to do something on colo's code, but I only get :
att1 att2 att3
val1 c c
val2 a ca
val3 a caa
val2 b caab
val3 c caabc
val1 b caabcb
val3 a caabcba
val2 a caabcbaa
Do you know if there is any simple way to do this kind of thing ?
My code example aimed at concatenating all example values into on example.
This here should do the job for you. Looks a bit dirty, but was the first solution coming to my mind Regards,
Matthias
It's working very well on this example, but I think the performance isn't very good on large dataset. Indeed, for every example, it's filtering and generating macros even if you already did it before.
So I tried to sort my dataset by attribute 1, then I add a macro "test" (initialize to "") to test if attribute 1 is the same as the previous iteration. If yes, i just add the att3_value to attribute 3 on the current example. If no, I change the value of "test" and I'm using your process.
But I'm getting an error when I compare my macro test to the current value of attribute 1 and I don't understand why :
Message: "" == value0: Unrecognized symbol "value0"
Maybe my code will be easier to understand : Thanks for your help.
to get your process running you should replace the "Set Macro" operator used for the generation of macro "test" (right before the Sort operator) by a "Generate Macro" operator. Unfortunately you have to be carefull using quotes for defining strings. Basically "Generate ..." operators use an expression parser thus allowing the usual declaration of strings by quotes. Other operators as "Set Macro" don't allow this - here you have to enter the desired value directly. Setting the value to "" doesn't result in an empty string but generates a string value containing two quotes.
Then you simply need to add quotes to the Branch operator's condition parameter to compare two string values correctly (here an expression parser is used again - as the condition type "expression" says): "%{test}" == "%{att1_value}". That's it - but it won't generate the results you want You use att3_value in the Then-part of the Branch operator where it isn't defined yet.
In my last example I simply built a process for solving the task you described but didn't really notice, that att3 only depends on the value of att1 (att2 is only needed for building the value). So you could build all valid values for att3 in a first step and then simply filter examples by att1 in a loop and set att3 for all examples. This shouldn't be too hard. Good luck
Regards,
Matthias
Thank you very much for you answer. It helped me so much to understand how macros are working.
I found a way to optimize a little bit your first code. I'm finally just using the "attribute_value_filter" condition in the branch operator. In this one, I'm using this condition : att1 = %{att1_value} [%{example}]
So now, I'm only calculating the new value of att3 when I have a new value of att1. I have a real gain of time on a big dataset (6min to 3min).
Here is my code: If anybody have a better idea, I'll take it.
would be cool, if you could contribute your processes to myExperiment! That's an easy way to reuse them for other users.
Greetings,
Sebastian
This script is uploaded.
You have to make sure that att3 exists.
For this you can use the Generate Empty Attribute operator.
ExampleSet es = operator.getInput(ExampleSet.class);
for (Example e : es) {
e["att3"] = e["att1"] + " " + e["att2"];
}
return es;
This is ugly (it would be nice if the Aggregate operator had the option to pick the concatenation character), but this takes a list like:
my_string
________
giants
patriots
lions
eagles
And returns: 'giants','patriots','lions','eagles'
Or as a process that takes a macro called "attribute" as the feature and returns an attribute called "list"