The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
How to delete single examples
Hello everybody,
after some time of developing I am facing a pretty simple problem, but don't know how to solve it. Maybe I am concerned too much with other problems to see a simple solution.
While iterating over all examples there appear some that should better be removed. Is there any chance to do this, or will I have to create a new example set and add just the nice examples?
Thanks and best regards
Matthias
after some time of developing I am facing a pretty simple problem, but don't know how to solve it. Maybe I am concerned too much with other problems to see a simple solution.
While iterating over all examples there appear some that should better be removed. Is there any chance to do this, or will I have to create a new example set and add just the nice examples?
Thanks and best regards
Matthias
Tagged:
0
Answers
There is the filter example range operator?
Also there is the the filter examples operator? (e.g. remove all examples with id=40)
Best regards,
Wessel
thanks for the hint. I already took a look at the code of both operators. But "Filter Example Range" is working with SplittedExampleSet, which does not make much sense in my case.
"Filter Examples" uses ConditionedExampleSet instead, which is working by modifying the mapping somehow. But I can't figure out, where the private mapping array really is applied to reproduce this for my iteration. I would like to remove the example when it is evaluated as invalid during iteration. I might set a value like "remove" and use the filter operator to delete all examples containing this value after my iteration. But this isn't really nice and I thought there should be a possibilty to delete a single example directly without having to use some sort of filtering to re-detect it (since it is currently processed).
Any further idea?
Best regards
Matthias
You can set the weight of this example to 0?
Or you can do some bookkeeping yourself?
Like in the code where you will use the data set do something like:
for (Example e : exampleSet) {
if(e["id"] == x) {
break;
}
}
Best regards,
Wessel
Let's say you do normal java, and you have a double[][].
There is not a really good way to delete an entry here either.
Only when storing your data as a linked list you can efficiently delete stuff.
So some extra bookkeeping is probably your best option.
Best regards,
Wessel
thanks again for your answers.
Of course deleting some entries somewhere in the middle of large arrays isn't very efficient. But I thought the example set data structure might provide some functionality to achieve this (and solve it more efficiently internally).
Keeping track of the example ids from the examples I want to keep is quite easy. The class ConditionedExampleSet does this by filling an array with the valid ids and somehow uses this as a new mapping. I would like to do this in a similar way, but I have no idea how to apply such a new mapping. I couldn't figure it out, although looking at the javadoc and the code as carefully as possible (time constraints grow as my thesis is slowly approaching the deadline )
Or did you have something else in mind when talking about bookkeeping?
For now I am just setting a special string content, that is filtered out by "Filter Examples" afterwards. Not really nice, but has to do the job for now
Best regards
Matthias
there is currently no way in RapidMiner to really delete examples from an ExampleSet while you iterate over the set.
ExampleSets provide merely a view on the data provided by the ExampleTable.
Your options are either to create something suitable to your needs yourself like marcin.blachnik posted above, to create a new ExampleTable from your edited DataRows and then create a new ExampleSet on that table, or use something like filtering (as you did).
Regards,
Marco