The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
How to log the number of positive and negative examples?
When you stop the GUI on an ExampleSet, you can look at the "label" attribute row to see how many positive and negative examples there are in the dataset. But I want to run from the command line and see the dataset class counts in the log.
The DataStatistics operator will write dataset info to the log, but it doesn't include the counts of the label classes. You can add in a DataMacroDefinition operator, but it only offers the total ExampleSet size, not the class counts.
Is there a way to log the class sizes?
The DataStatistics operator will write dataset info to the log, but it doesn't include the counts of the label classes. You can add in a DataMacroDefinition operator, but it only offers the total ExampleSet size, not the class counts.
Is there a way to log the class sizes?
0
Answers
you could first filter the example set according to the label value and then count the examples using the DataStatistics or DataMacroDefinition.
For this purpos I recommend using a ValueIterator, which will give you each value of an attribute as macro and then filter the examples accordingly.
Greetings,
Sebastian
Seems to work pretty well. Is there a way to keep the original ExampleSet and drop the new ones instead of merging the new ones?
I pushed it with large datasets, and it doesn't seem to use as much memory as you might expect from creating new ExampleSets. I assume that's because views into the current ExampleSet are being created and rows are not duplicated.
Still, it seems like a lot of overhead for a simple count...
you could use the IOStorer and IORetriever for storing it if it is not possible to pass it the usual way. IOMultiplier and IOConsumer might help as well.
In general I would recommend to switch to RM 5.0 RC, because the flow layout gives you much more intuitive way of handling such problems.
Greetings,
Sebastian
aeh, maybe I got it wrong but why do you not simply aggregate and count? Use the label as group by attribute and use a count of the label as aggregation attribute. Just one operator and you are done
Here is the process for RM 5 RC (based on the Iris sample data set): Cheers,
Ingo
can anyone help me to set up a process, with which I can filter out examples for which an attribute has a value with seldom occurance. The Aggregate-operator (count) calculates the occurances as described above, but how can I use the result to filter?
Thanks for advice.
Greetings,
ui3o
anybody have an idea on that?
Thanx for help
Greetings,
ui3o
Cheers and happy holidays,
Ingo
thx a lot! didn't know, that my question was not just setting the right parameter in the right operator ... great work and thanks again for you effort.
Best Regards & Viele Grüße
ui3o