The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Metadata view - "Statistics" and "Range" attributes
hello,
when I process documents from files with rapidminer,
and I get the results about the tf-idf number of each term of the documents,
I wish to get the terms sorted according to their tf-idf weight (in descending order).
For this reason I switch to the "metadata view" in the "Example Set" tab, and I distinguish two ways to sort them:
One according to the "Statistics" attribute (4th column normally) or according to the "Range" attribute (5th column normally).
one example value of a term in the "Statistics" column looks like this: avg = 0.049 +/-0.084
and one of the "Range" column would look like this: [0.000 ; 0.290]
When I roll over the titles of these columns I get a really sort and incomprehensible message about what they are supposed to show.
My question is pretty simple: Can anybody explain in a simple way, how these numbers, presented in these two columns, are calculated
and what they express?
I have no idea of statistics and maybe I'm looking in the wrong place for the kind of sorting I need. If someone can make things more clear, it will be really helpfull.
thank you for your time.
when I process documents from files with rapidminer,
and I get the results about the tf-idf number of each term of the documents,
I wish to get the terms sorted according to their tf-idf weight (in descending order).
For this reason I switch to the "metadata view" in the "Example Set" tab, and I distinguish two ways to sort them:
One according to the "Statistics" attribute (4th column normally) or according to the "Range" attribute (5th column normally).
one example value of a term in the "Statistics" column looks like this: avg = 0.049 +/-0.084
and one of the "Range" column would look like this: [0.000 ; 0.290]
When I roll over the titles of these columns I get a really sort and incomprehensible message about what they are supposed to show.
My question is pretty simple: Can anybody explain in a simple way, how these numbers, presented in these two columns, are calculated
and what they express?
I have no idea of statistics and maybe I'm looking in the wrong place for the kind of sorting I need. If someone can make things more clear, it will be really helpfull.
thank you for your time.
0