The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Combine documents + weighting
simon_knoll
Member Posts: 40 Contributor II
Hello dear RM Team,
it would be a cool feature if the combine documents operator would have the capabillities to weight incoming documents (the terms of one document are more important then others)
all the best,
simon
it would be a cool feature if the combine documents operator would have the capabillities to weight incoming documents (the terms of one document are more important then others)
all the best,
simon
0
Answers
we have thought about this and think it is a good idea in general. However, assuming that you have something like "label_weight_0.7" in the annotations looks a bit weird. We should at least have a weight meta data or something similar that does not require this parsing operation. How are you constructing this string in your case?
Best,
Simon
doing the weighting within the label was the easiest way for me to integrate it in my program.
Of which string are you talking about?
if you are talking about the string for the label than it goes like that:
first a bit context:
i want to cluster webservices, and for that i have documents related to the service. as not every document has the same importance, i have to weight them.
now how i build the label name:
the prefix is allways the service id, then i have "_weight_" and then i have a weight value like 0.5
e.g.: SMSService01_weight_0.5
all the best,
simon
thanks for clarifying this. Aytually I was thinking about which operator you are using to construct these strings. Is it an RM operator or your own implementation?
Do you agree that this concatenation of strings is not the most elegant solution if we want to incorportate it into the release?
Best,
Simon
The string is not constructed by a rapidminer operator, but by my own code, where im setting the labelnames of create document operators.
But i agree with you that for a release there should be a more elegant/general way. Maybe a metadata which can be set for every document as you mentioned in your previous post.
This was just a quick n' dirty coding which fit into my own implementation. Nevertheless also i would appreciate, if this comes into a release, that one can handle this by metadata for instance.
all the best,
Simon
if you change that so we have an additional meta data field "weight" which always contains a number, I would copy that to the next release. What do you think?
Best,
Simon
sorry for the late answer. I would appreciate that that this feature comes to the next release.
when does the next release will happen?
all the best
simon
we will include weighting into the next major release of the Text Extension. There are many ongoing changes beside this, so it might take some time.
Greetings,
Sebastian