The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Finding the most similar document(s) in a collection to a test document
While I was using version 4 of Rapid Miner I built a chain to perform this function. It is discussed here:
http://rapid-i.com/rapidforum/index.php/topic,1201.msg4577.html#msg4577 and here: http://rapid-i.com/rapidforum/index.php/topic,680.msg2587.html#msg2587.
With the advent of Rapid Miner 5 I was wondering if there are some new/better operators to allow this function.
The basic requirement is to compare a single (test) document to a set of documents and find the document in the set that best matches the test document (cosine similarity).
Any recommendations?
Thank you.
Charles
http://rapid-i.com/rapidforum/index.php/topic,1201.msg4577.html#msg4577 and here: http://rapid-i.com/rapidforum/index.php/topic,680.msg2587.html#msg2587.
With the advent of Rapid Miner 5 I was wondering if there are some new/better operators to allow this function.
The basic requirement is to compare a single (test) document to a set of documents and find the document in the set that best matches the test document (cosine similarity).
Any recommendations?
Thank you.
Charles
0
Answers
I was not deal with any similar problem, but my idea is to use entropy based representation (available in text mining extension) of documents and than for example usink k-NN you can check the similarity of the documents.
Regards
radone