The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Join / Append /Merge Multiple TD-IDF Example Sets or recompute ?
I'm trying to compare documents from 2 datasets with the data to similarity operator but I'm not sure how to join/merge/append the data sets which contain the TF-IDF results for each word
I can't join because there isn't a common ID
I can't append because there are different tokens in each dataset but I expect there to be some common ones as well
There are also different attribute counts in each dataset (20,000 attributes plus in each example set)
The datasets required different pre-processing to end up with TD-IDF so can I really recompute TD-IDF if I can figure out how to merge the original datasets into 1 before calculating the TD-IDF?
I can't join because there isn't a common ID
I can't append because there are different tokens in each dataset but I expect there to be some common ones as well
There are also different attribute counts in each dataset (20,000 attributes plus in each example set)
The datasets required different pre-processing to end up with TD-IDF so can I really recompute TD-IDF if I can figure out how to merge the original datasets into 1 before calculating the TD-IDF?
Tagged:
0
Answers
have you tried to use cross distances instead of data to similarity?
~martin
Dortmund, Germany
I think about something like this:
Dortmund, Germany
Dortmund, Germany