The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
[SOLVED] Transform Document-Term matrix to flat table?
RWingerter
Member Posts: 38 Contributor II
A newbie question.
I have a simple process which uses „Data to Documents“, „Process Documents“ and „Tokenize“ to turn a list of strings into a wordlist.The second result is my ExampleSet turned into a Document-Term Matrix.
My question is: How can I transform the Document-Term matrix (Document_ID x Term) to a flat table with three attributes (Document_ID, Term, occurrences)?
Regards,
Roland
I have a simple process which uses „Data to Documents“, „Process Documents“ and „Tokenize“ to turn a list of strings into a wordlist.The second result is my ExampleSet turned into a Document-Term Matrix.
My question is: How can I transform the Document-Term matrix (Document_ID x Term) to a flat table with three attributes (Document_ID, Term, occurrences)?
Regards,
Roland
0
Answers
It is more likely to get an answer by posting a (self-)process with a small chunk of your data. Currently I am not sure what you have and what you want.
Best
Marcin
thanks for your reply. Here is my example data and my simple process.
The input is a list of user queries (query_id, query, frequency), which is processed with "Process Documents from Data". The result is a word list and a document-term matrix. In addition, I would like to get a term-document table with Term, Query_ID, and TF*IDF, e.g.
Term Query_ID TF*IDF
---------------------------------
Term1 1 0.34
Term1 2 0.23
Term2 3 1.00
I tried various things without success. Maybe it's not difficult to do, but I didn't manage.
Sample data: Code: Any and all help welcome.
Thank you
Roland
thank you very much, it works like a charm. I had looked at the "De-Pivot" operator, but I had no idea how to adress the attribute names. I am not saying I understand your code (that will certainly take a while), but for now I am just happy to have a solution. Thanks again.
Kind regards
Roland