The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
"Text Processing - How to track which are the exact documents contain the word?"
Tan_Koon_Chin
Member Posts: 4 Contributor I
Hi all,
I have processed the TEXT MINING operators and obtained the ExampleSet (WordList to Data) & WordList (Process Documents From Files). Number of occurrence for words has been shown in the result too. How about if I wish to determine the words in result belong to which documents?
Example: The word "apple" appears 100 times in 80 documents. How to track and determine which are the exact documents contain the word "apple"? What am I missing here? Any solution for it?
Thanks in advance.
Regards.
I have processed the TEXT MINING operators and obtained the ExampleSet (WordList to Data) & WordList (Process Documents From Files). Number of occurrence for words has been shown in the result too. How about if I wish to determine the words in result belong to which documents?
Example: The word "apple" appears 100 times in 80 documents. How to track and determine which are the exact documents contain the word "apple"? What am I missing here? Any solution for it?
Thanks in advance.
Regards.
Tagged:
0
Answers
Take a look at the following process. The example set output contains labels corresponding to the document and by using term occurrences when processing the documents, you can see the word counts for each document. regards
Andrew
How about if multiple documents have been processed?
(If just a few documents can use "Create Document" operator and label each of them)
For example, the result of WordList shown is as below:
Word Total Occurrence In Documents
Apple 200 180
Orange 150 130
Strawberry 90 50
The result reveals that "Apple" appears 200 times in 180 documents.
Is there any method to know that which are those 180 documents from the analysis result? (E.g. Doc. 10, Doc. 16, Doc. 45)
Regards,
Tan
Andrew
Best Regards.