The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
"Text Mining Missing Values"
Dear All,
Using
"Data to Documents"
and
"Process Documents"
I get a nice document array.
Where rows are documents and columns are word frequencies.
How to set word frequencies to missing, when text length (e.g. document length) is smaller than some value?
Like
Document 1 has no occurrences for any word, because text is missing (or has some very small length).
Document 2 has a count of 10 on word1, and a count of 0 on word2.
Right now document 1 has a count of 0 for all words. This is true, but I'd like to be able to set a special case when when text is missing or of small length.
Best regards,
Wessel
Using
"Data to Documents"
and
"Process Documents"
I get a nice document array.
Where rows are documents and columns are word frequencies.
How to set word frequencies to missing, when text length (e.g. document length) is smaller than some value?
Like
Document 1 has no occurrences for any word, because text is missing (or has some very small length).
Document 2 has a count of 10 on word1, and a count of 0 on word2.
Right now document 1 has a count of 0 for all words. This is true, but I'd like to be able to set a special case when when text is missing or of small length.
Best regards,
Wessel
Tagged:
0
Answers
You could count the number of tokens using Extract Information (I think). This gets added as a special attribute. Then a bit of example filtering or attribute generation would be needed.
Regards
Andrew