The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
How to use filter stop words(dictionary) operator
Hello All,
I'm struggling with the filter stop words operator, I have a table that has 3 columns. column_1 has a list of words, and the other column has text. I need to remove the list of words(which are present in column_1) from the text column.
and also how can we remove user-defined/specific words from the text?
If I use the filter stop words operator, how can I convert the text column into the document as input to the operator?
Thanks in Advance!
I'm struggling with the filter stop words operator, I have a table that has 3 columns. column_1 has a list of words, and the other column has text. I need to remove the list of words(which are present in column_1) from the text column.
and also how can we remove user-defined/specific words from the text?
If I use the filter stop words operator, how can I convert the text column into the document as input to the operator?
Thanks in Advance!
0
Best Answer
-
SabaRG Member Posts: 13 Contributor IIHi @Anusha
There are many options for this task:
1- Using the "Replace" operator from "Blending\Values" to replace your words using Regular Expression. (it is a simple way).
2- You can use "Filter Tokens Using ExampleSet" from the "Operator Toolbox\Text Processing" extension or "Filter Stopwords" from "HanMiner\Processing\Filtering" to define your stopword list and remove them from a document. In this case, you have to change your data to document and vice versa, so you can use "Loop Examples" and use the below operators to do your job:
a) Use "Filter Example Range" to "%{example}" as the macro for the current row.
b) Use the "Extract Document" operator from the "Text Processing" extension to convert your column attribute to a document with index 1 for example index
c) Use your "Filter Stopwords" operator
d) Use "Documents to Data" to convert your document to an example set again
e) Use "Cartesian Product" to add your new data to other data
f) Use "Select Attribute" to filter and remove old data
Finally, you should use an "Append" outside the "Loop Parameters".
I suggest using the first approach, but if you need other operations like tokenizing, stemming, ..., the second approach is appropriate.1