The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
"Data to text Analysis"
Hi,
I have data in this format:
Code Text
A This is some text that could by anything.
B This is some other text relating to something else.
A This is more text.
A Yet more text
C Another line with more text.
I import the data with Code=Label and Text=Text. I process this with the DataToDocuments operator followed by ProcessDocuments. You get the idea. Now, in the end, I want to know:
What is common for A, B and C. In other words, what defines A, B and C in terms of word frequencies in the text for each. I don't know RapidMiner well enough to work out the last part.
Can anyone please direct me in the right direction?
Much appreciated.
B
I have data in this format:
Code Text
A This is some text that could by anything.
B This is some other text relating to something else.
A This is more text.
A Yet more text
C Another line with more text.
I import the data with Code=Label and Text=Text. I process this with the DataToDocuments operator followed by ProcessDocuments. You get the idea. Now, in the end, I want to know:
What is common for A, B and C. In other words, what defines A, B and C in terms of word frequencies in the text for each. I don't know RapidMiner well enough to work out the last part.
Can anyone please direct me in the right direction?
Much appreciated.
B
Tagged:
0
Answers
what you would like to do is an interesting but also tough datamining task.
Maybe this works:
After your document processing (with probably filtering, pruning, TFID, etc.) you can try to apply a Weight by SVM or Weight by Value in order to find the descriptive terms (=Attributes after the the Doc processing) for each class. Do not expect perfect results, you might need to filter afterwards and experiment with the document processing.
The Weights to Data operator transforms the weight list into a ExampleSet which you can process further with the usaual operators.
Ciao Sebastian
P.S. Does anybody have better/other ideas?