The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
"[Text Mining] How to feed SGML format file into dictionary?"
Hello, is there any way to load SGML-fomated file into TextInput operator? If not, are there some ways to convert SGML file into other formats to be loaed in TextInput operator? The loaed file will be combined with the dictionary as the following big picture of my experiment below:
<< Big Picture >>
(1) Documents ==> (2) Dictionary Creation ==> (3) Text Representation (based on either the number of the most frequently occurring words in the documents or Boolean, the exisistence of whether a specific topic words are appearing in the documents) ==> (4) Model Induction (e.g. rule-based induction) ==> (5) Document Classfication Rules
The input file is Reuters-21578 Text Catergorization Collection Data Set from UCI Machine Learning Repository, and the data set files are formated with SGM file tag.
<< Big Picture >>
(1) Documents ==> (2) Dictionary Creation ==> (3) Text Representation (based on either the number of the most frequently occurring words in the documents or Boolean, the exisistence of whether a specific topic words are appearing in the documents) ==> (4) Model Induction (e.g. rule-based induction) ==> (5) Document Classfication Rules
The input file is Reuters-21578 Text Catergorization Collection Data Set from UCI Machine Learning Repository, and the data set files are formated with SGM file tag.
Tagged:
0