The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
How to change rich text into readable one? (for text mining)
Hi everyone!
Finally i read my database with rapidminer. But, again, there is a problem. My items look like this;
{\rtf1\ansi\ansicpg1254\deff0{\fonttbl{\f0\fnil\fcharset162 Microsoft Sans Serif;}}
\viewkind4\uc1\pard\lang1055\f0\fs17 6 ayd\'fdr sol kol a\'f0. Boyun a\'f0 az.
G\'fc\'e7s\'fczl\'fck ve a\'f0 dan \'e7ok uyu\'feukluk var. NPBY. Belki C7-8 hipoaljezi.
Torasik \'e7\'fdk\'fd\'fe gibi de\'f0il. Miyofasial a\'f0 gibi. \'d6neriler.+\par \par \par \par \par }
How can i change this into a readable text? I need to do text mining
Thanks!
Finally i read my database with rapidminer. But, again, there is a problem. My items look like this;
{\rtf1\ansi\ansicpg1254\deff0{\fonttbl{\f0\fnil\fcharset162 Microsoft Sans Serif;}}
\viewkind4\uc1\pard\lang1055\f0\fs17 6 ayd\'fdr sol kol a\'f0. Boyun a\'f0 az.
G\'fc\'e7s\'fczl\'fck ve a\'f0 dan \'e7ok uyu\'feukluk var. NPBY. Belki C7-8 hipoaljezi.
Torasik \'e7\'fdk\'fd\'fe gibi de\'f0il. Miyofasial a\'f0 gibi. \'d6neriler.+\par \par \par \par \par }
How can i change this into a readable text? I need to do text mining
Thanks!
Tagged:
0
Answers
have already installed the text mining extension? If yes you will find an operator called "Data to Documents" which can be used to migrate an example set to a document object. But to answer your question, currently there is no option to parse rtf code directly in RapidMiner. Maybe you'll find some library or scripting tool you can pipe your data through. What you could try to get the text content from your input is to filter the rtf code via regular expressions (using "Replace" or "Replace Token" operator) with a search pattern like this:
[tt]\{\*?\\[^{}]+}|[{}]|\\\n?[A-Za-z]+\n?(?:-?\d+)?[ ]?[/tt]
Since text mining is a rather complex topic it may be a good idea to take a closer look at some useful introduction videos. A video which shows how to classify texts dealing with different topics can be found here:
http://rapidminerresources.com/index.php?page=text-mining-3
In addition to that Neil McGuigan produced a great series of videos dealing with RapidMiner and Text-Mining which are available via his blog:
http://vancouverdata.blogspot.de/2010/11/text-analytics-with-rapidminer-loading.html shows the first one of the series.
Cheers,
Helge
Yeah, Neil McGuigan's site really helpful
Thank you!