The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Text mining of mailing list traffic
I've just installed RapidMiner 5.2 and just noticed there is no importer for mailing box format. I'm interested in extracting mailing word frequencies.
Do you know any workflow or tutorial to perform this task with RapidMiner?
Right now I've managed to export the traffic in one big file in CSV format (from Thunderbird) but the RapidMiner CSV importer-parser gets very confused recognizing columns. Sample data can be found in the following list:
http://lists.gforge.inria.fr/pipermail/ecm-discuss/
Any help would be appreciated.
Do you know any workflow or tutorial to perform this task with RapidMiner?
Right now I've managed to export the traffic in one big file in CSV format (from Thunderbird) but the RapidMiner CSV importer-parser gets very confused recognizing columns. Sample data can be found in the following list:
http://lists.gforge.inria.fr/pipermail/ecm-discuss/
Any help would be appreciated.
0
Answers
did you try the Read Documents (Mail) operator from the text mining extension?
Best,
Marius
Thanks
Andrea
Best, Marius
Just download and uncompress any file which is in in gzip format: http://lists.gforge.inria.fr/pipermail/ecm-discuss/2012-March.txt.gz
Import into a Firebird new folder.
Install this extension/add-on: ImportExportTools
Right click in the folder, Import/Export, Export all messages in the folder, Spreadsheet (CSV)
Let me know if you cannot reproduce the problem.
Cheers,
Andrea
the problem is that RapidMiner reads csv files line-wise. If a field contains linebreaks, they are ignored, even if the field is quoted. MS Excel seems to have the same problem. What I could do was:
1. Import the file with OpenOffice
2. Save it as MS Excel file
3. Import the xls file with RapidMiner
This worked for an exported folder of my own mailbox. I don't know however if that is scriptable for a huge number of files.
Happy Mining!
~Marius