The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Import a CSV, without false Linebreaks
Hi,
I'm currently writing my Bachelors on Machine-Learning (Sarcasm detection specifically).
My Prof. recommended Rapid Miner.
Here's the problem: when trying to import the corpus I intend to work with (and am already working with in a survey, and that is also used in dozens of works im referencing) Rapid Miner moves content that belongs into the last column into the first, presumably, because the text there contains Line-breaks.
If any other software I had to work with so far did that, I'd probably know what to do...
I try as hard as I can to tell Rapid Miner to disregard everything that isn't a TAB.
Tagged:
0
Best Answers
-
kayman Member Posts: 662 UnicornYou could try to replace every linebreak first with a dummy string (something like [lb]) and replace it back with a linebreak after you loaded it as csv.
Adding linebreaks is a bit of a dirty trick since you can not easily add them with a regex, but what works for me is to first create an attribute with value %0A, which is linefeed char, then decode it using the decode url operator, and store this as a macro. Then you can insert it as a replacement value using the macro.
Or you can replace them upfront using notepad++ or so, here you can replace directly with \\r\\n. (single slash instead of double, but otherwise it doesn't show up here)
Then again, if your csv is looking for tabs as linefeads it should ignore the 'false' ones all together. So could it be there are like unicode tabs in your content that cause this behavior?0 -
BalazsBarany Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert Posts: 955 UnicornIt is true that CSV doesn't have a good specification and some programs can cope better with line breaks inside quoted strings than RapidMiner.
For me the manual conversion into Excel and then Read Excel in RapidMiner was a possible workaround.
Of course I strive to put everything in relational databases as early as possible, so these kinds of problems go away.0
Answers
you can use other software to import the CSV and export it in a more structured format like xlsx or into a database. RapidMiner will read the line breaks without a problem from these.
If you work with a survey software anyway, you should have other export options in addition to CSV.
Regards,
Balázs