The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Closest encoding to utf8mb4
I am working with social media data and all those emojis are driving me crazy, when I import them they are getting changed to system encoding and are a bunch of squiggles. What encoding is closets to utf8mb4 so that I can preserve the encoding when reading from a CSV?
0
Best Answer
-
Robi_Me Member Posts: 32 Maven@jwpfau when I am importing into the DB it is failing saying the character is not UTF8 with error message: Incorrect string value: '\xE2 \x94 \x82....'
This is basically all of the emojis that were being rejected. I was under the impression that I needed to set the encoding inside of Rapid Miner, however it was a change that was needed on the DB. Changing the free text field to TEXT and making the encoding UTF8mb4 sorted the issue out.0
Answers
UTF8MB4 is a workaround for the broken UTF8 type in mysql which only supports up to 3 byte character.
In the csv export it should be just regular utf-8.
Maybe the selected RapidMiner Studio font doesn't contain all the smileys and is displaying squares instead?
Greetings,
Jonas