The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

I have funny characters in my example sets. I suspect an encoding problem.

User13User13 Member Posts: 155 Maven

Problem:

Encoding settings of the database, the settings of a database connection configured in RapidMiner Studio or Server, or the JBoss instance that hosts RapidMiner Server are incorrect. Many file input operators can also specify an encoding.

Solution:

You should use utf8 wherever possible. Database settings can be made per


  • Database
    : In MySQL, use “ALTER DATABASE xxx DEFAULT CHARACTER SET utf8”


  • Table
    : Newly created tables will inherit from the default character set and can be otherwise specified in the CREATE statement.


  • RapidMiner Studio/Server JDBC connection
    : Set the appropriate connection properties (see below for a list). In RapidMiner Studio this is possible via Tools > Manage Database Connections > Advanced. In RapidMiner Server this is possible by going to Administration > Database Connection, editing a connection and going to the Advanced Settings tab. In places where you can only edit the JDBC-URL, parameters need to be appended, see the following article:
    “How do I configure properties of database connections defined in RapidMiner Server?”


  • JBoss datasource configuration
    : Same as above, add the parameter to the database URL specified in the standalone.xml file. See other FAQ article about this.

The encodingName you want to use is almost always utf8. What exactly the name of the JDCB property is, depends on the database. Known values are:


  • MySQL:
    characterEncoding

  • MS SQL Server via JTDS driver: CHARSET


  • Oracle:
    charset

Processes can configure the encoding via parameters of input operators.

  • Read CSV and similar operators offer parameters to select the encoding of the parsed file.

Pitfall in MySQL: In some tables, Quartz uses three varchar(255) as a key. In utf8, this is 3*3*255 bytes which is above the default maximum key length. (This is compiled into MySQL and cannot be changed.) Therefore, quartz tables must be created in latin1 encoding. Workaround: Set default encoding of database only after quartz tables are installed.



Sign In or Register to comment.