The new forums will be named Coin Return (based on the most recent vote)! You can check on the status and timeline of the transition to the new forums here.
The Guiding Principles and New Rules document is now in effect.

MySQL character sets need to die. Explain them to me!

Jimmy KingJimmy King Registered User regular
Alright, I need some help with MySQL and its ridiculously over complicated character set configuration. Explain here, give me links, whatever. I've read all of the documentation of it on the mysql website. It is sorely lacking.

From the command line client, my character set settings look like so:
+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | latin1                     |
| character_set_connection | latin1                     |
| character_set_database   | utf8                       |
| character_set_filesystem | binary                     |
| character_set_results    | latin1                     |
| character_set_server     | latin1                     |
| character_set_system     | utf8                       |
| character_sets_dir       | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+

Everything is happy when using the client like this. I've got some crazy arabic text in the db from some data imported from wikipedia. It appears properly as far as I can tell - I would past here, but it goes all crazy. Unfortunately, when I retrieve the results using java and mysql connector/j or with phpMyAdmin, those same arabic characters that are perfectly fine in the db come out as garbage. Lots of reading and testing leads me to believe this is probably directly related to that character_set_results setting.

If I change character_set_results in mysql client to utf8, then it comes out as garbage, just like I am getting with java and phpMyAdmin. I have tried setting the character_set_results to latin1 when using java, but I still get garbage, which probably means I didn't do it right, but again, the documentation blows so all I can really do is guess and copy what people on forums claim to have done.

So, how do I set the character_set_results with connector/j? I want to set a default for it, not have to issue a specific "SET character_set_results='latin1'" as a query before I do anything.

What the fuck is this character_set_results anyway? It's been made pretty clear that it is the character set that the client is expecting the results back in from mysql. Why does changing it to utf8 cause my shit to be all garbled even when everything on both server and client are set for utf8? I guess mysql is sending the results back encoded in latin1 anyway? Then what is the point of this setting? What other setting do I also need to change? How do I change them for my connector/j settings? php?

Jimmy King on

Posts

  • Jimmy KingJimmy King Registered User regular
    edited June 2008
    Got it sorted out. In case anyone else ever comes across a similar problem, since this stuff seems pretty damned important and complex, given how little documentation there is on it and the fact that I've never once seen this stuff mentioned until I started looking for a resolution to this problem.

    The issue was something related to having utf-8 encoded characters going to a utf-8 column, but several of those settings are set to latin-1 as you can see. I'm guessing character_set_connection was the trouble maker here. It caused mysql to think it was getting latin1 encoded data, when really it was utf8. So then it tried to convert it from latin1 to utf8 when it was inserted. Then when the data was being retrieved, it got converted around all ass backwards again.

    So, the lesson here, is that if you're going to insert data that is encoded in utf-8, make sure character_set_connection is utf8, possibly some of the others, too, but I don't know yet.

    Jimmy King on
  • typhoontyphoon Registered User regular
    edited October 2021
    .

    typhoon on
  • zanetheinsanezanetheinsane Registered User regular
    edited June 2008
    Just try to use Unicode for everything. Java doesn't like anything else.

    zanetheinsane on
  • Jimmy KingJimmy King Registered User regular
    edited June 2008
    typhoon wrote: »
    I always do:

    SET character_set_client=utf8
    SET character_set_results=utf8
    SET character_set_connection=utf8

    But yeah, you'll go crazy trying to make this shit work.
    Yeah, that's what I ended up doing. Had to update my data with those settings before it worked, though, which makes sense now that I've got an idea of what's going on but gave me fits before.

    I'm really surprised there's not more documentation on this. MySQL's website has a quick mention of character_set_results, and a couple pages explaining the heirarchy of the character sets from db to table to column. I've worked with MySQL daily for 1 1/2 years now, plus on and off work with it before that, and have never come across a mention of those variables and their effects depending on how your data is actually encoded until I had to start searching to resolve this issue. Nearly all of my work has been in Perl which appears to be smart enough to just sort out the encodings even with bad settings, so I'd never seen this problem before.

    Jimmy King on
Sign In or Register to comment.