As was foretold, we've added advertisements to the forums! If you have questions, or if you encounter any bugs, please visit this thread: https://forums.penny-arcade.com/discussion/240191/forum-advertisement-faq-and-reports-thread/

Remove Charset Declaration Please?

Premier kakosPremier kakos Registered User, ClubPA regular
Hey, alpha.

In the HTML for the forums, there is the following code:
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">

The charset is defined as iso-8859-1. It's typically bad form to explicitely define the language of a page for internationalisation reasons. Most browsers will force the page into whatever charset is defined in the meta tags and ignore the browser settings. This makes the forums inconvenient to use with unicode and other languages. Could you possibly remove the definition of the character set?

Premier kakos on

Posts

  • SenjutsuSenjutsu thot enthusiast Registered User regular
    edited November 2005
    I too have run into this. Every time some asks for Japanese help in H/A I have to manually change the encoding in my browser to be able to see what I've just typed, and that only persists 'till the end of the session in Safari.

    So, if there's not any reason you need to force a Latin-1 encoding, it'd be handy if that declaration weren't there.

    Senjutsu on
  • RohaqRohaq UKRegistered User regular
    edited December 2005
    Do we get anyone asking for Japanese help often?

    Also, this may sound nasty, but the forum is predominantly English, whilst we do love our international neighbours with all our hearts... it is appreciated when everyone speaks the same language, since it makes discussion a lot easier, and breaks down those international barriers.

    Rohaq on
  • ThanatosThanatos Registered User regular
    edited December 2005
    Rohaq wrote:
    Do we get anyone asking for Japanese help often?
    Yes, usually a couple a week.
    Also, this may sound nasty, but the forum is predominantly English, whilst we do love our international neighbours with all our hearts... it is appreciated when everyone speaks the same language, since it makes discussion a lot easier, and breaks down those international barriers.
    Yes, and I'm sure the moment Alpha changes this code, the forums are going to go into a total breakdown, with everyone starting their own language-based threads; all the Italians will be in the Italian thread, all the Germans in the German thread, all the French people in the French thread... Oh, wait, those languages all use the same character set as English. And we haven't had a problem with it. At all.

    Thanatos on
  • RohaqRohaq UKRegistered User regular
    edited December 2005
    Thanatos wrote:
    Rohaq wrote:
    Also, this may sound nasty, but the forum is predominantly English, whilst we do love our international neighbours with all our hearts... it is appreciated when everyone speaks the same language, since it makes discussion a lot easier, and breaks down those international barriers.
    Yes, and I'm sure the moment Alpha changes this code, the forums are going to go into a total breakdown, with everyone starting their own language-based threads; all the Italians will be in the Italian thread, all the Germans in the German thread, all the French people in the French thread... Oh, wait, those languages all use the same character set as English. And we haven't had a problem with it. At all.
    My point is that we're an English speaking forum, so why enable the use of other character sets? It seems rather pointless.

    Rohaq on
  • autono-wally, erotibot300autono-wally, erotibot300 love machine Registered User regular
    edited December 2005
    Uhm.. why not, really?

    autono-wally, erotibot300 on
    kFJhXwE.jpgkFJhXwE.jpg
  • MetalbourneMetalbourne Inside a cluster b personalityRegistered User regular
    edited December 2005
    I'm worried about spread and abuse by japanophiles.

    Metalbourne on
  • Just_Bri_ThanksJust_Bri_Thanks Seething with rage from a handbasket.Registered User, ClubPA regular
    edited December 2005
    I'm worried about spread and abuse by japanophiles.

    I do believe I have already seen use of Kanji on the forums here. The current setup isn't preventing its use, and changing the setup has other benifits.

    I think your worries are a tad overblown, but... *shrug*

    Just_Bri_Thanks on
    ...and when you are done with that; take a folding
    chair to Creation and then suplex the Void.
  • SenjutsuSenjutsu thot enthusiast Registered User regular
    edited December 2005
    Rohaq wrote:
    Thanatos wrote:
    Rohaq wrote:
    Also, this may sound nasty, but the forum is predominantly English, whilst we do love our international neighbours with all our hearts... it is appreciated when everyone speaks the same language, since it makes discussion a lot easier, and breaks down those international barriers.
    Yes, and I'm sure the moment Alpha changes this code, the forums are going to go into a total breakdown, with everyone starting their own language-based threads; all the Italians will be in the Italian thread, all the Germans in the German thread, all the French people in the French thread... Oh, wait, those languages all use the same character set as English. And we haven't had a problem with it. At all.
    My point is that we're an English speaking forum, so why enable the use of other character sets? It seems rather pointless.
    The posting of other character sets already works, in so far as I can type them in and phpbb accepts them just fine. The only thing having the Latin-1 charset declaration does is cause some browsers to display things wrong.

    There are people on this forum right now with sigs in different charsets, it's really not the end of civilization.

    Edit: And while we may be a english speaking forum, that doesn't mean that we never discuss anything else. This thread was originally created because we were discussing Greek in D&D, and having to manually tell their browsers that, despite the charset declaration in the html, phpbb wasn't really just outputting Latin-1 encoded content got pretty old for people.

    Senjutsu on
  • RocketScienceRocketScience Registered User regular
    edited December 2005
    Next thing you know they'll be wanting Chinese street signs.

    RocketScience on
  • typhoontyphoon Registered User regular
    edited October 2021
    .

    typhoon on
  • nozdormunozdormu Registered User regular
    edited December 2005
    typhoon wrote:
    If they're going to do anything, they should change it to UTF-8 (although that'd mess up all with posts, topics, user names, etc. already in the database using the last 128 characters in ISO-8859-1).

    Leaving it undefined is dumb, though.

    Would it, though? I thought ISO-88599-1 corresponded to the first couple sets of UTF? Anyway, if they're using xhtml, aren't you supposed to make a charset declaration?

    nozdormu on
  • typhoontyphoon Registered User regular
    edited October 2021
    .

    typhoon on
  • Just_Bri_ThanksJust_Bri_Thanks Seething with rage from a handbasket.Registered User, ClubPA regular
    edited December 2005
    typhoon wrote:
    The only characters ISO-8859-1 and UTF-8 encode with the same bytes are the 7-bit ASCII set. If you want to represent the other half of ISO-8859-1, it'd take two to four bytes per character (depending on the character). UTF-8 can't use for individual characters because it has to ensure that, as the Wikipedia page says, "no byte sequence of one character is contained within a longer byte sequence of another character."

    For example, if you used an extended character like an e with an acute accent (é) in a word like "resumé," the first five characters would be encoded the same (since they're just ASCII), but the last one would be 82 in ISO-8859-1 and C3 A9 in UTF-8. Since this page is encoded in ISO-8859-1, changing the encoding to UTF-8 (or anything just about else, really) will screw up that letter. Try it.

    I don't see why the thread starter thinks leaving it undefined will help weird characters display correctly. Imagine I have my default character encoding set to UTF-8 and you have it set to ISO-8859-1. I type a post using a special character like é, and it gets sent to the database and stored as C3 A9 (unless, of course, my UA converts it to the right format when it submits the form, which it won't do because it doesn't know what the "right format" is if the forum doesn't define it). Then you view it, and it gets viewed as the same bytes in ISO-8859-1 (√©), and then you're wondering what the fuck a resum√© is.

    As it stands now, the forum overrides my encoding by specifying its own, giving us some common ground. This is hardly "bad form." Taking that away is pretty stupid. Not defining a character set is basically the worst thing you can do for "internationalisation reasons." The only change here that wouldn't be a regression is changing the default encoding to a more universal one, like a Unicode encoding.

    That was well thought out and written, and I confess to not having given the whole thign as much thoguht,

    Just_Bri_Thanks on
    ...and when you are done with that; take a folding
    chair to Creation and then suplex the Void.
  • ask_leskoask_lesko Registered User regular
    edited December 2005
    typhoon wrote:
    The only characters ISO-8859-1 and UTF-8 encode with the same bytes are the 7-bit ASCII set. If you want to represent the other half of ISO-8859-1, it'd take two to four bytes per character (depending on the character). UTF-8 can't use for individual characters because it has to ensure that, as the Wikipedia page says, "no byte sequence of one character is contained within a longer byte sequence of another character."

    For example, if you used an extended character like an e with an acute accent (é) in a word like "resumé," the first five characters would be encoded the same (since they're just ASCII), but the last one would be 82 in ISO-8859-1 and C3 A9 in UTF-8. Since this page is encoded in ISO-8859-1, changing the encoding to UTF-8 (or anything just about else, really) will screw up that letter. Try it.

    I don't see why the thread starter thinks leaving it undefined will help weird characters display correctly. Imagine I have my default character encoding set to UTF-8 and you have it set to ISO-8859-1. I type a post using a special character like é, and it gets sent to the database and stored as C3 A9 (unless, of course, my UA converts it to the right format when it submits the form, which it won't do because it doesn't know what the "right format" is if the forum doesn't define it). Then you view it, and it gets viewed as the same bytes in ISO-8859-1 (√©), and then you're wondering what the fuck a resum√© is.

    As it stands now, the forum overrides my encoding by specifying its own, giving us some common ground. This is hardly "bad form." Taking that away is pretty stupid. Not defining a character set is basically the worst thing you can do for "internationalisation reasons." The only change here that wouldn't be a regression is changing the default encoding to a more universal one, like a Unicode encoding.

    (Edit: I should add that, if you encode special characters in their HTML entities, they'll display correctly regardless of the character set. So é will display as é no matter what you change the encoding to. So you can get some foreign characters like 럇 in this way.)

    This is correct. But to really make the change to UTF-8, you also need to set the HTTP Content-type entity header to "text/html; charset=UTF-8". Otherwise some browsers behave goofy when they get conflicting information (HTTP header says 8859-1 & meta tag says UTF-8 ). Setting the HTTP header also instructs the UA to encode data (like in a POST) in the same Content-type that the server sent.

    EDIT: I should I also mention that there would have to be server side changes to the posting system to notify php that the content is coming in UTF-8 rather than 8859-1. Not sure how this works in php, in Java you have to make a call to request.setCharacterEncoding("UTF-8") before you read any of the parameters from the request object. I've implemented multi-language websites including languages like Chinese & Arabic and it's non-trivial.

    ask_lesko on
    Get free money from the government to open up a coffee shop!
Sign In or Register to comment.