Mistrealm

Programming

Encoding

What is a character encoding?

If you are curious about how we got to where we are at, Brief History of Character Codes.

When you see a character on your screen, it must seem pretty self explanatory, "Oh, there is the letter J". For your computer to know what character to show you, some oddly complex processes have to go on behind the scenes. The character sets were standardized into something called ASCII:

http://en.wikipedia.org/wiki/ASCII

This encoding is still fairly common, especially on older computer systems. The simplicity of ASCII is, unfortunately, not able to handle all the characters that exist today.

ASCII is being replaced by a new unicode standard, UTF-8:

http://en.wikipedia.org/wiki/UTF-8

This new standard is (mostly) backward compatible with ASCII, and also supports all the different international character sets. One of the only down sides is that characters no longer take up a single byte of data, but could take up to four. This is the price we pay for gaining access to international character sets.

Your browser will probably default to the ISO-8859-1 character set, which contains the characters used in Western European countries.

What does that mean for my web page?

The UTF-8 encoding is more flexible than ISO-8859-1, and will probably be made standard at some point, but for the moment, we have to encode the character set information into our web pages.

If you are using HTML5, which I recommend, use this one in your HEAD section:

If you are using XML, use this one at the top of your document:

<?xml version="1.0" encoding="ISO-8859-1"?>

If you are using something else, use this one in your HEAD section:

Some further information:

Character encodings (w3)
Character encoding (wiki)
Character encodings in HTML (wiki)
HTML ISO-8859-1 Reference (w3schools)

Related topic DocType.

Mistrealm

Programming

Encoding

What do you think?