When computers were first invented no-one ever thought that more than a handful of characters would be needed. It was decided that 256 characters (1 bytes per character) would be enough. Unfortunately, this original character set does not include six of the special characters used in Turkish, ı (small dotless I); İ (large dotted I); ğ (soft g) and Ğ (soft G); ş (s cedilla) and Ş (S cedilla).
What's the solution?To get round this problem new character sets have been created. The original character set is now called Latin 1 (ISO-8859-1). For Turkish you can use Latin 5 (ISO-8859-9). Latin 5 substitutes some rarely needed Icelandic characters in the Latin 1 character set with Turkish ones.
Computers sold in Turkey are set up to use the Latin 5 character set straight from the keyboard. In our case, we will have an English keyboard and computer and will be using Latin 5 only for displaying web documents. To work with this character set, place the following line in the header of your web page:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"> or use this line below;
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-9" />
The new characters will now display properly on your web browser but will look different in your text editor, or whatever tool you are using to write your documents.
The table shows how the Turkish characters missing in Latin 1 are assigned in the two character sets:
| Character | Latin 1 | Latin 5 |
|---|---|---|
| 0253 | ý | ı |
| 0221 | Ý | İ |
| 0240 | ð | ğ |
| 0208 | Ð | Ğ |
| 0254 | þ | ş |
| 0222 | Þ | Ş |
For reference, here are all the characters that won't be on your keyboard, and how to type them:
| Turkish Character | Name | Latin 5 | Shortcut |
| ı | small dotless I | 0253 | |
| I | large dotless I | I | |
| i | small dotted i | i | |
| İ | large dotted I | 0221 | |
| ö | o with diaresis | 0246 | ctrl:,o |
| Ö | O with diaresis | 0214 | ctrl:,O |
| ü | u with diaresis | 0252 | ctrl:,u |
| Ü | U with diaresis | 0220 | ctrl:,U |
| ğ | yumuşak g | 0240 | |
| Ğ | yumuşak G | 0208 | |
| ç | c cedilla | 0231 | ctrl,,c |
| Ç | C cedilla | 0199 | ctrl,,C |
| ş | s cedilla | 0254 | |
| Ş | S cedilla | 0222 |
Not forgetting the circumflex characters used in some words of Arabic origin:
| Turkish Character | Name | Latin 5 | Shortcut |
| â | a circumflex | 0226 | ctrl^a |
| û | u circumflex | 0251 | ctrl^u |
If there isn't a keyboard shortcut for the character you want, you can type it on the numeric keypad by holding down the Alt key. Don't be tempted to use Microsoft Word as your text editor in order to assign shortcut keys to the missing characters (although you might want to do this anyway just for typing Turkish documents).
If you like, you may look at in a new window with this table of special characters.
That's great - but how did you type the first table?Good question. The first table displays both the Latin 1 and the Latin 5 versions of the re-used characters. This document can't be in Latin 1, else I could not have displayed the Latin 5 characters; but nor can it be in Latin 5, else I could not have displayed the Latin 1 characters.
To get all the characters you have to use Unicode utf-8 encoding. Unicode is an ongoing project to have all the characters in the world in a single system. It comes in three flavours, UTF-8, UTF-16 and UTF-32. The UTF-8 flavour uses between one and six bytes per character instead of one. The first 256 characters are the same as Latin 1, so the special Latin 5 characters have to use codes beyond this range (four digit codes).
| character | Latin 5 | Unicode |
|---|---|---|
| ı | 0253 | ı |
| İ | 0221 | İ |
| ğ | 0240 | ğ |
| Ğ | 0208 | Ğ |
| ş | 0254 | ş |
| Ş | 0222 | Ş |
As you can see from the table, Unicode is a right pain. You are better off using Latin 5 whenever you can.
Acknowledgements
Information on character sets other than Latin 1 is hard to find. If you are using Windows 2000 / NT / XP then The Unicode character set is available on your Character Map.
Resource: http://lavocah.org/turkce/special.html
Homepage