Page 1 of 1

unicode (UTF8) support patch (alpha)

Posted: 26 Jul 2005 23:25
by gpsoft
Hi everybody:
Some information from readme:

This patch enables partial unicode (unicode strings in UTF8 encoding) support in openttd. Now there are supported unicode chars from 0x000 to 0x7FF, so it is enough to support russian, greek and all central european languages, which have different charset. Ascii codes from basic ascii table (0x00 to 0x7F) and openttd special codes from 0x80 to 0xBF) are mapped to original openttd charset. UTF8 encoded strings from 0x080 to 0x7FF are mapped to extended charset (unicode space).

So, I tested it with russian fonts, because I had no other fonts :)). I can't read it really, because I don't know russian, but I found some words ("russian", "english") with web dictionaries.
These fonts are loaded in this test binary to basic cyrillic alphabet positions in unicode table (0x410-0x44F).
This patch reserves 0x1680 (5760) sprite space, so maybe there is too small space for load own newgrf sprites (I didn't raised NUM_SPRITES).

This patch is not tested very well, I used only few strings to test basic funcionality of patch (I don't know russian :) ). I didn't test for example big and tiny characters.

So, I need testing this, next more translated strings in unicode charset with utf8 encoding, and another .grf fonts (especially for central european languages - latin2 charset).

See the files and screenshot to more information.

I hope, this patch will be usefull for people who don't use Latin1 charset and after bugfixing will be in one of next openttd releases.

gpsoft.

Posted: 27 Jul 2005 16:32
by }T{Reme [Q_G]
Great stuff :) Hmmm have you tested the program with japanese / chinese as well? Just wondering if that would work.

Posted: 27 Jul 2005 17:26
by Dextro
this is something that had been missing for some time now and I remember a thread about it's development somewhere in these forums :?

Posted: 28 Jul 2005 10:08
by Hadez
FYI the topic is here: http://tt-forums.net/viewtopic.php?t=9988. Hope to see the Czech and Slovak characters in the game soon :-)

Posted: 28 Jul 2005 10:12
by gpsoft
}T{Reme [Q_G] wrote:Great stuff :) Hmmm have you tested the program with japanese / chinese as well? Just wondering if that would work.
No, I tested only russian fonts, I have no other fonts. But I think, japanese fonts have higher code positions in unicode table, than 0x800.

But it is possible to solve other ways. It is no problem to support true unicode (full 16-bit chars), but I need more space to sprites. Another way is to do some relocation, I can do it without problems, but then it is not true unicode.

Please tell me some information about chinese and japanese fonts. How many characters are used in your alphabet ? I know nothing about this.

Posted: 28 Jul 2005 10:23
by gpsoft
Hadez wrote:FYI the topic is here: http://tt-forums.net/viewtopic.php?t=9988. Hope to see the Czech and Slovak characters in the game soon :-)
Yes, I hope too. This is the reason, why I am doing this (I am from Slovakia) . :) But I can't find iso8859-2 fonts (=.grf sprites), it is the problem. So, somebody must do it, but I don't know, how to create right grf icons.

Posted: 28 Jul 2005 10:25
by Hadez
Maybe you could ask devs?

Posted: 28 Jul 2005 19:27
by }T{Reme [Q_G]
Using the charmap program distributed by default on windows should help you sort out problems with making grf files for any language you have installed on your system. (start -> programs -> accessories -> system tools)

Yes I do think adding character sets as large as japanese and chinese will be a problem if you are going to use sprites to draw text. Is it possible to rewrite the code so it uses the system's printing functions to draw strings on the screen instead of using sprites? If you are using true UTF-8 encoding and map the .lang files to the correct character space it should be.

Posted: 29 Jul 2005 09:24
by gpsoft
It is not possible to use system fonts, because it is OS specific, so the we have problems on another platforms (linux, os/2, macos). I don't want to make the game incompatible between operating systems. The seconds reason to not use the system's fonts are problems with font sizes. We need 3 font size with specific heights.

So, we will use grf charsets and own fonts. I think it is not difficult to made a charset, possibly we can made it with some automatic tools. But first I need to know something about drawing fonts.
I will ask developers after going home, I have no irc access now.

About encoding: Now I am using UTF8 encoding, but only range from 0x0000 to 0x07FF. Of course, it is no problem to do full 16-bit encoding (from 0x0000 to 0xFFFF, utf8 range is above 16 bit, but all charsets are covered in 16 bit). But I need to allocate more space to sprites, and it needs much recoding.

Maybe I will use internally some different encoding with char remapping from unicode, but I am sure, the input lang files will have true utf8 encoding.

Posted: 29 Jul 2005 20:20
by }T{Reme [Q_G]
Hmm.... I dunno.. but im pretty sure its possible to release a custom truetype (or fixed) font along with openttd.. and load this file. (should work platform-independent, by distributing multiple formats of the font file) I've seen many games use custom font sets.

Dont get me wrong, I agree on your comment about differences in font shapes and sizes. Im just thinking about those "too many sprites" problems people have been having recently.

Just poked around in the SDL docs... and found that the whole system is already there : http://www.libsdl.org/cgi/docwiki.cgi/SDL_5fttf

Posted: 29 Jul 2005 22:14
by gpsoft
It is more easy to do it with .grf files.
too many sprites problems are easy to fix in future.
Please answer to my question about number of characters in chinese and japanese charset.

Posted: 29 Jul 2005 23:58
by Nanaki13
I have a couple of japanese fonts on my system, but i'm no expert on the matter. People say that there are over 40K characters, but only around 2K-3K in daily use.
Those fonts do have a LOT of chars in them.

Posted: 30 Jul 2005 12:16
by orudge
Weren't these all things that were worked on and or solved by Pipian? It seems a shame to let all that work go to waste, perhaps someone should get in touch with Pipian... old topic here.

Posted: 30 Jul 2005 15:21
by gpsoft
orudge wrote:Weren't these all things that were worked on and or solved by Pipian? It seems a shame to let all that work go to waste, perhaps someone should get in touch with Pipian... old topic here.
I saw that topic, but he didn't released any patch about it.
Is there any patch released in sourceforge patch system ? I can't find any other patch or information about unicode.
But I'll try to contact Pipian, I hope he is visiting this site. Thank you for this information.

Posted: 30 Jul 2005 15:22
by Pipian
Found myself a bit busier than expected when coding the Unicode section. I still have the bitmap fonts prepared (Don't worry, it'll handle all the languages that most major fonts can handle, like Slovak and Greek and so forth), and the outline for the bitmap-rendering. I just never had the time to finish up working it in with the string rendering. Seems like a good idea, but it would be nice if we could extend to the entire charset (like I was trying to do, but found myself overwhelmed to do). I'd be willing to trade out the existing code if someone else has a bit more time to finish it up...