UTF-8 text edit handling is subtly broken

Got a problem with TTDPatch? Get help here.

Moderator: TTDPatch Moderators

Post Reply
User avatar
Csaboka
Tycoon
Tycoon
Posts: 1202
Joined: 25 Nov 2002 16:30
Location: Tiszavasvári, Hungary
Contact:

UTF-8 text edit handling is subtly broken

Post by Csaboka »

(This would have gone to the development mailing list, but that one is broken right now)

I've recently committed a fix that fixes issues when the savegame window has accented characters in the savegame name. I don't plan to get back as a regular developer, I just had some free time and a bug report I got via private e-mail :)

Anyway, while testing my fix, I've found a more subtle bug that can break anything that uses the text edit window. To trigger it, create a sign whose text reaches the length limit, and has at least one character in the 0x9E..0xFF range. When you click on the sign, the new text box will have truncated text, and you may be unable to use backspace.

This is what happens while the bug is triggered:
While editing the text the first time, the code keeps the lenght limit correctly, and keeps the text in UTF-8, without the starting thorn character. So far, so good.

When you press Enter, checklatin1conv is called, and if the text doesn't have any code points above 0xFF, it converts everything back to Latin1. This is where the first problem comes in: if I enter a tilde (and I have a font that contains that glyph so it doesn't get rejected), checklatin1conv will happily keep it, but in non-UTF-8 mode, that will print an unsigned word. This problem is masked by the fact that by default, you don't have glyphs for the problematic code points.

When you re-open the edit box, a second problem comes in. Originally, the CreateTextInputWindow proc would copy the original text into the edit buffer, but this is patched to call TextHandler instead, to make sure the contents of the edit buffer are in UTF-8. When our text handler routine sees a character between 0x9E and 0xFF, it will convert it to 0xE0xx. Therefore, the character that originally needed only two bytes to encode now takes three bytes. Even though the text box keypress code made sure we don't overrun our buffer, we still can have a buffer overrun because the text suddenly gets larger. The problem is made worse by CreateTextInputWindow because it tries to truncate the string to the max. lenght by writin a null terminator to the last allowed position. This might cut an UTF-8 sequence in half, and confuses the edit box keypress handler.

I'm not sure what the best way would be to solve this issue. It seems we would need to patch both checklatin1conv and our text handler, but we may be able to get away with patching only one of them.

Checklatin1conv should check for code points that have a different meaning in TTD than in Unicode. Alternatively, it could simply not convert back anything to Latin1 if it finds code points above 0x7A - this would still convert back plain English texts to Latin1, but would keep every accented string in UTF-8 and avoid the conversion problem.

Our text handler could also be more picky when converting things to the 0xE0xx range. If it did that only for characters that have a different meaning in TTD, it could save some memory and help avoiding this bug.

Does anyone have other ideas?

There is also a problem with advorders and UTF-8. When you want to enter the load percentage, for example, the text input window gets called with a limit of 3 bytes, since it only wants up to 3 digits. The problem is, the UTF-8 text input handler reserves 2 bytes for the starting thorn and 1 for the terminating null, so it won't allow any text to be entered at all. The text input window should either have a "digits only" mode where it doesn't care about UTF-8 encoding and thorns, or the advorders code should add three to the text edit limit when UTF-8 mode is enabled.
Reality is that which, when you stop believing in it, doesn't go away.—Philip K. Dick
Post Reply

Return to “Problems with TTDPatch”

Who is online

Users browsing this forum: No registered users and 9 guests