flat assembler
Message board for the users of flat assembler.

Index > Projects and Ideas > One approach to the internationalization.

Author
Thread Post new topic Reply to topic
JohnFound



Joined: 16 Jun 2003
Posts: 3500
Location: Bulgaria
JohnFound
The presented idea is only work-in-progress. I made some small demo, but not the whole idea is implemented yet. (The following code uses UTF-8, so switch you browser).

The idea is to have some macro that to allow definitions of the string constants in several languages:
Code:
Istring1 itext <EN:'Hello.'>,    \
               <SP:'Hola.'>,       \
               <DE:'Hallo.'>,      \
               <BG:'Здравейте.'>    


The strings are actually defined on the undefined data memory, so, initially they will be empty. Only place is allocated for every string, enough to contains the longest string from the language sets.
The different languages strings are separated in the defined data section in separate arrays for every language.
Then when you want to switch to some language, small procedure simply copy the whole language array to the real strings and the program switches to another language. For example, the below code will switch the whole program to English:

Code:
stdcall SetLanguage, 'EN'    


The attached file contains small demo application. It simply prints some text on a language depending on the command line argument. Execute it without argument to get the list of supported languages.
The program is compiled for Win32, Linux and KolibriOS (not tested).


Description:
Download
Filename: itext.tar.gz
Filesize: 8 KB
Downloaded: 357 Time(s)


_________________
Tox ID: 48C0321ADDB2FE5F644BB5E3D58B0D58C35E5BCBC81D7CD333633FEDF1047914A534256478D9
Post 13 Dec 2012, 08:43
View user's profile Send private message Visit poster's website ICQ Number Reply with quote
f0dder



Joined: 19 Feb 2004
Posts: 3170
Location: Denmark
f0dder
Might work for small hobby projects, but definitely doesn't scale up for something larger - especially if you need other people to do translations. I'm also not a fan of having non-ASCII characters in source code files; for personal projects it's a matter of taste, when collaborating with a lot of people of different nationalities (and with a wide selection of operating systems, IDEs and text editors) it's simply too much hassle.

Externalizing string data does mean slightly more overhead when dealing with the strings (but this is so little it doesn't matter in practice), and it does require some up-front organizing consideration as well as discipline in key naming - but in practice it's worth it in the long run.

Yeah, it can be annoying that you can't directly see the contents of a string in source code, but if you have decent tooling support for your translations it's not a real issue. (You could also use a scheme where the English/whatever-master-language text is used as the lookup key, but that becomes clumsy for anything but small strings).
Post 13 Dec 2012, 10:09
View user's profile Send private message Visit poster's website Reply with quote
JohnFound



Joined: 16 Jun 2003
Posts: 3500
Location: Bulgaria
JohnFound
f0dder, I am not agree with your statement: "but this is so little it doesn't matter in practice". Using external strings and tools requires that the application must be designed as easy-translatable from the beginning. Also, it limits your programming methods.
The above approach makes the program translation easy, because only string constants have to be changed. It also does not adds overhead to the program. You simply use the strings as always.

The source code is UTF-8 - standard enough IMHO. You only need UTF-8 compatible editor.
The translation tool that I will make for Fresh IDE will allow all strings defined with "itext" macro in the project to be edited from one common place and then patched back to the source. So, we can have both - easy readable source and easy translation.
Post 13 Dec 2012, 10:42
View user's profile Send private message Visit poster's website ICQ Number Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17247
Location: In your JS exploiting you and your system
revolution
Rather than using a copy function how about using a single master language pointer that points to a set of pointers for each string. The advantages here are
  1. changing languages only requires updating a single pointer
  2. the strings can remain immutable within read only memory
  3. each string length is native and doesn't need to conform to the longest one from each language
Post 13 Dec 2012, 11:34
View user's profile Send private message Visit poster's website Reply with quote
JohnFound



Joined: 16 Jun 2003
Posts: 3500
Location: Bulgaria
JohnFound
revolution, I though about this approach and rejected it, because it will need from the user to write the program in some special way (i.e. using pointers and indexes instead of simple pointers to the strings). I wanted to make the internationalization system as transparent as possible for the programmer.
Post 13 Dec 2012, 12:09
View user's profile Send private message Visit poster's website ICQ Number Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17247
Location: In your JS exploiting you and your system
revolution
Use macros to build the tables and pointers. The only thing you need to add is a final line to instantiate all the built tables at the end of the file.


Last edited by revolution on 13 Dec 2012, 13:14; edited 1 time in total
Post 13 Dec 2012, 12:20
View user's profile Send private message Visit poster's website Reply with quote
JohnFound



Joined: 16 Jun 2003
Posts: 3500
Location: Bulgaria
JohnFound
revolution wrote:
Use macros to build the tables and pointers. The only thing you need to add is a final line to instantiate all the built tables and the end of the file.


I got nothing? More detailed?

_________________
Tox ID: 48C0321ADDB2FE5F644BB5E3D58B0D58C35E5BCBC81D7CD333633FEDF1047914A534256478D9
Post 13 Dec 2012, 13:04
View user's profile Send private message Visit poster's website ICQ Number Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17247
Location: In your JS exploiting you and your system
revolution
Post 13 Dec 2012, 13:14
View user's profile Send private message Visit poster's website Reply with quote
JohnFound



Joined: 16 Jun 2003
Posts: 3500
Location: Bulgaria
JohnFound
Revolution, I am using this technique all the time (including the macro library discussed here). But what exactly you suggest about the internationalization problem? Imagine one program written in English only (Fresh IDE or FASMW for example). Now, how to make it to support multiply languages?
My approach makes this task very easy - You only have to edit the strings to contain the proper translation and then to insert a call to "SetLanguage" in order to switch to the new language. (well in GUI applications you maybe have to restart the application)
Post 13 Dec 2012, 14:23
View user's profile Send private message Visit poster's website ICQ Number Reply with quote
typedef



Joined: 25 Jul 2010
Posts: 2913
Location: 0x77760000
typedef
Why not just load from an .ini file instead? Confused Then when creating controls, for that specific application you call a function like so:

Code:
lang.ini

[EN]
0000=...
0003=&Hello

[FR]
0000=...
0003=&bonjour
    



To simplify, you can then use lists to load the current locale setting in memory on start up like so:
Code:
invoke SetLanguage, "EN" ; load all strings for this lang-id into a list. Can be a hex number so as to avoid the compiler defining the string in the data section. 
...
invoke GetLocalizedString, 3
invoke CreateWindowEx, "button", eax, ...
    
Post 13 Dec 2012, 14:49
View user's profile Send private message Reply with quote
JohnFound



Joined: 16 Jun 2003
Posts: 3500
Location: Bulgaria
JohnFound
typedef, read above. It is explained why not. Smile
Post 13 Dec 2012, 16:40
View user's profile Send private message Visit poster's website ICQ Number Reply with quote
f0dder



Joined: 19 Feb 2004
Posts: 3170
Location: Denmark
f0dder
JohnFound wrote:
f0dder, I am not agree with your statement: "but this is so little it doesn't matter in practice". Using external strings and tools requires that the application must be designed as easy-translatable from the beginning.
It's easier to handle if you design properly from the beginning, but it's not impossible to change at a later state - especially not if you have decent tooling support for locating and externalizing strings Smile

JohnFound wrote:
Also, it limits your programming methods.
Sure, it's a tradeoff - but it's not something that I feel limiting. Rather, I find that it makes everything a helluvalot easier in the long run.

JohnFound wrote:
The above approach makes the program translation easy, because only string constants have to be changed.
With your method, you need to hunt through all your assembly files for strings, and have no easy way to get an overview of which strings have been translated to which languages etc. OK, you can do a tool that scans for the itext macro etc., but that's more complex and less foolproof than dealing with a purpose-built external format. Especially if you want to support external translators, which would mean import/export functionality.

JohnFound wrote:
It also does not adds overhead to the program. You simply use the strings as always.
To nit-pick, it adds memory overhead for the "current strings" buffer and a bit of CPU overhead for the copying Wink - but of course this is pretty irrelevant, and there'd be overhead when dealing with external strings as well (though you can load just the languages you need, rather than your approach that has all languages in-memory). I've yet to see situations where localizable strings are involved in performance-sensitive code, anyway.

JohnFound wrote:
The source code is UTF-8 - standard enough IMHO. You only need UTF-8 compatible editor.
Not everybody uses that (yes, even in 2012), and you also have to guarantee that encoding isn't messed up in flight, etc. Easy enough if you're a single developer, a small team, or have very organized ways of working... that's not always the case in the real world, though Smile

The project I'm working on probably has somewhat larger scope, though - I'm not sure how many languages we're currently translating to, but there's some forty languages on the target list, and we have somewhat more than 1000 strings, I believe (which is only that low because most other stuff is content-managed). But that's a web project (yes, I sold my soul for god-$).

For desktop applications, I still prefer externalized localization though. There's still the argument that it's plain easier to deal with that way, especially when you need external translators, but also because it makes it possible to deploy new translations without distributing new binaries, and it also means you can reload the localization without restarting your application. Oh, and it's easier for end-users to do their own localizations that way, too Smile

_________________
Image - carpe noctem
Post 13 Dec 2012, 18:11
View user's profile Send private message Visit poster's website Reply with quote
AsmGuru62



Joined: 28 Jan 2004
Posts: 1408
Location: Toronto, Canada
AsmGuru62
I agree with external language strings file.
Very convenient.
But I want it to be a UNICODE file.
Post 13 Dec 2012, 18:37
View user's profile Send private message Send e-mail Reply with quote
f0dder



Joined: 19 Feb 2004
Posts: 3170
Location: Denmark
f0dder
AsmGuru62 wrote:
I agree with external language strings file.
Very convenient.
But I want it to be a UNICODE file.
Define "UNICODE"? Smile

I personally prefer UTF-8 for storage/exchange. It does mean you'll likely need to convert to an internal format, but it's (often) easier to deal with than whatever native format.

Also, JohnFound, I'm not trying to be a grumpy old flaming man Smile - just offering some alternate perspectives. I do like the (programmer) ease of use, and it's convenient that you load strings to .bss so you don't need to deal with indirection or manual GetTranslatedString() calls. It's a decent enough thing for smaller projects, especially if you're going to handle all the translations yourself (I'm still not fond of non-ASCII in source code files, but for smaller projects/groups there's not many good reasons against it, so I'll write that down to personal prejudice Smile).

_________________
Image - carpe noctem
Post 13 Dec 2012, 18:46
View user's profile Send private message Visit poster's website Reply with quote
JohnFound



Joined: 16 Jun 2003
Posts: 3500
Location: Bulgaria
JohnFound
I am not against the external files with language packs. I know it is easier to deploy new languages and all other advantages. And when I found some way to make FASM to write files during compilation, I can make macros to directly export the string tables to the external language files, instead of data section of the program. Wink
But still IMHO, the string definitions are better to stay in the source code. (I really hate tradeoffs).
I think this method can work on big projects as well.
Post 13 Dec 2012, 19:20
View user's profile Send private message Visit poster's website ICQ Number Reply with quote
AsmGuru62



Joined: 28 Jan 2004
Posts: 1408
Location: Toronto, Canada
AsmGuru62
UNICODE for me is 16-bits per every character.
I am not sure however, how to make these files.
Say, I need a file with some Japanese text -- how do I do this?
Post 13 Dec 2012, 21:36
View user's profile Send private message Send e-mail Reply with quote
f0dder



Joined: 19 Feb 2004
Posts: 3170
Location: Denmark
f0dder
AsmGuru62 wrote:
UNICODE for me is 16-bits per every character.
That's just one particular encoding (UCS-2), which can't even handle the entire unicode set. Windows is UTF-16.

AsmGuru62 wrote:
I am not sure however, how to make these files.
Say, I need a file with some Japanese text -- how do I do this?
Any reasonable text editor that can do handle various encodings - for free ones, Notepad++ can handle it. You need font support as well, of course, and some input method Smile

_________________
Image - carpe noctem
Post 13 Dec 2012, 21:44
View user's profile Send private message Visit poster's website Reply with quote
AsmGuru62



Joined: 28 Jan 2004
Posts: 1408
Location: Toronto, Canada
AsmGuru62
You want to convince me that all characters known to man will not fit into 16-bit?
Post 14 Dec 2012, 01:37
View user's profile Send private message Send e-mail Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4633
Location: Argentina
LocoDelAssembly
JAPANESE FOR TODAY - GAKKEN wrote:
2. Kanji
i) Although there are said to be some 48,000 kanji characters in existence, only about 5,000 to 10,000 are commonly used.
And this is just Japanese, Chinese, and whatever other languages using these characters (if any).
Post 14 Dec 2012, 02:00
View user's profile Send private message Reply with quote
typedef



Joined: 25 Jul 2010
Posts: 2913
Location: 0x77760000
typedef
JohnFound wrote:
typedef, read above. It is explained why not. Smile

But your way uses too much time and code like just like Microsoft spent 90% of their time coding security features in MS-Word Laughing
Post 14 Dec 2012, 02:39
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2020, Tomasz Grysztar.

Powered by rwasa.