flat assembler
Message board for the users of flat assembler.

Index > Compiler Internals > Fasm Multilangage Encoding

Goto page 1, 2  Next
Author
Thread Post new topic Reply to topic
hopcode



Joined: 04 Mar 2008
Posts: 563
Location: Germany
hopcode 19 Mar 2010, 10:15
Hallo Everybody,
I am working on unicode and encodings for fasmlab. Problems that arise are not to understimate.
Ok, i was wondering something like in the following drawing, it is to say a fasm multilanguage version.
That should be to avoid parsing (building / enabling a complex fasm encoding awareness)
different encodings and languages of the source files.

Image

May it be possible in a near/far future ?

Cheers, Very Happy
hopcode
Post 19 Mar 2010, 10:15
View user's profile Send private message Visit poster's website Reply with quote
edfed



Joined: 20 Feb 2006
Posts: 4330
Location: Now
edfed 19 Mar 2010, 11:22
totally useless.
if someone wants to code in asm, he should accept to learn alphabetical characters, and english.
Post 19 Mar 2010, 11:22
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20299
Location: In your JS exploiting you and your system
revolution 19 Mar 2010, 11:41
Although the opcodes are based upon English they are not really English. I think they should be taken as just what they are; ASCII sequences to form a mnemonic for the user.

What would be other language equivalents of FYL2X?
Post 19 Mar 2010, 11:41
View user's profile Send private message Visit poster's website Reply with quote
hopcode



Joined: 04 Mar 2008
Posts: 563
Location: Germany
hopcode 19 Mar 2010, 12:01
revolution wrote:
...opcodes...they should be taken as just what they are; ASCII sequences to form a mnemonic for the user.

The concept is transliteration of the source code, done by fasm

Quote:
What would be other language equivalents of FYL2X?

this:
Quote:
فيلءكس

NOTE: And you can read it, because php allows a transliteration 1:1 with a different (arabic) encoding... sending headers to an encoding-enabled browser.

NOTE 2: or sending &# + HTML encoded char (but the song remains still the same).

_________________
⠓⠕⠏⠉⠕⠙⠑
Post 19 Mar 2010, 12:01
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20299
Location: In your JS exploiting you and your system
revolution 19 Mar 2010, 12:11
hopcode wrote:
transliteration
So it is not really a language change, but it is a script change? The user still has to know what 'mov' is, just that the three letters 'm', 'o' and 'v' are encoded in a different script?

If so then I don't see how that helps the user in any way.
Post 19 Mar 2010, 12:11
View user's profile Send private message Visit poster's website Reply with quote
hopcode



Joined: 04 Mar 2008
Posts: 563
Location: Germany
hopcode 19 Mar 2010, 12:31
revolution wrote:
...So it is not really a language change, but it is a script change?...
(It is a bit more complex as it seems...)

I would say, a middle way between a transcript (like the passage of the writing standards in 9th-11th Century) Carolingian and a transliteration thru a script writing system like Cyrillic

Quote:
If so then I don't see how that helps the user in any way.

Yes, because you Tomasz edfed and me, we use almost the same encoding, and our files are "ascii" (with different codepages, for different OS languages ). If i write up an UTF-16, or save the same file in UTF-16 i could reopen it in arabic, with the same unchanged semantics:
Code:
  mov eax,ecx
    

_________________
⠓⠕⠏⠉⠕⠙⠑
Post 19 Mar 2010, 12:31
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20299
Location: In your JS exploiting you and your system
revolution 19 Mar 2010, 12:43
Are you writing this as a pre-preprocessor?

language1("ABCD") ---> fasm_language("mov")
language1("XYZ") ---> fasm_language("fyl2x")

Question
Post 19 Mar 2010, 12:43
View user's profile Send private message Visit poster's website Reply with quote
hopcode



Joined: 04 Mar 2008
Posts: 563
Location: Germany
hopcode 19 Mar 2010, 13:28
I mean built-in feature, because fasm could read the encoding and BOM
from the command line, or as the first line of the file (like for html headers)
For example, fasm indents an UTF-16 file in this way
fasm source.asm -d ARABIC
(-d ARABIC in arabic it is implicit, because of the encoding)

And i could do it on my ISO encoding with this arabic command line
(simply switching the encoding,as i wrote above in bold)

Quote:
فاسم صورك.أسم د أرابيك

the file contains this instruction:
فيلءكس

this corresponds to HTML encoding (singular separeted letters, visible if you quote it and read the source of the post)

ف (F)
ي (Y)
ل (L)
٢ (2)
اكس (X)

Corresponding, in our encoding:

u0046; (F)
u0059 ;(Y)
u004C ;(L)
u0032 ;(2)
u0058 ;(X)

...
.
.

_________________
⠓⠕⠏⠉⠕⠙⠑
Post 19 Mar 2010, 13:28
View user's profile Send private message Visit poster's website Reply with quote
shoorick



Joined: 25 Feb 2005
Posts: 1614
Location: Ukraine
shoorick 19 Mar 2010, 13:40
translation of opcodes, as well as transliteration, will lead to real mess. maybe when we will have a new great CPU with original arabic mnemonics, then we will learn it Wink

btw, there already was been a russian translation of fasm (on wasm.ru forum) - it was looking a bit funny Razz

_________________
UNICODE forever!
Post 19 Mar 2010, 13:40
View user's profile Send private message Visit poster's website Reply with quote
hopcode



Joined: 04 Mar 2008
Posts: 563
Location: Germany
hopcode 19 Mar 2010, 20:58
Quote:
translation of opcodes...
Please, read accurately what i wrote: i meant something between transcript and transliterate.
There is a reason because i tell it between.
Quote:
a russian translation of fasm (on wasm.ru forum)

What is a exactly there a "translation" ?
Did you ever think to express with your ukrain codepage symbols
like "aeiou" or "äüöß" ? No, i imagine. You should switch codepage, or worst
than all, install a west-euro OS. But the reason is not that your keyboard
cannot it, in one moment; the reason is that one byte only is not enough to express
in the same moment cyrillic and west-euro characters.(apart from the fact
that of Cyrillics there are a lot of encodings)

Now, i dont know that sort of wasm-fasm you report.
About it, one could state now, a priori, with 100% certainty:

    1) it is not international (a multilanguage version)
    2) it is "ASCII"
    3) it is funny (because of what i explained above here)
    4) it is by design a mess, not as a result,
    because of the one byte encoding.

Also, i thought about:

    1) internal UTF-16, gives an internal stability to fasm.
    (avoid that UTF-8 meager variance to let the US-user spare 100 bytes
    and adding other 10000 slowing ones in the parser for Arabs or Japaneses... etc)
    2) fixed tables built-in or (in a way pre-preprocessed as supposed above)
    3) This leads to merely and only unicode source files.

At that stage will be fasm completely multilanguage without doubt and not
as funny as you say about the wasm-fasm, or if you like it,
supposedly messy.

I think it is clear now. and i dont insist anymore on this thread.
Only one thing,please, think upon the following table
and draw your comments/conclusions.
Quote:

H=0048
e=0065
l=006c
l=006c
o=006f
↵=000a
您=60a8
好=597d
↵=000a
よ=3088
う=3046
こ=3053
そ=305d
↵=000a
V=0056
ä=00e4
l=006c
k=006b
o=006f
m=006d
m=006d
e=0065
n=006e


Bye,Very Happy
hopcode

_________________
⠓⠕⠏⠉⠕⠙⠑
Post 19 Mar 2010, 20:58
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20299
Location: In your JS exploiting you and your system
revolution 20 Mar 2010, 10:39
hopcode: Are you saying the source code file does not change?

e.g. how does the source file look for:
Code:
mov eax,-1
ret    
Currently this for me:
Code:
6D 6F 76 20 65 61 78 2C 2D 31 0D 0A 72 65 74    
How does it look for, say, Arabic? Is it the same or something else?
Post 20 Mar 2010, 10:39
View user's profile Send private message Visit poster's website Reply with quote
hopcode



Joined: 04 Mar 2008
Posts: 563
Location: Germany
hopcode 20 Mar 2010, 13:18
revolution wrote:
hopcode: Are you saying the source code file does not change?

Exceptionally i will answer you, because you exercise always a "good mood" effect on my bad soul. I will built it simply (in a near future a little test example) without anymore explanation here, because i need it, especially when fasm will remain ASCII (and i think it will remain ASCII) Ok.
Please, s-t-o-p thinking ASCII.
Now, to make it simple, save the following as west.asm,
1) compile it as fasm west.asm west.txt (42 bytes)
Code:
;--- west euro encoding UTF-16 BE
db 0FEh,0FFh,\
 00,6Dh,\
 00,6Fh,\
 00,76h,\
 00,20h,\
 00,65h,\
 00,61h,\
 00,78h,\
 00,20h,\
 00,2Ch,\
 00,20h,\
 00,2Dh,\
 00,20h,\
 00,31h,\
 00,0Dh,\
 00,0Ah,\
 00,72h,\
 00,65h,\
 00,74h,\
 00,0Dh,\
 00,0Ah
    


2) open it with notepad

3) save the following as arab.asm and
compile it fasm arab.asm arab.txt .
This file has the same number of bytes of the west.asm file (42 bytes)
Code:
;--- arab encoding UTF-16 BE
db 0FEh,0FFh,\
 06,45h,\
 06,48h,\
 06,41h,\
 00,20h,\
 06,25h,\
 06,43h,\
 06,33h,\
 00,20h,\
 06,0Ch,\
 00,20h,\
 00,2Dh,\
 00,20h,\
 06,61h,\
 00,0Dh,\
 00,0Ah,\
 06,31h,\
 06,4Ah,\
 06,2Ah,\
 00,0Dh,\
 00,0Ah
    

4) open it with notepad

Now, all toghether save the following as multicolor.asm (langages are like
colors for me) and compile it as fasm multicolor.asm multicolor.txt (84 bytes)
Code:
;--- west euro encoding UTF-16 BE
db 0FEh,0FFh,\
 00,6Dh,\
 00,6Fh,\
 00,76h,\
 00,20h,\
 00,65h,\
 00,61h,\
 00,78h,\
 00,20h,\
 00,2Ch,\
 00,20h,\
 00,2Dh,\
 00,20h,\
 00,31h,\
 00,0Dh,\
 00,0Ah,\
 00,72h,\
 00,65h,\
 00,74h,\
 00,0Dh,\
 00,0Ah

;--- arab encoding UTF-16 BE
db 0FEh,0FFh,\
 06,45h,\
 06,48h,\
 06,41h,\
 00,20h,\
 06,25h,\
 06,43h,\
 06,33h,\
 00,20h,\
 06,0Ch,\
 00,20h,\
 00,2Dh,\
 00,20h,\
 06,61h,\
 00,0Dh,\
 00,0Ah,\
 06,31h,\
 06,4Ah,\
 06,2Ah,\
 00,0Dh,\
 00,0Ah
    

Open multicolor.txt with notepad.
Do you see it ?

I cannot go at the moment so far in the explanation. I will do it simply. I am extending my fasmlab.
fasmlab imho should be not a toy like actually most of other ideS out there (...small steps it will grow if God assist me Wink)
I hope that Tomasz understands what i mean, and will make fasm multilanguage, in this way or in another way i cannot imagine now.

Sorry,
Bye Very Happy
hopcode

_________________
⠓⠕⠏⠉⠕⠙⠑
Post 20 Mar 2010, 13:18
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20299
Location: In your JS exploiting you and your system
revolution 20 Mar 2010, 13:26
This is what I see in my editor:


Description: Looks un-normal
Filesize: 1.86 KB
Viewed: 14522 Time(s)

MultiLanguage.PNG


Post 20 Mar 2010, 13:26
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20299
Location: In your JS exploiting you and your system
revolution 20 Mar 2010, 13:30
It appears you want to make the file multi-encoded with lots of different character sets, using Unicode or UTF-16. Is that correct?

And you want to be able to edit the version in your own language and fasm will automatically alter all the other versions to follow. Is that correct?

BTW: That Arabic is backwards. There should be a reversing character in there.
Post 20 Mar 2010, 13:30
View user's profile Send private message Visit poster's website Reply with quote
shoorick



Joined: 25 Feb 2005
Posts: 1614
Location: Ukraine
shoorick 22 Mar 2010, 07:11
as originally cyrillic man i can say the transliteration is the pain in the ass, and will not bring to us any profit. i would agree with translation, better when original and translated keywords accepted similar. we have such systems, where same time both english and russian keywords are accepted - they are mostly scripting languages in economical software. it has sense, but not for assembler, as it has different area of application.

btw, have you asked any arabic programmer is it a good idea? Wink how do you think, will polish programmer agree to change mov to mow ? Wink etc.-etc.

i do know what i'm talking about Wink see screenshot of 1C (1S), which we are using at work, with part of my code Wink

regards!


Description:
Filesize: 19.14 KB
Viewed: 14493 Time(s)

1C.png


Post 22 Mar 2010, 07:11
View user's profile Send private message Visit poster's website Reply with quote
MHajduk



Joined: 30 Mar 2006
Posts: 6115
Location: Poland
MHajduk 22 Mar 2010, 18:17
shoorick wrote:
how do you think, will polish programmer agree to change mov to mow ? Wink
Polish analogues for English verb "move" are "przesunąć" or "przemieścić", so most probably Polish version of the "mov" would be "prz", quite unusual (to me) indeed. Wink
Post 22 Mar 2010, 18:17
View user's profile Send private message Visit poster's website Reply with quote
hopcode



Joined: 04 Mar 2008
Posts: 563
Location: Germany
hopcode 22 Mar 2010, 20:37
revolution wrote:
This is what I see in my editor
revolution, i said notepad not UltraIde, or other ide.

Image
Look at the code,
there are extra spaces i added in. Note that notepad produces its own
meaningful error/ambiguity when rendering such text with 2 directions...

I am working on simple incomplete samples at the moment (incomplete
because i dont implement my translit idea there, i need a framework
to emulate them, time etc.).
Perhaps i manage to have them tommorrow or in a couple
of days, if all goes the expected way. But i cannot promise it.

Pdf downloadable Patents related to the "transliteration",here

http://www.google.com/patents/about?id=vIcLAAAAEBAJ&dq=transliteration&num=4&client=internal-uds

lot of ideas about...

and 2 constant factors to consider:
- 2-byte encoding gives lot of manipulation possibilities
- UNICODE allows custom code-point over the BMP 0
Private Use Planes for example, U-000F0000 -> U-0010FFFF

to Tomasz, imagine this:
something like the "Challenger" that eats (16bit) code-points, instead
of ASCII. Also, why not then grouped by 2,3,4 code-points ?
- 1) But eating 2,3,4 (ad libitum) code-points is almost the len of an asm instruction!
- 2) lot of opcoding possibilities in 2,3,4 byte
- 3) also -> automatically multilanguage
- 4) also -> automatically assemblable
- also ... add your comments or fantasy

Cheers, Very Happy
hopcode

_________________
⠓⠕⠏⠉⠕⠙⠑
Post 22 Mar 2010, 20:37
View user's profile Send private message Visit poster's website Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8351
Location: Kraków, Poland
Tomasz Grysztar 22 Mar 2010, 21:01
hopcode wrote:
something like the "Challenger" that eats (16bit) code-points, instead of ASCII.
The Challenger operates on 32-bit codepoints, not ASCII.

Other that that, I'm not able to comment on this, because, frankly, I'm not able to grasp the idea you're trying to present here.
Post 22 Mar 2010, 21:01
View user's profile Send private message Visit poster's website Reply with quote
hopcode



Joined: 04 Mar 2008
Posts: 563
Location: Germany
hopcode 22 Mar 2010, 21:27
Tomasz Grysztar wrote:
The Challenger operates on 32-bit codepoints, not ASCII.

that is a good start Wink
Quote:
...I'm not able to comment on this...

No problem. Afterall it is not so simple to render my abstract ideas, the
direction of them. i need some time more to draw lines around the thing. i am elaborating and trying to realize something that is not strictly related to fasm.
thinking about it primairly abstract as a patent, then realizing it as software/code etc

Cheers, Very Happy
hopcode

_________________
⠓⠕⠏⠉⠕⠙⠑
Post 22 Mar 2010, 21:27
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20299
Location: In your JS exploiting you and your system
revolution 22 Mar 2010, 23:45
hopcode: Are you just simply proposing that fasm be a unicode (or UTF-16) based assembler?

What was the purpose of the mixed ASCII/Arabic example you posted? Is it supposed to be two versions of the same code? Or what?

BTW: Not everyone uses plain Notepad. I hope your proposal is not relying upon everyone using just notepad. And don't forget that notepad is very different on different version of Windows. You will have to say which version of notepad that your proposal works with.
Post 22 Mar 2010, 23:45
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page 1, 2  Next

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.