flat assembler
Message board for the users of flat assembler.

Index > Macroinstructions > [SOLVED]fasm encode quoted string to multybyte utf8 string

Author
Thread Post new topic Reply to topic
ProMiNick



Joined: 24 Mar 2012
Posts: 804
Location: Russian Federation, Sochi
ProMiNick 02 Dec 2019, 13:48
source of source file (compile than compile over output):
Code:
format binary as 'ASM'
db $66,$6F,$72,$6D,$61,$74,$20,$62,$69,$6E,$61,$72,$79,$20,$61,$73
db $20,$27,$74,$78,$74,$27,$0D,$0A,$69,$6E,$63,$6C,$75,$64,$65,$20
db $27,$65,$6E,$63,$6F,$64,$69,$6E,$67,$2F,$77,$69,$6E,$31,$32,$35
db $31,$2E,$69,$6E,$63,$27,$0D,$0A,$64,$75,$20,$27,$D4,$F3,$ED,$EA
db $F6,$E8,$E8,$20,$EF,$EE,$EB,$FC,$E7,$EE,$E2,$E0,$F2,$E5,$EB,$FF
db $27,?    


on russin language there is just format as text, include utf8encoding and du 'user functions'
when I use other then utf8 encodings - all OK,
but in case of utf8 I got:
Code:
Error: value out of range.
dw 0D7C0h+wide shr 10,0DC00h or(wide and 3FFh)    


what I expect to see in final output: (text file with content)
Code:
$D0, $A4, $D1, $83, $D0, $BD, $D0, $BA, $D1, $86, $D0, $B8, $D0, $B8, ; функции
$20, ; space
$D0, $BF, $D0, $BE, $D0, $BB, $D1, $8C, ; поль
$D0, $B7, $D0, $BE, $D0, $B2, $D0, $B0, ; зова
$D1, $82, $D0, $B5, $D0, $BB, $D1, $8F ; теля    

_________________
I don`t like to refer by "you" to one person.
My soul requires acronim "thou" instead.


Last edited by ProMiNick on 03 Dec 2019, 12:55; edited 1 time in total
Post 02 Dec 2019, 13:48
View user's profile Send private message Send e-mail Reply with quote
ProMiNick



Joined: 24 Mar 2012
Posts: 804
Location: Russian Federation, Sochi
ProMiNick 03 Dec 2019, 07:16
all single cirillic chars will cause error except 3 ones:
Code:
format binary as 'txt'
include 'encoding/utf8.inc'
du 'А'    
or
Code:
format binary as 'txt'
include 'encoding/utf8.inc'
du 'Ё'    
or
Code:
format binary as 'txt'
include 'encoding/utf8.inc'
du 'ё'    


Is it impossible in utf8 to encode single cirillic character?
Post 03 Dec 2019, 07:16
View user's profile Send private message Send e-mail Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8358
Location: Kraków, Poland
Tomasz Grysztar 03 Dec 2019, 08:46
The "encoding/utf8.inc" macro is not to convert to UTF-8, it converts from UTF-8. So your source file should be using UTF-8 (and not Windows 1251 as it does), and what "du" produces is always UTF-16.
Post 03 Dec 2019, 08:46
View user's profile Send private message Visit poster's website Reply with quote
ProMiNick



Joined: 24 Mar 2012
Posts: 804
Location: Russian Federation, Sochi
ProMiNick 03 Dec 2019, 12:54
thanks, it helps.
Code:
macro utf8 [arg] {
 local char,..data,size
        if arg eqtype ''
                virtual at 0
                        ..data::
                        db arg
                        size = $
                end virtual
                repeat size
                        load char byte from ..data:%-1
                        if char < $80
                                db char
                        else
                                load char word from __encoding:char*2
                                if char > $7FF
                                        db $E0 + char shr (6*2),$80 + (char shr 6) and $3F,$80 + char and $3F

                                else
                                        db $C0 + (char shr 6) and $3F,$80 + char and $3F
                                end if
                        end if
                end repeat
        else if arg eqtype 0
                if arg > $7FF
                        db $E0 + arg shr (6*2),$80 + (arg shr 6) and $3F,$80 + arg and $3F
                else if arg > $7F
                        db $C0 + (arg shr 6) and $3F,$80 + arg and $3F
                else
                        db arg
                end if
        else ;let standart directive handle error
                db arg
        end if }

struc utf8 [args]
 { common label . word
   utf8 args }      


use macro(struc) utf8 only as encoding parasit over standart WIN... encodings for du directive.

or at least needed to be defined apropriate table somewhere
Code:
virtual at 0
  __encoding:: 
...
end virtual     


If anybody interested why was needed such conversation - there is small example (from my work - creating in fasm some receipt needed to show some developers that accessible set of operators is catastroficaly small, but yes this set still enought to solve problems - creating that receipt manualy possible too - but more mazahistic).

https://yadi.sk/d/NLkxipN-NJcwpA
Post 03 Dec 2019, 12:54
View user's profile Send private message Send e-mail Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.