flat assembler
Message board for the users of flat assembler.

Index > Main > &specialchars; 𞉀

Author
Thread Post new topic Reply to topic
edfed



Joined: 20 Feb 2006
Posts: 4237
Location: 2018
edfed
esi => input string at &
edi still used for output
al for char passing
(ebp point to the current object, optional)

all other registers free.

convert the &...; statement into a char if valid, nothig otherwise.

  will generate a 'space' and esi will point to the next input
&nbspp; will generate nothing and continue like if it was a normal input
&# 123 ; will mov al,123 and esi = current ';'

Code:
db '                '
db '                '
db '  "   &         '
db '                '

db '                '
db '                '
db '                '
db '                '

db '€‚ƒ„…†‡ˆ‰Š<ŒŽ'
db '‘’“”•–—˜™š>œžŸ'
db ' ¡¢£¤¥¦§¨©ª«¬­®¯'
db '°±²³´µ¶·¸¹º»¼½¾¿'

db 'ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏ'
db 'ÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞß'
db 'àáâãäåæçèéêëìíîï'
db 'ðñòóôõö÷øùúûüýþÿ'
    


Last edited by edfed on 07 Apr 2008, 00:30; edited 2 times in total
Post 03 Apr 2008, 21:30
View user's profile Send private message Visit poster's website Reply with quote
vid
Verbosity in development


Joined: 05 Sep 2003
Posts: 7105
Location: Slovakia
vid
use table of 256 pointers to ascii name of character, if pointer is NULL character is okay...
Post 03 Apr 2008, 21:47
View user's profile Send private message Visit poster's website AIM Address MSN Messenger ICQ Number Reply with quote
edfed



Joined: 20 Feb 2006
Posts: 4237
Location: 2018
edfed
like the code i posted, ok, i'll work on it. i tryed many solutions to find the best one

no need of the char after the name, only the offset in table is ok.


problem is for the unicode chars.
there are some posts on the board in cyrilic, maybe the [french]slave[/french] peoples will do this when browser contest will start.

edit: yes yes yes. thanks vid. my brain is really tired.


Last edited by edfed on 05 Apr 2008, 13:52; edited 1 time in total
Post 03 Apr 2008, 22:02
View user's profile Send private message Visit poster's website Reply with quote
edfed



Joined: 20 Feb 2006
Posts: 4237
Location: 2018
edfed
now, just a little problem for macroaddicts. how to genrerate a string containing all the &#xx; i want, with <br> each 20 declarations with macro or stuff like this?

anyone have an idea?

Code:
times 256 db ??????????
    


thanks for replies.
Post 05 Apr 2008, 13:51
View user's profile Send private message Visit poster's website Reply with quote
vid
Verbosity in development


Joined: 05 Sep 2003
Posts: 7105
Location: Slovakia
vid
edfed: read description of "times" directive in FASM manual
Post 05 Apr 2008, 14:45
View user's profile Send private message Visit poster's website AIM Address MSN Messenger ICQ Number Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17279
Location: In your JS exploiting you and your system
revolution
edfed wrote:
how to genrerate a string containing all the &#xx;
Code:
rept 256 v:0 {db "&",`v,"xx;",13,10}    
Post 05 Apr 2008, 14:51
View user's profile Send private message Visit poster's website Reply with quote
vid
Verbosity in development


Joined: 05 Sep 2003
Posts: 7105
Location: Slovakia
vid
sorry, not "times", you'd need "rept"
Post 05 Apr 2008, 14:56
View user's profile Send private message Visit poster's website AIM Address MSN Messenger ICQ Number Reply with quote
edfed



Joined: 20 Feb 2006
Posts: 4237
Location: 2018
edfed
thanks revolution, very funny your joke Laughing
Code:
rept 256 v:0 {db "&#",`v,";",13,10}    


problem, the 10,13 bytes are not generated, strange and re-strange.
:s
Post 05 Apr 2008, 15:39
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17279
Location: In your JS exploiting you and your system
revolution
edfed wrote:
thanks revolution, very funny your joke
I often find my jokes tend to pass by most people here without them noticing/commenting. I think I need to work on my sense of humour and make it plainer ... on second thought, nah, I don't wanna change me, too much like hard work.
Post 05 Apr 2008, 16:17
View user's profile Send private message Visit poster's website Reply with quote
edfed



Joined: 20 Feb 2006
Posts: 4237
Location: 2018
edfed
i like your humour, really, i laungh when i saw the result.
&1xx; the xx, just to notify me a sort of idiot question, but giving the result, jusyt have to correct it.

but about he absence of 10,13, i don't understand, it don't generate them at all. Sad

hem, little corection, it works, just i was looking in the html view instead of the txt view, erf, then, it's ok, the 10,13 are well generated. no problem, thanks again revolution. Smile

my stupidness is increasing exponentially, it's the fatality
Post 05 Apr 2008, 17:00
View user's profile Send private message Visit poster's website Reply with quote
edfed



Joined: 20 Feb 2006
Posts: 4237
Location: 2018
edfed
HI! i cannot find the solution and leave everyone in the obscurantism..
then, i post the function test for XHTML type special chars in a file.(cause i'm proud to made it and want to share it to enhance the FASM based code database.)
the only external data is named .limit, it is a local equate to set the limit of the file.

output the char in al.
if cf set, then, ok, if not, then, char can be concidered as 'blank', no char.

ESI points to the current Char in the string. and will point to "current" ';' Char when finished, jumping over the &... statment
to support the extended unicodes, need some very little modifications.
Code:
specialchar:
.limit equ ds:ebp+xhtml.instop
        cmp al,'&'
        je .special
        cmp al,' '
        je @f
        cmp al,9
        je @f
        cmp al,0ah
        je @f
        cmp al,0dh
        je @f
        jmp .ok
@@:
        cmp byte[edi-1],' '
        clc
        je @f
        mov al,' '
.ok:
        stc
@@:
        ret
.special:
        lea edx,[esi+1]
        cmp byte[edx],'#'
        je .num
        mov ecx,34
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;;;;;;;;;;;;;;       replace it by the DAWG algoryhtm     ;;;;;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
.name:                                    
        mov ebx,[htchars+ecx*4]
        or ebx,ebx
        je .none?
@@:
        mov ah,[ebx]
        mov al,[edx]
        inc ebx
        inc edx
        or al,al
        je .none?
        cmp al,ah
        jne .none?
        cmp al,';'
        jne @b
        lea esi,[edx-1]
        mov al,cl
        jmp .end
.none?:
        lea edx,[esi+1]
        inc cl
        jne .name
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
.none:
        mov al,[esi]
.end:
        or al,al
        jne .ok
        clc
        ret
.num:
        xor ebx,ebx
        cmp byte[edx+1],'x'
        je .hex
        cmp byte[edx+1],'X'
        je .hex
@@:
        inc edx
        cmp edx,[.limit]
        jge .none
        movzx eax,byte[edx]
        cmp al,';'
        je .num?
        cmp al,'0'
        jl .none
        cmp al,'9'
        jg .none
        sub al,'0'
        lea ebx,[ebx*5]
        lea ebx,[ebx*2+eax]
        jmp @b
.hex:
        inc edx
@@:
        inc edx
        cmp edx,[.limit]
        jge .none
        movzx eax,byte[edx]
        cmp al,';'
        je .num?
        cmp al,'9'
        jg .alpha?
        cmp al,'0'
        jl .none
        sub al,'0'
        shl ebx,4
        add ebx,eax
        jmp @b
.alpha?:
        or al,20h
        cmp al,'a'
        jl .none
        cmp al,'f'
        jg .none
        sub al,'a'-10
        shl ebx,4
        add ebx,eax
        jmp @b
.num?:
        or ebx,ebx
        jl .none
        cmp ebx,100h
        jnl .none
        mov al,bl
        mov esi,edx
        jmp .ok
htchars:
dd 0000,0000,0000,0000,0000,0000,0000,0000,0000,0000,0000,0000,0000,0000,0000,0000
dd 0000,0000,0000,0000,0000,0000,0000,0000,0000,0000,0000,0000,0000,0000,0000,0000
dd 0000,0000, .34,0000,0000,0000, .38,0000,0000,0000,0000,0000,0000,0000,0000,0000
dd 0000,0000,0000,0000,0000,0000,0000,0000,0000,0000,0000,0000,0000,0000,0000,0000

dd 0000,0000,0000,0000,0000,0000,0000,0000,0000,0000,0000,0000,0000,0000,0000,0000
dd 0000,0000,0000,0000,0000,0000,0000,0000,0000,0000,0000,0000,0000,0000,0000,0000
dd 0000,0000,0000,0000,0000,0000,0000,0000,0000,0000,0000,0000,0000,0000,0000,0000
dd 0000,0000,0000,0000,0000,0000,0000,0000,0000,0000,0000,0000,0000,0000,0000,0000

dd .128,0000,0000,0000,0000,0000,0000,0000,0000,0000,0000,.139,0000,0000,0000,0000
dd 0000,0000,0000,0000,0000,0000,0000,0000,0000,0000,0000,.155,.156,0000,0000,.159
dd .160,.161,.162,.163,.164,.165,.166,.167,.168,.169,.170,.171,.172,.173,.174,.175
dd .176,.177,.178,.179,.180,.181,.182,.183,.184,.185,.186,.187,.188,.189,.190,.191

dd .192,.193,.194,.195,.196,.197,.198,.199,.200,.201,.202,.203,.204,.205,.206,.207
dd .208,.209,.210,.211,.212,.213,.214,.215,.216,.217,.218,.219,.220,.221,.222,.223
dd .224,.225,.226,.227,.228,.229,.230,.231,.232,.233,.234,.235,.236,.237,.238,.239
dd .240,.241,.242,.243,.244,.245,.246,.247,.248,.249,.250,.251,.252,.253,.254,.255
.ch:
.34     db 'quot;'
.38     db 'amp;'
.128    db 'euro;'
.139    db 'lt;'
.155    db 'gt;'
.156    db 'oelig;'
.159    db 'Yuml;'
.160    db 'nbsp;'
.161    db 'iexcl;'
.162    db 'cent;'
.163    db 'pound;'
.164    db 'curren;'
.165    db 'yen;'
.166    db 'brvbar;'
.167    db 'sect;'
.168    db 'uml;'
.169    db 'copy;'
.170    db 'ordf;'
.171    db 'laquo;'
.172    db 'not;'
.173    db 'shy;'
.174    db 'reg;'
.175    db 'masr;'
.176    db 'deg;'
.177    db 'plusmn;'
.178    db 'sup2;'
.179    db 'sup3;'
.180    db 'acute;'
.181    db 'micro;'
.182    db 'para;'
.183    db 'middot;'
.184    db 'cedil;'
.185    db 'sup1;'
.186    db 'ordm;'
.187    db 'raquo;'
.188    db 'frac14;'
.189    db 'frac12;'
.190    db 'frac34;'
.191    db 'iquest;'
.192    db 'Agrave;'
.193    db 'Aacute;'
.194    db 'Acirc;'
.195    db 'Atilde;'
.196    db 'Auml;'
.197    db 'Aring;'
.198    db 'Aelig;'
.199    db 'Ccedil;'
.200    db 'Egrave;'
.201    db 'Eacute;'
.202    db 'Ecirc;'
.203    db 'Euml;'
.204    db 'Igrave;'
.205    db 'Iacute;'
.206    db 'Icirc;'
.207    db 'Iuml;'
.208    db 'eth;'
.209    db 'Ntilde;'
.210    db 'Ograve;'
.211    db 'Oacute;'
.212    db 'Ocirc;'
.213    db 'Otilde;'
.214    db 'Ouml;'
.215    db 'times;'
.216    db 'Oslash;'
.217    db 'Ugrave;'
.218    db 'Uacute;'
.219    db 'Ucirc;'
.220    db 'Uuml;'
.221    db 'Yacute;'
.222    db 'thorn;'
.223    db 'szlig;'
.224    db 'agrave;'
.225    db 'aacute;'
.226    db 'acirc;'
.227    db 'atilde;'
.228    db 'auml;'
.229    db 'aring;'
.230    db 'aelig;'
.231    db 'ccedil;'
.232    db 'egrave;'
.233    db 'eacute;'
.234    db 'ecirc;'
.235    db 'euml;'
.236    db 'igrave;'
.237    db 'iacute;'
.238    db 'icirc;'
.239    db 'iuml;'
.240    db 'eth;'
.241    db 'ntilde;'
.242    db 'ograve;'
.243    db 'oacute;'
.244    db 'ocirc;'
.245    db 'otilde;'
.246    db 'ouml;'
.247    db 'divide;'
.248    db 'oslash;'
.249    db 'ugrave;'
.250    db 'uacute;'
.251    db 'ucirc;'
.252    db 'uuml;'
.253    db 'yacute;'
.254    db 'thorn;'
.255    db 'yuml;'
    

then, it can be used in any code without modification, like ever when i try to code a function. but this is not always possible

Compiles in 1891 bytes because of the arrays.


Description: just include it, be sure you don't still use the specialchar and htchars labels, in your code. and call specialchar before to display it.
Download
Filename: charmodules.inc
Filesize: 4.47 KB
Downloaded: 171 Time(s)



Last edited by edfed on 12 Jun 2008, 22:31; edited 3 times in total
Post 07 Apr 2008, 00:19
View user's profile Send private message Visit poster's website Reply with quote
f0dder



Joined: 19 Feb 2004
Posts: 3170
Location: Denmark
f0dder
Instead of a table of pointers, you could treat the special symbols as one long string, and use 16-bit index into this string... saves a small amount of space :]
Post 07 Apr 2008, 11:33
View user's profile Send private message Visit poster's website Reply with quote
edfed



Joined: 20 Feb 2006
Posts: 4237
Location: 2018
edfed
it can save 768 bytes, then yes, it can be interresting. but it will increase the complexity of the algo.. note as a future optimisation to make.

but for now, i leave this and plan or an other component.

thx 4 ze suggestion. Smile
Post 07 Apr 2008, 11:39
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17279
Location: In your JS exploiting you and your system
revolution
You can store the table as a TRIE (google it), or even a DAWG (google it). The DAWG gives good size and very fast searching. I always use DAWG's for any fixed text searches and I recommend it.
Post 07 Apr 2008, 11:47
View user's profile Send private message Visit poster's website Reply with quote
Rahsennor



Joined: 07 Jul 2007
Posts: 61
Rahsennor
Directed Acyclic Word Graphs Very Happy

Off topic: Revolution, thank you. I've been looking for something like these for what seems like forever.
Post 08 Apr 2008, 05:22
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 17279
Location: In your JS exploiting you and your system
revolution
Rahsennor wrote:
I've been looking for something like these for what seems like forever.
You could have asked earlier and I would've told ya. All it needed was a simple four letter word.
Post 08 Apr 2008, 06:08
View user's profile Send private message Visit poster's website Reply with quote
sakeniwefu



Joined: 23 Mar 2008
Posts: 29
sakeniwefu
Too good! :O
Post 09 Apr 2008, 00:26
View user's profile Send private message Reply with quote
Rahsennor



Joined: 07 Jul 2007
Posts: 61
Rahsennor
revolution wrote:
You could have asked earlier and I would've told ya.

Embarassed
I had kind of given up...
Nevemind. I know now. Very Happy
Post 09 Apr 2008, 04:31
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2020, Tomasz Grysztar.

Powered by rwasa.