flat assembler
Message board for the users of flat assembler.

Index > Main > let's optimize the ASCII table

Goto page 1, 2  Next
Author
Thread Post new topic Reply to topic
Teehee



Joined: 05 Aug 2009
Posts: 570
Location: Brazil
Teehee 31 Jan 2010, 18:53
the current ASCII table is something like:
Code:
[null][special_chars][space]!#$%&()*+,-./0123456789:<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~[del][+chars][andmore]    


The most significant part of a text are the letters [if you want to check a syntax you will use them alot], so if we want to check for them we need to do something like:

Code:
isLetter:
     cmp             al,'A'
    jb              .n
  cmp             al,'z'
    ja              .n
  cmp             al,'Z'
    jbe             .y
  cmp             al,'a'
    jb              .n
.y:
.n:
    ret    


Ok, so if we rewrite the table to something like:
Code:
[null][special_chars][space]!#$%&()*+,-./:<=>?@[\]^_`{|}~[del][+chars][andmore]0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz    


we just need to check for:
Code:
isLetter:
     cmp     al,VALUE_OF_FIRST_LETTER ; 'A'
    jb      .no
 ; yes
.no:
   ret    

3x faster.

What about that? Razz

_________________
Sorry if bad english.
Post 31 Jan 2010, 18:53
View user's profile Send private message Reply with quote
Borsuc



Joined: 29 Dec 2005
Posts: 2465
Location: Bucharest, Romania
Borsuc 31 Jan 2010, 19:23
Good luck making it "catch on" since ASCII is pretty much impossible to change now that it is used almost everywhere.

Although I agree that ASCII is not very well designed. Nothing we can do about it though, unless you're talking about the internal workings of your application? In that case you may not even need special symbols or other crap.
Post 31 Jan 2010, 19:23
View user's profile Send private message Reply with quote
Teehee



Joined: 05 Aug 2009
Posts: 570
Location: Brazil
Teehee 31 Jan 2010, 19:30
I was wondering why they not thought that before they made the table. Anyway, there is a tip to the new OS's Razz

Quote:
unless you're talking about the internal workings of your application
Is it possible to use my own table in an application? [i mean, for instance, in a Win app]
Post 31 Jan 2010, 19:30
View user's profile Send private message Reply with quote
windwakr



Joined: 30 Jun 2004
Posts: 827
windwakr 31 Jan 2010, 19:41
You could do something with xlat, couldn't you? But that probably would be slower.
Post 31 Jan 2010, 19:41
View user's profile Send private message Reply with quote
bitshifter



Joined: 04 Dec 2007
Posts: 796
Location: Massachusetts, USA
bitshifter 31 Jan 2010, 20:32
Here is some good tricks to study.
http://dex.7.forumer.com/viewtopic.php?p=5278#5278
Post 31 Jan 2010, 20:32
View user's profile Send private message Reply with quote
Borsuc



Joined: 29 Dec 2005
Posts: 2465
Location: Bucharest, Romania
Borsuc 31 Jan 2010, 21:47
Teehee wrote:
Is it possible to use my own table in an application? [i mean, for instance, in a Win app]
I don't know, I was talking about a more custom app -- say a video game for example. Let's say you have to make a custom font format, you might as well use a custom character encoding too. Wink

_________________
Previously known as The_Grey_Beast
Post 31 Jan 2010, 21:47
View user's profile Send private message Reply with quote
roboman



Joined: 03 Dec 2006
Posts: 122
Location: USA
roboman 01 Feb 2010, 00:37
While you are redefining ascii you might want to stick the numbers first.
Code:
0123456789[null][special_chars][space]!#$%&()*+,-./:<=>?@[\]^_`{|}~[del][+chars][andmore]ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz    

so that 00000000 = 0, 00000001=1 ect or maybe just do numbers then letters then the rest.
Code:
0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz[null][special_chars][space]!#$%&()*+,-./:<=>?@[\]^_`{|}~[del][+chars][andmore]
    


then you do have a problem with null.....
Post 01 Feb 2010, 00:37
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20451
Location: In your JS exploiting you and your system
revolution 01 Feb 2010, 00:43
I once tried to redefine the character set for my OSes. In my situation putting 'A'=1, B='2' etc. would have been really convenient in the central processing loop. But, while it worked just fine, it was harder to write the source code. The editor programs (notepad, et. al.) don't use such a coding scheme and expect ASCII. So while the change may at first appear to be beneficial, it will likely hurt you when you are dealing with editing the source.
Post 01 Feb 2010, 00:43
View user's profile Send private message Visit poster's website Reply with quote
baldr



Joined: 19 Mar 2008
Posts: 1651
baldr 01 Feb 2010, 01:51
Teehee,

If the entire point of reordering is to simplify character classification, you should definitely look at C run-time library (isXxx functions). Look-up table and some simple logic, like test byte[_ctype+eax], _HEX to detect hexadecimal digit char.

__________
Borsuc,

ASCII is not very well designed? Take a look at EBCDIC then (ahhh, those mainframes!). Wink
Post 01 Feb 2010, 01:51
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20451
Location: In your JS exploiting you and your system
revolution 01 Feb 2010, 02:10
Actually there is a huge barrier to optimisation of ASCII (and anything really). And that is: what are you optimising for? You can specifically design a character set to be superb in one particular measurement aspect but I guarantee it would really suck in other aspects. Each application has it's own needs and no one single encoding scheme could ever hope to be optimised for all situations. And indeed, a compromise encoding scheme would probably not be optimised for any application. It is all about trade-offs.
Post 01 Feb 2010, 02:10
View user's profile Send private message Visit poster's website Reply with quote
shoorick



Joined: 25 Feb 2005
Posts: 1614
Location: Ukraine
shoorick 01 Feb 2010, 06:35
what about those who live in "[+chars][andmore]" area? Smile))
Post 01 Feb 2010, 06:35
View user's profile Send private message Visit poster's website Reply with quote
f0dder



Joined: 19 Feb 2004
Posts: 3175
Location: Denmark
f0dder 01 Feb 2010, 10:52
Borsuc wrote:
Although I agree that ASCII is not very well designed.
At least it's not EBCDIC Smile

_________________
Image - carpe noctem
Post 01 Feb 2010, 10:52
View user's profile Send private message Visit poster's website Reply with quote
edemko



Joined: 18 Jul 2009
Posts: 549
edemko 01 Feb 2010, 11:18
Code:

;Here is a way feryno often comes:
         mov     eax,'AbCd'     ;example only
         or      eax,'    '     ;remember that ' '-char is 00100000b = 32d
         cmp     eax,'abcd'     ;cool, we've just added 32d 4 times for a once
;Or so:
         mov     al,00001001b   ;TAB-char
         and     al,11011111b   ;TAB is untouched         
;The problem is you have to know, which char you are to perform.
;This way or another it saves time:
         mov     eax,'FaSm'
         and     eax,(not 0x20202020) 
         cmp     eax,'FASM' 

    


Last edited by edemko on 02 Feb 2010, 07:36; edited 1 time in total
Post 01 Feb 2010, 11:18
View user's profile Send private message Reply with quote
edfed



Joined: 20 Feb 2006
Posts: 4353
Location: Now
edfed 01 Feb 2010, 12:23
why not trying to do our own spec for a fascii?

and our own implementations too.

then it will be original, and hard to reverse if you don't know about fascii
Post 01 Feb 2010, 12:23
View user's profile Send private message Visit poster's website Reply with quote
f0dder



Joined: 19 Feb 2004
Posts: 3175
Location: Denmark
f0dder 01 Feb 2010, 12:27
edfed wrote:
why not trying to do our own spec for a fascii?
For what advantage? The disadvantages are huge.

_________________
Image - carpe noctem
Post 01 Feb 2010, 12:27
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20451
Location: In your JS exploiting you and your system
revolution 01 Feb 2010, 12:30
f0dder wrote:
For what advantage?
Maybe the 'f' stand for 'fun'. Razz
Post 01 Feb 2010, 12:30
View user's profile Send private message Visit poster's website Reply with quote
MHajduk



Joined: 30 Mar 2006
Posts: 6115
Location: Poland
MHajduk 01 Feb 2010, 12:37
In my opinion, most convenient way of checking if some ASCII char belongs to the some subset of ASCII char set is to use bit tables and assembler bt instruction. This is especially useful if our subset of ASCII set is complicated and "dispersed". Here you have a simple scheme how this method works:
  • Build the bit table first. This table consists of 256 bits (i.e. 32 bytes = 8 dwords) and represents so-called characteristic function of our char subset. Every n-th bit corresponds to the char with code n (if bit is equal to 1 then corresponding character belongs to our subset).

  • Check if the character with code in given register (for example eax) belongs to our subset:
    Code:
        bt      [CharTab], eax
          jc      .yes
            jnc     .no
        
Where 'CharTab' is our bit table defined somewhere in the code.

Here you have some already prepared bit tables for common character sets:
Code:
; White chars (SPACE, HT = tab, CR, LF).
;
WhiteCharsTab  dd 0x00002600, 0x00000001
           rd 6

; Decimal digits.
;
DecDigitsTab      dd 0, 0x03FF0000
            rd 6

; Decimal digits without 0.
;
Digits1_9Tab    dd 0, 0x03FE0000
            rd 6

; Hexadecimal digits.
;
HexDigitsTab  dd 0, 0x03FF0000, 0x0000007E
                rd 5

; Alphanumeric chars (a-z, A-Z, 0-9, _).
;
AlphaNumTab        dd 0, 0x03FF0000, 0x87FFFFFE, 0x07FFFFFE
            rd 4

; Characters a-z, A-Z and underscore symbol.
;
AlphaTab       rd 2
                dd 0x87FFFFFE, 0x07FFFFFE
           rd 4
    
Post 01 Feb 2010, 12:37
View user's profile Send private message Visit poster's website Reply with quote
baldr



Joined: 19 Mar 2008
Posts: 1651
baldr 01 Feb 2010, 14:54
serfasm,
Code:
         and     eax,(not 0x20202020)
         cmp     eax,'fasm'    
wil never give you ZF==1, because 'fasm'==0x6D736166 (in ASCII) and eax has all 0x20202020 bits cleared. Wink
Post 01 Feb 2010, 14:54
View user's profile Send private message Reply with quote
Borsuc



Joined: 29 Dec 2005
Posts: 2465
Location: Bucharest, Romania
Borsuc 01 Feb 2010, 18:10
I have to admit I never heard of EBCDIC before... Razz

@MHajduk: very clever... awesome Very Happy
Post 01 Feb 2010, 18:10
View user's profile Send private message Reply with quote
edfed



Joined: 20 Feb 2006
Posts: 4353
Location: Now
edfed 01 Feb 2010, 18:45
yep, for fun.and advantages are enormous.

for example, all that is done for ASCII will not be able to read or write valid text in fascii
then, it is more secure for a specific use. like a very personnal way to do.

that's not because norms (people that make it are frequentlly far away of reallity, like politics) are imposed that we should always follow them all our life.

then, if i decide to make my implementation of any ASM for an other CPU (like PIC16FXX) i will use fasm, and a macro system to convert x86 mnemonics into a PIC bytecode. without any mnemonic from microship.

then, only X86 coders will understand the code.

and francklly, ascii table is very bad designed.
and unfortunatelly, it is THE only TXT standard used all over the globe in every piece of hardware.

but ascii will never die. all the text i write is ASCII.
Post 01 Feb 2010, 18:45
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page 1, 2  Next

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.