flat assembler
Message board for the users of flat assembler.

Index > Main > let's optimize the ASCII table

Goto page Previous  1, 2
Author
Thread Post new topic Reply to topic
f0dder



Joined: 19 Feb 2004
Posts: 3170
Location: Denmark
f0dder
edfed wrote:
for example, all that is done for ASCII will not be able to read or write valid text in fascii
then, it is more secure for a specific use. like a very personnal way to do.
Security through obscurity, and a very low amount of obscurity at that - won't take a lot of time to figure out. Looking at patterns, it will obvious if you're dealing with text... and even without knowledge of the alphabet, you could do statistical analysis. So, security is hardly an argument - not being able to interface with other people is a big real-life disadvantage, though.

edfed wrote:
and francklly, ascii table is very bad designed.
It's not perfect, but at least it's not EBCDIC. And really, just how bad is it? It's a bit silly that upper-case letters come before lower-case, and it would be nice if '0'..'9' came right before the alphabetic entities. And by positioning stuff carefully, more clever bit-tricks would be possible.

But since when has any of this been a problem, speedwise or in any other way? Smile

edfed wrote:
and unfortunatelly, it is THE only TXT standard used all over the globe in every piece of hardware.
Not really - big iron has EBCDIC (not so common to bump into), utf-8 (and other unicode encodings) are pretty common these days. ASCII is really mostly suitable for English text (ie, the original 7-bit ASCII) - once you go into international codepages, many of the clever tricks won't work anymore.

_________________
Image - carpe noctem
Post 01 Feb 2010, 19:16
View user's profile Send private message Visit poster's website Reply with quote
edemko



Joined: 18 Jul 2009
Posts: 549
edemko
baldr wrote:
serfasm,
Code:
         and     eax,(not 0x20202020)
         cmp     eax,'fasm'    
wil never give you ZF==1, because 'fasm'==0x6D736166 (in ASCII) and eax has all 0x20202020 bits cleared. Wink


right, you are, i did a mistake:
Code:
         mov     eax,'FasM'
         or        eax,not 0x20202020 ; eax 'FASM', as every the 6th bit cleared   
    
Post 02 Feb 2010, 07:41
View user's profile Send private message Reply with quote
baldr



Joined: 19 Mar 2008
Posts: 1651
baldr
serfasm,

Not so fast, buddy. Wink Your code yields eax==0xDFFFFFDF.
'FasM' | 0x20202020 == 'fasm', is that what you're trying to achieve?
Post 02 Feb 2010, 09:20
View user's profile Send private message Reply with quote
edemko



Joined: 18 Jul 2009
Posts: 549
edemko
baldr wrote:
serfasm,

Not so fast, buddy. Wink Your code yields eax==0xDFdFdFDF.
'FasM' | 0x20202020 == 'fasm', is that what you're trying to achieve?


Case insensitivity, offered by feryno i liked much.
Must admit he does it bit different.

_msg db 'FasM',0 ;or smth somewhere else
or dword[_msg],0x20202020 ;[_msg] = 'fasm'
and dword[_msg],not 0x20202020 ;[_msg] = 'FASM'
Post 02 Feb 2010, 10:45
View user's profile Send private message Reply with quote
Plue



Joined: 15 Dec 2005
Posts: 151
Plue
Code:
mov eax, '9' ; mov character into eax
and eax, 64  ; check if character is in A-Z or a-z
; zf = 1 if eax is letter
; zf = 0 if eax is not letter    

Now go and have fun with your super efficient ascii set. Razz
Post 02 Feb 2010, 14:23
View user's profile Send private message Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 2902
Location: [RSP+8*5]
bitRAKE
(I'm not going to bother to digress into the difficulties of usage or adoption of an alternate scheme. Rather try to build upon the idea expressed in the thread.)
Having a single bit to differentiate between all the paired symbols could be beneficial as well.

( )
{ }
[ ]
< >

So, we have four pairs - clearly needing three bits: nnnnnXXX. But maybe other pairs should be considered?

/ \
` ' ...or should it be... ' "
_ |
+ - ...or should it be... + *
. ,
: ;

It should be possible to compile an encoding of maximal utility as well as converter routines - both assemble-time and run-time. Once all the groupings are collected then possible ideal encodings become apparent.

Of course, by ideal we mean range testing with minimal instructions and a branch (like the BT [TABLE] approach, but without the cache miss). We would prefer non-destructive (TEST/LAHF) over destructive (AND/OR). We shouldn't forget about all the flags available (parity for example).

ASCII ranges only require two instructions (SUB/CMP) to destructively test without a table.
Post 02 Feb 2010, 18:17
View user's profile Send private message Visit poster's website Reply with quote
f0dder



Joined: 19 Feb 2004
Posts: 3170
Location: Denmark
f0dder
Plue wrote:
Code:
mov eax, '9' ; mov character into eax
and eax, 64  ; check if character is in A-Z or a-z
; zf = 1 if eax is letter
; zf = 0 if eax is not letter    

Now go and have fun with your super efficient ascii set. Razz
Unfortunately this isn't a replacement for isalpha(), which only returns true for ['a';'z']∪['A';'Z'], whereas this method additionally does for [0x40;0x60]∪[0x7B;0xFF].

_________________
Image - carpe noctem
Post 02 Feb 2010, 21:28
View user's profile Send private message Visit poster's website Reply with quote
baldr



Joined: 19 Mar 2008
Posts: 1651
baldr
f0dder,

[0x40; 0x7F]∪[0xC0; 0xFF], maybe? Any one-bit test should divide 256 input values into two sets of 128 each.
Post 02 Feb 2010, 23:07
View user's profile Send private message Reply with quote
Borsuc



Joined: 29 Dec 2005
Posts: 2466
Location: Bucharest, Romania
Borsuc
Hey this isn't a math forum, use '|' instead of 'U' Razz
Post 02 Feb 2010, 23:20
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page Previous  1, 2

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2020, Tomasz Grysztar.

Powered by rwasa.