flat assembler
Message board for the users of flat assembler. Index > Main > I hate to ask, but HELP!
Author
 Thread  Patrick_

Joined: 11 Mar 2006
Posts: 53
Location: 127.0.0.1
Patrick_ 15 Aug 2008, 19:24
I'm rewriting the NeuQuant image quantizing algorithm in assembly. I use it to process GIF's in one of my programs, but I was hoping to get some better performance out of it, therefore I'm rewriting it in assembly.

Anyway, I've looked over every instruction about 5 times but still can't find the issue. To me the code exactly matches up to the C code, so I'm thinking I'm doing something in assembly the wrong way.

I know, some of the code isn't the best; currently it's not highly optimized, as I'm just trying to get it to work. As you can see it uses the C calling convention because these functions are called from a C program. If anyone can find the problem, I'd *really* appreciate it. Here's the code:

neuquant.inc:

Code:
```;; neuquant.inc: Include file for neuquant.asm.

NET_SIZE                = 256   ;number of colors used

;; For 256 colours, fixed arrays need 8kb, plus space for the image

;; four primes near 500 - assume no image has a length so large */
;; that it is divisible by all four primes */

PRIME_1         = 499
PRIME_2         = 491
PRIME_3         = 487
PRIME_4         = 503

MIN_PICTURE_BYTES       = (3*PRIME_4)   ;minimum size for input image

MAX_NET_POS             = (NET_SIZE-1)
NET_BIAS_SHIFT          = 4             ;bias for color values
N_CYCLES                = 100           ;number of learning cycles

;; Definitions for freq and bias
INT_BIAS_SHIFT          = 16    ;bias for fractions
INT_BIAS                = (1 shl INT_BIAS_SHIFT)
GAMMA_SHIFT             = 10    ;gamma = 1024
GAMMA                   = (1 shl GAMMA_SHIFT)
BETA_SHIFT              = 10
BETA                    = (INT_BIAS shr BETA_SHIFT)     ;beta == 1/1024
BETA_GAMMA              = (INT_BIAS shl (GAMMA_SHIFT - BETA_SHIFT))

;; Definitions for decreasing radius factor
RADIUS_BIAS_SHIFT       = 6                     ;at 32.0 biased by 6 bits
RADIUS_DEC              = 30                            ;factor of 1/30 each cycle

;; Definitions for decreasing alpha factor
ALPHA_BIAS_SHIFT        = 10    ;alpha strats at 1.0
INIT_ALPHA              = (1 shl ALPHA_BIAS_SHIFT)

BEST_D_CONTEST          = not (1 shr 31)

macro ccall proc,[arg]                  ; directly call CDECL procedure
{ common
size@ccall = 0
if ~ arg eq
reverse
pushd arg
size@ccall = size@ccall+4
common
end if
call proc
if size@ccall
end if }

push    ecx edx ebx
sub     esp, 4
mov     [esp], esp
push    ebp esi edi
}

pop     edi esi ebp
pop     ebx edx ecx
}

PNOEAX_OFFSET   = 28

neuquant.asm:

Code:
```;; NeuQuant Neural-Net Quantization Algorithm
;; x86 assembly rewrite based on NEUQUANT.C:
;; ------------------------------------------
;;
;; Copyright (c) 1994 Anthony Dekker
;;
;; NEUQUANT Neural-Net quantization algorithm by Anthony Dekker, 1994.
;; See "Kohonen neural networks for optimal colour quantization"
;; in "Network: Computation in Neural Systems" Vol. 5 (1994) pp 351-367.
;; for a discussion of the algorithm.
;;
;; Any party obtaining a copy of these files from the author, directly or
;; indirectly, is granted, free of charge, a full and unrestricted irrevocable,
;; world-wide, paid up, royalty-free, nonexclusive right and license to deal
;; in this software and documentation files (the "Software"), including without
;; limitation the rights to use, copy, modify, merge, publish, distribute, sublicense,
;; and/or sell copies of the Software, and to permit persons who receive
;; copies from any such party to do so, with the only requirement being
;; that this copyright notice remain intact.
;;
;; ------------------------------------------
;; x86 assembly rewrite:
;;
;;
;; Permission to use, copy, modify, and/or distribute this software for any
;; purpose with or without fee is hereby granted, provided that the above
;; copyright notice and this permission notice appear in all copies.
;;
;; THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
;; WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
;; MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
;; ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
;; WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
;; ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
;; OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
format MS COFF

public neuquant_initnet as '_neuquant_initnet'
public neuquant_unbiasnet as '_neuquant_unbiasnet'
public neuquant_writecolourmap as '_neuquant_writecolourmap'
public neuquant_inxbuild as '_neuquant_inxbuild'
public neuquant_inxsearch as '_neuquant_inxsearch'
public neuquant_learn as '_neuquant_learn'

include 'neuquant.inc'

;; @neuquant_initnet(unsigned char *the_pic, int len, int sample):
;;
;; Initialise network in range (0,0,0) to (255,255,255) and set parameters
;; Returns nothing.

neuquant_initnet:

mov     ebp, esp

mov     eax, dword [ebp+ARG_THE_PIC]    ;thepicture = thepic;
mov     dword [the_picture], eax
mov     eax, dword [ebp+ARG_LEN]        ;lengthcount = len;
mov     dword [length_count], eax
mov     eax, dword [ebp+ARG_SAMPLE]     ;samplefac = sample;
mov     dword [sample_fac], eax

;; for (i=0; i<netsize; i++) {
xor     ecx, ecx
.for_begin:
cmp     ecx, NET_SIZE
jnl     .for_end

imul    edi, ecx, network_WIDTH*4               ;p = network[i];

;; p = p = p = (i << (netbiasshift+ )/netsize;
mov     eax, ecx
shl     eax, NET_BIAS_SHIFT+8
mov     ebx, NET_SIZE
xor     edx, edx
idiv    ebx
mov     dword [edi], eax
mov     dword [edi+1*4], eax
mov     dword [edi+2*4], eax

mov     dword [freq+ecx*4], INT_BIAS/NET_SIZE   ;freq[i] = intbias/netsize;
mov     dword [bias+ecx*4], 0                   ;bias[i] = 0;

jmp     .for_begin
.for_end:
.return:
ret

;; @neuquant_unbiasnet(void):
;;
;; Unbias network to give byte values 0..255 and record position i to prepare for sort
;; Returns nothing.
neuquant_unbiasnet:

mov     ebp, esp

;; for (i=0; i<netsize; i++) {
xor     ecx, ecx
.for_one_begin:
cmp     ecx, NET_SIZE
jnl     .for_one_after

imul    ebx, ecx, network_WIDTH*4

;; for (j=0; j<3; j++) {
xor     edx, edx
.for_two_begin:
cmp     edx, 3
jnl     .for_two_after

;; temp = (network[i][j] + (1 << (netbiasshift - 1))) >> netbiasshift;
mov     eax, dword [ebx+edx*4]
add     eax, (1 shl (NET_BIAS_SHIFT - 1))
sar     eax, NET_BIAS_SHIFT

;; if (temp > 255) temp = 255;
cmp     eax, 255
jng     @f
mov     eax, 255
@@:
;; network[i][j] = temp;
mov     dword [ebx+edx*4], eax

jmp     .for_two_begin
.for_two_after:
;; network[i] = i;
mov     dword [ebx+3*4], ecx

jmp     .for_one_begin

.for_one_after:
.return:
ret

;; @neuquant_writecolourmap(GifColorType *f):
;;
;; Output colormap to GifColorType array.
;; Returns nothing.
neuquant_writecolourmap:

mov     ebp, esp

;; for (j=0; j<netsize; j++) {
xor     ecx, ecx
mov     ebx, dword [ebp+ARG_COLORMAP]
.for_one:
cmp     ecx, NET_SIZE
jnl     .for_one_after

imul    eax, ecx, network_WIDTH*4

mov     edx, dword [eax+2*4]            ;f[j].Red = network[j];
mov     byte [ebx+0], dl
mov     edx, dword [eax+1*4]            ;f[j].Green = network[j];
mov     byte [ebx+1], dl
mov     edx, dword [eax+0*4]            ;f[j].Blue = network[j];
mov     byte [ebx+2], dl

jmp     .for_one

.for_one_after:

.return:
ret

;; neuquant_inxbuild(void):
;;
;; Insertion sort of network and building of netindex[0..255] (to do after unbias)
;; Returns nothing.
ST_PREV_COL     = -4
ST_START_POS    = -8
ST_ALLOC        = 8

neuquant_inxbuild:

mov     ebp, esp
sub     esp, ST_ALLOC

mov     dword [ebp+ST_PREV_COL], 0
mov     dword [ebp+ST_START_POS], 0

;; for (i=0; i<netsize; i++) {
xor     ecx, ecx
.for_one:
cmp     ecx, NET_SIZE
jnl     .for_one_after

imul    esi, ecx, network_WIDTH*4       ;p = network[i];

mov     eax, ecx                        ;smallpos = i
mov     ebx, dword [esi+1*4]            ;smallval = p

;; for (j=i+1; j<netsize; j++)
mov     edx, ecx
.for_two:
cmp     edx, NET_SIZE
jnl     .for_two_after

imul    edi, edx, network_WIDTH*4       ;q = network[j]

cmp     dword [edi+1*4], ebx            ;if (q < smallval)
jnl     @f
mov     eax, edx                        ;smallpos = j
mov     ebx, dword [edi+1*4]            ;smallval = q
@@:
jmp     .for_two
.for_two_after:

imul    edi, eax, network_WIDTH*4       ;q = network[smallpos]

cmp     ecx, eax                        ;if (i != smallpos)
jne     @f

push    ecx

;; j = q;   q = p;   p = j;
mov     edx, dword [edi+0*4]
mov     ecx, dword [esi+0*4]
mov     dword [edi+0*4], ecx
mov     dword [esi+0*4], edx
;; j = q;   q = p;   p = j;
mov     edx, dword [edi+1*4]
mov     ecx, dword [esi+1*4]
mov     dword [edi+1*4], ecx
mov     dword [esi+1*4], edx
;; j = q;   q = p;   p = j;
mov     edx, dword [edi+2*4]
mov     ecx, dword [esi+2*4]
mov     dword [edi+2*4], ecx
mov     dword [esi+2*4], edx
;; j = q;   q = p;   p = j;
mov     edx, dword [edi+3*4]
mov     ecx, dword [esi+3*4]
mov     dword [edi+3*4], ecx
mov     dword [esi+3*4], edx

pop     ecx
@@:
;; We no longer need esi or edi; they're reset at the top.
mov     esi, dword [ebp+ST_PREV_COL]
cmp     ebx, esi        ;if (smallval != previouscol)
jne     .for_one_reloop

;; netindex[previouscol] = (startpos+i)>>1;
mov     edi, dword [ebp+ST_START_POS]
sar     edi, 1
mov     dword [net_index+esi*4], edi

;;for (j=previouscol+1; j<smallval; j++)
mov     edx, esi
.for_three:
cmp     edx, ebx
jnl     .for_three_after
mov     dword [net_index+edx*4], ecx    ;netindex[j] = i
jmp     .for_three
.for_three_after:
mov     dword [ebp+ST_PREV_COL], ebx    ;previouscol = smallval;
mov     dword [ebp+ST_START_POS], ecx   ;startpos = i

.for_one_reloop:
jmp     .for_one

.for_one_after:
;; netindex[previouscol] = (startpos+maxnetpos)>>1;
mov     eax, dword [ebp+ST_START_POS]
sar     eax, 1
mov     ebx, dword [ebp+ST_PREV_COL]
mov     dword [net_index+ebx*4], eax

;; for (j=previouscol+1; j<256; j++)
.for_last:
cmp     ebx, 256
jnl     .for_last_after
mov     dword [net_index+ebx*4], MAX_NET_POS    ;netindex[j] = maxnetpos
jmp     .for_last
.for_last_after:

.return:
ret

;; neuquant_inxsearch(int b, int g, int r):
;;
;; Search for BGR values 0..255 (after net is unbiased) and return colour index
;; We discard esp for non-stack usage, so no pushing/popping after esp is changed.
;; This function returns the location in the colormap.
ARG_B   = PNOEAX_OFFSET+4
ARG_G   = PNOEAX_OFFSET+8
ARG_R   = PNOEAX_OFFSET+12

neuquant_inxsearch:

mov     ebp, esp

mov     eax, 1000               ;bestd = 1000;
mov     esp, -1                 ;best = -1
mov     ebx, dword [ebp+ARG_G]
mov     ecx, dword [net_index+ebx*4]    ;i = netindex[g];
mov     edx, ecx                        ;j = i-1;
sub     edx, 1

;; while ((i<netsize) || (j>=0)) {
.while_one:
cmp     ecx, NET_SIZE
jb      .ecx_below
test    edx, edx
jns     .edx_ae
jmp     .while_one_after

;; if (i<netsize) {. From the above while test, we know already
;; that i (ecx) < NET_SIZE
.ecx_below:
imul    esi, ecx, network_WIDTH*4               ;p = network[i];
mov     edi, dword [esi+1*4]            ;dist = p - g;
sub     edi, dword [ebp+ARG_G]
cmp     edi, eax                                ;if (dist >= bestd)
jnge    .else_one
mov     ecx, NET_SIZE           ;i = netsize
jmp     .else_one_after
.else_one:
test    edi, edi        ;if (dist<0)
jns     @f
neg     edi             ;dist = -dist
@@:
mov     ebx, dword [esi]        ;a = p - b
sub     ebx, dword [ebp+ARG_B]
test    ebx, ebx                        ;if (a<0) a = -a;
jns     @f
neg     ebx
@@:
add     edi, ebx                ;dist += a
cmp     edi, eax                ;if (dist<bestd)
jge     .else_one_after
mov     ebx, dword [esi+2*4]    ;a = p - r
sub     ebx, dword [ebp+ARG_R]
test    ebx, ebx                ;if (a<0) a = -a;
jns     @f
neg     ebx
@@:
add     edi, ebx        ;dist += a;
cmp     edi, eax        ;if (dist<bestd)
jnl     @f
mov     eax, edi        ;bestd=dist
mov     esp, dword [esi+3*4]    ;best=p
@@:
.else_one_after:
.ecx_below_after:

;; if (j>=0)
test    edx, edx
js      .while_one_reloop
.edx_ae:

imul    esi, edx, network_WIDTH*4               ;p = network[j];
mov     edi, dword [ebp+ARG_G]          ;dist = g - p;
sub     edi, dword [esi+1*4]
cmp     edi, eax                ;if (dist >= bestd)
jnge    .else_distnae
mov     edx, -1         ;j = -1
jmp     .else_distnae_after
.else_distnae:
sub     edx, 1          ;j--
test    edi, edi        ;if (dist<0)
jns     @f
neg     edi             ;dist = -dist
@@:
mov     ebx, dword [esi+0]      ;a = p - b;
sub     ebx, dword [ebp+ARG_B]
test    ebx, ebx                ;if (a<0)
jns     @f
neg     ebx                     ;a = -a
@@:
cmp     edi, eax                ;if (dist<bestd)
jnl     .L1
mov     ebx, dword [esi+2*4]    ;a = p - r;
sub     ebx, dword [ebp+ARG_R]
test    ebx, ebx                ;if (a<0)
jns     @f
neg     ebx                     ;a = -a
@@:
add     edi, ebx                ;dist += a
cmp     edi, eax                ;if (dist<bestd)
jnl     .L2
mov     eax, edi                ;bestd=dist;
mov     esp, dword [esi+3*4]    ;best = p
.L2:
.L1:
.else_distnae_after:
.while_one_reloop:
jmp     .while_one
.while_one_after:
.return:
mov     eax, esp
mov     esp, ebp
ret

;; @neuquant_contest(int b, int g, int r):
;;
;; Search for biased BGR values.
;; Returns nothing.
ARG_B   = PNOEAX_OFFSET+4
ARG_G   = PNOEAX_OFFSET+8
ARG_R   = PNOEAX_OFFSET+12

ST_BEST_BD      = -4
ST_BEST_BP      = -8
ST_BESTD        = -12
ST_BEST_POS     = -16
ST_BIAS_DIST    = -20
ST_ALLOC        = 20
neuquant_contest:

mov     ebp, esp
sub     esp, ST_ALLOC

mov     eax, 1
sal     eax, 31
not     eax
mov     dword [ebp+ST_BESTD], eax       ;bestd = ~(((int) 1)<<31);
mov     dword [ebp+ST_BEST_BD], eax     ;bestbiasd = bestd
mov     dword [ebp+ST_BEST_POS], -1             ;bestpos = -1
mov     dword [ebp+ST_BEST_BP], -1              ;bestbiaspos = bestpos
mov     edx, dword bias                         ;p = bias
mov     esi, dword freq                         ;f = freq

;for (i=0; i<netsize; i++)
xor     ecx, ecx
.for_one:
cmp     ecx, NET_SIZE
jnl     .for_one_after

imul    edi, ecx, network_WIDTH*4       ;n = network[i]
mov     eax, dword [edi+0*4]    ;dist = n-b
sub     eax, dword [ebp+ARG_B]
test    eax, eax                ;if (dist<0)
jns     @f
neg     eax                     ;dist = -dist
@@:
mov     ebx, dword [edi+1*4]    ;a = n -g
sub     ebx, dword [ebp+ARG_G]
test    ebx, ebx                ;if (a<0)
jns     @f
neg     ebx                     ;a = -a
@@:
add     eax, ebx                ;dist += a
mov     ebx, dword [edi+2*4]    ;a = n -r
sub     ebx, dword [ebp+ARG_R]
test    ebx, ebx                ;if (a<0)
jns     @f
neg     ebx                     ;a = -a
@@:
add     eax, ebx                ;dist += a
cmp     eax, dword [ebp+ST_BESTD]       ;if (dist <bestd)
jnl     @f
mov     dword [ebp+ST_BESTD], eax       ;bestd = dist
mov     dword [ebp+ST_BEST_POS], ecx    ;bestpos = i
@@:
;; Since dist (eax) is no longer needed after this equation,
;; we use it to calculate the equation.
mov     dword [ebp+ST_BIAS_DIST], eax   ;biasdist = dist
mov     eax, dword [edx]                ;((*p)>>(intbiasshift-netbiasshift))
sar     eax, (INT_BIAS_SHIFT - NET_BIAS_SHIFT)
sub     dword [ebp+ST_BIAS_DIST], eax
mov     eax, dword [ebp+ST_BIAS_DIST]
cmp     eax, dword [ebp+ST_BEST_BD]     ;if (biasdist<bestbiasd)
jnl     @f
mov     dword [ebp+ST_BEST_BD], eax     ;bestbiasd = biasdist
mov     dword [ebp+ST_BEST_BP], ecx     ;bestbiaspos = i
@@:
mov     eax, dword [esi]        ;betafreq = (*f >> betashift);
sar     eax, BETA_SHIFT
sub     dword [esi], eax        ;*f++ -= betafreq;
sal     eax, GAMMA_SHIFT
add     dword [edx], eax        ;*p++ += (betafreq<<gammashift);
.for_one_reloop:
jmp     .for_one
.for_one_after:
mov     eax, dword [ebp+ST_BEST_POS]
add     dword [freq+eax*4], BETA        ;freq[bestpos] += beta;
sub     dword [bias+eax*4], BETA_GAMMA  ;bias[bestpos] -= betagamma;
mov     eax, dword [ebp+ST_BEST_BP]     ;return(bestbiaspos);
.return:
mov     esp, ebp
ret

;; @neuquant_altersingle(int alpha, int i, int b, int g, int r):
;;
;; Move neuron i towards biased (b,g,r) by factor alpha
;; Returns nothing.
neuquant_altersingle:

mov     ebp, esp

imul    ebx, dword [ebp+ARG_I], network_WIDTH*4         ;;n = network[i];

mov     ecx, dword [ebp+ARG_ALPHA]
mov     esi, INIT_ALPHA

mov     eax, dword [ebx]                ;*n -= (alpha*(*n - b)) / initalpha;
sub     eax, dword [ebp+ARG_B]
imul    eax, ecx
xor     edx, edx
idiv    esi
sub     dword [ebx], eax

mov     eax, dword [ebx]                ;*n -= (alpha*(*n - b)) / initalpha;
sub     eax, dword [ebp+ARG_G]
imul    eax, ecx
xor     edx, edx
idiv    esi
sub     dword [ebx], eax

mov     eax, dword [ebx]                ;*n -= (alpha*(*n - b)) / initalpha;
sub     eax, dword [ebp+ARG_R]
imul    eax, ecx
xor     edx, edx
idiv    esi
sub     dword [ebx], eax

.return:
mov     esp, ebp
ret

;; @neuquant_alterneigh(int rad, int i, int b, int g, int r):
;;
;; Returns nothing.

ST_LO           = -4
ST_HI           = -8
ST_Q            = -12
ST_HOLD_EAX     = -16
ST_ALLOC        = 16
neuquant_alterneigh:

mov     ebp, esp
sub     esp, ST_ALLOC

mov     esi, dword [ebp+ARG_I]

;; lo = i-rad;   if (lo<-1) lo=-1;
mov     eax, esi
sub     eax, edi
cmp     eax, -1
jnl     @f
mov     eax, -1
@@:
mov     dword [ebp+ST_LO], eax

;; hi = i+rad;   if (hi>netsize) hi=netsize;
mov     eax, esi
cmp     eax, NET_SIZE
jna     @f
mov     eax, NET_SIZE
@@:
mov     dword [ebp+ST_HI], eax

mov     ecx, esi        ;j = i+1;
mov     ebx, esi        ;k = i-1;
sub     ebx, 1
mov     dword [ebp+ST_Q], eax

;; while ((j<hi) || (k>lo)) {
.while_one:
mov     edx, dword [ebp+ST_Q]
mov     eax, dword [edx]        ;a = (*(++q));

cmp     ecx, dword [ebp+ST_HI]
jl      .if1
cmp     ebx, dword [ebp+ST_LO]
jng     .while_one_after
jmp     .if2_past_check         ;we already know k is within range
.if1:
push    ecx
mov     dword [ebp+ST_HOLD_EAX], eax

imul    esi, ecx, network_WIDTH*4       ;p = network[j];

mov     edi, [esi]                      ;*p -= (a*(*p - b)) / alpharadbias;
sub     edi, dword [ebp+ARG_B]
imul    eax, edi
xor     edx, edx
idiv    ecx
sub     [esi], eax

mov     eax, dword [ebp+ST_HOLD_EAX]
mov     edi, [esi]                      ;*p -= (a*(*p - g)) / alpharadbias;
sub     edi, dword [ebp+ARG_G]
imul    eax, edi
xor     edx, edx
idiv    ecx
sub     [esi], eax

mov     eax, dword [ebp+ST_HOLD_EAX]
mov     edi, [esi]                      ;*p -= (a*(*p - r)) / alpharadbias;
sub     edi, dword [ebp+ARG_R]
imul    eax, edi
xor     edx, edx
idiv    ecx
sub     [esi], eax

pop     ecx
mov     eax, dword [ebp+ST_HOLD_EAX]
.after_if1:
.if2:
cmp     ebx, dword [ebp+ST_LO]
jna     .while_one_reloop
.if2_past_check:

push    ecx
mov     dword [ebp+ST_HOLD_EAX], eax

imul    esi, ebx, network_WIDTH*4       ;p = network[k];

mov     edi, [esi]                      ;*p -= (a*(*p - b)) / alpharadbias;
sub     edi, dword [ebp+ARG_B]
imul    eax, edi
xor     edx, edx
idiv    ecx
sub     [esi], eax

mov     eax, dword [ebp+ST_HOLD_EAX]
mov     edi, [esi]                      ;*p -= (a*(*p - g)) / alpharadbias;
sub     edi, dword [ebp+ARG_G]
imul    eax, edi
xor     edx, edx
idiv    ecx
sub     [esi], eax

mov     eax, dword [ebp+ST_HOLD_EAX]
mov     edi, [esi]                      ;*p -= (a*(*p - r)) / alpharadbias;
sub     edi, dword [ebp+ARG_R]
imul    eax, edi
xor     edx, edx
idiv    ecx
sub     [esi], eax

pop     ecx
mov     eax, dword [ebp+ST_HOLD_EAX]
sub     ebx, 1                          ;k--
.after_if2:
.while_one_reloop:
jmp     .while_one
.while_one_after:
.return:
mov     esp, ebp
ret

;; @neuquant_learn:
;;
;; Main learning loop.
;; Returns nothing.
ST_LIM          = -4
ST_SAMPLE_PIXELS= -8
ST_DELTA        = -12
ST_ALPHA        = -16
ST_STEP         = -24
ST_ALLOC        = 28
neuquant_learn:

mov     ebp, esp
sub     esp, ST_ALLOC

mov     eax, dword [sample_fac]         ;alphadec = 30 + ((samplefac-1)/3);
sub     eax, 1
mov     ebx, 3
xor     edx, edx
idiv    ebx

mov     edi, dword [the_picture]        ;p = thepicture
mov     ebx, dword [length_count]       ;lim = thepicture + lengthcount;
mov     dword [ebp+ST_LIM], ebx

mov     eax, dword [length_count]       ;samplepixels = lengthcount/(3*samplefac);
imul    ebx, dword [sample_fac], 3
xor     edx, edx
idiv    ebx
mov     dword [ebp+ST_SAMPLE_PIXELS], eax

mov     ebx, N_CYCLES                   ;delta = samplepixels/ncycles;
xor     edx, edx
idiv    ebx
mov     dword [ebp+ST_DELTA], eax

mov     dword [ebp+ST_ALPHA], INIT_ALPHA

cmp     ebx, 1          ;if (rad <= 1)
jnle    @f
xor     ebx, ebx
@@:

xor     ecx, ecx
.for1:
cmp     ecx, ebx
jnl     .for1_after

push    edi
imul    eax, ebx
mov     edx, ecx        ;i*i
imul    edx, ecx
xor     edx, edx
mov     esi, dword [ebp+ST_ALPHA]
imul    esi, eax        ;alpha * ...
pop     edi

jmp     .for1
.for1_after:

;; if ((lengthcount%prime1) != 0) step = 3*prime1;
mov     eax, dword [length_count]
mov     ebx, PRIME_1
xor     edx, edx
idiv    ebx
.if1:
test    edx, edx
jz      .if1_else
mov     dword [ebp+ST_STEP], 3*PRIME_1
jmp     .if1_after
.if1_else:
;; if ((lengthcount%prime2) !=0) step = 3*prime2;
mov     eax, dword [length_count]
mov     ebx, PRIME_2
xor     edx, edx
idiv    ebx
.if2:
test    edx, edx
jz      .if2_else
mov     dword [ebp+ST_STEP], 3*PRIME_2
jmp     .if2_after
.if2_else:
;; if ((lengthcount%prime3) !=0) step = 3*prime3;
mov     eax, dword [length_count]
mov     ebx, PRIME_3
xor     edx, edx
idiv    ebx
.if3:
test    edx, edx
jz      .if3_else
mov     dword [ebp+ST_STEP], 3*PRIME_3
jmp     .if3_after
.if3_else:
mov     dword [ebp+ST_STEP], 3*PRIME_4
.if3_after:
.if2_after:
.if1_after:

xor     ecx, ecx        ;i = 0
.while1:
cmp     ecx, dword [ebp+ST_SAMPLE_PIXELS]
jnb     .while1_after

push    ecx

movzx   ecx, byte [edi]         ;b = p << netbiasshift;
movzx   ebx, byte [edi+1]       ;g = p << netbiasshift;
movzx   edx, byte [edi+2]       ;r = p << netbiasshift;
sal     cl, NET_BIAS_SHIFT
sal     bl, NET_BIAS_SHIFT
sal     dl, NET_BIAS_SHIFT

;; j = contest(b,g,r);
ccall   neuquant_contest, ecx, ebx, edx

;; altersingle(alpha,j,b,g,r);
ccall   neuquant_altersingle, dword [ebp+ST_ALPHA], eax, ecx, ebx, edx
jz      @f

ccall   neuquant_alterneigh, dword [ebp+ST_RAD], eax, ecx, ebx, edx

@@:
pop     ecx

;; At this point we're done with eax, ebx, edx, esi
add     edi, dword [ebp+ST_STEP]        ;p += step
cmp     edi, dword [ebp+ST_LIM]         ;if (p >= lim)
jnae    @f
sub     edi, dword [length_count]       ;p -= lengthcount;
@@:

;; if (i%delta == 0) {
mov     eax, ecx
mov     ebx, dword [ebp+ST_DELTA]
xor     edx, edx
idiv    ebx
test    edx, edx
jnz     @f

;; alpha -= alpha / alphadec;
mov     eax, dword [ebp+ST_ALPHA]
mov     ebx, dword [alpha_dec]
xor     edx, edx
idiv    ebx
sub     dword [ebp+ST_ALPHA], eax

xor     edx, edx
idiv    ebx

.if4:
cmp     esi, 1          ;if (rad <= 1)
jnle    .if4_after
xor     esi, esi        ;rad = 0
.if4_after:
xor     ebx, ebx
.for2:
cmp     ebx, esi
jnl     .for2_after

push    edi
mov     edi, esi
mov     edx, ebx        ;j*j
imul    edx, ebx
xor     edx, edx
mov     edi, dword [ebp+ST_ALPHA]
imul    edi, eax        ;alpha * ...
pop     edi

jmp     .for2
.for2_after:
@@:
jmp     .while1
.while1_after:

.return:
mov     esp, ebp
ret

section '.data'

;; Global variables used throughought the code:
alpha_dec       dd ?    ;biased by 10 bits, type INT
the_picture     dd ?    ;the input image itself, type UNSIGNED CHAR *
length_count    dd ?    ;length_count = H*W*3, type INT
sample_fac      dd ?    ;sampling factor 1..30, type INT

network         rd network_WIDTH * network_HEIGHT       ;the network itself, BGRc, type INT array
network_WIDTH   = 4
network_HEIGHT  = NET_SIZE
net_index       rd 256          ;for network lookup - really 256, type INT array
bias            rd NET_SIZE     ;bias and freq arrays for learning
freq            rd NET_SIZE

Last edited by Patrick_ on 02 Nov 2010, 20:03; edited 1 time in total 15 Aug 2008, 19:24  revolution When all else fails, read the source Joined: 24 Aug 2004 Posts: 19101 Location: In your JS exploiting you and your system revolution 15 Aug 2008, 19:45 Whoa, that is a lot of code to go over for one posting. Have you tried following it through with your debugger, then comparing to the the C code as see where it deviates? Just reading the code can sometimes be very difficult to spot subtle bugs. 15 Aug 2008, 19:45
tom tobias

Joined: 09 Sep 2003
Posts: 1320
Location: usa
tom tobias 15 Aug 2008, 22:07
Thank you for this opportunity to preach my message of PROGRAM versus CODE.
revolution wrote:
Whoa, that is a lot of code to go over for one posting.

Question: would it still be so arduous to investigate, if it were properly written as a PROGRAM?
Let's took a look at some of the code, to illustrate what it is about this snippet of code that makes it so difficult:
(A program, to cut the lecture short, is READABLE, like a book. Code, on the other hand, must be deciphered, and therefore cannot be readily and quickly debugged. Ah, boy I like those heights. Umm, very nice being up there so high on my soapbox...)
Patrick McCann, in neuquant.asm, after eleven lines of text, warning thieves about legal consequences of appropriating this buggy software, wrote:
;; @neuquant_altersingle(int alpha, int i, int b, int g, int r):
;;
;; Move neuron i towards biased (b,g,r) by factor alpha
;; Returns nothing.
neuquant_altersingle:

mov ebp, esp

imul ebx, dword [ebp+ARG_I], network_WIDTH*4 ;;n = network[i];

mov ecx, dword [ebp+ARG_ALPHA]
mov esi, INIT_ALPHA

mov eax, dword [ebx] ;*n -= (alpha*(*n - b)) / initalpha;
sub eax, dword [ebp+ARG_B]
imul eax, ecx
xor edx, edx
idiv esi
sub dword [ebx], eax

mov eax, dword [ebx] ;*n -= (alpha*(*n - b)) / initalpha;
sub eax, dword [ebp+ARG_G]
imul eax, ecx
xor edx, edx
idiv esi
sub dword [ebx], eax

mov eax, dword [ebx] ;*n -= (alpha*(*n - b)) / initalpha;

(A program = Algorithm plus data structures:)
Notice three points:
1. I have no idea, looking at this code, what the program is attempting to accomplish. I cannot deduce the algorithm. How can I find a mistake, perhaps very subtle, if I cannot READ what the program is supposed to be accomplishing. Earlier in the code, you have those infernal @@ signs, instead of human readable labels. This is typical of coders, not programmers. You write about, (or perhaps the first author wrote about) Neurons, as if you somehow imagine that these biological structures have something to do with your code. They do not. Use of the word NEURON, conveys absolutely NOTHING to anyone remotely familiar with cpu architecture. "Neural Networks" is a MARKETING PLOY, not a computer programming method. Above all, do NOT imagine that you are somehow explaining to us, or to anyone else, how this obscure code functions, by including the word NEURON, in the "documentation" of this code.
2. Your documentation is not only SPARSE, it is essentially non-existant. Where you do make an effort to "explain" some lines of code, you employ a semicolon on a line with some obscure C code, i.e. code to explain code. Very nice. You have sufficient contempt for the documentation process that you allow your C code comments to spill over onto the next line, so that it is IMPOSSIBLE to comprehend, at a glance. One gets vertigo trying to read the few comments provided. You need to employ WHOLE, entire lines of text, to explain what is going on in each instruction, in lieu of :
";*n -= (alpha*(*n - b)) / initalpha; What, you mean to define a pointer to n by subtracting alpha multiplied by n - b, divided by initalpha??? What's that? Am I supposed to understand that?
Your variable names are counterintuitive. Initalpha??? What does that mean? n? b? alpha?
3. You have two modules, one is .inc, one is .asm. Why? Why two modules, not 20? Why two and not ONE? There is NO OVERVIEW. No one can possibly analyze this code, efficiently, absent a proper explanation of what exactly you are trying to accomplish, and why? That is the purpose of the overview, which will also explain the decision to assemble two separate files. There must be a valid reason to break the code up into two different files? What is it?
Have you tried SIMPLIFIYING this spaghetti code, and adding some print statements, to ensure that you understand the flow of the program, then gradually reintroducing successively the parts deleted?  15 Aug 2008, 22:07  Patrick_

Joined: 11 Mar 2006
Posts: 53
Location: 127.0.0.1
Patrick_ 15 Aug 2008, 22:59
Thanks for being so kind, tom! I'm sure you've also told Tomasz Grysztar this, as well, what with the commentless-code that makes flat assembler. (I'm not complaining, FASM is great )

Also, I usually comment my code; however, since this is completely based off of NEUQUANT.C, I simply comment the code with the respective C code that it's written from. If you follow the assembly code along with the C code, it's easy to tell what's going on, even though I myself don't understand how the NeuQuant algorithm works (it's above my head). But you're not required to know much of how it works, as long as you have the source. I know that sounds funny but in this case it's true.

Now that I think of it, it probably is too much to ask, you folks analyzing all that source code. It's just that from the source I've seen of the MenuetOS kernel and applications, and FASM, and the lack of complaint as to the extremely sparse comments, I figured you guys would be able to handle it. I'll keep trying myself. Thanks again, tom; I think I've made a new friend today.  15 Aug 2008, 22:59  tom tobias

Joined: 09 Sep 2003
Posts: 1320
Location: usa
tom tobias 16 Aug 2008, 02:37 Patrick wrote:
I'm sure you've also told Tomasz Grysztar this, as well,...

As a matter of fact, Yes. Many times, for several years. Generally in the form of private messages, however.
Patrick wrote:
I myself don't understand how the NeuQuant algorithm works (it's above my head).
1. You are skillful, and certainly the so called "neural networks" nonsense is not above your head. It is trash.
If you don't understand it, then, it is a reflection, not of you, or your supposed "limitations", but of the ideology itself, which is phony as a three dollar bill.
2. No one. NO ONE can properly write a program in any language, but especially not assembly language, without having a properly constructed algorithm. Code, yes. People can slop together a bunch of instructions, which appear to "work", i.e. function appropriately, UNTIL that point in time, when someone seeks to CHANGE something, as you are attempting to do, now.
Cheers!  16 Aug 2008, 02:37  bitRAKE Joined: 21 Jul 2003 Posts: 3547 Location: vpcmipstrm bitRAKE 16 Aug 2008, 03:21 I found your conversion to be fairly consistent and your use of syntax not too difficult to grasp. Here is what I believe to be the first error: Code:``` cmp ecx, eax ;if (i != smallpos) jne @f ```...the code following should executed when "i!=smallpos" - you have the reverse - jumping over the code when not equal - "JE" should fix it. Do you use a debugger? Stepping through the code should help. _________________¯\(°_o)/¯ unlicense.org 16 Aug 2008, 03:21
 Madis731 Joined: 25 Sep 2003 Posts: 2139 Location: Estonia Madis731 16 Aug 2008, 20:44 To my personal dislike: Code:```format MS COFF ``` Why can't I just take it to FASM and hit F9 Code:```;what about wrapping it to simple executable: format PE... ``` _________________My updated idol http://www.agner.org/optimize/ 16 Aug 2008, 20:44
 Madis731 Joined: 25 Sep 2003 Posts: 2139 Location: Estonia Madis731 16 Aug 2008, 20:49 EDIT: OOps, double-post _________________My updated idol http://www.agner.org/optimize/Last edited by Madis731 on 17 Aug 2008, 11:52; edited 1 time in total 16 Aug 2008, 20:49
Patrick_

Joined: 11 Mar 2006
Posts: 53
Location: 127.0.0.1
Patrick_ 16 Aug 2008, 21:31
tom: Thanks, that was a bit more bearable. My thing with NeuQuant is that the source was available and I was therefore able to use it in my program without rewriting it. Also, it works fairly well.

bitR: Sheesh, thanks! That certainly did make a difference. It's *almost* fixed, but I should be able to take it from here. And yes, I'm using olly as my debugger.

Madis: I'm compiling it into an object file so I can link it against my C program. 16 Aug 2008, 21:31  Madis731 Joined: 25 Sep 2003 Posts: 2139 Location: Estonia Madis731 17 Aug 2008, 12:00 @Patrick_: So there's no way we can test it as a real application :S 17 Aug 2008, 12:00
 Display posts from previous: All Posts1 Day7 Days2 Weeks1 Month3 Months6 Months1 Year Oldest FirstNewest First  Jump to: Select a forum Official----------------AssemblyPeripheria General----------------MainTutorials and ExamplesDOSWindowsLinuxUnixMenuetOS Specific----------------MacroinstructionsOS ConstructionIDE DevelopmentProjects and IdeasNon-x86 architecturesHigh Level LanguagesProgramming Language DesignCompiler Internals Other----------------FeedbackHeapTest Area

Forum Rules:
 You cannot post new topics in this forumYou cannot reply to topics in this forumYou cannot edit your posts in this forumYou cannot delete your posts in this forumYou cannot vote in polls in this forumYou cannot attach files in this forumYou can download files in this forum