flat assembler
Message board for the users of flat assembler.

Index > Main > I hate to ask, but HELP!

Author
Thread Post new topic Reply to topic
Patrick_



Joined: 11 Mar 2006
Posts: 53
Location: 127.0.0.1
Patrick_ 15 Aug 2008, 19:24
I'm rewriting the NeuQuant image quantizing algorithm in assembly. I use it to process GIF's in one of my programs, but I was hoping to get some better performance out of it, therefore I'm rewriting it in assembly.

Anyway, I've looked over every instruction about 5 times but still can't find the issue. To me the code exactly matches up to the C code, so I'm thinking I'm doing something in assembly the wrong way.

I know, some of the code isn't the best; currently it's not highly optimized, as I'm just trying to get it to work. As you can see it uses the C calling convention because these functions are called from a C program. If anyone can find the problem, I'd *really* appreciate it. Smile Here's the code:

neuquant.inc:

Code:
;; neuquant.inc: Include file for neuquant.asm.

NET_SIZE                = 256   ;number of colors used

;; For 256 colours, fixed arrays need 8kb, plus space for the image

;; four primes near 500 - assume no image has a length so large */
;; that it is divisible by all four primes */

PRIME_1         = 499
PRIME_2         = 491
PRIME_3         = 487
PRIME_4         = 503

MIN_PICTURE_BYTES       = (3*PRIME_4)   ;minimum size for input image

MAX_NET_POS             = (NET_SIZE-1)
NET_BIAS_SHIFT          = 4             ;bias for color values
N_CYCLES                = 100           ;number of learning cycles

;; Definitions for freq and bias
INT_BIAS_SHIFT          = 16    ;bias for fractions
INT_BIAS                = (1 shl INT_BIAS_SHIFT)
GAMMA_SHIFT             = 10    ;gamma = 1024
GAMMA                   = (1 shl GAMMA_SHIFT)
BETA_SHIFT              = 10
BETA                    = (INT_BIAS shr BETA_SHIFT)     ;beta == 1/1024
BETA_GAMMA              = (INT_BIAS shl (GAMMA_SHIFT - BETA_SHIFT))

;; Definitions for decreasing radius factor
INIT_RAD                = (NET_SIZE shr 3)      ;for 256 cols, radius starts
RADIUS_BIAS_SHIFT       = 6                     ;at 32.0 biased by 6 bits
RADIUS_BIAS             = (1 shl RADIUS_BIAS_SHIFT)
INIT_RADIUS             = (INIT_RAD * RADIUS_BIAS)      ;and decreases by a
RADIUS_DEC              = 30                            ;factor of 1/30 each cycle

;; Definitions for decreasing alpha factor
ALPHA_BIAS_SHIFT        = 10    ;alpha strats at 1.0
INIT_ALPHA              = (1 shl ALPHA_BIAS_SHIFT)

;; RAD_BIAS and ALPHA_RAD_BIAS used for radpower calculation
RAD_BIAS_SHIFT          = 8
RAD_BIAS                = (1 shl RAD_BIAS_SHIFT)
ALPHA_RAD_B_SHIFT       = (ALPHA_BIAS_SHIFT + RAD_BIAS_SHIFT)
ALPHA_RAD_BIAS          = (1 shl ALPHA_RAD_B_SHIFT)

BEST_D_CONTEST          = not (1 shr 31)

macro ccall proc,[arg]                  ; directly call CDECL procedure
 { common
    size@ccall = 0
    if ~ arg eq
   reverse
    pushd arg
    size@ccall = size@ccall+4
   common
    end if
    call proc
    if size@ccall
    add esp,size@ccall
    end if }
    
;; pushad, but no eax
macro pushad_noeax {
        push    ecx edx ebx
        sub     esp, 4
        mov     [esp], esp
        push    ebp esi edi
}

;; popad, but no eax
macro popad_noeax {
        pop     edi esi ebp
        add     esp, 4
        pop     ebx edx ecx
}

PNOEAX_OFFSET   = 28
PUSHAD_OFFSET   = 32    


neuquant.asm:

Code:
;; NeuQuant Neural-Net Quantization Algorithm
;; x86 assembly rewrite based on NEUQUANT.C:
;; ------------------------------------------
;;
;; Copyright (c) 1994 Anthony Dekker
;;
;; NEUQUANT Neural-Net quantization algorithm by Anthony Dekker, 1994.
;; See "Kohonen neural networks for optimal colour quantization"
;; in "Network: Computation in Neural Systems" Vol. 5 (1994) pp 351-367.
;; for a discussion of the algorithm.
;; See also  http://members.ozemail.com.au/~dekker/NEUQUANT.HTML
;;
;; Any party obtaining a copy of these files from the author, directly or
;; indirectly, is granted, free of charge, a full and unrestricted irrevocable,
;; world-wide, paid up, royalty-free, nonexclusive right and license to deal
;; in this software and documentation files (the "Software"), including without
;; limitation the rights to use, copy, modify, merge, publish, distribute, sublicense,
;; and/or sell copies of the Software, and to permit persons who receive
;; copies from any such party to do so, with the only requirement being
;; that this copyright notice remain intact.
;;
;; ------------------------------------------
;; x86 assembly rewrite:
;;
;; Copyright (c) 2008
;;
;; Permission to use, copy, modify, and/or distribute this software for any
;; purpose with or without fee is hereby granted, provided that the above
;; copyright notice and this permission notice appear in all copies.
;;
;; THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
;; WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
;; MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
;; ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
;; WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
;; ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
;; OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
        format MS COFF

        public neuquant_initnet as '_neuquant_initnet'
        public neuquant_unbiasnet as '_neuquant_unbiasnet'
        public neuquant_writecolourmap as '_neuquant_writecolourmap'
        public neuquant_inxbuild as '_neuquant_inxbuild'
        public neuquant_inxsearch as '_neuquant_inxsearch'
        public neuquant_learn as '_neuquant_learn'

include 'neuquant.inc'

;; @neuquant_initnet(unsigned char *the_pic, int len, int sample):
;;
;; Initialise network in range (0,0,0) to (255,255,255) and set parameters
;; Returns nothing.
        ARG_THE_PIC     = PUSHAD_OFFSET+4
        ARG_LEN         = PUSHAD_OFFSET+8
        ARG_SAMPLE      = PUSHAD_OFFSET+12

neuquant_initnet:

        pushad
        mov     ebp, esp

        mov     eax, dword [ebp+ARG_THE_PIC]    ;thepicture = thepic;
        mov     dword [the_picture], eax
        mov     eax, dword [ebp+ARG_LEN]        ;lengthcount = len;
        mov     dword [length_count], eax
        mov     eax, dword [ebp+ARG_SAMPLE]     ;samplefac = sample;
        mov     dword [sample_fac], eax

        ;; for (i=0; i<netsize; i++) {
        xor     ecx, ecx
 .for_begin:
        cmp     ecx, NET_SIZE
        jnl     .for_end

        imul    edi, ecx, network_WIDTH*4               ;p = network[i];
        add     edi, dword network

        ;; p[0] = p[1] = p[2] = (i << (netbiasshift+Cool)/netsize;
        mov     eax, ecx
        shl     eax, NET_BIAS_SHIFT+8
        mov     ebx, NET_SIZE
        xor     edx, edx
        idiv    ebx
        mov     dword [edi], eax
        mov     dword [edi+1*4], eax
        mov     dword [edi+2*4], eax

        mov     dword [freq+ecx*4], INT_BIAS/NET_SIZE   ;freq[i] = intbias/netsize;
        mov     dword [bias+ecx*4], 0                   ;bias[i] = 0;

        add     ecx, 1
        jmp     .for_begin
 .for_end:
 .return:
        popad
        ret

;; @neuquant_unbiasnet(void):
;;
;; Unbias network to give byte values 0..255 and record position i to prepare for sort
;; Returns nothing.
neuquant_unbiasnet:

        pushad
        mov     ebp, esp

        ;; for (i=0; i<netsize; i++) {
        xor     ecx, ecx
 .for_one_begin:
        cmp     ecx, NET_SIZE
        jnl     .for_one_after
 
        imul    ebx, ecx, network_WIDTH*4
        add     ebx, dword network

        ;; for (j=0; j<3; j++) {
        xor     edx, edx
  .for_two_begin:
        cmp     edx, 3
        jnl     .for_two_after

        ;; temp = (network[i][j] + (1 << (netbiasshift - 1))) >> netbiasshift;
        mov     eax, dword [ebx+edx*4]
        add     eax, (1 shl (NET_BIAS_SHIFT - 1))
        sar     eax, NET_BIAS_SHIFT

        ;; if (temp > 255) temp = 255;
        cmp     eax, 255
        jng     @f
        mov     eax, 255
 @@:
        ;; network[i][j] = temp;
        mov     dword [ebx+edx*4], eax

        add     edx, 1
        jmp     .for_two_begin
 .for_two_after:
        ;; network[i][3] = i;
        mov     dword [ebx+3*4], ecx

        add     ecx, 1
        jmp     .for_one_begin

 .for_one_after:
 .return:
        popad
        ret

;; @neuquant_writecolourmap(GifColorType *f):
;;
;; Output colormap to GifColorType array.
;; Returns nothing.
        ARG_COLORMAP    = PUSHAD_OFFSET+4
neuquant_writecolourmap:

        pushad
        mov     ebp, esp

        ;; for (j=0; j<netsize; j++) {
        xor     ecx, ecx
        mov     ebx, dword [ebp+ARG_COLORMAP]
 .for_one:
        cmp     ecx, NET_SIZE
        jnl     .for_one_after
 
        imul    eax, ecx, network_WIDTH*4
        add     eax, dword network

        mov     edx, dword [eax+2*4]            ;f[j].Red = network[j][2];
        mov     byte [ebx+0], dl
        mov     edx, dword [eax+1*4]            ;f[j].Green = network[j][1];
        mov     byte [ebx+1], dl
        mov     edx, dword [eax+0*4]            ;f[j].Blue = network[j][0];
        mov     byte [ebx+2], dl

        add     ebx, 3

        add     ecx, 1
        jmp     .for_one

 .for_one_after:

 .return:
        popad
        ret

;; neuquant_inxbuild(void):
;;
;; Insertion sort of network and building of netindex[0..255] (to do after unbias)
;; Returns nothing.
        ST_PREV_COL     = -4
        ST_START_POS    = -8
        ST_ALLOC        = 8

neuquant_inxbuild:

        pushad
        mov     ebp, esp
        sub     esp, ST_ALLOC

        mov     dword [ebp+ST_PREV_COL], 0
        mov     dword [ebp+ST_START_POS], 0

        ;; for (i=0; i<netsize; i++) {
        xor     ecx, ecx
 .for_one:
        cmp     ecx, NET_SIZE
        jnl     .for_one_after
 
        imul    esi, ecx, network_WIDTH*4       ;p = network[i];
        add     esi, dword network

        mov     eax, ecx                        ;smallpos = i
        mov     ebx, dword [esi+1*4]            ;smallval = p[1]

        ;; for (j=i+1; j<netsize; j++)
        mov     edx, ecx
        add     edx, 1
  .for_two:
        cmp     edx, NET_SIZE
        jnl     .for_two_after
  
        imul    edi, edx, network_WIDTH*4       ;q = network[j]
        add     edi, dword network

        cmp     dword [edi+1*4], ebx            ;if (q[1] < smallval)
        jnl     @f
        mov     eax, edx                        ;smallpos = j
        mov     ebx, dword [edi+1*4]            ;smallval = q[1]
  @@:
        add     edx, 1
        jmp     .for_two
  .for_two_after:
  
        imul    edi, eax, network_WIDTH*4       ;q = network[smallpos]
        add     edi, dword network

        cmp     ecx, eax                        ;if (i != smallpos)
        jne     @f

        push    ecx

        ;; j = q[0];   q[0] = p[0];   p[0] = j;
        mov     edx, dword [edi+0*4]
        mov     ecx, dword [esi+0*4]
        mov     dword [edi+0*4], ecx
        mov     dword [esi+0*4], edx
        ;; j = q[1];   q[1] = p[1];   p[1] = j;
        mov     edx, dword [edi+1*4]
        mov     ecx, dword [esi+1*4]
        mov     dword [edi+1*4], ecx
        mov     dword [esi+1*4], edx
        ;; j = q[2];   q[2] = p[2];   p[2] = j;
        mov     edx, dword [edi+2*4]
        mov     ecx, dword [esi+2*4]
        mov     dword [edi+2*4], ecx
        mov     dword [esi+2*4], edx
        ;; j = q[3];   q[3] = p[3];   p[3] = j;
        mov     edx, dword [edi+3*4]
        mov     ecx, dword [esi+3*4]
        mov     dword [edi+3*4], ecx
        mov     dword [esi+3*4], edx

        pop     ecx
 @@:
        ;; We no longer need esi or edi; they're reset at the top.
        mov     esi, dword [ebp+ST_PREV_COL]
        cmp     ebx, esi        ;if (smallval != previouscol)
        jne     .for_one_reloop

        ;; netindex[previouscol] = (startpos+i)>>1;
        mov     edi, dword [ebp+ST_START_POS]
        add     edi, ecx
        sar     edi, 1
        mov     dword [net_index+esi*4], edi

        ;;for (j=previouscol+1; j<smallval; j++)
        mov     edx, esi
        add     edx, 1
 .for_three:
        cmp     edx, ebx
        jnl     .for_three_after
        mov     dword [net_index+edx*4], ecx    ;netindex[j] = i
        add     edx, 1
        jmp     .for_three
 .for_three_after:
        mov     dword [ebp+ST_PREV_COL], ebx    ;previouscol = smallval;
        mov     dword [ebp+ST_START_POS], ecx   ;startpos = i

 .for_one_reloop:
        add     ecx, 1
        jmp     .for_one

 .for_one_after:
        ;; netindex[previouscol] = (startpos+maxnetpos)>>1;
        mov     eax, dword [ebp+ST_START_POS]
        add     eax, MAX_NET_POS
        sar     eax, 1
        mov     ebx, dword [ebp+ST_PREV_COL]
        mov     dword [net_index+ebx*4], eax
 
        ;; for (j=previouscol+1; j<256; j++)
 .for_last:
        cmp     ebx, 256
        jnl     .for_last_after
        mov     dword [net_index+ebx*4], MAX_NET_POS    ;netindex[j] = maxnetpos
        add     ebx, 1
        jmp     .for_last
 .for_last_after:

 .return:
        add     esp, ST_ALLOC
        popad
        ret
 
;; neuquant_inxsearch(int b, int g, int r):
;;
;; Search for BGR values 0..255 (after net is unbiased) and return colour index
;; We discard esp for non-stack usage, so no pushing/popping after esp is changed.
;; This function returns the location in the colormap.
        ARG_B   = PNOEAX_OFFSET+4
        ARG_G   = PNOEAX_OFFSET+8
        ARG_R   = PNOEAX_OFFSET+12

neuquant_inxsearch:

        pushad_noeax
        mov     ebp, esp

        mov     eax, 1000               ;bestd = 1000;
        mov     esp, -1                 ;best = -1
        mov     ebx, dword [ebp+ARG_G]
        mov     ecx, dword [net_index+ebx*4]    ;i = netindex[g];
        mov     edx, ecx                        ;j = i-1;
        sub     edx, 1

        ;; while ((i<netsize) || (j>=0)) {
 .while_one:
        cmp     ecx, NET_SIZE
        jb      .ecx_below
        test    edx, edx
        jns     .edx_ae
        jmp     .while_one_after

        ;; if (i<netsize) {. From the above while test, we know already
        ;; that i (ecx) < NET_SIZE
 .ecx_below:
        imul    esi, ecx, network_WIDTH*4               ;p = network[i];
        add     esi, dword network
        mov     edi, dword [esi+1*4]            ;dist = p[1] - g;
        sub     edi, dword [ebp+ARG_G]
        cmp     edi, eax                                ;if (dist >= bestd)
        jnge    .else_one
        mov     ecx, NET_SIZE           ;i = netsize
        jmp     .else_one_after
 .else_one:
        add     ecx, 1          ;i++
        test    edi, edi        ;if (dist<0)
        jns     @f
        neg     edi             ;dist = -dist
 @@:
        mov     ebx, dword [esi]        ;a = p[0] - b
        sub     ebx, dword [ebp+ARG_B]
        test    ebx, ebx                        ;if (a<0) a = -a;
        jns     @f
        neg     ebx
 @@:
        add     edi, ebx                ;dist += a
        cmp     edi, eax                ;if (dist<bestd)
        jge     .else_one_after
        mov     ebx, dword [esi+2*4]    ;a = p[2] - r
        sub     ebx, dword [ebp+ARG_R]
        test    ebx, ebx                ;if (a<0) a = -a;
        jns     @f
        neg     ebx
 @@:
        add     edi, ebx        ;dist += a;
        cmp     edi, eax        ;if (dist<bestd)
        jnl     @f
        mov     eax, edi        ;bestd=dist
        mov     esp, dword [esi+3*4]    ;best=p[3]
 @@:
 .else_one_after:
 .ecx_below_after:
 
        ;; if (j>=0)
        test    edx, edx
        js      .while_one_reloop
 .edx_ae:

        imul    esi, edx, network_WIDTH*4               ;p = network[j];
        add     esi, dword network
        mov     edi, dword [ebp+ARG_G]          ;dist = g - p[1];
        sub     edi, dword [esi+1*4]
        cmp     edi, eax                ;if (dist >= bestd)
        jnge    .else_distnae
        mov     edx, -1         ;j = -1
        jmp     .else_distnae_after
 .else_distnae:
        sub     edx, 1          ;j--
        test    edi, edi        ;if (dist<0)
        jns     @f
        neg     edi             ;dist = -dist
 @@:
        mov     ebx, dword [esi+0]      ;a = p[0] - b;
        sub     ebx, dword [ebp+ARG_B]
        test    ebx, ebx                ;if (a<0)
        jns     @f
        neg     ebx                     ;a = -a
 @@:
        add     edi, ebx                ;dist +=a
        cmp     edi, eax                ;if (dist<bestd)
        jnl     .L1
        mov     ebx, dword [esi+2*4]    ;a = p[2] - r;
        sub     ebx, dword [ebp+ARG_R]
        test    ebx, ebx                ;if (a<0)
        jns     @f
        neg     ebx                     ;a = -a
 @@:
        add     edi, ebx                ;dist += a
        cmp     edi, eax                ;if (dist<bestd)
        jnl     .L2
        mov     eax, edi                ;bestd=dist;
        mov     esp, dword [esi+3*4]    ;best = p[3]
 .L2:
 .L1:
 .else_distnae_after:
 .while_one_reloop:
        jmp     .while_one
 .while_one_after:
 .return:
        mov     eax, esp
        mov     esp, ebp
        popad_noeax
        ret
 
;; @neuquant_contest(int b, int g, int r):
;;
;; Search for biased BGR values.
;; Returns nothing.
        ARG_B   = PNOEAX_OFFSET+4
        ARG_G   = PNOEAX_OFFSET+8
        ARG_R   = PNOEAX_OFFSET+12

        ST_BEST_BD      = -4
        ST_BEST_BP      = -8
        ST_BESTD        = -12
        ST_BEST_POS     = -16
        ST_BIAS_DIST    = -20
        ST_ALLOC        = 20
neuquant_contest:

        pushad_noeax
        mov     ebp, esp
        sub     esp, ST_ALLOC

        mov     eax, 1
        sal     eax, 31
        not     eax
        mov     dword [ebp+ST_BESTD], eax       ;bestd = ~(((int) 1)<<31);
        mov     dword [ebp+ST_BEST_BD], eax     ;bestbiasd = bestd
        mov     dword [ebp+ST_BEST_POS], -1             ;bestpos = -1
        mov     dword [ebp+ST_BEST_BP], -1              ;bestbiaspos = bestpos
        mov     edx, dword bias                         ;p = bias
        mov     esi, dword freq                         ;f = freq

        ;for (i=0; i<netsize; i++)
        xor     ecx, ecx
 .for_one:
        cmp     ecx, NET_SIZE
        jnl     .for_one_after

        imul    edi, ecx, network_WIDTH*4       ;n = network[i]
        add     edi, dword network
        mov     eax, dword [edi+0*4]    ;dist = n[0]-b
        sub     eax, dword [ebp+ARG_B]
        test    eax, eax                ;if (dist<0)
        jns     @f
        neg     eax                     ;dist = -dist
 @@:
        mov     ebx, dword [edi+1*4]    ;a = n[1] -g
        sub     ebx, dword [ebp+ARG_G]
        test    ebx, ebx                ;if (a<0)
        jns     @f
        neg     ebx                     ;a = -a
 @@:
        add     eax, ebx                ;dist += a
        mov     ebx, dword [edi+2*4]    ;a = n[2] -r
        sub     ebx, dword [ebp+ARG_R]
        test    ebx, ebx                ;if (a<0)
        jns     @f
        neg     ebx                     ;a = -a
 @@:
        add     eax, ebx                ;dist += a
        cmp     eax, dword [ebp+ST_BESTD]       ;if (dist <bestd)
        jnl     @f
        mov     dword [ebp+ST_BESTD], eax       ;bestd = dist
        mov     dword [ebp+ST_BEST_POS], ecx    ;bestpos = i
 @@:
        ;; Since dist (eax) is no longer needed after this equation,
        ;; we use it to calculate the equation.
        mov     dword [ebp+ST_BIAS_DIST], eax   ;biasdist = dist
        mov     eax, dword [edx]                ;((*p)>>(intbiasshift-netbiasshift))
        sar     eax, (INT_BIAS_SHIFT - NET_BIAS_SHIFT)
        sub     dword [ebp+ST_BIAS_DIST], eax
        mov     eax, dword [ebp+ST_BIAS_DIST]
        cmp     eax, dword [ebp+ST_BEST_BD]     ;if (biasdist<bestbiasd)
        jnl     @f
        mov     dword [ebp+ST_BEST_BD], eax     ;bestbiasd = biasdist
        mov     dword [ebp+ST_BEST_BP], ecx     ;bestbiaspos = i
 @@:
        mov     eax, dword [esi]        ;betafreq = (*f >> betashift);
        sar     eax, BETA_SHIFT
        sub     dword [esi], eax        ;*f++ -= betafreq;
        add     esi, 4
        sal     eax, GAMMA_SHIFT
        add     dword [edx], eax        ;*p++ += (betafreq<<gammashift);
        add     edx, 4
 .for_one_reloop:
        add     ecx, 1
        jmp     .for_one
 .for_one_after:
        mov     eax, dword [ebp+ST_BEST_POS]
        add     dword [freq+eax*4], BETA        ;freq[bestpos] += beta;
        sub     dword [bias+eax*4], BETA_GAMMA  ;bias[bestpos] -= betagamma;
        mov     eax, dword [ebp+ST_BEST_BP]     ;return(bestbiaspos);
 .return:
        mov     esp, ebp
        popad_noeax
        ret

;; @neuquant_altersingle(int alpha, int i, int b, int g, int r):
;;
;; Move neuron i towards biased (b,g,r) by factor alpha
;; Returns nothing.
        ARG_ALPHA       = PUSHAD_OFFSET+4
        ARG_I           = PUSHAD_OFFSET+8
        ARG_B           = PUSHAD_OFFSET+12
        ARG_G           = PUSHAD_OFFSET+16
        ARG_R           = PUSHAD_OFFSET+20
neuquant_altersingle:

        pushad
        mov     ebp, esp

        imul    ebx, dword [ebp+ARG_I], network_WIDTH*4         ;;n = network[i];
        add     ebx, dword network

        mov     ecx, dword [ebp+ARG_ALPHA]
        mov     esi, INIT_ALPHA

        mov     eax, dword [ebx]                ;*n -= (alpha*(*n - b)) / initalpha;
        sub     eax, dword [ebp+ARG_B]
        imul    eax, ecx
        xor     edx, edx
        idiv    esi
        sub     dword [ebx], eax
        add     ebx, 4                          ;n++;

        mov     eax, dword [ebx]                ;*n -= (alpha*(*n - b)) / initalpha;
        sub     eax, dword [ebp+ARG_G]
        imul    eax, ecx
        xor     edx, edx
        idiv    esi
        sub     dword [ebx], eax
        add     ebx, 4                          ;n++;

        mov     eax, dword [ebx]                ;*n -= (alpha*(*n - b)) / initalpha;
        sub     eax, dword [ebp+ARG_R]
        imul    eax, ecx
        xor     edx, edx
        idiv    esi
        sub     dword [ebx], eax

 .return:
        mov     esp, ebp
        popad
        ret

;; @neuquant_alterneigh(int rad, int i, int b, int g, int r):
;;
;; Move adjacent neurons by precomputed alpha*(1-((i-j)^2/[r]^2)) in radpower[|i-j|]
;; Returns nothing.
        ARG_RAD         = PUSHAD_OFFSET+4
        ARG_I           = PUSHAD_OFFSET+8
        ARG_B           = PUSHAD_OFFSET+12
        ARG_G           = PUSHAD_OFFSET+16
        ARG_R           = PUSHAD_OFFSET+20

        ST_LO           = -4
        ST_HI           = -8
        ST_Q            = -12
        ST_HOLD_EAX     = -16
        ST_ALLOC        = 16
neuquant_alterneigh:

        pushad
        mov     ebp, esp
        sub     esp, ST_ALLOC

        mov     esi, dword [ebp+ARG_I]
        mov     edi, dword [ebp+ARG_RAD]

        ;; lo = i-rad;   if (lo<-1) lo=-1;
        mov     eax, esi
        sub     eax, edi
        cmp     eax, -1
        jnl     @f
        mov     eax, -1
 @@:
        mov     dword [ebp+ST_LO], eax

        ;; hi = i+rad;   if (hi>netsize) hi=netsize;
        mov     eax, esi
        add     eax, edi
        cmp     eax, NET_SIZE
        jna     @f
        mov     eax, NET_SIZE
 @@:
        mov     dword [ebp+ST_HI], eax

        mov     ecx, esi        ;j = i+1;
        add     ecx, 1
        mov     ebx, esi        ;k = i-1;
        sub     ebx, 1
        mov     eax, rad_power
        mov     dword [ebp+ST_Q], eax

        ;; while ((j<hi) || (k>lo)) {
 .while_one:
        add     dword [ebp+ST_Q], 4
        mov     edx, dword [ebp+ST_Q]
        mov     eax, dword [edx]        ;a = (*(++q));

        cmp     ecx, dword [ebp+ST_HI]
        jl      .if1
        cmp     ebx, dword [ebp+ST_LO]
        jng     .while_one_after
        jmp     .if2_past_check         ;we already know k is within range
 .if1:
        push    ecx
        mov     dword [ebp+ST_HOLD_EAX], eax

        imul    esi, ecx, network_WIDTH*4       ;p = network[j];
        add     esi, dword network
        mov     ecx, ALPHA_RAD_BIAS

        mov     edi, [esi]                      ;*p -= (a*(*p - b)) / alpharadbias;
        sub     edi, dword [ebp+ARG_B]
        imul    eax, edi
        xor     edx, edx
        idiv    ecx
        sub     [esi], eax
        add     esi, 4                          ;p++

        mov     eax, dword [ebp+ST_HOLD_EAX]
        mov     edi, [esi]                      ;*p -= (a*(*p - g)) / alpharadbias;
        sub     edi, dword [ebp+ARG_G]
        imul    eax, edi
        xor     edx, edx
        idiv    ecx
        sub     [esi], eax
        add     esi, 4                          ;p++

        mov     eax, dword [ebp+ST_HOLD_EAX]
        mov     edi, [esi]                      ;*p -= (a*(*p - r)) / alpharadbias;
        sub     edi, dword [ebp+ARG_R]
        imul    eax, edi
        xor     edx, edx
        idiv    ecx
        sub     [esi], eax

        pop     ecx
        mov     eax, dword [ebp+ST_HOLD_EAX]
        add     ecx, 1                          ;j++
 .after_if1:
 .if2:
        cmp     ebx, dword [ebp+ST_LO]
        jna     .while_one_reloop
 .if2_past_check:

        push    ecx
        mov     dword [ebp+ST_HOLD_EAX], eax

        imul    esi, ebx, network_WIDTH*4       ;p = network[k];
        add     esi, dword network
        mov     ecx, ALPHA_RAD_BIAS

        mov     edi, [esi]                      ;*p -= (a*(*p - b)) / alpharadbias;
        sub     edi, dword [ebp+ARG_B]
        imul    eax, edi
        xor     edx, edx
        idiv    ecx
        sub     [esi], eax
        add     esi, 4                          ;p++

        mov     eax, dword [ebp+ST_HOLD_EAX]
        mov     edi, [esi]                      ;*p -= (a*(*p - g)) / alpharadbias;
        sub     edi, dword [ebp+ARG_G]
        imul    eax, edi
        xor     edx, edx
        idiv    ecx
        sub     [esi], eax
        add     esi, 4                          ;p++

        mov     eax, dword [ebp+ST_HOLD_EAX]
        mov     edi, [esi]                      ;*p -= (a*(*p - r)) / alpharadbias;
        sub     edi, dword [ebp+ARG_R]
        imul    eax, edi
        xor     edx, edx
        idiv    ecx
        sub     [esi], eax

        pop     ecx
        mov     eax, dword [ebp+ST_HOLD_EAX]
        sub     ebx, 1                          ;k--
 .after_if2:
 .while_one_reloop:
        jmp     .while_one
 .while_one_after:
 .return:
        mov     esp, ebp
        popad
        ret
 
;; @neuquant_learn:
;;
;; Main learning loop.
;; Returns nothing.
        ST_LIM          = -4
        ST_SAMPLE_PIXELS= -8
        ST_DELTA        = -12
        ST_ALPHA        = -16
        ST_RADIUS       = -20
        ST_STEP         = -24
        ST_RAD          = -28
        ST_ALLOC        = 28
neuquant_learn:

        pushad
        mov     ebp, esp
        sub     esp, ST_ALLOC

        mov     eax, dword [sample_fac]         ;alphadec = 30 + ((samplefac-1)/3);
        sub     eax, 1
        mov     ebx, 3
        xor     edx, edx
        idiv    ebx
        add     eax, 30
        add     dword [alpha_dec], eax

        mov     edi, dword [the_picture]        ;p = thepicture
        mov     ebx, dword [length_count]       ;lim = thepicture + lengthcount;
        add     ebx, edi
        mov     dword [ebp+ST_LIM], ebx

        mov     eax, dword [length_count]       ;samplepixels = lengthcount/(3*samplefac);
        imul    ebx, dword [sample_fac], 3
        xor     edx, edx
        idiv    ebx
        mov     dword [ebp+ST_SAMPLE_PIXELS], eax

        mov     ebx, N_CYCLES                   ;delta = samplepixels/ncycles;
        xor     edx, edx
        idiv    ebx
        mov     dword [ebp+ST_DELTA], eax

        mov     dword [ebp+ST_ALPHA], INIT_ALPHA
        mov     dword [ebp+ST_RADIUS], INIT_RADIUS

        mov     ebx, INIT_RADIUS                ;rad = radius >> radiusbiasshift;
        sar     ebx, RADIUS_BIAS_SHIFT
        cmp     ebx, 1          ;if (rad <= 1)
        jnle    @f
        xor     ebx, ebx
 @@:
        mov     dword [ebp+ST_RAD], ebx
 
        ;for (i=0; i<rad; i++)
        xor     ecx, ecx
 .for1:
        cmp     ecx, ebx
        jnl     .for1_after

        ;; radpower[i] =
        ;;    alpha*(((rad*rad - i*i)*radbias)/(rad*rad));
        push    edi
        mov     eax, ebx        ;rad*rad
        imul    eax, ebx
        mov     edx, ecx        ;i*i
        imul    edx, ecx
        mov     edi, eax        ;save rad*rad
        sub     eax, edx        ;rad*rad - i*i
        imul    eax, RAD_BIAS
        xor     edx, edx
        idiv    edi             ; / rad*rad
        mov     esi, dword [ebp+ST_ALPHA]
        imul    esi, eax        ;alpha * ...
        mov     dword [rad_power+ecx*4], esi
        pop     edi

        add     ecx, 1
        jmp     .for1
 .for1_after:

        ;; if ((lengthcount%prime1) != 0) step = 3*prime1;
        mov     eax, dword [length_count]
        mov     ebx, PRIME_1
        xor     edx, edx
        idiv    ebx
 .if1:
        test    edx, edx
        jz      .if1_else
        mov     dword [ebp+ST_STEP], 3*PRIME_1
        jmp     .if1_after
 .if1_else:
        ;; if ((lengthcount%prime2) !=0) step = 3*prime2;
        mov     eax, dword [length_count]
        mov     ebx, PRIME_2
        xor     edx, edx
        idiv    ebx
  .if2:
        test    edx, edx
        jz      .if2_else
        mov     dword [ebp+ST_STEP], 3*PRIME_2
        jmp     .if2_after
  .if2_else:
        ;; if ((lengthcount%prime3) !=0) step = 3*prime3;
        mov     eax, dword [length_count]
        mov     ebx, PRIME_3
        xor     edx, edx
        idiv    ebx
    .if3:
        test    edx, edx
        jz      .if3_else
        mov     dword [ebp+ST_STEP], 3*PRIME_3
        jmp     .if3_after
    .if3_else:
        mov     dword [ebp+ST_STEP], 3*PRIME_4
    .if3_after:
  .if2_after:
 .if1_after:
 
        xor     ecx, ecx        ;i = 0
 .while1:
        cmp     ecx, dword [ebp+ST_SAMPLE_PIXELS]
        jnb     .while1_after

        push    ecx

        movzx   ecx, byte [edi]         ;b = p[0] << netbiasshift;
        movzx   ebx, byte [edi+1]       ;g = p[1] << netbiasshift;
        movzx   edx, byte [edi+2]       ;r = p[2] << netbiasshift;
        sal     cl, NET_BIAS_SHIFT
        sal     bl, NET_BIAS_SHIFT
        sal     dl, NET_BIAS_SHIFT

        ;; j = contest(b,g,r);
        ccall   neuquant_contest, ecx, ebx, edx

        ;; altersingle(alpha,j,b,g,r);
        ccall   neuquant_altersingle, dword [ebp+ST_ALPHA], eax, ecx, ebx, edx
        cmp     dword [ebp+ST_RAD], 0   ;if (rad)
        jz      @f

        ;; alterneigh(rad,j,b,g,r);
        ccall   neuquant_alterneigh, dword [ebp+ST_RAD], eax, ecx, ebx, edx

 @@:
        pop     ecx

        ;; At this point we're done with eax, ebx, edx, esi
        add     edi, dword [ebp+ST_STEP]        ;p += step
        cmp     edi, dword [ebp+ST_LIM]         ;if (p >= lim)
        jnae    @f
        sub     edi, dword [length_count]       ;p -= lengthcount;
 @@:
        add     ecx, 1                          ;i++

        ;; if (i%delta == 0) {
        mov     eax, ecx
        mov     ebx, dword [ebp+ST_DELTA]
        xor     edx, edx
        idiv    ebx
        test    edx, edx
        jnz     @f

        ;; alpha -= alpha / alphadec;
        mov     eax, dword [ebp+ST_ALPHA]
        mov     ebx, dword [alpha_dec]
        xor     edx, edx
        idiv    ebx
        sub     dword [ebp+ST_ALPHA], eax
 
        ;; radius -= radius / radiusdec;
        mov     eax, dword [ebp+ST_RADIUS]
        mov     ebx, RADIUS_DEC
        xor     edx, edx
        idiv    ebx
        sub     dword [ebp+ST_RADIUS], eax
 
        ;; rad = radius >> radiusbiasshift;
        mov     esi, dword [ebp+ST_RADIUS]
        sar     esi, RADIUS_BIAS_SHIFT
        mov     dword [ebp+ST_RAD], esi
 .if4:
        cmp     esi, 1          ;if (rad <= 1)
        jnle    .if4_after
        xor     esi, esi        ;rad = 0
        mov     dword [ebp+ST_RAD], 0
 .if4_after:
        ;; for (j=0; j<rad; j++)
        xor     ebx, ebx
 .for2:
        cmp     ebx, esi
        jnl     .for2_after

        ;; radpower[j] =
        ;;    alpha*(((rad*rad - j*j)*radbias)/(rad*rad));
        push    edi
        mov     edi, esi
        imul    edi, esi        ;rad*rad
        mov     eax, edi        ;operate on rad*rad with eax
        mov     edx, ebx        ;j*j
        imul    edx, ebx
        sub     eax, edx        ;rad*rad - j*j
        imul    eax, RAD_BIAS
        xor     edx, edx
        idiv    edi             ; / rad*rad
        mov     edi, dword [ebp+ST_ALPHA]
        imul    edi, eax        ;alpha * ...
        mov     dword [rad_power+ebx*4], edi
        pop     edi

        add     ebx, 1
        jmp     .for2
 .for2_after:
 @@:
        jmp     .while1
 .while1_after:

 .return:
        mov     esp, ebp
        popad
        ret

        section '.data'

;; Global variables used throughought the code:
alpha_dec       dd ?    ;biased by 10 bits, type INT
the_picture     dd ?    ;the input image itself, type UNSIGNED CHAR *
length_count    dd ?    ;length_count = H*W*3, type INT
sample_fac      dd ?    ;sampling factor 1..30, type INT

network         rd network_WIDTH * network_HEIGHT       ;the network itself, BGRc, type INT array
network_WIDTH   = 4
network_HEIGHT  = NET_SIZE
net_index       rd 256          ;for network lookup - really 256, type INT array
bias            rd NET_SIZE     ;bias and freq arrays for learning
freq            rd NET_SIZE
rad_power       rd INIT_RAD     ;radpower for precomputation    


Last edited by Patrick_ on 02 Nov 2010, 20:03; edited 1 time in total
Post 15 Aug 2008, 19:24
View user's profile Send private message Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20451
Location: In your JS exploiting you and your system
revolution 15 Aug 2008, 19:45
Whoa, that is a lot of code to go over for one posting.

Have you tried following it through with your debugger, then comparing to the the C code as see where it deviates?

Just reading the code can sometimes be very difficult to spot subtle bugs.
Post 15 Aug 2008, 19:45
View user's profile Send private message Visit poster's website Reply with quote
tom tobias



Joined: 09 Sep 2003
Posts: 1320
Location: usa
tom tobias 15 Aug 2008, 22:07
Thank you for this opportunity to preach my message of PROGRAM versus CODE.
revolution wrote:
Whoa, that is a lot of code to go over for one posting.

Question: would it still be so arduous to investigate, if it were properly written as a PROGRAM?
Let's took a look at some of the code, to illustrate what it is about this snippet of code that makes it so difficult:
(A program, to cut the lecture short, is READABLE, like a book. Code, on the other hand, must be deciphered, and therefore cannot be readily and quickly debugged. Ah, boy I like those heights. Umm, very nice being up there so high on my soapbox...)
Patrick McCann, in neuquant.asm, after eleven lines of text, warning thieves about legal consequences of appropriating this buggy software, wrote:
;; @neuquant_altersingle(int alpha, int i, int b, int g, int r):
;;
;; Move neuron i towards biased (b,g,r) by factor alpha
;; Returns nothing.
ARG_ALPHA = PUSHAD_OFFSET+4
ARG_I = PUSHAD_OFFSET+8
ARG_B = PUSHAD_OFFSET+12
ARG_G = PUSHAD_OFFSET+16
ARG_R = PUSHAD_OFFSET+20
neuquant_altersingle:

pushad
mov ebp, esp

imul ebx, dword [ebp+ARG_I], network_WIDTH*4 ;;n = network[i];
add ebx, dword network

mov ecx, dword [ebp+ARG_ALPHA]
mov esi, INIT_ALPHA

mov eax, dword [ebx] ;*n -= (alpha*(*n - b)) / initalpha;
sub eax, dword [ebp+ARG_B]
imul eax, ecx
xor edx, edx
idiv esi
sub dword [ebx], eax
add ebx, 4 ;n++;

mov eax, dword [ebx] ;*n -= (alpha*(*n - b)) / initalpha;
sub eax, dword [ebp+ARG_G]
imul eax, ecx
xor edx, edx
idiv esi
sub dword [ebx], eax
add ebx, 4 ;n++;

mov eax, dword [ebx] ;*n -= (alpha*(*n - b)) / initalpha;


(A program = Algorithm plus data structures:)
Notice three points:
1. I have no idea, looking at this code, what the program is attempting to accomplish. I cannot deduce the algorithm. How can I find a mistake, perhaps very subtle, if I cannot READ what the program is supposed to be accomplishing. Earlier in the code, you have those infernal @@ signs, instead of human readable labels. This is typical of coders, not programmers. You write about, (or perhaps the first author wrote about) Neurons, as if you somehow imagine that these biological structures have something to do with your code. They do not. Use of the word NEURON, conveys absolutely NOTHING to anyone remotely familiar with cpu architecture. "Neural Networks" is a MARKETING PLOY, not a computer programming method. Above all, do NOT imagine that you are somehow explaining to us, or to anyone else, how this obscure code functions, by including the word NEURON, in the "documentation" of this code.
2. Your documentation is not only SPARSE, it is essentially non-existant. Where you do make an effort to "explain" some lines of code, you employ a semicolon on a line with some obscure C code, i.e. code to explain code. Very nice. You have sufficient contempt for the documentation process that you allow your C code comments to spill over onto the next line, so that it is IMPOSSIBLE to comprehend, at a glance. One gets vertigo trying to read the few comments provided. You need to employ WHOLE, entire lines of text, to explain what is going on in each instruction, in lieu of :
";*n -= (alpha*(*n - b)) / initalpha; What, you mean to define a pointer to n by subtracting alpha multiplied by n - b, divided by initalpha??? What's that? Am I supposed to understand that?
Your variable names are counterintuitive. Initalpha??? What does that mean? n? b? alpha?
3. You have two modules, one is .inc, one is .asm. Why? Why two modules, not 20? Why two and not ONE? There is NO OVERVIEW. No one can possibly analyze this code, efficiently, absent a proper explanation of what exactly you are trying to accomplish, and why? That is the purpose of the overview, which will also explain the decision to assemble two separate files. There must be a valid reason to break the code up into two different files? What is it?
Have you tried SIMPLIFIYING this spaghetti code, and adding some print statements, to ensure that you understand the flow of the program, then gradually reintroducing successively the parts deleted?

Smile
Post 15 Aug 2008, 22:07
View user's profile Send private message Reply with quote
Patrick_



Joined: 11 Mar 2006
Posts: 53
Location: 127.0.0.1
Patrick_ 15 Aug 2008, 22:59
Thanks for being so kind, tom! I'm sure you've also told Tomasz Grysztar this, as well, what with the commentless-code that makes flat assembler. (I'm not complaining, FASM is great Smile)

Also, I usually comment my code; however, since this is completely based off of NEUQUANT.C, I simply comment the code with the respective C code that it's written from. If you follow the assembly code along with the C code, it's easy to tell what's going on, even though I myself don't understand how the NeuQuant algorithm works (it's above my head). But you're not required to know much of how it works, as long as you have the source. I know that sounds funny but in this case it's true.

Now that I think of it, it probably is too much to ask, you folks analyzing all that source code. It's just that from the source I've seen of the MenuetOS kernel and applications, and FASM, and the lack of complaint as to the extremely sparse comments, I figured you guys would be able to handle it. I'll keep trying myself. Thanks again, tom; I think I've made a new friend today. Rolling Eyes
Post 15 Aug 2008, 22:59
View user's profile Send private message Reply with quote
tom tobias



Joined: 09 Sep 2003
Posts: 1320
Location: usa
tom tobias 16 Aug 2008, 02:37
Smile
Patrick wrote:
I'm sure you've also told Tomasz Grysztar this, as well,...

As a matter of fact, Yes. Many times, for several years. Generally in the form of private messages, however.
Patrick wrote:
I myself don't understand how the NeuQuant algorithm works (it's above my head).
1. You are skillful, and certainly the so called "neural networks" nonsense is not above your head. It is trash.
If you don't understand it, then, it is a reflection, not of you, or your supposed "limitations", but of the ideology itself, which is phony as a three dollar bill.
2. No one. NO ONE can properly write a program in any language, but especially not assembly language, without having a properly constructed algorithm. Code, yes. People can slop together a bunch of instructions, which appear to "work", i.e. function appropriately, UNTIL that point in time, when someone seeks to CHANGE something, as you are attempting to do, now.
Cheers!
Smile
Post 16 Aug 2008, 02:37
View user's profile Send private message Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 4073
Location: vpcmpistri
bitRAKE 16 Aug 2008, 03:21
I found your conversion to be fairly consistent and your use of syntax not too difficult to grasp. Here is what I believe to be the first error:
Code:
        cmp     ecx, eax                        ;if (i != smallpos) 
        jne     @f    
...the code following should executed when "i!=smallpos" - you have the reverse - jumping over the code when not equal - "JE" should fix it.

Do you use a debugger? Stepping through the code should help.

_________________
¯\(°_o)/¯ “languages are not safe - uses can be” Bjarne Stroustrup
Post 16 Aug 2008, 03:21
View user's profile Send private message Visit poster's website Reply with quote
Madis731



Joined: 25 Sep 2003
Posts: 2139
Location: Estonia
Madis731 16 Aug 2008, 20:44
To my personal dislike:
Code:
format MS COFF 
    

Why can't I just take it to FASM and hit F9 Crying or Very sad
Code:
;what about wrapping it to simple executable:
format PE...
    

_________________
My updated idol Very Happy http://www.agner.org/optimize/
Post 16 Aug 2008, 20:44
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
Madis731



Joined: 25 Sep 2003
Posts: 2139
Location: Estonia
Madis731 16 Aug 2008, 20:49
EDIT: OOps, double-post

_________________
My updated idol Very Happy http://www.agner.org/optimize/


Last edited by Madis731 on 17 Aug 2008, 11:52; edited 1 time in total
Post 16 Aug 2008, 20:49
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
Patrick_



Joined: 11 Mar 2006
Posts: 53
Location: 127.0.0.1
Patrick_ 16 Aug 2008, 21:31
tom: Thanks, that was a bit more bearable. My thing with NeuQuant is that the source was available and I was therefore able to use it in my program without rewriting it. Also, it works fairly well.

bitR: Sheesh, thanks! That certainly did make a difference. It's *almost* fixed, but I should be able to take it from here. And yes, I'm using olly as my debugger.

Madis: I'm compiling it into an object file so I can link it against my C program.
Post 16 Aug 2008, 21:31
View user's profile Send private message Reply with quote
Madis731



Joined: 25 Sep 2003
Posts: 2139
Location: Estonia
Madis731 17 Aug 2008, 12:00
@Patrick_: So there's no way we can test it as a real application :S
Post 17 Aug 2008, 12:00
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.