flat assembler
Message board for the users of flat assembler.
Index
> Main > I hate to ask, but HELP! |
Author |
|
Patrick_ 15 Aug 2008, 19:24
I'm rewriting the NeuQuant image quantizing algorithm in assembly. I use it to process GIF's in one of my programs, but I was hoping to get some better performance out of it, therefore I'm rewriting it in assembly.
Anyway, I've looked over every instruction about 5 times but still can't find the issue. To me the code exactly matches up to the C code, so I'm thinking I'm doing something in assembly the wrong way. I know, some of the code isn't the best; currently it's not highly optimized, as I'm just trying to get it to work. As you can see it uses the C calling convention because these functions are called from a C program. If anyone can find the problem, I'd *really* appreciate it. Here's the code: neuquant.inc: Code: ;; neuquant.inc: Include file for neuquant.asm. NET_SIZE = 256 ;number of colors used ;; For 256 colours, fixed arrays need 8kb, plus space for the image ;; four primes near 500 - assume no image has a length so large */ ;; that it is divisible by all four primes */ PRIME_1 = 499 PRIME_2 = 491 PRIME_3 = 487 PRIME_4 = 503 MIN_PICTURE_BYTES = (3*PRIME_4) ;minimum size for input image MAX_NET_POS = (NET_SIZE-1) NET_BIAS_SHIFT = 4 ;bias for color values N_CYCLES = 100 ;number of learning cycles ;; Definitions for freq and bias INT_BIAS_SHIFT = 16 ;bias for fractions INT_BIAS = (1 shl INT_BIAS_SHIFT) GAMMA_SHIFT = 10 ;gamma = 1024 GAMMA = (1 shl GAMMA_SHIFT) BETA_SHIFT = 10 BETA = (INT_BIAS shr BETA_SHIFT) ;beta == 1/1024 BETA_GAMMA = (INT_BIAS shl (GAMMA_SHIFT - BETA_SHIFT)) ;; Definitions for decreasing radius factor INIT_RAD = (NET_SIZE shr 3) ;for 256 cols, radius starts RADIUS_BIAS_SHIFT = 6 ;at 32.0 biased by 6 bits RADIUS_BIAS = (1 shl RADIUS_BIAS_SHIFT) INIT_RADIUS = (INIT_RAD * RADIUS_BIAS) ;and decreases by a RADIUS_DEC = 30 ;factor of 1/30 each cycle ;; Definitions for decreasing alpha factor ALPHA_BIAS_SHIFT = 10 ;alpha strats at 1.0 INIT_ALPHA = (1 shl ALPHA_BIAS_SHIFT) ;; RAD_BIAS and ALPHA_RAD_BIAS used for radpower calculation RAD_BIAS_SHIFT = 8 RAD_BIAS = (1 shl RAD_BIAS_SHIFT) ALPHA_RAD_B_SHIFT = (ALPHA_BIAS_SHIFT + RAD_BIAS_SHIFT) ALPHA_RAD_BIAS = (1 shl ALPHA_RAD_B_SHIFT) BEST_D_CONTEST = not (1 shr 31) macro ccall proc,[arg] ; directly call CDECL procedure { common size@ccall = 0 if ~ arg eq reverse pushd arg size@ccall = size@ccall+4 common end if call proc if size@ccall add esp,size@ccall end if } ;; pushad, but no eax macro pushad_noeax { push ecx edx ebx sub esp, 4 mov [esp], esp push ebp esi edi } ;; popad, but no eax macro popad_noeax { pop edi esi ebp add esp, 4 pop ebx edx ecx } PNOEAX_OFFSET = 28 PUSHAD_OFFSET = 32 neuquant.asm: Code: ;; NeuQuant Neural-Net Quantization Algorithm ;; x86 assembly rewrite based on NEUQUANT.C: ;; ------------------------------------------ ;; ;; Copyright (c) 1994 Anthony Dekker ;; ;; NEUQUANT Neural-Net quantization algorithm by Anthony Dekker, 1994. ;; See "Kohonen neural networks for optimal colour quantization" ;; in "Network: Computation in Neural Systems" Vol. 5 (1994) pp 351-367. ;; for a discussion of the algorithm. ;; See also http://members.ozemail.com.au/~dekker/NEUQUANT.HTML ;; ;; Any party obtaining a copy of these files from the author, directly or ;; indirectly, is granted, free of charge, a full and unrestricted irrevocable, ;; world-wide, paid up, royalty-free, nonexclusive right and license to deal ;; in this software and documentation files (the "Software"), including without ;; limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, ;; and/or sell copies of the Software, and to permit persons who receive ;; copies from any such party to do so, with the only requirement being ;; that this copyright notice remain intact. ;; ;; ------------------------------------------ ;; x86 assembly rewrite: ;; ;; Copyright (c) 2008 ;; ;; Permission to use, copy, modify, and/or distribute this software for any ;; purpose with or without fee is hereby granted, provided that the above ;; copyright notice and this permission notice appear in all copies. ;; ;; THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES ;; WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF ;; MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ;; ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES ;; WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ;; ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF ;; OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. format MS COFF public neuquant_initnet as '_neuquant_initnet' public neuquant_unbiasnet as '_neuquant_unbiasnet' public neuquant_writecolourmap as '_neuquant_writecolourmap' public neuquant_inxbuild as '_neuquant_inxbuild' public neuquant_inxsearch as '_neuquant_inxsearch' public neuquant_learn as '_neuquant_learn' include 'neuquant.inc' ;; @neuquant_initnet(unsigned char *the_pic, int len, int sample): ;; ;; Initialise network in range (0,0,0) to (255,255,255) and set parameters ;; Returns nothing. ARG_THE_PIC = PUSHAD_OFFSET+4 ARG_LEN = PUSHAD_OFFSET+8 ARG_SAMPLE = PUSHAD_OFFSET+12 neuquant_initnet: pushad mov ebp, esp mov eax, dword [ebp+ARG_THE_PIC] ;thepicture = thepic; mov dword [the_picture], eax mov eax, dword [ebp+ARG_LEN] ;lengthcount = len; mov dword [length_count], eax mov eax, dword [ebp+ARG_SAMPLE] ;samplefac = sample; mov dword [sample_fac], eax ;; for (i=0; i<netsize; i++) { xor ecx, ecx .for_begin: cmp ecx, NET_SIZE jnl .for_end imul edi, ecx, network_WIDTH*4 ;p = network[i]; add edi, dword network ;; p[0] = p[1] = p[2] = (i << (netbiasshift+)/netsize; mov eax, ecx shl eax, NET_BIAS_SHIFT+8 mov ebx, NET_SIZE xor edx, edx idiv ebx mov dword [edi], eax mov dword [edi+1*4], eax mov dword [edi+2*4], eax mov dword [freq+ecx*4], INT_BIAS/NET_SIZE ;freq[i] = intbias/netsize; mov dword [bias+ecx*4], 0 ;bias[i] = 0; add ecx, 1 jmp .for_begin .for_end: .return: popad ret ;; @neuquant_unbiasnet(void): ;; ;; Unbias network to give byte values 0..255 and record position i to prepare for sort ;; Returns nothing. neuquant_unbiasnet: pushad mov ebp, esp ;; for (i=0; i<netsize; i++) { xor ecx, ecx .for_one_begin: cmp ecx, NET_SIZE jnl .for_one_after imul ebx, ecx, network_WIDTH*4 add ebx, dword network ;; for (j=0; j<3; j++) { xor edx, edx .for_two_begin: cmp edx, 3 jnl .for_two_after ;; temp = (network[i][j] + (1 << (netbiasshift - 1))) >> netbiasshift; mov eax, dword [ebx+edx*4] add eax, (1 shl (NET_BIAS_SHIFT - 1)) sar eax, NET_BIAS_SHIFT ;; if (temp > 255) temp = 255; cmp eax, 255 jng @f mov eax, 255 @@: ;; network[i][j] = temp; mov dword [ebx+edx*4], eax add edx, 1 jmp .for_two_begin .for_two_after: ;; network[i][3] = i; mov dword [ebx+3*4], ecx add ecx, 1 jmp .for_one_begin .for_one_after: .return: popad ret ;; @neuquant_writecolourmap(GifColorType *f): ;; ;; Output colormap to GifColorType array. ;; Returns nothing. ARG_COLORMAP = PUSHAD_OFFSET+4 neuquant_writecolourmap: pushad mov ebp, esp ;; for (j=0; j<netsize; j++) { xor ecx, ecx mov ebx, dword [ebp+ARG_COLORMAP] .for_one: cmp ecx, NET_SIZE jnl .for_one_after imul eax, ecx, network_WIDTH*4 add eax, dword network mov edx, dword [eax+2*4] ;f[j].Red = network[j][2]; mov byte [ebx+0], dl mov edx, dword [eax+1*4] ;f[j].Green = network[j][1]; mov byte [ebx+1], dl mov edx, dword [eax+0*4] ;f[j].Blue = network[j][0]; mov byte [ebx+2], dl add ebx, 3 add ecx, 1 jmp .for_one .for_one_after: .return: popad ret ;; neuquant_inxbuild(void): ;; ;; Insertion sort of network and building of netindex[0..255] (to do after unbias) ;; Returns nothing. ST_PREV_COL = -4 ST_START_POS = -8 ST_ALLOC = 8 neuquant_inxbuild: pushad mov ebp, esp sub esp, ST_ALLOC mov dword [ebp+ST_PREV_COL], 0 mov dword [ebp+ST_START_POS], 0 ;; for (i=0; i<netsize; i++) { xor ecx, ecx .for_one: cmp ecx, NET_SIZE jnl .for_one_after imul esi, ecx, network_WIDTH*4 ;p = network[i]; add esi, dword network mov eax, ecx ;smallpos = i mov ebx, dword [esi+1*4] ;smallval = p[1] ;; for (j=i+1; j<netsize; j++) mov edx, ecx add edx, 1 .for_two: cmp edx, NET_SIZE jnl .for_two_after imul edi, edx, network_WIDTH*4 ;q = network[j] add edi, dword network cmp dword [edi+1*4], ebx ;if (q[1] < smallval) jnl @f mov eax, edx ;smallpos = j mov ebx, dword [edi+1*4] ;smallval = q[1] @@: add edx, 1 jmp .for_two .for_two_after: imul edi, eax, network_WIDTH*4 ;q = network[smallpos] add edi, dword network cmp ecx, eax ;if (i != smallpos) jne @f push ecx ;; j = q[0]; q[0] = p[0]; p[0] = j; mov edx, dword [edi+0*4] mov ecx, dword [esi+0*4] mov dword [edi+0*4], ecx mov dword [esi+0*4], edx ;; j = q[1]; q[1] = p[1]; p[1] = j; mov edx, dword [edi+1*4] mov ecx, dword [esi+1*4] mov dword [edi+1*4], ecx mov dword [esi+1*4], edx ;; j = q[2]; q[2] = p[2]; p[2] = j; mov edx, dword [edi+2*4] mov ecx, dword [esi+2*4] mov dword [edi+2*4], ecx mov dword [esi+2*4], edx ;; j = q[3]; q[3] = p[3]; p[3] = j; mov edx, dword [edi+3*4] mov ecx, dword [esi+3*4] mov dword [edi+3*4], ecx mov dword [esi+3*4], edx pop ecx @@: ;; We no longer need esi or edi; they're reset at the top. mov esi, dword [ebp+ST_PREV_COL] cmp ebx, esi ;if (smallval != previouscol) jne .for_one_reloop ;; netindex[previouscol] = (startpos+i)>>1; mov edi, dword [ebp+ST_START_POS] add edi, ecx sar edi, 1 mov dword [net_index+esi*4], edi ;;for (j=previouscol+1; j<smallval; j++) mov edx, esi add edx, 1 .for_three: cmp edx, ebx jnl .for_three_after mov dword [net_index+edx*4], ecx ;netindex[j] = i add edx, 1 jmp .for_three .for_three_after: mov dword [ebp+ST_PREV_COL], ebx ;previouscol = smallval; mov dword [ebp+ST_START_POS], ecx ;startpos = i .for_one_reloop: add ecx, 1 jmp .for_one .for_one_after: ;; netindex[previouscol] = (startpos+maxnetpos)>>1; mov eax, dword [ebp+ST_START_POS] add eax, MAX_NET_POS sar eax, 1 mov ebx, dword [ebp+ST_PREV_COL] mov dword [net_index+ebx*4], eax ;; for (j=previouscol+1; j<256; j++) .for_last: cmp ebx, 256 jnl .for_last_after mov dword [net_index+ebx*4], MAX_NET_POS ;netindex[j] = maxnetpos add ebx, 1 jmp .for_last .for_last_after: .return: add esp, ST_ALLOC popad ret ;; neuquant_inxsearch(int b, int g, int r): ;; ;; Search for BGR values 0..255 (after net is unbiased) and return colour index ;; We discard esp for non-stack usage, so no pushing/popping after esp is changed. ;; This function returns the location in the colormap. ARG_B = PNOEAX_OFFSET+4 ARG_G = PNOEAX_OFFSET+8 ARG_R = PNOEAX_OFFSET+12 neuquant_inxsearch: pushad_noeax mov ebp, esp mov eax, 1000 ;bestd = 1000; mov esp, -1 ;best = -1 mov ebx, dword [ebp+ARG_G] mov ecx, dword [net_index+ebx*4] ;i = netindex[g]; mov edx, ecx ;j = i-1; sub edx, 1 ;; while ((i<netsize) || (j>=0)) { .while_one: cmp ecx, NET_SIZE jb .ecx_below test edx, edx jns .edx_ae jmp .while_one_after ;; if (i<netsize) {. From the above while test, we know already ;; that i (ecx) < NET_SIZE .ecx_below: imul esi, ecx, network_WIDTH*4 ;p = network[i]; add esi, dword network mov edi, dword [esi+1*4] ;dist = p[1] - g; sub edi, dword [ebp+ARG_G] cmp edi, eax ;if (dist >= bestd) jnge .else_one mov ecx, NET_SIZE ;i = netsize jmp .else_one_after .else_one: add ecx, 1 ;i++ test edi, edi ;if (dist<0) jns @f neg edi ;dist = -dist @@: mov ebx, dword [esi] ;a = p[0] - b sub ebx, dword [ebp+ARG_B] test ebx, ebx ;if (a<0) a = -a; jns @f neg ebx @@: add edi, ebx ;dist += a cmp edi, eax ;if (dist<bestd) jge .else_one_after mov ebx, dword [esi+2*4] ;a = p[2] - r sub ebx, dword [ebp+ARG_R] test ebx, ebx ;if (a<0) a = -a; jns @f neg ebx @@: add edi, ebx ;dist += a; cmp edi, eax ;if (dist<bestd) jnl @f mov eax, edi ;bestd=dist mov esp, dword [esi+3*4] ;best=p[3] @@: .else_one_after: .ecx_below_after: ;; if (j>=0) test edx, edx js .while_one_reloop .edx_ae: imul esi, edx, network_WIDTH*4 ;p = network[j]; add esi, dword network mov edi, dword [ebp+ARG_G] ;dist = g - p[1]; sub edi, dword [esi+1*4] cmp edi, eax ;if (dist >= bestd) jnge .else_distnae mov edx, -1 ;j = -1 jmp .else_distnae_after .else_distnae: sub edx, 1 ;j-- test edi, edi ;if (dist<0) jns @f neg edi ;dist = -dist @@: mov ebx, dword [esi+0] ;a = p[0] - b; sub ebx, dword [ebp+ARG_B] test ebx, ebx ;if (a<0) jns @f neg ebx ;a = -a @@: add edi, ebx ;dist +=a cmp edi, eax ;if (dist<bestd) jnl .L1 mov ebx, dword [esi+2*4] ;a = p[2] - r; sub ebx, dword [ebp+ARG_R] test ebx, ebx ;if (a<0) jns @f neg ebx ;a = -a @@: add edi, ebx ;dist += a cmp edi, eax ;if (dist<bestd) jnl .L2 mov eax, edi ;bestd=dist; mov esp, dword [esi+3*4] ;best = p[3] .L2: .L1: .else_distnae_after: .while_one_reloop: jmp .while_one .while_one_after: .return: mov eax, esp mov esp, ebp popad_noeax ret ;; @neuquant_contest(int b, int g, int r): ;; ;; Search for biased BGR values. ;; Returns nothing. ARG_B = PNOEAX_OFFSET+4 ARG_G = PNOEAX_OFFSET+8 ARG_R = PNOEAX_OFFSET+12 ST_BEST_BD = -4 ST_BEST_BP = -8 ST_BESTD = -12 ST_BEST_POS = -16 ST_BIAS_DIST = -20 ST_ALLOC = 20 neuquant_contest: pushad_noeax mov ebp, esp sub esp, ST_ALLOC mov eax, 1 sal eax, 31 not eax mov dword [ebp+ST_BESTD], eax ;bestd = ~(((int) 1)<<31); mov dword [ebp+ST_BEST_BD], eax ;bestbiasd = bestd mov dword [ebp+ST_BEST_POS], -1 ;bestpos = -1 mov dword [ebp+ST_BEST_BP], -1 ;bestbiaspos = bestpos mov edx, dword bias ;p = bias mov esi, dword freq ;f = freq ;for (i=0; i<netsize; i++) xor ecx, ecx .for_one: cmp ecx, NET_SIZE jnl .for_one_after imul edi, ecx, network_WIDTH*4 ;n = network[i] add edi, dword network mov eax, dword [edi+0*4] ;dist = n[0]-b sub eax, dword [ebp+ARG_B] test eax, eax ;if (dist<0) jns @f neg eax ;dist = -dist @@: mov ebx, dword [edi+1*4] ;a = n[1] -g sub ebx, dword [ebp+ARG_G] test ebx, ebx ;if (a<0) jns @f neg ebx ;a = -a @@: add eax, ebx ;dist += a mov ebx, dword [edi+2*4] ;a = n[2] -r sub ebx, dword [ebp+ARG_R] test ebx, ebx ;if (a<0) jns @f neg ebx ;a = -a @@: add eax, ebx ;dist += a cmp eax, dword [ebp+ST_BESTD] ;if (dist <bestd) jnl @f mov dword [ebp+ST_BESTD], eax ;bestd = dist mov dword [ebp+ST_BEST_POS], ecx ;bestpos = i @@: ;; Since dist (eax) is no longer needed after this equation, ;; we use it to calculate the equation. mov dword [ebp+ST_BIAS_DIST], eax ;biasdist = dist mov eax, dword [edx] ;((*p)>>(intbiasshift-netbiasshift)) sar eax, (INT_BIAS_SHIFT - NET_BIAS_SHIFT) sub dword [ebp+ST_BIAS_DIST], eax mov eax, dword [ebp+ST_BIAS_DIST] cmp eax, dword [ebp+ST_BEST_BD] ;if (biasdist<bestbiasd) jnl @f mov dword [ebp+ST_BEST_BD], eax ;bestbiasd = biasdist mov dword [ebp+ST_BEST_BP], ecx ;bestbiaspos = i @@: mov eax, dword [esi] ;betafreq = (*f >> betashift); sar eax, BETA_SHIFT sub dword [esi], eax ;*f++ -= betafreq; add esi, 4 sal eax, GAMMA_SHIFT add dword [edx], eax ;*p++ += (betafreq<<gammashift); add edx, 4 .for_one_reloop: add ecx, 1 jmp .for_one .for_one_after: mov eax, dword [ebp+ST_BEST_POS] add dword [freq+eax*4], BETA ;freq[bestpos] += beta; sub dword [bias+eax*4], BETA_GAMMA ;bias[bestpos] -= betagamma; mov eax, dword [ebp+ST_BEST_BP] ;return(bestbiaspos); .return: mov esp, ebp popad_noeax ret ;; @neuquant_altersingle(int alpha, int i, int b, int g, int r): ;; ;; Move neuron i towards biased (b,g,r) by factor alpha ;; Returns nothing. ARG_ALPHA = PUSHAD_OFFSET+4 ARG_I = PUSHAD_OFFSET+8 ARG_B = PUSHAD_OFFSET+12 ARG_G = PUSHAD_OFFSET+16 ARG_R = PUSHAD_OFFSET+20 neuquant_altersingle: pushad mov ebp, esp imul ebx, dword [ebp+ARG_I], network_WIDTH*4 ;;n = network[i]; add ebx, dword network mov ecx, dword [ebp+ARG_ALPHA] mov esi, INIT_ALPHA mov eax, dword [ebx] ;*n -= (alpha*(*n - b)) / initalpha; sub eax, dword [ebp+ARG_B] imul eax, ecx xor edx, edx idiv esi sub dword [ebx], eax add ebx, 4 ;n++; mov eax, dword [ebx] ;*n -= (alpha*(*n - b)) / initalpha; sub eax, dword [ebp+ARG_G] imul eax, ecx xor edx, edx idiv esi sub dword [ebx], eax add ebx, 4 ;n++; mov eax, dword [ebx] ;*n -= (alpha*(*n - b)) / initalpha; sub eax, dword [ebp+ARG_R] imul eax, ecx xor edx, edx idiv esi sub dword [ebx], eax .return: mov esp, ebp popad ret ;; @neuquant_alterneigh(int rad, int i, int b, int g, int r): ;; ;; Move adjacent neurons by precomputed alpha*(1-((i-j)^2/[r]^2)) in radpower[|i-j|] ;; Returns nothing. ARG_RAD = PUSHAD_OFFSET+4 ARG_I = PUSHAD_OFFSET+8 ARG_B = PUSHAD_OFFSET+12 ARG_G = PUSHAD_OFFSET+16 ARG_R = PUSHAD_OFFSET+20 ST_LO = -4 ST_HI = -8 ST_Q = -12 ST_HOLD_EAX = -16 ST_ALLOC = 16 neuquant_alterneigh: pushad mov ebp, esp sub esp, ST_ALLOC mov esi, dword [ebp+ARG_I] mov edi, dword [ebp+ARG_RAD] ;; lo = i-rad; if (lo<-1) lo=-1; mov eax, esi sub eax, edi cmp eax, -1 jnl @f mov eax, -1 @@: mov dword [ebp+ST_LO], eax ;; hi = i+rad; if (hi>netsize) hi=netsize; mov eax, esi add eax, edi cmp eax, NET_SIZE jna @f mov eax, NET_SIZE @@: mov dword [ebp+ST_HI], eax mov ecx, esi ;j = i+1; add ecx, 1 mov ebx, esi ;k = i-1; sub ebx, 1 mov eax, rad_power mov dword [ebp+ST_Q], eax ;; while ((j<hi) || (k>lo)) { .while_one: add dword [ebp+ST_Q], 4 mov edx, dword [ebp+ST_Q] mov eax, dword [edx] ;a = (*(++q)); cmp ecx, dword [ebp+ST_HI] jl .if1 cmp ebx, dword [ebp+ST_LO] jng .while_one_after jmp .if2_past_check ;we already know k is within range .if1: push ecx mov dword [ebp+ST_HOLD_EAX], eax imul esi, ecx, network_WIDTH*4 ;p = network[j]; add esi, dword network mov ecx, ALPHA_RAD_BIAS mov edi, [esi] ;*p -= (a*(*p - b)) / alpharadbias; sub edi, dword [ebp+ARG_B] imul eax, edi xor edx, edx idiv ecx sub [esi], eax add esi, 4 ;p++ mov eax, dword [ebp+ST_HOLD_EAX] mov edi, [esi] ;*p -= (a*(*p - g)) / alpharadbias; sub edi, dword [ebp+ARG_G] imul eax, edi xor edx, edx idiv ecx sub [esi], eax add esi, 4 ;p++ mov eax, dword [ebp+ST_HOLD_EAX] mov edi, [esi] ;*p -= (a*(*p - r)) / alpharadbias; sub edi, dword [ebp+ARG_R] imul eax, edi xor edx, edx idiv ecx sub [esi], eax pop ecx mov eax, dword [ebp+ST_HOLD_EAX] add ecx, 1 ;j++ .after_if1: .if2: cmp ebx, dword [ebp+ST_LO] jna .while_one_reloop .if2_past_check: push ecx mov dword [ebp+ST_HOLD_EAX], eax imul esi, ebx, network_WIDTH*4 ;p = network[k]; add esi, dword network mov ecx, ALPHA_RAD_BIAS mov edi, [esi] ;*p -= (a*(*p - b)) / alpharadbias; sub edi, dword [ebp+ARG_B] imul eax, edi xor edx, edx idiv ecx sub [esi], eax add esi, 4 ;p++ mov eax, dword [ebp+ST_HOLD_EAX] mov edi, [esi] ;*p -= (a*(*p - g)) / alpharadbias; sub edi, dword [ebp+ARG_G] imul eax, edi xor edx, edx idiv ecx sub [esi], eax add esi, 4 ;p++ mov eax, dword [ebp+ST_HOLD_EAX] mov edi, [esi] ;*p -= (a*(*p - r)) / alpharadbias; sub edi, dword [ebp+ARG_R] imul eax, edi xor edx, edx idiv ecx sub [esi], eax pop ecx mov eax, dword [ebp+ST_HOLD_EAX] sub ebx, 1 ;k-- .after_if2: .while_one_reloop: jmp .while_one .while_one_after: .return: mov esp, ebp popad ret ;; @neuquant_learn: ;; ;; Main learning loop. ;; Returns nothing. ST_LIM = -4 ST_SAMPLE_PIXELS= -8 ST_DELTA = -12 ST_ALPHA = -16 ST_RADIUS = -20 ST_STEP = -24 ST_RAD = -28 ST_ALLOC = 28 neuquant_learn: pushad mov ebp, esp sub esp, ST_ALLOC mov eax, dword [sample_fac] ;alphadec = 30 + ((samplefac-1)/3); sub eax, 1 mov ebx, 3 xor edx, edx idiv ebx add eax, 30 add dword [alpha_dec], eax mov edi, dword [the_picture] ;p = thepicture mov ebx, dword [length_count] ;lim = thepicture + lengthcount; add ebx, edi mov dword [ebp+ST_LIM], ebx mov eax, dword [length_count] ;samplepixels = lengthcount/(3*samplefac); imul ebx, dword [sample_fac], 3 xor edx, edx idiv ebx mov dword [ebp+ST_SAMPLE_PIXELS], eax mov ebx, N_CYCLES ;delta = samplepixels/ncycles; xor edx, edx idiv ebx mov dword [ebp+ST_DELTA], eax mov dword [ebp+ST_ALPHA], INIT_ALPHA mov dword [ebp+ST_RADIUS], INIT_RADIUS mov ebx, INIT_RADIUS ;rad = radius >> radiusbiasshift; sar ebx, RADIUS_BIAS_SHIFT cmp ebx, 1 ;if (rad <= 1) jnle @f xor ebx, ebx @@: mov dword [ebp+ST_RAD], ebx ;for (i=0; i<rad; i++) xor ecx, ecx .for1: cmp ecx, ebx jnl .for1_after ;; radpower[i] = ;; alpha*(((rad*rad - i*i)*radbias)/(rad*rad)); push edi mov eax, ebx ;rad*rad imul eax, ebx mov edx, ecx ;i*i imul edx, ecx mov edi, eax ;save rad*rad sub eax, edx ;rad*rad - i*i imul eax, RAD_BIAS xor edx, edx idiv edi ; / rad*rad mov esi, dword [ebp+ST_ALPHA] imul esi, eax ;alpha * ... mov dword [rad_power+ecx*4], esi pop edi add ecx, 1 jmp .for1 .for1_after: ;; if ((lengthcount%prime1) != 0) step = 3*prime1; mov eax, dword [length_count] mov ebx, PRIME_1 xor edx, edx idiv ebx .if1: test edx, edx jz .if1_else mov dword [ebp+ST_STEP], 3*PRIME_1 jmp .if1_after .if1_else: ;; if ((lengthcount%prime2) !=0) step = 3*prime2; mov eax, dword [length_count] mov ebx, PRIME_2 xor edx, edx idiv ebx .if2: test edx, edx jz .if2_else mov dword [ebp+ST_STEP], 3*PRIME_2 jmp .if2_after .if2_else: ;; if ((lengthcount%prime3) !=0) step = 3*prime3; mov eax, dword [length_count] mov ebx, PRIME_3 xor edx, edx idiv ebx .if3: test edx, edx jz .if3_else mov dword [ebp+ST_STEP], 3*PRIME_3 jmp .if3_after .if3_else: mov dword [ebp+ST_STEP], 3*PRIME_4 .if3_after: .if2_after: .if1_after: xor ecx, ecx ;i = 0 .while1: cmp ecx, dword [ebp+ST_SAMPLE_PIXELS] jnb .while1_after push ecx movzx ecx, byte [edi] ;b = p[0] << netbiasshift; movzx ebx, byte [edi+1] ;g = p[1] << netbiasshift; movzx edx, byte [edi+2] ;r = p[2] << netbiasshift; sal cl, NET_BIAS_SHIFT sal bl, NET_BIAS_SHIFT sal dl, NET_BIAS_SHIFT ;; j = contest(b,g,r); ccall neuquant_contest, ecx, ebx, edx ;; altersingle(alpha,j,b,g,r); ccall neuquant_altersingle, dword [ebp+ST_ALPHA], eax, ecx, ebx, edx cmp dword [ebp+ST_RAD], 0 ;if (rad) jz @f ;; alterneigh(rad,j,b,g,r); ccall neuquant_alterneigh, dword [ebp+ST_RAD], eax, ecx, ebx, edx @@: pop ecx ;; At this point we're done with eax, ebx, edx, esi add edi, dword [ebp+ST_STEP] ;p += step cmp edi, dword [ebp+ST_LIM] ;if (p >= lim) jnae @f sub edi, dword [length_count] ;p -= lengthcount; @@: add ecx, 1 ;i++ ;; if (i%delta == 0) { mov eax, ecx mov ebx, dword [ebp+ST_DELTA] xor edx, edx idiv ebx test edx, edx jnz @f ;; alpha -= alpha / alphadec; mov eax, dword [ebp+ST_ALPHA] mov ebx, dword [alpha_dec] xor edx, edx idiv ebx sub dword [ebp+ST_ALPHA], eax ;; radius -= radius / radiusdec; mov eax, dword [ebp+ST_RADIUS] mov ebx, RADIUS_DEC xor edx, edx idiv ebx sub dword [ebp+ST_RADIUS], eax ;; rad = radius >> radiusbiasshift; mov esi, dword [ebp+ST_RADIUS] sar esi, RADIUS_BIAS_SHIFT mov dword [ebp+ST_RAD], esi .if4: cmp esi, 1 ;if (rad <= 1) jnle .if4_after xor esi, esi ;rad = 0 mov dword [ebp+ST_RAD], 0 .if4_after: ;; for (j=0; j<rad; j++) xor ebx, ebx .for2: cmp ebx, esi jnl .for2_after ;; radpower[j] = ;; alpha*(((rad*rad - j*j)*radbias)/(rad*rad)); push edi mov edi, esi imul edi, esi ;rad*rad mov eax, edi ;operate on rad*rad with eax mov edx, ebx ;j*j imul edx, ebx sub eax, edx ;rad*rad - j*j imul eax, RAD_BIAS xor edx, edx idiv edi ; / rad*rad mov edi, dword [ebp+ST_ALPHA] imul edi, eax ;alpha * ... mov dword [rad_power+ebx*4], edi pop edi add ebx, 1 jmp .for2 .for2_after: @@: jmp .while1 .while1_after: .return: mov esp, ebp popad ret section '.data' ;; Global variables used throughought the code: alpha_dec dd ? ;biased by 10 bits, type INT the_picture dd ? ;the input image itself, type UNSIGNED CHAR * length_count dd ? ;length_count = H*W*3, type INT sample_fac dd ? ;sampling factor 1..30, type INT network rd network_WIDTH * network_HEIGHT ;the network itself, BGRc, type INT array network_WIDTH = 4 network_HEIGHT = NET_SIZE net_index rd 256 ;for network lookup - really 256, type INT array bias rd NET_SIZE ;bias and freq arrays for learning freq rd NET_SIZE rad_power rd INIT_RAD ;radpower for precomputation Last edited by Patrick_ on 02 Nov 2010, 20:03; edited 1 time in total |
|||
15 Aug 2008, 19:24 |
|
tom tobias 15 Aug 2008, 22:07
Thank you for this opportunity to preach my message of PROGRAM versus CODE.
revolution wrote: Whoa, that is a lot of code to go over for one posting. Question: would it still be so arduous to investigate, if it were properly written as a PROGRAM? Let's took a look at some of the code, to illustrate what it is about this snippet of code that makes it so difficult: (A program, to cut the lecture short, is READABLE, like a book. Code, on the other hand, must be deciphered, and therefore cannot be readily and quickly debugged. Ah, boy I like those heights. Umm, very nice being up there so high on my soapbox...) Patrick McCann, in neuquant.asm, after eleven lines of text, warning thieves about legal consequences of appropriating this buggy software, wrote: ;; @neuquant_altersingle(int alpha, int i, int b, int g, int r): (A program = Algorithm plus data structures:) Notice three points: 1. I have no idea, looking at this code, what the program is attempting to accomplish. I cannot deduce the algorithm. How can I find a mistake, perhaps very subtle, if I cannot READ what the program is supposed to be accomplishing. Earlier in the code, you have those infernal @@ signs, instead of human readable labels. This is typical of coders, not programmers. You write about, (or perhaps the first author wrote about) Neurons, as if you somehow imagine that these biological structures have something to do with your code. They do not. Use of the word NEURON, conveys absolutely NOTHING to anyone remotely familiar with cpu architecture. "Neural Networks" is a MARKETING PLOY, not a computer programming method. Above all, do NOT imagine that you are somehow explaining to us, or to anyone else, how this obscure code functions, by including the word NEURON, in the "documentation" of this code. 2. Your documentation is not only SPARSE, it is essentially non-existant. Where you do make an effort to "explain" some lines of code, you employ a semicolon on a line with some obscure C code, i.e. code to explain code. Very nice. You have sufficient contempt for the documentation process that you allow your C code comments to spill over onto the next line, so that it is IMPOSSIBLE to comprehend, at a glance. One gets vertigo trying to read the few comments provided. You need to employ WHOLE, entire lines of text, to explain what is going on in each instruction, in lieu of : ";*n -= (alpha*(*n - b)) / initalpha; What, you mean to define a pointer to n by subtracting alpha multiplied by n - b, divided by initalpha??? What's that? Am I supposed to understand that? Your variable names are counterintuitive. Initalpha??? What does that mean? n? b? alpha? 3. You have two modules, one is .inc, one is .asm. Why? Why two modules, not 20? Why two and not ONE? There is NO OVERVIEW. No one can possibly analyze this code, efficiently, absent a proper explanation of what exactly you are trying to accomplish, and why? That is the purpose of the overview, which will also explain the decision to assemble two separate files. There must be a valid reason to break the code up into two different files? What is it? Have you tried SIMPLIFIYING this spaghetti code, and adding some print statements, to ensure that you understand the flow of the program, then gradually reintroducing successively the parts deleted? |
|||
15 Aug 2008, 22:07 |
|
Patrick_ 15 Aug 2008, 22:59
Thanks for being so kind, tom! I'm sure you've also told Tomasz Grysztar this, as well, what with the commentless-code that makes flat assembler. (I'm not complaining, FASM is great )
Also, I usually comment my code; however, since this is completely based off of NEUQUANT.C, I simply comment the code with the respective C code that it's written from. If you follow the assembly code along with the C code, it's easy to tell what's going on, even though I myself don't understand how the NeuQuant algorithm works (it's above my head). But you're not required to know much of how it works, as long as you have the source. I know that sounds funny but in this case it's true. Now that I think of it, it probably is too much to ask, you folks analyzing all that source code. It's just that from the source I've seen of the MenuetOS kernel and applications, and FASM, and the lack of complaint as to the extremely sparse comments, I figured you guys would be able to handle it. I'll keep trying myself. Thanks again, tom; I think I've made a new friend today. |
|||
15 Aug 2008, 22:59 |
|
tom tobias 16 Aug 2008, 02:37
Patrick wrote: I'm sure you've also told Tomasz Grysztar this, as well,... As a matter of fact, Yes. Many times, for several years. Generally in the form of private messages, however. Patrick wrote: I myself don't understand how the NeuQuant algorithm works (it's above my head). If you don't understand it, then, it is a reflection, not of you, or your supposed "limitations", but of the ideology itself, which is phony as a three dollar bill. 2. No one. NO ONE can properly write a program in any language, but especially not assembly language, without having a properly constructed algorithm. Code, yes. People can slop together a bunch of instructions, which appear to "work", i.e. function appropriately, UNTIL that point in time, when someone seeks to CHANGE something, as you are attempting to do, now. Cheers! |
|||
16 Aug 2008, 02:37 |
|
bitRAKE 16 Aug 2008, 03:21
I found your conversion to be fairly consistent and your use of syntax not too difficult to grasp. Here is what I believe to be the first error:
Code: cmp ecx, eax ;if (i != smallpos) jne @f Do you use a debugger? Stepping through the code should help. _________________ ¯\(°_o)/¯ “languages are not safe - uses can be” Bjarne Stroustrup |
|||
16 Aug 2008, 03:21 |
|
Madis731 16 Aug 2008, 20:44
To my personal dislike:
Code: format MS COFF Why can't I just take it to FASM and hit F9 Code: ;what about wrapping it to simple executable: format PE... |
|||
16 Aug 2008, 20:44 |
|
Madis731 16 Aug 2008, 20:49
EDIT: OOps, double-post
Last edited by Madis731 on 17 Aug 2008, 11:52; edited 1 time in total |
|||
16 Aug 2008, 20:49 |
|
Patrick_ 16 Aug 2008, 21:31
tom: Thanks, that was a bit more bearable. My thing with NeuQuant is that the source was available and I was therefore able to use it in my program without rewriting it. Also, it works fairly well.
bitR: Sheesh, thanks! That certainly did make a difference. It's *almost* fixed, but I should be able to take it from here. And yes, I'm using olly as my debugger. Madis: I'm compiling it into an object file so I can link it against my C program. |
|||
16 Aug 2008, 21:31 |
|
Madis731 17 Aug 2008, 12:00
@Patrick_: So there's no way we can test it as a real application :S
|
|||
17 Aug 2008, 12:00 |
|
< Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.