flat assembler
Message board for the users of flat assembler.
flat assembler
> High Level Languages > understanding jpg decompression Goto page Previous 1, 2 
Author 

Fourier Transforms are complex by nature (by complex I literally mean operating on complex numbers), even if the input is realonly (i.e. imaginary is zero). The DCT is similar, so probably also is. (never used it personally but hopefully the below helps)
By far the most important thing to understand is Euler's Formula: https://en.wikipedia.org/wiki/Euler%27s_formula This is how I visualize it: the Xcoordinate is based on cos, and the Ycoordinate is based on sin. So if you end up with "r*cos(a) + i*r*sin(a)" after applying Euler's Formula, that means the phase is a (the angle) while the magnitude is r, for each frequency bin. Note that the DFT (and FFT) convert a signal to rectangular coordinates  so you won't have phase and magnitude directly, but instead you'll get a complex number for each frequency bin. This is not very useful. You need to convert it to polar coordinates (https://en.wikipedia.org/wiki/Polar_coordinate_system) and then you'll get the above. That r*cos(a) + i*r*sin(a) is in polar coordinates (length + angle), rectangular is just x+i*y position (y being the imaginary component). BTW you won't actually end up doing all this pointless conversions  the math can be simplified in the end, but you won't understand anything if you look at the "optimized math" from the beginning. You need to understand the steps how it was derived, since you did ask. (sorry though, I don't know anything about the Fast Fourier Transform other than it removes redundant calculations  but I know how to derive the math for the DFT though, the FFT is just an algorithmic optimization then, not different math) Also I've never used a 2D DFT (or DCT), like in this case. But hopefully, you can extend it to 2D. I've used 1D DFT extensively though (audio). 

16 May 2018, 20:07 

Please read https://unix4lyfe.org/dct1d/ and please explain what he did at "This number can be brought down by manually factoring"?
What is factoring? 

16 May 2018, 21:00 

>I have no idea what he did with the + and  in front (the left), since those two terms are NOT equal, but maybe he's using a weird notation.
I got it. "" indicates deleted code, "+" indicates added code. Common in .patch files. discrete cosine transform started looking much simpler after I realized that, despite its name, all cos that are there are actually constants, and the most complex operation you will ever need is multiplication. There aren't that many of those constants even in the most raw algorithm ( look at "We don't need to calculate quite so many coefficients" and dct_c.cc in https://unix4lyfe.org/dct1d/ ), and in AAN algorithm there are only 5 constants. 

19 May 2018, 18:30 

ieice Arai Agui Nakajima
https://search.ieice.org/bin/summary.php?id=e71e_11_1095 >The sequence of 8point DCT and scalar quantization is effective in image data compression. The operation is executed very efficiently, if the DCT coefficients need not to be found explicitly. The present paper proposes a method, requiring only five times of multiplication for the transform. The 8point DCT can be comopsed from the 16point DFT which gives only the real parts of coefficients, and final scaling. The real part DFT can be implemented by the small FFT Winograd algorithm, which requires only five multiplications. The final scaling can be combined with the quantizing matrix without any change in arithmetic complexity of the qunatizer. Since each signal path in the proposed algorithm has one multiplication at most, the five multiplications can be executed in parallel. This will make the hardware implementation of the algorithm effective. >FFT Winograd https://en.wikipedia.org/wiki/Fast_Fourier_transform >Another polynomial viewpoint is exploited by the Winograd FFT algorithm,[22][23] which factorizes zN − 1 into cyclotomic polynomials—these often have coefficients of 1, 0, or −1, and therefore require few (if any) multiplications >so Winograd can be used to obtain minimalmultiplication FFTs and is often used to find efficient algorithms for small factors >unfortunately, this comes at the cost of many more additions, a tradeoff no longer favorable on modern processors with hardware multipliers. Okay, now I just need to understand what this Winograd algorithm is about... This AAN paper costs 4000 yen, maybe I should just buy it... Guys, want to buy it for me? What I should do now, is to try to code the "braindead" version of dct, the one that just does 64 multiplications. 

19 May 2018, 18:39 

"Wrote" and run the naive version of dct, another one. See how numbers are slightly different, but all of them do reverse back to original numbers.
Code: #include <stdio.h> //#include <cmath> #define _USE_MATH_DEFINES #include <math.h> //https://stackoverflow.com/questions/26065359/mpiflaggedasundeclaredidentifier static const float c0 = 1. / sqrt(2.) * sqrt(2. / 8.); static const float c1 = cos(M_PI * 1. / 16.) * sqrt(2. / 8.); static const float c2 = cos(M_PI * 2. / 16.) * sqrt(2. / 8.); static const float c3 = cos(M_PI * 3. / 16.) * sqrt(2. / 8.); static const float c4 = cos(M_PI * 4. / 16.) * sqrt(2. / 8.); static const float c5 = cos(M_PI * 5. / 16.) * sqrt(2. / 8.); static const float c6 = cos(M_PI * 6. / 16.) * sqrt(2. / 8.); static const float c7 = cos(M_PI * 7. / 16.) * sqrt(2. / 8.); #define a arr[0] #define b arr[1] #define c arr[2] #define d arr[3] #define e arr[4] #define f arr[5] #define g arr[6] #define h arr[7] //from, to void dct_ii(int N, const float x[], float X[]) { for (int k = 0; k < N; ++k) { float sum = 0.; float s = (k == 0) ? sqrt(.5) : 1.; for (int n = 0; n < N; ++n) { sum += s * x[n] * cos(M_PI * (n + .5) * k / N); } X[k] = sum * sqrt(2. / N); } } //from, to void idct(int N, const float X[], float x[]) { for (int n = 0; n < N; ++n) { float sum = 0.; for (int k = 0; k < N; ++k) { float s = (k == 0) ? sqrt(.5) : 1.; sum += s * X[k] * cos(M_PI * (n + .5) * k / N); } x[n] = sum * sqrt(2. / N); } } int main() { float arr[8] = {0, 1, 2, 3, 3, 3, 2, 0}; printf("%f %f %f %f %f %f %f %f\n\n\n", arr[0], arr[1], arr[2], arr[3], arr[4], arr[5], arr[6], arr[7]); puts("AAN method\n"); { //forward dct float t0 = arr[0] + arr[7]; float t7 = arr[0]  arr[7]; float t1 = arr[1] + arr[6]; float t6 = arr[1]  arr[6]; float t2 = arr[2] + arr[5]; float t5 = arr[2]  arr[5]; float t3 = arr[3] + arr[4]; float t4 = arr[3]  arr[4]; /* Even part */ float t10 = t0 + t3; /* phase 2 */ float t13 = t0  t3; float t11 = t1 + t2; float t12 = t1  t2; arr[0] = t10 + t11; /* phase 3 */ arr[4] = t10  t11; float z1 = (t12 + t13) * 0.707106781; /* c4 */ arr[2] = t13 + z1; /* phase 5 */ arr[6] = t13  z1; /* Odd part */ t10 = t4 + t5; /* phase 2 */ t11 = t5 + t6; t12 = t6 + t7; /* The rotator is modified from fig 48 to avoid extra negations. */ float z5 = (t10  t12) * 0.382683433; /* c6 */ float z2 = 0.541196100 * t10 + z5; /* c2c6 */ float z4 = 1.306562965 * t12 + z5; /* c2+c6 */ float z3 = t11 * 0.707106781; /* c4 */ float z11 = t7 + z3; /* phase 5 */ float z13 = t7  z3; arr[5] = z13 + z2; /* phase 6 */ arr[3] = z13  z2; arr[1] = z11 + z4; arr[7] = z11  z4; } printf("%f %f %f %f %f %f %f %f\n\n\n", arr[0], arr[1], arr[2], arr[3], arr[4], arr[5], arr[6], arr[7]); { //inverse dct /* Even part */ float t0 = arr[0]; float t1 = arr[2]; float t2 = arr[4]; float t3 = arr[6]; float t10 = t0 + t2; /* phase 3 */ float t11 = t0  t2; float t13 = t1 + t3; /* phases 53 */ float t12 = (t1  t3) * 1.414213562  t13; /* 2*c4 */ t0 = t10 + t13; /* phase 2 */ t3 = t10  t13; t1 = t11 + t12; t2 = t11  t12; /* Odd part */ float t4 = arr[1]; float t5 = arr[3]; float t6 = arr[5]; float t7 = arr[7]; float z13 = t6 + t5; /* phase 6 */ float z10 = t6  t5; float z11 = t4 + t7; float z12 = t4  t7; t7 = z11 + z13; /* phase 5 */ t11 = (z11  z13) * 1.414213562; /* 2*c4 */ float z5 = (z10 + z12) * 1.847759065; /* 2*c2 */ t10 = z5  z12 * 1.082392200; /* 2*(c2c6) */ t12 = z5  z10 * 2.613125930; /* 2*(c2+c6) */ t6 = t12  t7; /* phase 2 */ t5 = t11  t6; t4 = t10  t5; arr[0] = t0 + t7; arr[7] = t0  t7; arr[1] = t1 + t6; arr[6] = t1  t6; arr[2] = t2 + t5; arr[5] = t2  t5; arr[3] = t3 + t4; arr[4] = t3  t4; } printf("%f %f %f %f %f %f %f %f\n\n\n", arr[0], arr[1], arr[2], arr[3], arr[4], arr[5], arr[6], arr[7]); { //shift_left arr[0] = arr[0]/8; arr[1] = arr[1]/8; arr[2] = arr[2]/8; arr[3] = arr[3]/8; arr[4] = arr[4]/8; arr[5] = arr[5]/8; arr[6] = arr[6]/8; arr[7] = arr[7]/8; } printf("%f %f %f %f %f %f %f %f\n\n\n", arr[0], arr[1], arr[2], arr[3], arr[4], arr[5], arr[6], arr[7]); //arr = {0, 1, 2, 3, 3, 3, 2, 0}; puts("naive method\n"); float arr2[8]; dct_ii(8, arr, arr2); printf("%f %f %f %f %f %f %f %f\n\n\n", arr2[0], arr2[1], arr2[2], arr2[3], arr2[4], arr2[5], arr2[6], arr2[7]); idct(8, arr2, arr); printf("%f %f %f %f %f %f %f %f\n\n\n", arr[0], arr[1], arr[2], arr[3], arr[4], arr[5], arr[6], arr[7]); puts("matrix multiplication method\n"); float X[8]; { X[0] = a*c0 + b*c0 + c*c0 + d*c0 + e*c0 + f*c0 + g*c0 + h*c0; X[1] = a*c1 + b*c3 + c*c5 + d*c7  e*c7  f*c5  g*c3  h*c1; X[2] = a*c2 + b*c6  c*c6  d*c2  e*c2  f*c6 + g*c6 + h*c2; X[3] = a*c3  b*c7  c*c1  d*c5 + e*c5 + f*c1 + g*c7  h*c3; X[4] = a*c4  b*c4  c*c4 + d*c4 + e*c4  f*c4  g*c4 + h*c4; X[5] = a*c5  b*c1 + c*c7 + d*c3  e*c3  f*c7 + g*c1  h*c5; X[6] = a*c6  b*c2 + c*c2  d*c6  e*c6 + f*c2  g*c2 + h*c6; X[7] = a*c7  b*c5 + c*c3  d*c1 + e*c1  f*c3 + g*c5  h*c7; } printf("%f %f %f %f %f %f %f %f\n\n\n", X[0], X[1], X[2], X[3], X[4], X[5], X[6], X[7]); { a = X[0]*c0 + X[1]*c1 + X[2]*c2 + X[3]*c3 + X[4]*c4 + X[5]*c5 + X[6]*c6 + X[7]*c7; b = X[0]*c0 + X[1]*c3 + X[2]*c6  X[3]*c7  X[4]*c4  X[5]*c1  X[6]*c2  X[7]*c5; c = X[0]*c0 + X[1]*c5  X[2]*c6  X[3]*c1  X[4]*c4 + X[5]*c7 + X[6]*c2 + X[7]*c3; d = X[0]*c0 + X[1]*c7  X[2]*c2  X[3]*c5 + X[4]*c4 + X[5]*c3  X[6]*c6  X[7]*c1; e = X[0]*c0  X[1]*c7  X[2]*c2 + X[3]*c5 + X[4]*c4  X[5]*c3  X[6]*c6 + X[7]*c1; f = X[0]*c0  X[1]*c5  X[2]*c6 + X[3]*c1  X[4]*c4  X[5]*c7 + X[6]*c2  X[7]*c3; g = X[0]*c0  X[1]*c3 + X[2]*c6 + X[3]*c7  X[4]*c4 + X[5]*c1  X[6]*c2 + X[7]*c5; h = X[0]*c0  X[1]*c1 + X[2]*c2  X[3]*c3 + X[4]*c4  X[5]*c5 + X[6]*c6  X[7]*c7; } printf("%f %f %f %f %f %f %f %f\n\n\n", arr[0], arr[1], arr[2], arr[3], arr[4], arr[5], arr[6], arr[7]); getchar(); } Code: 0.000000 1.000000 2.000000 3.000000 3.000000 3.000000 2.000000 0.000000 AAN method 14.000000 2.720777 11.656855 1.955410 2.000000 0.873017 0.343146 0.107651 0.000000 8.000000 16.000000 24.000000 24.000000 24.000000 16.000000 0.000000 0.000000 1.000000 2.000000 3.000000 3.000000 3.000000 2.000000 0.000000 naive method 4.949747 0.693520 3.154322 0.587938 0.707107 0.392847 0.224171 0.137950 0.000000 1.000000 2.000000 3.000000 3.000000 3.000000 2.000000 0.000000 matrix multiplication method 4.949747 0.693520 3.154322 0.587938 0.707107 0.392848 0.224171 0.137950 0.000000 1.000000 2.000000 3.000000 3.000000 3.000000 2.000000 0.000000 

23 May 2018, 08:56 

I found some info on this difference in numbers: In aan.cc from https://unix4lyfe.org/dct1d/ , I found this:
Code: #if 1 const double s0 = (cos(0)*sqrt(.5)/2)/(1 ); // 0.353553 const double s1 = (cos(1.*M_PI/16)/2)/(a5+a4+1); // 0.254898 const double s2 = (cos(2.*M_PI/16)/2)/(a1+1 ); // 0.270598 const double s3 = (cos(3.*M_PI/16)/2)/(a5+1 ); // 0.300672 const double s4 = s0; // (cos(4.*M_PI/16)/2)/(1 ); const double s5 = (cos(5.*M_PI/16)/2)/(1a5 ); // 0.449988 const double s6 = (cos(6.*M_PI/16)/2)/(1a1 ); // 0.653281 const double s7 = (cos(7.*M_PI/16)/2)/(a5a4+1 ); // 1.281458 #else const double s0 = 1.; const double s1 = 1.; const double s2 = 1.; const double s3 = 1.; const double s4 = 1.; const double s5 = 1.; const double s6 = 1.; const double s7 = 1.; #endif I suppose I can use those numbers to make all 3 methods result in same numbers. I guess, I'm supposed to multiply my quantization/dequantization table by those numbers somehow. If I were to make my custom format, I probably could avoid that extra multiplication... Not like that matters that much, it's done once for the whole image, it's like only 0.000001% of all time wasted on decoding. Still, why not, will make executable a bit smaller. 

23 May 2018, 09:02 

I can see that this works, but I still have no idea why this works. How people find this stuff?
I must take AAN method, and try to dumb it down to the matrix multiplication method somehow. I want to see all conversions on the way. 

23 May 2018, 09:07 

vivik wrote: >I have no idea what he did with the + and  in front (the left), since those two terms are NOT equal, but maybe he's using a weird notation. BTW, you want to understand the math? Or what is your question? I suggest you start with the DFT, it's easier to grasp. People "come up with that stuff" when the math just gets reduced to it, in the end. But is there any particular reason for it that you want to derive it yourself? It's not that easy to derive. Understanding the "bloated" form of math (with the integral or sum) is easier as a concept IMO. But even that is difficult with 2D transforms. 

23 May 2018, 14:32 

I have a feeling that I don't really need complex numbers for this. I only need to split 2d signal into frequencies, to remove frequencies that almost don't matter. I can do anything I want for this, not just dct, dct just seems to be the easiest to use, with many convenient properties (like, it's easy to turn 1d dct into 2d, and it doesn't matter if you do 2d dct column first or row first, maybe something else).
About Euler's formula: how is putting any number into the power of imaginary number even legal? Putting any number into the power of any 2d number? I know what 2^2 is, what 4^2 and 2^4 is, but what is 2^i2? It looks like this formula actually defines a new operation, the one that is artificial and doesn't have any meaning outside math. Apparently it found its applications somewhere. Last edited by vivik on 26 May 2018, 18:51; edited 1 time in total 

26 May 2018, 07:25 

I just noticed. I posted 2 flowgraphs/butterfly diagrams on page 1. Looks like they are slightly different, look at the left part. Looks like the bottom butterfly diagram is wrong.
Last edited by vivik on 26 May 2018, 19:08; edited 1 time in total 

26 May 2018, 12:40 

Euler's equations (which can be derived from a Taylor series expansion):
Quote: e^it = cos(t)+isin(t) Quote: e^it = cos(t)isin(t) can be used to convert between the compact exponential form and rectangular form of a complex number. In the earlier Matlab link I posted, I think I remember reading DCT algorithms do not need complex numbers. Compressing the frequencies is done at the quantization step, I believe. There's more to JPEG than just DCT, a thorough walkthrough appears to be here: 1. https://www.impulseadventure.com/photo/jpeghuffmancoding.html 2. https://en.wikibooks.org/wiki/JPEG__Idea_and_Practice/The_Huffman_coding The author of the first provided a tool which would probably aide in debugging also:  https://www.impulseadventure.com/photo/jpegsnoop.html There are a lot of image formats, each with implementation variations. It's noble to try to understand the components and algorithms of each format, but I would just implement a few from scratch and use the Windows decoder/encoders for the rest. 

26 May 2018, 16:44 

Good link about image compression in general, may make some things clear. https://fgiesen.wordpress.com/2013/11/04/bink22integerdctdesignpart1/


02 Jun 2018, 04:45 

Will have to gradually simplify AAN method down to matrix multiplication method, need to prove that it's correct somehow.
Code: #include <stdio.h> //#include <cmath> #define _USE_MATH_DEFINES #include <math.h> //https://stackoverflow.com/questions/26065359/mpiflaggedasundeclaredidentifier //just matrix coefficients... they were c0 c1 c2 ... c7 static const float z0 = 1. / sqrt(2.) * sqrt(2. / 8.); static const float z1 = cos(M_PI * 1. / 16.) * sqrt(2. / 8.); static const float z2 = cos(M_PI * 2. / 16.) * sqrt(2. / 8.); static const float z3 = cos(M_PI * 3. / 16.) * sqrt(2. / 8.); static const float z4 = cos(M_PI * 4. / 16.) * sqrt(2. / 8.); static const float z5 = cos(M_PI * 5. / 16.) * sqrt(2. / 8.); static const float z6 = cos(M_PI * 6. / 16.) * sqrt(2. / 8.); static const float z7 = cos(M_PI * 7. / 16.) * sqrt(2. / 8.); //those were a1 a2 a3 a4 a5, constants used in AAN //apparently there already is y1 in math.h float u1 = sqrt(.5); //0.707 float u2 = sqrt(2.) * cos(3./16. * 2 * M_PI); //0.541 float u3 = u1; //0.707 float u4 = sqrt(2.) * cos(1./16. * 2 * M_PI); //1.307 float u5 = cos(3./16. * 2 * M_PI); //0.383 //those are used to scale AAN to the final result //were s0 .. s7 float x0 = sqrt(.5)/2; //0.353553; float x1 = (cos(1.*M_PI/16)/2) / (u5+u4+1); //0.254898; float x2 = (cos(2.*M_PI/16)/2) / (u1+1); //0.270598; float x3 = (cos(3.*M_PI/16)/2) / (u5+1); //0.300672; float x4 = sqrt(.5)/2; //0.353553; float x5 = (cos(5.*M_PI/16)/2) / (1u5); //0.449988; float x6 = (cos(6.*M_PI/16)/2) / (1u1); //0.653281; float x7 = (cos(7.*M_PI/16)/2) / (u5u4+1); //1.281458; #define a i[0] #define b i[1] #define c i[2] #define d i[3] #define e i[4] #define f i[5] #define g i[6] #define h i[7] #define i0 i[0] #define i1 i[1] #define i2 i[2] #define i3 i[3] #define i4 i[4] #define i5 i[5] #define i6 i[6] #define i7 i[7] #define o0 o[0] #define o1 o[1] #define o2 o[2] #define o3 o[3] #define o4 o[4] #define o5 o[5] #define o6 o[6] #define o7 o[7] void reset(float x[]) { x[0] = 0; x[1] = 1; x[2] = 2; x[3] = 3; x[4] = 3; x[5] = 3; x[6] = 2; x[7] = 0; } void print(float x[]) { printf("%f %f %f %f %f %f %f %f\n\n\n", x[0], x[1], x[2], x[3], x[4], x[5], x[6], x[7]); } int main() { //float arr[8] = {0, 1, 2, 3, 3, 3, 2, 0}; float i[8]; float o[8]; reset(i); puts("AAN method\n"); { //forward dct float t0 = i0 + i7; float t7 = i0  i7; float t1 = i1 + i6; float t6 = i1  i6; float t2 = i2 + i5; float t5 = i2  i5; float t3 = i3 + i4; float t4 = i3  i4; /* Even part */ float t10 = t0 + t3; /* phase 2 */ float t13 = t0  t3; float t11 = t1 + t2; float t12 = t1  t2; o[0] = t10 + t11; /* phase 3 */ o[4] = t10  t11; float z1 = (t12 + t13) * 0.707106781; /* c4 */ o[2] = t13 + z1; /* phase 5 */ o[6] = t13  z1; /* Odd part */ t10 = t4 + t5; /* phase 2 */ t11 = t5 + t6; t12 = t6 + t7; /* The rotator is modified from fig 48 to avoid extra negations. */ float z5 = (t10  t12) * 0.382683433; /* c6 */ float z2 = 0.541196100 * t10 + z5; /* c2c6 */ float z4 = 1.306562965 * t12 + z5; /* c2+c6 */ float z3 = t11 * 0.707106781; /* c4 */ float z11 = t7 + z3; /* phase 5 */ float z13 = t7  z3; o[5] = z13 + z2; /* phase 6 */ o[3] = z13  z2; o[1] = z11 + z4; o[7] = z11  z4; o0 *= x0; o1 *= x1; o2 *= x2; o3 *= x3; o4 *= x4; o5 *= x5; o6 *= x6; o7 *= x7; } print(o); puts("matrix multiplication method\n"); reset(i); { o0 = a*z0 + b*z0 + c*z0 + d*z0 + e*z0 + f*z0 + g*z0 + h*z0; o1 = a*z1 + b*z3 + c*z5 + d*z7  e*z7  f*z5  g*z3  h*z1; o2 = a*z2 + b*z6  c*z6  d*z2  e*z2  f*z6 + g*z6 + h*z2; o3 = a*z3  b*z7  c*z1  d*z5 + e*z5 + f*z1 + g*z7  h*z3; o4 = a*z4  b*z4  c*z4 + d*z4 + e*z4  f*z4  g*z4 + h*z4; o5 = a*z5  b*z1 + c*z7 + d*z3  e*z3  f*z7 + g*z1  h*z5; o6 = a*z6  b*z2 + c*z2  d*z6  e*z6 + f*z2  g*z2 + h*z6; o7 = a*z7  b*z5 + c*z3  d*z1 + e*z1  f*z3 + g*z5  h*z7; } print(o); getchar(); } 

02 Jun 2018, 05:03 

Goto page Previous 1, 2 < Last Thread  Next Thread > 
Forum Rules:

Copyright © 19992018, Tomasz Grysztar.
Powered by rwasa.