flat assembler
Message board for the users of flat assembler.

Index > High Level Languages > understanding jpg decompression

Goto page Previous  1, 2
Author
Thread Post new topic Reply to topic
Furs



Joined: 04 Mar 2016
Posts: 2493
Furs 16 May 2018, 20:07
Fourier Transforms are complex by nature (by complex I literally mean operating on complex numbers), even if the input is real-only (i.e. imaginary is zero). The DCT is similar, so probably also is. (never used it personally but hopefully the below helps)

By far the most important thing to understand is Euler's Formula: https://en.wikipedia.org/wiki/Euler%27s_formula

This is how I visualize it: the X-coordinate is based on cos, and the Y-coordinate is based on sin. So if you end up with "r*cos(a) + i*r*sin(a)" after applying Euler's Formula, that means the phase is a (the angle) while the magnitude is r, for each frequency bin.

Note that the DFT (and FFT) convert a signal to rectangular coordinates -- so you won't have phase and magnitude directly, but instead you'll get a complex number for each frequency bin. This is not very useful. You need to convert it to polar coordinates (https://en.wikipedia.org/wiki/Polar_coordinate_system) and then you'll get the above.

That r*cos(a) + i*r*sin(a) is in polar coordinates (length + angle), rectangular is just x+i*y position (y being the imaginary component).

BTW you won't actually end up doing all this pointless conversions -- the math can be simplified in the end, but you won't understand anything if you look at the "optimized math" from the beginning. You need to understand the steps how it was derived, since you did ask.

(sorry though, I don't know anything about the Fast Fourier Transform other than it removes redundant calculations -- but I know how to derive the math for the DFT though, the FFT is just an algorithmic optimization then, not different math)

Also I've never used a 2D DFT (or DCT), like in this case. But hopefully, you can extend it to 2D. I've used 1D DFT extensively though (audio).
Post 16 May 2018, 20:07
View user's profile Send private message Reply with quote
vivik



Joined: 29 Oct 2016
Posts: 671
vivik 16 May 2018, 21:00
Please read https://unix4lyfe.org/dct-1d/ and please explain what he did at "This number can be brought down by manually factoring"?

What is factoring?
Post 16 May 2018, 21:00
View user's profile Send private message Reply with quote
Furs



Joined: 04 Mar 2016
Posts: 2493
Furs 16 May 2018, 22:02
Factoring is when you reduce common multiplications like this:
Code:
a*X + b*X + c*X    
to
Code:
(a + b + c)*X    
I have no idea what he did with the + and - in front (the left), since those two terms are NOT equal, but maybe he's using a weird notation.
Post 16 May 2018, 22:02
View user's profile Send private message Reply with quote
vivik



Joined: 29 Oct 2016
Posts: 671
vivik 19 May 2018, 18:30
>I have no idea what he did with the + and - in front (the left), since those two terms are NOT equal, but maybe he's using a weird notation.

I got it. "-" indicates deleted code, "+" indicates added code. Common in .patch files.

discrete cosine transform started looking much simpler after I realized that, despite its name, all cos that are there are actually constants, and the most complex operation you will ever need is multiplication. There aren't that many of those constants even in the most raw algorithm ( look at "We don't need to calculate quite so many coefficients" and dct_c.cc in https://unix4lyfe.org/dct-1d/ ), and in AAN algorithm there are only 5 constants.
Post 19 May 2018, 18:30
View user's profile Send private message Reply with quote
vivik



Joined: 29 Oct 2016
Posts: 671
vivik 19 May 2018, 18:39
ieice Arai Agui Nakajima

https://search.ieice.org/bin/summary.php?id=e71-e_11_1095

>The sequence of 8-point DCT and scalar quantization is effective in image data compression. The operation is executed very efficiently, if the DCT coefficients need not to be found explicitly. The present paper proposes a method, requiring only five times of multiplication for the transform. The 8-point DCT can be comopsed from the 16-point DFT which gives only the real parts of coefficients, and final scaling. The real part DFT can be implemented by the small FFT Winograd algorithm, which requires only five multiplications. The final scaling can be combined with the quantizing matrix without any change in arithmetic complexity of the qunatizer. Since each signal path in the proposed algorithm has one multiplication at most, the five multiplications can be executed in parallel. This will make the hardware implementation of the algorithm effective.

>FFT Winograd

https://en.wikipedia.org/wiki/Fast_Fourier_transform

>Another polynomial viewpoint is exploited by the Winograd FFT algorithm,[22][23] which factorizes zN − 1 into cyclotomic polynomials—these often have coefficients of 1, 0, or −1, and therefore require few (if any) multiplications

>so Winograd can be used to obtain minimal-multiplication FFTs and is often used to find efficient algorithms for small factors

>unfortunately, this comes at the cost of many more additions, a tradeoff no longer favorable on modern processors with hardware multipliers.

Okay, now I just need to understand what this Winograd algorithm is about... This AAN paper costs 4000 yen, maybe I should just buy it... Guys, want to buy it for me?

What I should do now, is to try to code the "braindead" version of dct, the one that just does 64 multiplications.
Post 19 May 2018, 18:39
View user's profile Send private message Reply with quote
vivik



Joined: 29 Oct 2016
Posts: 671
vivik 23 May 2018, 08:56
"Wrote" and run the naive version of dct, another one. See how numbers are slightly different, but all of them do reverse back to original numbers.

Code:
#include <stdio.h>


//#include <cmath>

#define _USE_MATH_DEFINES
#include <math.h>
//https://stackoverflow.com/questions/26065359/m-pi-flagged-as-undeclared-identifier


static const float c0 = 1. / sqrt(2.) * sqrt(2. / 8.);
static const float c1 = cos(M_PI * 1. / 16.) * sqrt(2. / 8.);
static const float c2 = cos(M_PI * 2. / 16.) * sqrt(2. / 8.);
static const float c3 = cos(M_PI * 3. / 16.) * sqrt(2. / 8.);
static const float c4 = cos(M_PI * 4. / 16.) * sqrt(2. / 8.);
static const float c5 = cos(M_PI * 5. / 16.) * sqrt(2. / 8.);
static const float c6 = cos(M_PI * 6. / 16.) * sqrt(2. / 8.);
static const float c7 = cos(M_PI * 7. / 16.) * sqrt(2. / 8.);


#define a arr[0]
#define b arr[1]
#define c arr[2]
#define d arr[3]
#define e arr[4]
#define f arr[5]
#define g arr[6]
#define h arr[7]



//from,  to
void dct_ii(int N, const float x[], float X[]) {
  for (int k = 0; k < N; ++k) {
    float sum = 0.;
    float s = (k == 0) ? sqrt(.5) : 1.;
    for (int n = 0; n < N; ++n) {
      sum += s * x[n] * cos(M_PI * (n + .5) * k / N);
    }
    X[k] = sum * sqrt(2. / N);
  }
}

//from, to
void idct(int N, const float X[], float x[]) {
  for (int n = 0; n < N; ++n) {
    float sum = 0.;
    for (int k = 0; k < N; ++k) {
      float s = (k == 0) ? sqrt(.5) : 1.;
      sum += s * X[k] * cos(M_PI * (n + .5) * k / N);
    }
    x[n] = sum * sqrt(2. / N);
  }
}




int main() {

        float arr[8] = {0, 1, 2, 3, 3, 3, 2, 0};

        printf("%f %f %f %f %f %f %f %f\n\n\n",
                arr[0], arr[1], arr[2], arr[3],
                arr[4], arr[5], arr[6], arr[7]);

        puts("AAN method\n");
        {
                //forward dct

                float t0 = arr[0] + arr[7];
                float t7 = arr[0] - arr[7];
                float t1 = arr[1] + arr[6];
                float t6 = arr[1] - arr[6];
                float t2 = arr[2] + arr[5];
                float t5 = arr[2] - arr[5];
                float t3 = arr[3] + arr[4];
                float t4 = arr[3] - arr[4];

                /* Even part */

                float t10 = t0 + t3;        /* phase 2 */
                float t13 = t0 - t3;
                float t11 = t1 + t2;
                float t12 = t1 - t2;

                arr[0] = t10 + t11; /* phase 3 */
                arr[4] = t10 - t11;

                float z1 = (t12 + t13) * 0.707106781; /* c4 */
                arr[2] = t13 + z1;    /* phase 5 */
                arr[6] = t13 - z1;

                /* Odd part */

                t10 = t4 + t5;        /* phase 2 */
                t11 = t5 + t6;
                t12 = t6 + t7;

                /* The rotator is modified from fig 4-8 to avoid extra negations. */
                float z5 = (t10 - t12) * 0.382683433; /* c6 */
                float z2 = 0.541196100 * t10 + z5; /* c2-c6 */
                float z4 = 1.306562965 * t12 + z5; /* c2+c6 */
                float z3 = t11 * 0.707106781; /* c4 */

                float z11 = t7 + z3;            /* phase 5 */
                float z13 = t7 - z3;

                arr[5] = z13 + z2;      /* phase 6 */
                arr[3] = z13 - z2;
                arr[1] = z11 + z4;
                arr[7] = z11 - z4;
        }

        printf("%f %f %f %f %f %f %f %f\n\n\n",
                arr[0], arr[1], arr[2], arr[3],
                arr[4], arr[5], arr[6], arr[7]);

        {
                //inverse dct

                /* Even part */

                float t0 = arr[0];
                float t1 = arr[2];
                float t2 = arr[4];
                float t3 = arr[6];

                float t10 = t0 + t2;        /* phase 3 */
                float t11 = t0 - t2;

                float t13 = t1 + t3;        /* phases 5-3 */
                float t12 = (t1 - t3) * 1.414213562 - t13; /* 2*c4 */

                t0 = t10 + t13;       /* phase 2 */
                t3 = t10 - t13;
                t1 = t11 + t12;
                t2 = t11 - t12;

                /* Odd part */

                float t4 = arr[1];
                float t5 = arr[3];
                float t6 = arr[5];
                float t7 = arr[7];

                float z13 = t6 + t5;          /* phase 6 */
                float z10 = t6 - t5;
                float z11 = t4 + t7;
                float z12 = t4 - t7;

                t7 = z11 + z13;           /* phase 5 */
                t11 = (z11 - z13) * 1.414213562; /* 2*c4 */

                float z5 = (z10 + z12) * 1.847759065; /* 2*c2 */
                t10 = z5 - z12 * 1.082392200; /* 2*(c2-c6) */
                t12 = z5 - z10 * 2.613125930; /* 2*(c2+c6) */

                t6 = t12 - t7;        /* phase 2 */
                t5 = t11 - t6;
                t4 = t10 - t5;

                arr[0] = t0 + t7;
                arr[7] = t0 - t7;
                arr[1] = t1 + t6;
                arr[6] = t1 - t6;
                arr[2] = t2 + t5;
                arr[5] = t2 - t5;
                arr[3] = t3 + t4;
                arr[4] = t3 - t4;
        }

        printf("%f %f %f %f %f %f %f %f\n\n\n",
                arr[0], arr[1], arr[2], arr[3],
                arr[4], arr[5], arr[6], arr[7]);


        {
                //shift_left

                arr[0] = arr[0]/8;
                arr[1] = arr[1]/8;
                arr[2] = arr[2]/8;
                arr[3] = arr[3]/8;
                arr[4] = arr[4]/8;
                arr[5] = arr[5]/8;
                arr[6] = arr[6]/8;
                arr[7] = arr[7]/8;
        }


        printf("%f %f %f %f %f %f %f %f\n\n\n",
                arr[0], arr[1], arr[2], arr[3],
                arr[4], arr[5], arr[6], arr[7]);


        

        //arr = {0, 1, 2, 3, 3, 3, 2, 0};


        puts("naive method\n");
        float arr2[8];

        dct_ii(8, arr, arr2);

        printf("%f %f %f %f %f %f %f %f\n\n\n",
                arr2[0], arr2[1], arr2[2], arr2[3],
                arr2[4], arr2[5], arr2[6], arr2[7]);

        idct(8, arr2, arr);

        printf("%f %f %f %f %f %f %f %f\n\n\n",
                arr[0], arr[1], arr[2], arr[3],
                arr[4], arr[5], arr[6], arr[7]);


        puts("matrix multiplication method\n");
        float X[8];
        {

                X[0] = a*c0 + b*c0 + c*c0 + d*c0 + e*c0 + f*c0 + g*c0 + h*c0;
                X[1] = a*c1 + b*c3 + c*c5 + d*c7 - e*c7 - f*c5 - g*c3 - h*c1;
                X[2] = a*c2 + b*c6 - c*c6 - d*c2 - e*c2 - f*c6 + g*c6 + h*c2;
                X[3] = a*c3 - b*c7 - c*c1 - d*c5 + e*c5 + f*c1 + g*c7 - h*c3;
                X[4] = a*c4 - b*c4 - c*c4 + d*c4 + e*c4 - f*c4 - g*c4 + h*c4;
                X[5] = a*c5 - b*c1 + c*c7 + d*c3 - e*c3 - f*c7 + g*c1 - h*c5;
                X[6] = a*c6 - b*c2 + c*c2 - d*c6 - e*c6 + f*c2 - g*c2 + h*c6;
                X[7] = a*c7 - b*c5 + c*c3 - d*c1 + e*c1 - f*c3 + g*c5 - h*c7;
        }

        printf("%f %f %f %f %f %f %f %f\n\n\n",
                        X[0], X[1], X[2], X[3],
                        X[4], X[5], X[6], X[7]);

        {

                a = X[0]*c0 + X[1]*c1 + X[2]*c2 + X[3]*c3 + X[4]*c4 + X[5]*c5 + X[6]*c6 + X[7]*c7;
                b = X[0]*c0 + X[1]*c3 + X[2]*c6 - X[3]*c7 - X[4]*c4 - X[5]*c1 - X[6]*c2 - X[7]*c5;
                c = X[0]*c0 + X[1]*c5 - X[2]*c6 - X[3]*c1 - X[4]*c4 + X[5]*c7 + X[6]*c2 + X[7]*c3;
                d = X[0]*c0 + X[1]*c7 - X[2]*c2 - X[3]*c5 + X[4]*c4 + X[5]*c3 - X[6]*c6 - X[7]*c1;
                e = X[0]*c0 - X[1]*c7 - X[2]*c2 + X[3]*c5 + X[4]*c4 - X[5]*c3 - X[6]*c6 + X[7]*c1;
                f = X[0]*c0 - X[1]*c5 - X[2]*c6 + X[3]*c1 - X[4]*c4 - X[5]*c7 + X[6]*c2 - X[7]*c3;
                g = X[0]*c0 - X[1]*c3 + X[2]*c6 + X[3]*c7 - X[4]*c4 + X[5]*c1 - X[6]*c2 + X[7]*c5;
                h = X[0]*c0 - X[1]*c1 + X[2]*c2 - X[3]*c3 + X[4]*c4 - X[5]*c5 + X[6]*c6 - X[7]*c7;
        }

        printf("%f %f %f %f %f %f %f %f\n\n\n",
                arr[0], arr[1], arr[2], arr[3],
                arr[4], arr[5], arr[6], arr[7]);



        getchar();
}    


Code:
0.000000 1.000000 2.000000 3.000000 3.000000 3.000000 2.000000 0.000000


AAN method

14.000000 -2.720777 -11.656855 1.955410 -2.000000 0.873017 -0.343146 -0.107651


0.000000 8.000000 16.000000 24.000000 24.000000 24.000000 16.000000 0.000000


0.000000 1.000000 2.000000 3.000000 3.000000 3.000000 2.000000 0.000000


naive method

4.949747 -0.693520 -3.154322 0.587938 -0.707107 0.392847 -0.224171 -0.137950


-0.000000 1.000000 2.000000 3.000000 3.000000 3.000000 2.000000 -0.000000


matrix multiplication method

4.949747 -0.693520 -3.154322 0.587938 -0.707107 0.392848 -0.224171 -0.137950


-0.000000 1.000000 2.000000 3.000000 3.000000 3.000000 2.000000 0.000000    
Post 23 May 2018, 08:56
View user's profile Send private message Reply with quote
vivik



Joined: 29 Oct 2016
Posts: 671
vivik 23 May 2018, 09:02
I found some info on this difference in numbers: In aan.cc from https://unix4lyfe.org/dct-1d/ , I found this:

Code:
#if 1
  const double s0 = (cos(0)*sqrt(.5)/2)/(1       );  // 0.353553
  const double s1 = (cos(1.*M_PI/16)/2)/(-a5+a4+1);  // 0.254898
  const double s2 = (cos(2.*M_PI/16)/2)/(a1+1    );  // 0.270598
  const double s3 = (cos(3.*M_PI/16)/2)/(a5+1    );  // 0.300672
  const double s4 = s0;  // (cos(4.*M_PI/16)/2)/(1       );
  const double s5 = (cos(5.*M_PI/16)/2)/(1-a5    );  // 0.449988
  const double s6 = (cos(6.*M_PI/16)/2)/(1-a1    );  // 0.653281
  const double s7 = (cos(7.*M_PI/16)/2)/(a5-a4+1 );  // 1.281458
#else
  const double s0 = 1.;
  const double s1 = 1.;
  const double s2 = 1.;
  const double s3 = 1.;
  const double s4 = 1.;
  const double s5 = 1.;
  const double s6 = 1.;
  const double s7 = 1.;
#endif    


I suppose I can use those numbers to make all 3 methods result in same numbers.

I guess, I'm supposed to multiply my quantization/dequantization table by those numbers somehow. If I were to make my custom format, I probably could avoid that extra multiplication... Not like that matters that much, it's done once for the whole image, it's like only 0.000001% of all time wasted on decoding. Still, why not, will make executable a bit smaller.
Post 23 May 2018, 09:02
View user's profile Send private message Reply with quote
vivik



Joined: 29 Oct 2016
Posts: 671
vivik 23 May 2018, 09:07
I can see that this works, but I still have no idea why this works. How people find this stuff?

I must take AAN method, and try to dumb it down to the matrix multiplication method somehow. I want to see all conversions on the way.
Post 23 May 2018, 09:07
View user's profile Send private message Reply with quote
Furs



Joined: 04 Mar 2016
Posts: 2493
Furs 23 May 2018, 14:32
vivik wrote:
>I have no idea what he did with the + and - in front (the left), since those two terms are NOT equal, but maybe he's using a weird notation.

I got it. "-" indicates deleted code, "+" indicates added code. Common in .patch files.
Hahaha wow. That's quite a syntax to mix with math... Smile

BTW, you want to understand the math? Or what is your question? I suggest you start with the DFT, it's easier to grasp. People "come up with that stuff" when the math just gets reduced to it, in the end.

But is there any particular reason for it that you want to derive it yourself? It's not that easy to derive. Understanding the "bloated" form of math (with the integral or sum) is easier as a concept IMO. But even that is difficult with 2D transforms.
Post 23 May 2018, 14:32
View user's profile Send private message Reply with quote
vivik



Joined: 29 Oct 2016
Posts: 671
vivik 26 May 2018, 07:25
I have a feeling that I don't really need complex numbers for this. I only need to split 2d signal into frequencies, to remove frequencies that almost don't matter. I can do anything I want for this, not just dct, dct just seems to be the easiest to use, with many convenient properties (like, it's easy to turn 1d dct into 2d, and it doesn't matter if you do 2d dct column first or row first, maybe something else).

About Euler's formula: how is putting any number into the power of imaginary number even legal? Putting any number into the power of any 2d number? I know what 2^2 is, what 4^2 and 2^4 is, but what is 2^i2? It looks like this formula actually defines a new operation, the one that is artificial and doesn't have any meaning outside math. Apparently it found its applications somewhere.


Last edited by vivik on 26 May 2018, 18:51; edited 1 time in total
Post 26 May 2018, 07:25
View user's profile Send private message Reply with quote
vivik



Joined: 29 Oct 2016
Posts: 671
vivik 26 May 2018, 12:40
I just noticed. I posted 2 flowgraphs/butterfly diagrams on page 1. Looks like they are slightly different, look at the left part. Looks like the bottom butterfly diagram is wrong.


Last edited by vivik on 26 May 2018, 19:08; edited 1 time in total
Post 26 May 2018, 12:40
View user's profile Send private message Reply with quote
donn



Joined: 05 Mar 2010
Posts: 321
donn 26 May 2018, 16:44
Euler's equations (which can be derived from a Taylor series expansion):

Quote:
e^it = cos(t)+isin(t)



Quote:
e^-it = cos(t)-isin(t)


can be used to convert between the compact exponential form and rectangular form of a complex number.

In the earlier Matlab link I posted, I think I remember reading DCT algorithms do not need complex numbers. Compressing the frequencies is done at the quantization step, I believe.

There's more to JPEG than just DCT, a thorough walkthrough appears to be here:
1. https://www.impulseadventure.com/photo/jpeg-huffman-coding.html
2. https://en.wikibooks.org/wiki/JPEG_-_Idea_and_Practice/The_Huffman_coding
The author of the first provided a tool which would probably aide in debugging also:
- https://www.impulseadventure.com/photo/jpeg-snoop.html

There are a lot of image formats, each with implementation variations. It's noble to try to understand the components and algorithms of each format, but I would just implement a few from scratch and use the Windows decoder/encoders for the rest.
Post 26 May 2018, 16:44
View user's profile Send private message Reply with quote
vivik



Joined: 29 Oct 2016
Posts: 671
vivik 02 Jun 2018, 04:45
Good link about image compression in general, may make some things clear. https://fgiesen.wordpress.com/2013/11/04/bink-2-2-integer-dct-design-part-1/
Post 02 Jun 2018, 04:45
View user's profile Send private message Reply with quote
vivik



Joined: 29 Oct 2016
Posts: 671
vivik 02 Jun 2018, 05:03
Will have to gradually simplify AAN method down to matrix multiplication method, need to prove that it's correct somehow.

Code:
#include <stdio.h>


//#include <cmath>

#define _USE_MATH_DEFINES
#include <math.h>
//https://stackoverflow.com/questions/26065359/m-pi-flagged-as-undeclared-identifier

//just matrix coefficients... they were c0 c1 c2 ... c7
static const float z0 = 1. / sqrt(2.) * sqrt(2. / 8.);
static const float z1 = cos(M_PI * 1. / 16.) * sqrt(2. / 8.);
static const float z2 = cos(M_PI * 2. / 16.) * sqrt(2. / 8.);
static const float z3 = cos(M_PI * 3. / 16.) * sqrt(2. / 8.);
static const float z4 = cos(M_PI * 4. / 16.) * sqrt(2. / 8.);
static const float z5 = cos(M_PI * 5. / 16.) * sqrt(2. / 8.);
static const float z6 = cos(M_PI * 6. / 16.) * sqrt(2. / 8.);
static const float z7 = cos(M_PI * 7. / 16.) * sqrt(2. / 8.);

//those were a1 a2 a3 a4 a5, constants used in AAN
//apparently there already is y1 in math.h
float u1 = sqrt(.5);
//0.707
float u2 = sqrt(2.) * cos(3./16. * 2 * M_PI);
//0.541
float u3 = u1;
//0.707
float u4 = sqrt(2.) * cos(1./16. * 2 * M_PI);
//1.307
float u5 = cos(3./16. * 2 * M_PI);
//0.383


//those are used to scale AAN to the final result
//were s0 .. s7
float x0 = sqrt(.5)/2;
//0.353553;
float x1 = (cos(1.*M_PI/16)/2) / (-u5+u4+1);
//0.254898;
float x2 = (cos(2.*M_PI/16)/2) / (u1+1);
//0.270598;
float x3 = (cos(3.*M_PI/16)/2) / (u5+1);
//0.300672;
float x4 = sqrt(.5)/2;
//0.353553;
float x5 = (cos(5.*M_PI/16)/2) / (1-u5);
//0.449988;
float x6 = (cos(6.*M_PI/16)/2) / (1-u1);
//0.653281;
float x7 = (cos(7.*M_PI/16)/2) / (u5-u4+1);
//1.281458;



#define a i[0]
#define b i[1]
#define c i[2]
#define d i[3]
#define e i[4]
#define f i[5]
#define g i[6]
#define h i[7]

#define i0 i[0]
#define i1 i[1]
#define i2 i[2]
#define i3 i[3]
#define i4 i[4]
#define i5 i[5]
#define i6 i[6]
#define i7 i[7]

#define o0 o[0]
#define o1 o[1]
#define o2 o[2]
#define o3 o[3]
#define o4 o[4]
#define o5 o[5]
#define o6 o[6]
#define o7 o[7]



void reset(float x[]) {
        x[0] = 0;
        x[1] = 1;
        x[2] = 2;
        x[3] = 3;
        x[4] = 3;
        x[5] = 3;
        x[6] = 2;
        x[7] = 0;
}

void print(float x[]) {
        printf("%f %f %f %f %f %f %f %f\n\n\n",
                x[0], x[1], x[2], x[3],
                x[4], x[5], x[6], x[7]);
}


int main() {

        //float arr[8] = {0, 1, 2, 3, 3, 3, 2, 0};
        float i[8];
        float o[8];
        reset(i);

        puts("AAN method\n");
        {
                //forward dct

                float t0 = i0 + i7;
                float t7 = i0 - i7;
                float t1 = i1 + i6;
                float t6 = i1 - i6;
                float t2 = i2 + i5;
                float t5 = i2 - i5;
                float t3 = i3 + i4;
                float t4 = i3 - i4;

                /* Even part */

                float t10 = t0 + t3;        /* phase 2 */
                float t13 = t0 - t3;
                float t11 = t1 + t2;
                float t12 = t1 - t2;

                o[0] = t10 + t11; /* phase 3 */
                o[4] = t10 - t11;

                float z1 = (t12 + t13) * 0.707106781; /* c4 */
                o[2] = t13 + z1;    /* phase 5 */
                o[6] = t13 - z1;

                /* Odd part */

                t10 = t4 + t5;        /* phase 2 */
                t11 = t5 + t6;
                t12 = t6 + t7;

                /* The rotator is modified from fig 4-8 to avoid extra negations. */
                float z5 = (t10 - t12) * 0.382683433; /* c6 */
                float z2 = 0.541196100 * t10 + z5; /* c2-c6 */
                float z4 = 1.306562965 * t12 + z5; /* c2+c6 */
                float z3 = t11 * 0.707106781; /* c4 */

                float z11 = t7 + z3;            /* phase 5 */
                float z13 = t7 - z3;

                o[5] = z13 + z2;      /* phase 6 */
                o[3] = z13 - z2;
                o[1] = z11 + z4;
                o[7] = z11 - z4;

                o0 *= x0;
                o1 *= x1;
                o2 *= x2;
                o3 *= x3;
                o4 *= x4;
                o5 *= x5;
                o6 *= x6;
                o7 *= x7;
        }

        print(o);


        puts("matrix multiplication method\n");
        reset(i);

        {

                o0 = a*z0 + b*z0 + c*z0 + d*z0 + e*z0 + f*z0 + g*z0 + h*z0;
                o1 = a*z1 + b*z3 + c*z5 + d*z7 - e*z7 - f*z5 - g*z3 - h*z1;
                o2 = a*z2 + b*z6 - c*z6 - d*z2 - e*z2 - f*z6 + g*z6 + h*z2;
                o3 = a*z3 - b*z7 - c*z1 - d*z5 + e*z5 + f*z1 + g*z7 - h*z3;
                o4 = a*z4 - b*z4 - c*z4 + d*z4 + e*z4 - f*z4 - g*z4 + h*z4;
                o5 = a*z5 - b*z1 + c*z7 + d*z3 - e*z3 - f*z7 + g*z1 - h*z5;
                o6 = a*z6 - b*z2 + c*z2 - d*z6 - e*z6 + f*z2 - g*z2 + h*z6;
                o7 = a*z7 - b*z5 + c*z3 - d*z1 + e*z1 - f*z3 + g*z5 - h*z7;
        }

        print(o);

        getchar();
}    
Post 02 Jun 2018, 05:03
View user's profile Send private message Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page Previous  1, 2

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.