flat assembler
Message board for the users of flat assembler.

Index > Main > Variable Memory Size

Author
Thread Post new topic Reply to topic
pal



Joined: 26 Aug 2008
Posts: 227
pal 24 Aug 2009, 15:26
This is kind of a question related to big numbers, but also for other applications. I'll give an example.

Say I want to make a program which does matrix multiplication. Now I want the user to be able to input a matrix of specified size. E.g. 4x4. (Something on 2^n x 2^n so I can use Strassen's Algorithm). If I want to use SSE, or even just with GPRs, to multiply these matricies, how do I go about allocating enough memory?

With a 4 x 4 matrix I can use each byte in an XMM register to store one value, and another XMM register to store another value with a third XMM register to store the final matrix. If it is a 2x2 matrix I can do it all in one XMM register. But if the value is greater than 4x4 e.g. 5x5 (and I pad with zeros to get 8x8), that is obviously 64 bytes and so I would need 4 XMM registers just for one matrix. If you get what I am saying? So how would I know how many registers I need and then be able to do it all with my code?

Another example is allowing user input for a number * a big number. E.g. they input:

18916531566518561 * 891565

I know in this case I have to have a memory array, and the best way of finding the size would be taking logarithms. This would mean having to make the big number into smaller numbers (e.g. 88 becomes 2^3 * 11), but this becomes problematic when the big number is a prime number.

If this makes sense? If not then I'll try to explain better.
Post 24 Aug 2009, 15:26
View user's profile Send private message Reply with quote
Borsuc



Joined: 29 Dec 2005
Posts: 2465
Location: Bucharest, Romania
Borsuc 24 Aug 2009, 16:05
You can't make an "optimized" matrix multiplication, without "special cases" (for special matrix types, like the examples), for any general-purpose matrix size.

For example, if you want to make optimized SSE code for 4x4, you have to make it specifically for that, you can't use it for 8x8 or other arbitrary sizes.

Arbitrary things are always less optimized. You can, though, make a list of jumps to different parts of code depending on the size (use the matrix size as the 'index' to this array of jumps), but you will have to code for every case you want the program to accept.
Post 24 Aug 2009, 16:05
View user's profile Send private message Reply with quote
pal



Joined: 26 Aug 2008
Posts: 227
pal 24 Aug 2009, 17:23
I just meant an example by the matrix multiplication (by the way). Hmm fair enough, I'd have thought there would be a way to do it. I think there is as you can do it with a single piece of code in other languages I reckon. I dunno I'll think about it and try it later in C.

Just like if I let a user input a number e.g. 89748741884. This is bigger than 32-bits. How do I, without doing loads of comparisons, find out how many bytes of storage data I need? Aligned to 4 bytes I will need 8 bytes in this case.

Oh. P.S. The matrix multiplication will work as if the matrix is bigger than 2x2 you pad them out so they are a power of two square matrix. You then break the matrix down into submatricies and do it from there. The below link may explain it (I haven't read it properly yet):

Code:
http://www-math.mit.edu/~djk/18.310/18.310F04/Matrix_%20Multiplication.html    


But it is possible I know that.
Post 24 Aug 2009, 17:23
View user's profile Send private message Reply with quote
r22



Joined: 27 Dec 2004
Posts: 805
r22 24 Aug 2009, 17:26
M x N matrix in memory is just an M x N element array.
If you want to create a structure to hold arbitrary sizes then just save the number of columns and the number of rows first.

Even before the multiply of two matrices you know how many rows and columns the resulting matrix will have.

Code:
MATRIX_ELM_SIZE = 8; dq double fp for each data element
STRUCT Matrix
cols rd 1
rows rd 1
data rd 1
ENDS
...
AllocMatrixMem:
push esi
;;esp+8 == addr Matrix struc
xor edx,edx
mov esi,[esp+4]
mov ecx,[esi.cols]
mov eax,[esi.rows]
mul ecx
mov ecx,MATRIX_ELM_SIZE
xor edx,edx
mul ecx
push eax
call malloc ;;allocate the memory for the data
mov [esi.data],eax
pop esi
ret 4
    


Same with big numbers. You know the lengths of the two inputs so it's trivial to get an estimate of the size of the product of both of them.

99 X 999 = 98901
99 = 2 digits
999 = 3 digits
2 + 3 = 5, so the result can't be larger than 5 digits.
So you allocate 5 digits worth of space.
[Replace "DIGITS" with "BYTES" in the above and it holds true]
Post 24 Aug 2009, 17:26
View user's profile Send private message AIM Address Yahoo Messenger Reply with quote
pal



Joined: 26 Aug 2008
Posts: 227
pal 24 Aug 2009, 19:55
Hmm yeah; I guess allocating more memory will be good any how. Cheers r22.

I'll have a go at implementing Strassen's algorithm for 32-bit numbers later. I know for numbers < 32-bits you can easily take logarithms to find the number of digits (e.g. when you are working with powers 1324^3563 you just take logarithms). Thats cool though, I'll try it out in a little bit.
Post 24 Aug 2009, 19:55
View user's profile Send private message Reply with quote
r22



Joined: 27 Dec 2004
Posts: 805
r22 24 Aug 2009, 20:32
fyi
http://software.intel.com/en-us/forums/strassens-algorithm/topic/67860/
The first post has an attachment with c++ implementation of the algorithm.
Post 24 Aug 2009, 20:32
View user's profile Send private message AIM Address Yahoo Messenger Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2023, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.