flat assembler
Message board for the users of flat assembler.
Index
> Main > Critique please |
Author |
|
LocoDelAssembly 30 Mar 2012, 21:30
Sorry, I tend to be extremely stupid on floating point stuff, but don't you get some byte values out of range when converted? (e.g. -128)
|
|||
30 Mar 2012, 21:30 |
|
tripledot 30 Mar 2012, 21:46
I'm not sure I understand you. My apologies if this is obvious, but this code converts from signed bytes to doubles, not the other way around.
"vpmovsxbd" sign-extends bytes to dwords, which are converted to doubles by the "vcvtdq2pd" instruction. We now have floats in the range -128.0 to 127.0. In an ideal world, the person responsible for the code that generates 8-bit audio knows that they should not use the entire range of a byte to represent the amplitude range. Rather, they should use the (unsigned) range 1 to 255, therefore making 128 the middle (i.e. 0.0), giving you 127 values in either direction (+ve or -ve) to represent a floating point signal in fixed-point format. I multiply my doubles by 1.0/127.0 to get a range of -1.0 to +1.0 (I hope!) In any case, I was appealing more to people with AVX/Sandy Bridge/BD experience... I've managed to get the inner loop down to 32 bytes. It's a tight loop but it's nice and small for the code cache. There are no loop-carried dependencies, so after a few iterations the latencies of the moves, conversions and multiplies should be overcome by pipelining and OoO execution. Unless somebody knows something I don't (very likely!) But sheeeit, no need for apologies! Thanks a million for taking a look! EDIT: I think I see what you mean... So if a value of -128 crops up in a sound file, my code will result in there being a floating point value < -1.0... Ack. If I divide everything by 1.0/128.0 then everything sits nicely in range (no clipping), but then I have a DC offset to worry about. I haven't been able to find much info about the standards used to store 8 bit audio data (nothing really deep, anyway). I should really fire up Audition and run some tests... |
|||
30 Mar 2012, 21:46 |
|
Madis731 01 Apr 2012, 19:56
If you've completed with theoretical optimizing then there's nothing more to do than test it on real hardware and you can optimize by reordering instructions and filling it with some strategically placed NOPs.
|
|||
01 Apr 2012, 19:56 |
|
edfed 01 Apr 2012, 21:18
working on the -128 value (the -1.0078 value) will maybe not be a problem while it is not frequent on real audio to have this kind of maximal values.
saturation can be used therefore, before to convert. take the byte flow, and do the saturation adjustment before conversion and you'll get something good i think. then, your loop will not be smaller, but will be bigger. maybe a 64 bytes long loop can be acceptable. i know many say: don't use inc and dec instructions on registers. but what is the goal of such instruction if not to be used. by replacing [rdi] by [rdi*8], and add edi,8, sub ecx,1 by inc edi, loop, you will get a smaller code, and then, maybe faster. there, i see ecx cannot be bigger than 4, then, why use the full rcx register. cl is largelly enough to do the job. all in all, the multi case loop can be converted in a single loop if you pad or ignore the extra bytes of your signal. |
|||
01 Apr 2012, 21:18 |
|
Madis731 02 Apr 2012, 11:21
@edfed - because using partial registers is sometimes dangerous. I would generally use ecx where a value is guaranteed to fit into DWORD, but I would not use parts of that register (cx, cl, ch). Even if loading a BYTE/WORD value, you are recommended to make use of the MOV*X instructions.
@tripledot - you can do two things. Either: 1) Force using only counts divisible by 4 so you don't need .process1 at all 2) or take advantage of the fact that when rsi is 16-aligned (for xmm, or 32-aligned for ymm) and you won't get any errors accessing memory at rsi+rcx+00..15 (31). Then you can just make one extra round with .process 4 and later discard 1, 2 or 3 results depending on the original size of rcx. My guess is that .process1 loop takes about the same time as .process4 therefore you can speed up the processing of the remaining 1..3 bytes by up to 3 times. You can also change the mov rbx,4 to mov ebx,4, which accomplishes the same thing but brings the total footprint down (from 94) to 78 bytes. This lucky win comes from the fact that the original 4 instructions take that extra byte and trip over the 16-byte barrier. mov ebx,4 will zero-extend the result, but take less space in binary. Can anyone explain me why 1.0/127.0 is so important that -1.0078 result is acceptable while 1.0/128.0 sounds more natural (2^7=128, its ½ of 256 space) and you don't need to do any more checks on limits. Is there something wrong with 0.9921875 on the positive maximum side? 0.0 is still 0.0, isn't it? |
|||
02 Apr 2012, 11:21 |
|
tripledot 13 Apr 2012, 20:09
Really sorry for replying so late.
Thanks to all for the input, it is hugely appreciated. @Madis: I can't believe I didn't spot "mov ebx, 4". Nice one! I haven't been writing 64-bit code for very long, and sign-extension still trips me up from time to time. Like you, I'm very happy writing to the 32-bit portion of a register, but I avoid partial register use like the plague for words or bytes. @both: You are both right; it seems like a total waste to even bother with the single-byte case. Since all my buffers are 32-byte aligned, it's not a big deal to pad their lengths to a multiple of 32 bytes. And after a bit more research, it seems nobody cares about the extra negative value in fixed-point audio files. So I'll just downscale by 128 instead of 127 and be done with it. Should lead to more accurate floating-point representation of the original audio, too. This begs an interesting question... when converting from floating-point to fixed point, which is more evil (in terms of THD): upscaling by 127 (and introducing floating-point inaccuracies along the way), or upscaling by 128 (necessitating saturation clipping of the most positive peaks)? Really a moot point when dealing with 8-bit audio, but for 16/24-bit I suppose this might concern the audiophiles. Not that I'd believe them if they claimed to be able to hear a difference, but still... |
|||
13 Apr 2012, 20:09 |
|
< Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.