flat assembler
Message board for the users of flat assembler.

Index > Main > Usefulness of assembly in modern projects

Goto page Previous  1, 2
Author
Thread Post new topic Reply to topic
st



Joined: 12 Jul 2019
Posts: 49
Location: Russia
st 31 Aug 2019, 05:02
DimonSoft wrote:
st wrote:
Moreover, why would we assume any non-trivial hand-written code is correct? It is not proven on all inputs.

Isn’t proving that is what any good programmer does as part of writing code?
I expect we would distinguish proving correctness from testing. The former means a mathematical proof and is hard (NP-Hard?) for imperative programming languages like assembly (could be solved with a brute-force algorithm sometimes). That is why functional languages exist.
Post 31 Aug 2019, 05:02
View user's profile Send private message Visit poster's website Reply with quote
DimonSoft



Joined: 03 Mar 2010
Posts: 1228
Location: Belarus
DimonSoft 31 Aug 2019, 06:45
st wrote:
I expect we would distinguish proving correctness from testing. The former means a mathematical proof and is hard (NP-Hard?) for imperative programming languages like assembly (could be solved with a brute-force algorithm sometimes). That is why functional languages exist.

I did mean exactly proving the correctness, not testing. Structured programming has been invented quite long ago and if you feel its three basic structures (sequence, binary decision and generic loop), you can easily evaluate possible value sets for all the variables used in this particular piece of code and make sure that for these values the documented behaviour of language constructs or processor instructions leads to the desired result. Something like abstract interpretation that happens in one’s mind.

And, of course, if such evaluation becomes hard it absolutely means the code should be refactored to decrease its complexity.

P.S. You definitely can’t use such proving for arbitrary assembly code. But spaghetti machine code is quite rare these days. Modern processors tend to execute structured code faster, so outside of obfuscation (performed over a provable piece of code with methods that are proved to produce correct results) and extreme size optimizations (the same note here), non-structured machine code doesn’t seem to be widely used. Well, definitely, you should be able to recognize the basic structures to prove correctness but for a particular compiler it’s quite easy.
Post 31 Aug 2019, 06:45
View user's profile Send private message Visit poster's website Reply with quote
st



Joined: 12 Jul 2019
Posts: 49
Location: Russia
st 01 Sep 2019, 06:29
DimonSoft wrote:
you can easily evaluate possible value sets for all the variables used in this particular piece of code
I see no proof I can do that. Exclamation
Post 01 Sep 2019, 06:29
View user's profile Send private message Visit poster's website Reply with quote
DimonSoft



Joined: 03 Mar 2010
Posts: 1228
Location: Belarus
DimonSoft 01 Sep 2019, 09:32
st wrote:
DimonSoft wrote:
you can easily evaluate possible value sets for all the variables used in this particular piece of code
I see no proof I can do that. Exclamation

“Abstract interpretation” is your search request. Modern compilers have been doing it for ages to ensure as many bugs as possible are found at compile time. A good programmer does the same but is not limited to a particular data structure to represent the information in their mind.

Are you trying to tell me that people write code and then just hope it will work?
Post 01 Sep 2019, 09:32
View user's profile Send private message Visit poster's website Reply with quote
st



Joined: 12 Jul 2019
Posts: 49
Location: Russia
st 02 Sep 2019, 06:25
DimonSoft wrote:
*** is your search request.
I knew no mathematical proofs would be provided.
DimonSoft wrote:
people write code and then just hope it will work

This is true, except of rare (particular and elementary) cases[*].
Quote:
In computability theory, Rice's theorem states that all non-trivial, semantic properties of programs are undecidable. […] A property is non-trivial if it is neither true for every computable function, nor false for every computable function.
Rice's theorem

[*] Just an example related to assembly:
Quote:
EverCrypt is a formally verified modern cryptographic provider that provides […]
EverCrypt is written and verified using the F* programming language, then compiled to a mixture of C (using a dedicated compiler, KreMLin) and assembly.

Functional correctness: EverCrypt's input/output behavior is fully characterized by simple mathematical functions derived directly from the official cryptographic standards (e.g., from NIST or the IETF).
Post 02 Sep 2019, 06:25
View user's profile Send private message Visit poster's website Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8351
Location: Kraków, Poland
Tomasz Grysztar 02 Sep 2019, 09:12
st wrote:
DimonSoft wrote:
"Abstract interpretation" is your search request.
I knew no mathematical proofs would be provided.
While it can be viewed as just a fancy name for something that [assembly] programmers are doing intuitively, it definitely can be formalized and used to obtain "mathematical proofs". I would argue that abstraction methods are in general the most basic and powerful tool that mathematics at large use to prove things.

And abstracting out what your registers and relevant memory locations contain in order to prove that at the end of a given function their states are as defined by design/specification is done pretty routinely by assembly programmers. Of course, this is often not formalized enough and mistakes happen.

st wrote:
This is true, except of rare (particular and elementary) cases[*].
I think that in fact wide majority of our code is quite elementary and provable (but I speak just from the point of view of fasm-related programming, etc.). For instance, even though halting problem is undecidable in general, most of the routines we write here are quite easily provable to come to an end or hang at some point.

Quote:
In computability theory, Rice's theorem states that all non-trivial, semantic properties of programs are undecidable. […] A property is non-trivial if it is neither true for every computable function, nor false for every computable function.
This reminds me of an anecdote I once read. A teacher announced in a class that there would be a test next week, but it would remain surprise on which day it would happen. One of the pupils then started thinking out loud: "Obviously, it cannot be on Friday, because when lessons end on Thursday and there was no test, we would know that it would have to be on Friday and it would no longer be a surprise. But then it cannot be on Thursday, because then it would also not be a surprise, after lessons on Wednesday we would already know that it has to be on Thursday (since Friday has already been ruled out). But then analogously we can rule out the other days and since it is not possible for this test to be a surprise, it is not going to happen at all!" Everyone in class was convinced by this reasoning and then they were really surprised when the test actually happened.
Post 02 Sep 2019, 09:12
View user's profile Send private message Visit poster's website Reply with quote
st



Joined: 12 Jul 2019
Posts: 49
Location: Russia
st 03 Sep 2019, 07:36
Tomasz Grysztar wrote:
it definitely can be formalized and used to obtain "mathematical proofs".
So, where it *is*? Existing static-analizers for C do partial analysis with false positives and/or require annotation, introducing new semantics (this way we can end up with rewriting everything in Rust Smile ).

Tomasz Grysztar wrote:
Of course, this is often not formalized enough and mistakes happen.
Often (as opposed to seldom) equals to "more than in 50% cases". That is the point.
Post 03 Sep 2019, 07:36
View user's profile Send private message Visit poster's website Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8351
Location: Kraków, Poland
Tomasz Grysztar 03 Sep 2019, 08:20
st wrote:
Tomasz Grysztar wrote:
it definitely can be formalized and used to obtain "mathematical proofs".
So, where it *is*? Existing static-analizers for C do partial analysis with false positives and/or require annotation, introducing new semantics (this way we can end up with rewriting everything in Rust Smile ).
As a mathematician I am quite used to the idea of "proof" being something done by a human. And I could easily formalize many of the intuitive reasoning that I (and other assembly programmers) use all the time while writing code to ensure that a sequence of instructions does what it is designed to do.

On a simplest example, an instruction like AND EAX,1 leaves only two possibilities for the value of EAX afterwards. So when analyzing the code that later depends on it, you have to trace only two cases instead of 2^32.

For a simple real-life case, one could analyze a routine like my uint128_to_float80. The only branches it has are forward ones, so there are no loops and to assess all possible outcomes we just need to determine what the values of registers might be. At every step we can tell what is the possible range of values for any of the registers used (including the bit-span), so we can abstract this out to check just the borderline cases and one generic case for the values in between (with inequalities to express ranges). An experiences assembly programmer does this while coding, skipping most of the formalism, but this is quite similar to the way mathematicians work, often hand-waving parts of the proof that are familiar and do not seem to hide any traps. Of course, both mathematicians and programmers may sometimes make mistakes. But my point is that much of what assembly programmers routinely do is to prove correctness of the code they write, in a manner quite similar to the work of a mathematician (and I say this from a perspective of someone that worked in both roles).

[edit] Oh, I actually have an example of an existing assembly routine backed with a formalized mathematical proof - you could take a look at my div10 function. BTW, fasmg uses it now. [/edit]

st wrote:
Tomasz Grysztar wrote:
Of course, this is often not formalized enough and mistakes happen.
Often (as opposed to seldom) equals to "more than in 50% cases". That is the point.
I did not want to imply that as soon as someone does a less-formalized proof the mistakes are going to happen. Many times they do not. And a fully formalized proof can also have mistakes (see above).
Post 03 Sep 2019, 08:20
View user's profile Send private message Visit poster's website Reply with quote
DimonSoft



Joined: 03 Mar 2010
Posts: 1228
Location: Belarus
DimonSoft 03 Sep 2019, 09:22
st wrote:
So, where it *is*? Existing static-analizers for C do partial analysis with false positives and/or require annotation, introducing new semantics (this way we can end up with rewriting everything in Rust Smile ).

It’s funny when people start discussing spherical things in vacuum absolutely loosing the whole point of discussion. Proofs are not valueable by themselves, I can proof an DFJDKG (some object in an imaginary parallel universe) is always twice as large as FVJFGK (another object in an imaginary parallel universe), but nobody cares until particular characteristics of the objects are used some way.

You make “false positives” sound like it a bad thing. It’s not if you’re looking for a bug and your method tells it’s there. In fact, if you’re looking for bugs and get a false positive from some method it often means you have a code smell that doesn’t cause problems right now but may come into play at some moment in the future.

Adding newly introduced annotations into the mix, you will have to do that if you don’t want false positives. Programming languages don’t even pretend to express the solution idea in its entirety. If some analyzer has to work with a program written with only programming language constructs, it has enough information to have, say, 5 false positives out of 100. You just need more information to distinguish false positive case from real bug. You might have this information in your mind but you still need to formalize this idea to make it available for the tool. That’s where annotations come into play.
Post 03 Sep 2019, 09:22
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20304
Location: In your JS exploiting you and your system
revolution 03 Sep 2019, 11:33
It is also possible to verify the the input space if it isn't too large with a brute force test function.

For example: https://board.flatassembler.net/topic.php?p=208281#208281

While the mathematics approach gives us the proof of method, the brute force force approach gives us the proof of implementation. So I think both are useful.
Post 03 Sep 2019, 11:33
View user's profile Send private message Visit poster's website Reply with quote
st



Joined: 12 Jul 2019
Posts: 49
Location: Russia
st 04 Sep 2019, 11:19
Tomasz Grysztar wrote:
I actually have an example of an existing assembly routine backed with a formalized mathematical proof - you could take a look at my div10 function.
So, we have this one plus a few routines that are proven to be correct. However we may not generalise these particular cases and induce the rule "all hypothetical routines are provable to be correct". No matter are they written in asm or in C.

From this point of view, C gives us no advantage (except it is cross-platform).

In the same time there are languages designed with Rice's theorem in mind. They prohibit hard to prove cases (non-pure functions). As a side effect they provide no easy way to write low-level code.

So, we could consider two variants:
1. assembly language which allows to code effective and correct small routines.
2. some very high level languages that allow to implement complex and correct software.
Both have strong and weak sides and complement each other.

Actually any macro-assembler is some simple combination of above. fasm g did step forward (and even potentially allows to write cross-platform code). Now we have two points and can imagine vector of Evolution.

DimonSoft wrote:
It’s funny when people start discussing spherical things in vacuum absolutely loosing the whole point of discussion.
Agree. And more impressive when they ask to remind that point with such a well-known pattern. Well, I've started to describe this:
bitRAKE wrote:
In fact, research continues on "super optimizers" which produce machine code no programmer would have thought of.
https://board.flatassembler.net/topic.php?p=210882#210882

And here is another point I've repeated a few times :
DimonSoft wrote:
compiler [...] doesn’t (and cannot) perform complete analysis

revolution wrote:
It is also possible to verify the the input space if it isn't too large with a brute force test function.
That is what I have mentioned earlier https://board.flatassembler.net/topic.php?p=210897#210897
Yes, just in simple cases.
Post 04 Sep 2019, 11:19
View user's profile Send private message Visit poster's website Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8351
Location: Kraków, Poland
Tomasz Grysztar 04 Sep 2019, 12:47
st wrote:
However we may not generalise these particular cases and induce the rule "all hypothetical routines are provable to be correct". No matter are they written in asm or in C.
We have no quarrel here. I only jumped in to this discussion because of the suggestion that [assembly] programmers do not routinely prove correctness of their code, while I believe that it something that we do all the time while coding, even if only in our heads and in a semi-formal (and fallible, as humans generally are) way.

I also think that assembly gives a unique perspective into this problem, since the concept of "non-pure functions" is not even well-applicable in case of machine code. We can have function that operates only on registers and then it has just input in registers and output in registers, with no so-called "side effects". If you take a look at my coding habits (as in fasmg source), you may notice that I define inputs and outputs (including "trashed" registers) in such manner for the functions I design and sometimes a function has additional inputs and/or outputs in memory locations, not only due to the scarcity of registers, but to provide additional options, etc. But this also demonstrates how using register in such role is really no different from using a global variable, as long as it is documented. In fact, every function could be viewed as having entire memory and CPU state (including registers) as its input and output, and then we just abstract out the parts that never get touched by it or are irrelevant. Of course, properly documenting every location that function may read or modify is sometimes a hard task by itself (especially if code also calls other functions, every one introducing its own sets of "touched" locations). But having all interfaces properly defined like that makes it possible to see a way of proving correctness of such function (even if only manually) and the distinction between "pure" and "non-pure" function does not even make much sense in this context.

Please forgive me my digression here, I just saw a good opportunity to point out some interesting aspects of assembly programming as I see it.
Post 04 Sep 2019, 12:47
View user's profile Send private message Visit poster's website Reply with quote
Mike Gonta



Joined: 26 Dec 2010
Posts: 243
Mike Gonta 04 Sep 2019, 17:21
Ideom is a concept oriented language. The basic root concepts (routines) can only be described in Ideom as axioms.
All other Ideom routines are higher level concepts based on these axioms.
For example:
Code:
To add a number to another number:
Add the number to the other number.    

These axioms are implemented in Ideom as x86 statements:
Code:
To add a number to another number:
x86 $8B7C2404.               \  mov edi, [esp+4] ; the number
x86 $8B4C2408.               \  mov ecx, [esp+8] ; the other number
x86 $8B07.                   \  mov eax, [edi]
x86 $0101.                   \  add [ecx], eax
    
which are generated by fasmg using a slightly modified listing macro.
The Ideom also assembles x86 machine code using Ideom code such as:
Code:
To assemble (mov [ebp+) an offset (], edx) to a hex string:
Append $8955 to the hex string.
If the offset is short,
  append the offset as a byte to the hex string.
Otherwise
  add 64 to the hex string's last's target;
  append the offset to the hex string.
\ x86 $895501.               \  mov [ebp+1], edx
\ x86 $8995FE000000.         \  mov [ebp+0xFE], edx
    
This is one of several simple optimizations to produce the smallest possible executable.

_________________
Mike Gonta
look and see - many look but few see

https://mikegonta.com
Post 04 Sep 2019, 17:21
View user's profile Send private message Visit poster's website Reply with quote
st



Joined: 12 Jul 2019
Posts: 49
Location: Russia
st 06 Sep 2019, 09:25
Tomasz Grysztar wrote:
We can have function that operates only on registers and then it has just input in registers and output in registers, with no so-called "side effects". [...]
using register in such role is really no different from using a global variable, as long as it is documented.
I think there is a difference: registers are safer. Register is accessible only with corresponding machine instruction. Global variable could be changed via an arbitrary pointer.

C does not provide an easy way for such approach (may be with help of some 'context' structure, that is passed as an argument to all functions).
Post 06 Sep 2019, 09:25
View user's profile Send private message Visit poster's website Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8351
Location: Kraków, Poland
Tomasz Grysztar 06 Sep 2019, 09:42
st wrote:
I think there is a difference: registers are safer. Register is accessible only with corresponding machine instruction. Global variable could be changed via an arbitrary pointer.
Only if your function actually uses some input as an arbitrary pointer. Additionally, if a function has its input/output interface defined such that it contains some pointers, but it defines strict constraints on where the pointers may actually point to, then this moves the burden of proof to the place where this function gets used. And then when you are proving the "second level" function you need to prove in specific that the first one is always called correctly according to specification (because the first function has been proven only under that assumption).
Post 06 Sep 2019, 09:42
View user's profile Send private message Visit poster's website Reply with quote
macgub



Joined: 11 Jan 2006
Posts: 346
Location: Poland
macgub 06 Sep 2019, 10:34
Tomasz Grysztar wrote:


This reminds me of an anecdote I once read. A teacher announced in a class that there would be a test next week, but it would remain surprise on which day it would happen. One of the pupils then started thinking out loud: "Obviously, it cannot be on Friday, because when lessons end on Thursday and there was no test, we would know that it would have to be on Friday and it would no longer be a surprise. But then it cannot be on Thursday, because then it would also not be a surprise, after lessons on Wednesday we would already know that it has to be on Thursday (since Friday has already been ruled out). But then analogously we can rule out the other days and since it is not possible for this test to be a surprise, it is not going to happen at all!" Everyone in class was convinced by this reasoning and then they were really surprised when the test actually happened.

Nice joke at the begginig of the new school year, Very Happy
Post 06 Sep 2019, 10:34
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page Previous  1, 2

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.