flat assembler
Message board for the users of flat assembler.

Index > Macroinstructions > fasmg: output multiple files by writing uncompressed zip

Author
Thread Post new topic Reply to topic
Grom PE



Joined: 13 Mar 2008
Posts: 114
Location: i@grompe.org.ru
Grom PE 01 Nov 2016, 16:45
fasmg can have multiple input files, but only one output file. Remembering this discussion, I decided to remove this limitation... by writing an uncompressed zip file!

Code:
; writing ZIP format for fasmg by Grom PE
; allows to output multiple files in a single zip file (uncompressed)

ZIP::
; todo: calculate crc32 only once instead of on every pass (is that possible?)
; todo: sandbox the contents in add2zip..endadd2zip so it's possible to use org inside and other complex formats

virtual at 0
  crc32table::
  repeat 256
    r = %-1
    repeat 8
      r = r shr 1 xor (0xEDB88320 * r and 1)
    end repeat
    dd r
  end repeat
end virtual

namespace ZIP
  VersionToExtract := 20
  VersionMadeBy := 63
  FILE_INDEX = 0
  HAS_FILES = 0
  FILE_OFFSET = $%
end namespace

macro endadd2zip
  namespace ZIP
    FILE_SIZE = $% - FILE_OFFSET

    if HAS_FILES
      match e,entry
        c = 0xffffffff
        repeat FILE_SIZE, a:0
          ; fixme: fragile, breaks if "org" is used inside the region
          load b:byte from e.Data:a
          load t dword from crc32table:(b xor (c and 0xFF))*4
          c = c shr 8 xor t
        end repeat
        e.Crc32 = c xor 0xffffffff
        e.CompressedSize = FILE_SIZE
        e.UncompressedSize = FILE_SIZE
      end match
      FILE_INDEX = FILE_INDEX + 1
    end if
  end namespace
end macro

macro add2zip name*
  namespace ZIP
    local e
    endadd2zip
    e.GeneralPurpose = 0
    e.CompressionMethod = 0
    e.FileTime = 0
    e.FileDate = 0
    e.FileNameLength = lengthof name
    e.FileAttributes = 0
    e.FileAttributesExt = 0
    e.LocalHeaderOffset = $%%
    e.FileName = name
    entry equ e

    db 'PK',3,4
    dw VersionToExtract
    dw e.GeneralPurpose
    dw e.CompressionMethod
    dw e.FileTime
    dw e.FileDate
    dd e.Crc32
    dd e.CompressedSize
    dd e.UncompressedSize
    dw e.FileNameLength
    dw 0 ; ExtraFieldLength
    db e.FileName
    ;db '' ; ExtraField
    org 0
    e.Data::

    FILE_OFFSET = $%
    HAS_FILES = 1
  end namespace
end macro

postpone
  purge add2zip
  endadd2zip
  namespace ZIP
  org $%%
central:
  irpv e,entry
    db 'PK',1,2
    dw VersionMadeBy
    dw VersionToExtract
    dw e.GeneralPurpose
    dw e.CompressionMethod
    dw e.FileTime
    dw e.FileDate
    dd e.Crc32
    dd e.CompressedSize
    dd e.UncompressedSize
    dw e.FileNameLength
    dw 0 ; ExtraFieldLength
    dw 0 ; CommentLength
    dw 0 ; DiskNumber
    dw e.FileAttributes
    dd e.FileAttributesExt
    dd e.LocalHeaderOffset

    db e.FileName
    ;db '' ; ExtraField
    ;db '' ; Comment
  end irpv
tail:
  db 'PK',5,6
  dw 0 ; Number of this disk
  dw 0 ; Number of the disk with the start of the central repository
  dw NUMBER_OF_FILES
  dw NUMBER_OF_FILES
  dd tail - central
  dd central
  dw 0 ; Comment length
  ;db '' ; Comment

  NUMBER_OF_FILES := FILE_INDEX
  end namespace
end postpone
    


this could be used like so:
Code:
include 'zipwrite.inc'

db 'Some non-zip data that goes in the beginning',10

add2zip 'hello.txt'
db 'Hello world!',10

add2zip 'greetings.txt'
db 'Greetings to all the flat assembler fans!',10
    


In the future, this could also prove useful for making .jar files for JVM assembly.


Description: zip write macros for fasmg
Download
Filename: fasmg_zipwrite.zip
Filesize: 1.41 KB
Downloaded: 776 Time(s)

Post 01 Nov 2016, 16:45
View user's profile Send private message Visit poster's website Reply with quote
Trinitek



Joined: 06 Nov 2011
Posts: 257
Trinitek 01 Nov 2016, 21:17
Very clever.
Post 01 Nov 2016, 21:17
View user's profile Send private message Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8351
Location: Kraków, Poland
Tomasz Grysztar 02 Nov 2016, 11:11
Grom PE wrote:
todo: calculate crc32 only once instead of on every pass (is that possible?)
While I tried to avoid exposing details of the resolving process to the constructions of language, the calculation of checksums is an example of place where it is tempting to introduce a way to compute them only in the final pass. Or, to be more precise: to make it possible to mark some code in such a way that it would get assembled only when everything that came before is already correctly resolved. It is easy to add something like that to fasmg, in form of a built-in variable that could be checked with "if", but I'm still considering the consequences of such move on the overall design of the language.
Post 02 Nov 2016, 11:11
View user's profile Send private message Visit poster's website Reply with quote
revolution
When all else fails, read the source


Joined: 24 Aug 2004
Posts: 20309
Location: In your JS exploiting you and your system
revolution 02 Nov 2016, 11:18
I added CRC natively into fasmarm for this reason. Doing a native CRC is much faster than with a macro. And makes doing it on every pass is almost inconsequential in terms of the extra time needed. If the defining parameters are made flexible enough then it can accommodate all the common bit lengths and polynomials.
Post 02 Nov 2016, 11:18
View user's profile Send private message Visit poster's website Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8351
Location: Kraków, Poland
Tomasz Grysztar 02 Nov 2016, 13:22
revolution wrote:
I added CRC natively into fasmarm for this reason. Doing a native CRC is much faster than with a macro. And makes doing it on every pass is almost inconsequential in terms of the extra time needed. If the defining parameters are made flexible enough then it can accommodate all the common bit lengths and polynomials.
This is an obvious thing to do when constructing some actual targeted assembler, like fasm for x86 or fasmarm (after all, fasm computes PE checksums natively), or any potential assembler based on fasmg engine. But when everything including the entire output format generation is processed by macros, the engine cannot help much with specific implementations, at least not in general. Even if it had some helper functions to calculate CRC of specified data blocks, in other places we would need the PE checksum algorithm, or SHA, or something different altogether. For the output formatters I see it as "all macro or no macro" choice.

On the other hand, if in the future there are some specialized assemblers based on fasmg engine, they will most probably be able to assemble all these "all macro" solutions created currently for fasmg, while also having some slick native implementations of some instruction sets and output formats.
Post 02 Nov 2016, 13:22
View user's profile Send private message Visit poster's website Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8351
Location: Kraków, Poland
Tomasz Grysztar 22 Nov 2016, 14:22
I though we could try the same thing with TAR format, since this should be even simpler. But even though this format is simple, it manages to be a kind of mess at the same time.
Here come my macros:
Code:
macro tar_number? value*,length:8
        local d
        repeat length-1
                d = value shr ((length-1-%)*3)
                if d > 0
                        db '0' + d and 111b
                else
                        db 20h
                end if
        end repeat
        db 20h
end macro

macro tar_record? name

        local data,size,checksum_field,checksum,byte

        org 0

        db string name,(100 - lengthof string name) dup 0
        tar_number 10077o       ; file mode
        tar_number 0            ; owner id
        tar_number 0            ; group id
        tar_number size,12      ; file size
        tar_number %t,12        ; last modification time
        checksum_field db 8 dup 20h
        db '0',100 dup 0        ; normal file

        checksum = 0
        repeat $
                load byte : 1 from $-%
                checksum = checksum + byte
        end repeat
        repeat 6
                byte = checksum shr ((6-%)*3)
                store '0' + byte and 111b : 1 at checksum_field + % - 1
        end repeat
        store 0:1 at checksum_field + 7

        db 512-$ dup 0

        org 0
        data = $%
        define $% ($%-data)
        define $%% ($%%-data)

        macro end?.tar_record?
                size = $%
                db (512 - size and 511) dup 0
                restore $%,$%%
                purge end?.tar_record?
        end macro

end macro    
And use them like this:
Code:
tar_record 'hello.txt'

        db 'Hello world!',10

end tar_record

tar_record 'greetings.txt'

        db 'Greetings to all the flat assembler fans!',10

end tar_record

db 2*512 dup 0  ; two null records to mark the end of tarball    
The macros emulate the "$%" and "$%%" values inside the contained files.

They also be combined with POSTPONE emulation:
Code:
macro tar_number? value*,length:8
        local d
        repeat length-1
                d = value shr ((length-1-%)*3)
                if d > 0
                        db '0' + d and 111b
                else
                        db 20h
                end if
        end repeat
        db 20h
end macro

macro tar_record? name

        local postponed,data,size,checksum_field,checksum,byte

        org 0

        db string name,(100 - lengthof string name) dup 0
        tar_number 10077o       ; file mode
        tar_number 0            ; owner id
        tar_number 0            ; group id
        tar_number size,12      ; file size
        tar_number %t,12        ; last modification time
        checksum_field db 8 dup 20h
        db '0',100 dup 0        ; normal file

        checksum = 0
        repeat $
                load byte : 1 from $-%
                checksum = checksum + byte
        end repeat
        repeat 6
                byte = checksum shr ((6-%)*3)
                store '0' + byte and 111b : 1 at checksum_field + % - 1
        end repeat
        store 0:1 at checksum_field + 7

        db 512-$ dup 0

        org 0
        data = $%
        define $% ($%-data)
        define $%% ($%%-data)

        macro postponed
        end macro

        macro postpone?!
            esc macro postponed 
        end macro  

        macro end?.postpone?!  
                postponed 
            esc end macro  
        end macro

        macro end?.tar_record?
                postponed
                size = $%
                db (512 - size and 511) dup 0
                restore $%,$%%
                purge postpone?,end?.postpone?,end?.tar_record?
        end macro

end macro    
and then it becomes possible to assemble an entire example program as one of the files:
Code:
tar_record 'hello.txt'

        db 'Hello world!',10

end tar_record

tar_record 'win32.exe'

        include 'win32.asm'     ; example from fasmg package

end tar_record

db 2*512 dup 0  ; two null records to mark the end of tarball    
Post 22 Nov 2016, 14:22
View user's profile Send private message Visit poster's website Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8351
Location: Kraków, Poland
Tomasz Grysztar 06 Mar 2017, 14:21
Tomasz Grysztar wrote:
Grom PE wrote:
todo: calculate crc32 only once instead of on every pass (is that possible?)
While I tried to avoid exposing details of the resolving process to the constructions of language, the calculation of checksums is an example of place where it is tempting to introduce a way to compute them only in the final pass. Or, to be more precise: to make it possible to mark some code in such a way that it would get assembled only when everything that came before is already correctly resolved. It is easy to add something like that to fasmg, in form of a built-in variable that could be checked with "if", but I'm still considering the consequences of such move on the overall design of the language.
I have finally decided on a way to implement this. There is a new variant of POSTPONE directive, that looks like this:
Code:
postpone ?
    ; ...
end postpone    
and it postpones the block until the rest of the source has been resolved. If you put the entire computation of a checksum in such block, it should generally get assembled just once, at the end of assembly.
Post 06 Mar 2017, 14:21
View user's profile Send private message Visit poster's website Reply with quote
l4m2



Joined: 15 Jan 2015
Posts: 674
l4m2 09 Feb 2019, 03:48
Can I create folder?
Post 09 Feb 2019, 03:48
View user's profile Send private message Reply with quote
ProMiNick



Joined: 24 Mar 2012
Posts: 799
Location: Russian Federation, Sochi
ProMiNick 09 Feb 2019, 08:05
Idealy yes: thou could create binary representation of folder (it is just sequence of bytes like file but with other role in file system), problem will be to mount such ubnormal folder to existing file system, or not to crash file system if such mounting automaticaly happens. in windows thou could try to create file with extension .folder but if something crashed - I warned thou.
Post 09 Feb 2019, 08:05
View user's profile Send private message Send e-mail Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8351
Location: Kraków, Poland
Tomasz Grysztar 09 Feb 2019, 08:35
l4m2 wrote:
Can I create folder?
In my TAR example you simply put relative paths to files where you declare their names, and folders are going to be implicitly created:
Code:
format binary as 'tar'

include 'tar.inc'


tar_record 'texts/hello.txt'

        db 'Hello world!',10

end tar_record

tar_record 'bytes.bin'

        repeat 256, c:0
                db c
        end repeat

end tar_record

db 2*512 dup 0  ; two null records to mark the end of tarball    
I'm pretty sure it should work the same with ZIP.
Post 09 Feb 2019, 08:35
View user's profile Send private message Visit poster's website Reply with quote
Tomasz Grysztar



Joined: 16 Jun 2003
Posts: 8351
Location: Kraków, Poland
Tomasz Grysztar 09 Feb 2019, 18:00
I have rearranged and improved the ZIP example from first post, putting it into official repository: https://github.com/tgrysztar/fasmg/blob/master/packages/tar/zip.inc

It now uses "postpone ?" and a recently introduced syntax for loading/storing using output offsets to make checksum calculation reliable and more efficient.
Post 09 Feb 2019, 18:00
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.