flat assembler
Message board for the users of flat assembler.

Index > Main > String split iterator

Author
Thread Post new topic Reply to topic
bitRAKE



Joined: 21 Jul 2003
Posts: 4078
Location: vpcmpistri
bitRAKE 11 Sep 2021, 22:27
Code:
std::vector<std::string> StringSplit(const std::string &text, int separator) {
        std::vector<std::string> vs(text.empty() ? 0 : 1);
        for (const char ch : text) {
                if (ch == separator) {
                        vs.emplace_back();
                } else {
                        vs.back() += ch;
                }
        }
        return vs;
}    
People like to say that assembly is cryptic and hard to understand. I know what the above C++ does, but there is little in the syntax that explains the function - the range loop and high-level objects hide what is happening. How much object code do you think is generated by the compiler for that obtuse flexibility? (Plus several support functions!)

What about the common use case? I mean, we aren't doing bioinformatics with this code - there are special parallel SIMD-type algorithms for that. We have few parameters in a config file, or on the command line -- that's the work being done in 99% of cases.

The code for those use cases is very tiny. We don't want to be copying data around unless it really needs to persist. Like a file name - it's probably being converted into a full path name string. Like flags and options - they will configure internal state and the string is not needed afterward. In that regard we arrive here:
Code:
SKIP_EMPTIES := 1 ; consume multiple splits
macro ITERATOR__STRING_SPLIT char:' ',FUNCTION&
; RDI:  string to scan
; RCX:  length of string in characters, >0
; AX:   character to split on

        local scan,last,skip,done,last_clear
scan:
        mov al,char
        mov rsi,rdi
        repnz scasb
        push rdi
        jnz last
        sub edi,1               ; don't count the split character in length
last:
        push rcx
        sub edi,esi             ; length without terminator
        if definite SKIP_EMPTIES and SKIP_EMPTIES
                jz skip
        end if

        FUNCTION ; (address:RSI, length:RDI)
skip:
        pop rcx                 ; characters to go
        pop rdi                 ; start
        jrcxz done
        jmp scan
done:
        if definite SKIP_EMPTIES and SKIP_EMPTIES
        else ; check for last empty
                cmp byte [rdi-1],char
                jnz last_clear
                mov edi,ecx
                FUNCTION ; (address:RSI, length:RDI)
        last_clear:
        end if
end macro    
With a few bytes we have much greater flexibility. The test cases used to verify the iterator:
Code:
        iterate test_case,\
                "1",\           ; single value
                "|",\           ; two empty values
                "||",\          ; three empty values
                "1|2",\         ; two values
                "|1|2",\        ; empty, middle, last
                "1|2|"          ; first, middle, empty
                ; empty string not allowed

                WriteConsoleA [.hCon],ADDR .LF,1,0,0
                mov rax,test_case
                push rax
                lea rdi,[rsp]
                mov ecx,lengthof test_case
                ITERATOR__STRING_SPLIT '|',call PrintCase
                pop rax
        end iterate    
There is no guess-work. Change a couple bytes to process other data types. There is a use-case we don't want to blindly use this iterator on: the one where the separator can be from a set of values (like spaces and tabs). It isn't terribly difficult to add the feature though. Enjoy.

_________________
¯\(°_o)/¯ “languages are not safe - uses can be” Bjarne Stroustrup


Last edited by bitRAKE on 16 Sep 2021, 02:49; edited 1 time in total
Post 11 Sep 2021, 22:27
View user's profile Send private message Visit poster's website Reply with quote
Roman



Joined: 21 Apr 2012
Posts: 1850
Roman 12 Sep 2021, 10:43
This macro only for fasmg ?
Post 12 Sep 2021, 10:43
View user's profile Send private message Reply with quote
bitRAKE



Joined: 21 Jul 2003
Posts: 4078
Location: vpcmpistri
bitRAKE 12 Sep 2021, 20:23
What are you having trouble understanding?
Code:
SKIP_EMPTIES = 1 ; consume (multiple) splits
macro ITERATOR__STRING_SPLIT char:' ',FUNCTION& { common
; RDI:  string to scan
; RCX:  length of string in characters, >0
; AX:   character to split on

        local scan,last,skip,done,last_clear
scan:
        mov al,char
        mov rsi,rdi
        repnz scasb
        push rdi
        jnz last
        sub edi,1               ; don't count the split character in length
last:
        push rcx
        sub edi,esi             ; length without terminator
        if definite SKIP_EMPTIES and SKIP_EMPTIES
                jz skip
        end if

        FUNCTION ; (address:RSI, length:RDI)
skip:
        pop rcx                 ; characters to go
        pop rdi                 ; start
        jrcxz done
        jmp scan
done:
        if definite SKIP_EMPTIES and SKIP_EMPTIES
        else ; check for last empty
                cmp byte [rdi-1],char
                jnz last_clear
                mov edi,ecx
                FUNCTION ; (address:RSI, length:RDI)
        last_clear:
        end if
}    
... I think just changing to curly bracket is sufficient.

_________________
¯\(°_o)/¯ “languages are not safe - uses can be” Bjarne Stroustrup
Post 12 Sep 2021, 20:23
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2025, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.