flat assembler
Message board for the users of flat assembler.

Index > Projects and Ideas > re4asm - regular expression engine

Goto page 1, 2, 3  Next
Author
Thread Post new topic Reply to topic
mrpink



Joined: 03 Jun 2005
Posts: 27
Location: Germany
mrpink 31 Oct 2006, 16:11
re4asm is a small and reasonably powerful regular expression engine written completely
in assembly language.
The regular expression syntax is a proper subset of POSIX ERE with a few minor constraints.
Whole-match addressing is supported but submatch addressing is not.


Description:
Download
Filename: re4asm.tar.gz
Filesize: 96.14 KB
Downloaded: 1701 Time(s)



Last edited by mrpink on 12 Dec 2012, 17:53; edited 4 times in total
Post 31 Oct 2006, 16:11
View user's profile Send private message Reply with quote
vid
Verbosity in development


Joined: 05 Sep 2003
Posts: 7105
Location: Slovakia
vid 31 Oct 2006, 18:14
later, when FASMLIB core is built, this could be optional module
Post 31 Oct 2006, 18:14
View user's profile Send private message Visit poster's website AIM Address MSN Messenger ICQ Number Reply with quote
rugxulo



Joined: 09 Aug 2005
Posts: 2341
Location: Usono (aka, USA)
rugxulo 01 Nov 2006, 05:51
I haven't tested it too much, but congrats on everything so far, very cool! Very Happy
Post 01 Nov 2006, 05:51
View user's profile Send private message Visit poster's website Reply with quote
mrpink



Joined: 03 Jun 2005
Posts: 27
Location: Germany
mrpink 13 Feb 2007, 12:06
Hello all, I'm back for a while.

Thanks to vid and rugxulo for their encouraging comments.

I've solved some of the issues listed in the TODO file. By the way, there is an error
in the README file: regexec is not thread save in general. This statement only applies
to regexes that have a character width of less than 9.

Is there still interest in a regular expression engine?

I want to have some more feedback from you. Critism, suggestions, comments on
everything.
Post 13 Feb 2007, 12:06
View user's profile Send private message Reply with quote
TmX



Joined: 02 Mar 2006
Posts: 841
Location: Jakarta, Indonesia
TmX 13 Feb 2007, 12:10
How does this engine regex does, compared to PCRE ?
Post 13 Feb 2007, 12:10
View user's profile Send private message Reply with quote
Crukko



Joined: 26 Nov 2005
Posts: 118
Crukko 13 Feb 2007, 15:13
Sure it's a big work.
I'm reading about regex and I think your contribute will be great for Fasm User.
Only one thing: can you put more examples?
By these, people who doesn't know how it works and how to use has got a quick possibility to start understand Wink
Post 13 Feb 2007, 15:13
View user's profile Send private message Reply with quote
f0dder



Joined: 19 Feb 2004
Posts: 3175
Location: Denmark
f0dder 13 Feb 2007, 15:20
Post 13 Feb 2007, 15:20
View user's profile Send private message Visit poster's website Reply with quote
rugxulo



Joined: 09 Aug 2005
Posts: 2341
Location: Usono (aka, USA)
rugxulo 13 Feb 2007, 19:03
http://www.regular-expression.info

EDIT: Yes, there is still interest in this (or else who's been downloading it?? 174 people, at least!)


Last edited by rugxulo on 14 Feb 2007, 02:42; edited 1 time in total
Post 13 Feb 2007, 19:03
View user's profile Send private message Visit poster's website Reply with quote
MichaelH



Joined: 03 May 2005
Posts: 402
MichaelH 13 Feb 2007, 21:10
Quote:

Is there still interest in a regular expression engine?


Absolutely! Thankyou for your current and any future work you do.
Post 13 Feb 2007, 21:10
View user's profile Send private message Reply with quote
mrpink



Joined: 03 Jun 2005
Posts: 27
Location: Germany
mrpink 14 Feb 2007, 08:15
Thank you all.

Quote:
How does this engine regex does, compared to PCRE ?


Well, PCRE is (in general) not POSIX compatible. The engine they use is a so called Traditional NFA but not a POSIX-NFA. My implementation is a DFA.
For example, PCRE does not necessarily return/find the leftmost longest match. On the other hand they provide submatch-addressing and backreferencing
which cannot be done by a DFA.

See (copy of Mastering Regular Expressions) http://www.mamiyami.com/document/regex/0596002890_mastregex2-chp-4-sect-1.html for a more detailed
description.

I'm not sure which features I should implement because if I implement lots of them, people will say that it is too bloated. But on the other hand most of the
features are very handy.
Take for example character classes, e.g., [:alpha:]. On the one hand this is just syntactic sugar since it is equivalent to the already implemented [A-Za-z].
But on the other hand [:punct:] is far more convenient than its equivalent.
The more features the more code and thus increased binary size.
Since AsmRegEx is designed to be embedded into existing applications (it was never meant to become a standalone tool since far more powerful tools are
readily available) it should be small and easy to integrate. More features also increase the complexity of the interface.

The following features have been implemented(total size: code+data = 3KB):
    - leftmost longest match (returns start and end position)
    - character classes

The following features will be implemented soon:
    - case insensitivity
    - backward searching

Please tell me your opinion on these.
Maybe it becomes an optional part of FASMLIB if vid agrees. (When it is done, I will port it to all supported assemblers.)
By the way what about the license? I thought about changing to LGPL. Should I?
I don't have too much time and want to finish this soon.
Post 14 Feb 2007, 08:15
View user's profile Send private message Reply with quote
vid
Verbosity in development


Joined: 05 Sep 2003
Posts: 7105
Location: Slovakia
vid 14 Feb 2007, 08:34
i think i will have to develop some engine for optional "modules" to FASMLIB. If you code is enough errorproof i will for sure consider it.

About interface, my opinion is that you should not be afraid to implement all you can, ideally implement entire standard. It would be great to have complete implementation maybe 5 times smaller than one written in C. But I think that would be MUCH MUCH more work.

About license? What do you want people to prohibit with your library? Why do you find possibility of relinking important?
Post 14 Feb 2007, 08:34
View user's profile Send private message Visit poster's website AIM Address MSN Messenger ICQ Number Reply with quote
OzzY



Joined: 19 Sep 2003
Posts: 1029
Location: Everywhere
OzzY 14 Feb 2007, 13:53
RegEx is very usefull feature to have. I use scripting languages because most of them provide RegEx which makes it easy to parse large amounts of text.
An optimized fast and easy to use implementation for FASM is very great idea.

Maybe you could take a look at Pelles C standard library that comes with PCRE and also another easier to use implementation. (but looking at the source files of your implemantation, it looks very easy too. I'm going to try it.)

Also there's a more simple thing that is called "glob". It matches wildcards (*, ?, etc) like F?SM (would match FASM, F2SM, FTSM, etc..). Would be nice too.

Thank you and keep up the good work.
Post 14 Feb 2007, 13:53
View user's profile Send private message Reply with quote
rugxulo



Joined: 09 Aug 2005
Posts: 2341
Location: Usono (aka, USA)
rugxulo 14 Feb 2007, 15:01
I'm partial to sed and its regex support, personally, so anything moving closer to that would be fine with me. Wink

P.S. Unless you have a good reason not to, choose the most liberal license.
Post 14 Feb 2007, 15:01
View user's profile Send private message Visit poster's website Reply with quote
mrpink



Joined: 03 Jun 2005
Posts: 27
Location: Germany
mrpink 14 Feb 2007, 20:11
Hello vid.
What do you mean with errorproof? This is zero-defect software.
About the entire standard: I'm sure you are familiar with it. Can you show me an implementation that
supports equivalence classes and collation sequences? What do you mean by the entire standard?
Currently only a subset of ERE is implemented. I can and will not implement BRE.
What do you mean by relinking? I looked up this word in a dictionary but it does not exist.
I just wanted to state that I've no problem to change it. That's all.

Hello Ozzy.
To be honest, although PCRE is probably the most popular regex library under the sun, I'm not a fan of
it. If you need a small, yet very very powerful and almost POSIX compliant free third party regex
library, I would highly recommend TRE by Ville Laurikari.

Hello rugxulo.
Unfortunately sed uses BRE. It is (close to) impossible to implement them by means of a DFA. I would have
to rewrite all and everything. You might implement your own tool using AsmRegEx that does a sed-like job.
(And of course, only FASM rocks.)

I wish you all a nice rest of the week and an even nicer weekend. I'm off until monday.
Post 14 Feb 2007, 20:11
View user's profile Send private message Reply with quote
vid
Verbosity in development


Joined: 05 Sep 2003
Posts: 7105
Location: Slovakia
vid 14 Feb 2007, 21:35
Quote:
This is zero-defect software.

nothing is

Quote:
I'm sure you are familiar with it

No i'm not Wink

Quote:
Can you show me an implementation that
supports equivalence classes and collation sequences?

you say there os none?
Post 14 Feb 2007, 21:35
View user's profile Send private message Visit poster's website AIM Address MSN Messenger ICQ Number Reply with quote
f0dder



Joined: 19 Feb 2004
Posts: 3175
Location: Denmark
f0dder 14 Feb 2007, 23:41
mrpink: what's wrong with PCRE? I haven't yet started a project of mine where I will need RegExes, but I was intending on using PCRE; mainly because it's so wellknown. Would be nice hearing about possible defects before I get in too deep Smile
Post 14 Feb 2007, 23:41
View user's profile Send private message Visit poster's website Reply with quote
tantrikwizard



Joined: 13 Dec 2006
Posts: 142
tantrikwizard 15 Feb 2007, 14:14
Depending on your needs, GoldParserhttp://www.devincook.com/goldparser is very nice. There is even an ASM engine. I breifly looked at the ASM engine and I think it used Visual Basic somehow, will probably need to be ported for use with FASM.
Post 15 Feb 2007, 14:14
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
f0dder



Joined: 19 Feb 2004
Posts: 3175
Location: Denmark
f0dder 15 Feb 2007, 15:07
tantrikwizard wrote:
Depending on your needs, GoldParserhttp://www.devincook.com/goldparser is very nice. There is even an ASM engine. I breifly looked at the ASM engine and I think it used Visual Basic somehow, will probably need to be ported for use with FASM.


ASM engine using VB? That makes no sense O_o

_________________
Image - carpe noctem
Post 15 Feb 2007, 15:07
View user's profile Send private message Visit poster's website Reply with quote
tantrikwizard



Joined: 13 Dec 2006
Posts: 142
tantrikwizard 15 Feb 2007, 23:52
f0dder wrote:
tantrikwizard wrote:
Depending on your needs, GoldParserhttp://www.devincook.com/goldparser is very nice. There is even an ASM engine. I breifly looked at the ASM engine and I think it used Visual Basic somehow, will probably need to be ported for use with FASM.


ASM engine using VB? That makes no sense O_o


Ah, my mistake:
Quote:
GOLDx86Engine is written with x86 assembly language to ensure it has decent performance for serious compilers / interpreters.Another important aspect of it is that it is language neutral, that is, the functions contained in the DLL can be called from any programming language that is able to call windows functions such as C/C++, Visual Basic, Delphi etc. The package contains 3 forms of the software:

1. DLL version. As mentioned before, can be used by any language that can call a 32 bit Windows dll. Import library, and .def and .exp files are provided.
2. Lib version. C/C++ and Assembler programmers can embed it to their applications to aviod distributing an additional dll. An include file for assembly programmers also included. C/C++ programmers should create their own header files.
3. Assembler source code. You can modify or use it directly if you have MASM32 package to assemble it.

I havent looked at the asm engine but have used the VB, C# and C++ engines. Gold is really cool, define a BNF grammar spec and compile the grammer spec into a proprietary grammer table file (.cgt) The CGT gets loaded into the engine and parses the text that is submitted. The parser then creates an object model heirarchy of the text which makes compiler creation or interpreters easy to write. I've used it for parsing IMAP email server protocol messages as well. The most time consuming portion of using this parser is defining the BNF grammar. Here's a BNF grammar for ASM if anyone needs it in gold: http://tech.groups.yahoo.com/group/GOLDParser/message/2502
Post 15 Feb 2007, 23:52
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger Reply with quote
f0dder



Joined: 19 Feb 2004
Posts: 3175
Location: Denmark
f0dder 15 Feb 2007, 23:56
I took a brief look at Gold Parser some years ago, looked interesting, but never got around to using it. Not really a replacement for a RegEx engine either, imho.

Anyway, the assembly engine implementation looked pretty trivial, I wouldn't be surprised if it's actually beaten by a decent compiler; would be interesting to see some speed tests.
Post 15 Feb 2007, 23:56
View user's profile Send private message Visit poster's website Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  
Goto page 1, 2, 3  Next

< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.

Website powered by rwasa.