Buildargv function using ragel

Fun use of the Ragel State Machine Compiler to create a line parsing function on int argc, char * argv [].







It all started with the fact that the buildargv function was needed to parse the string for subsequent transfer to







int main (int argc, char *argv[]) { body }
      
      





Well, I thought, it cannot be that it was impossible to borrow anywhere, now we find ... And I did not find ...













Well, not that I would not have found it at all, for example, https://github.com/gcc-mirror/gcc/blob/master/libiberty/argv.c (GPLv2 is always good), I immediately take on such obligations was not ready. There is definitely such a function in bash (GPLv3 is even better). zsh? - go find (I found ... - I do not want).







In general, I didn’t find what I wanted, but I didn’t like what I found. Well, in the end I have the right to do it, all the same I make for myself a thirst for entertainment in the process.







I did not want to write this case in a conventional way from the word at all, I was even upset on this ground.







In general, we meet the Ragel State Machine Compiler.







Tools





The project can be found here: JOYFUL CMDLINE PARSER WRITTEN IN RAGEL







Formulation of the problem



At the input we have a string of any kind, the task is to get from the string an array of arguments separated by a space or tab, with:









In general, there are not many conditions. And Ragel is quite suitable for this task.







Explained Implementation



Declare a machine with the name "buildargv" and ask Ragel to place its data at the beginning of the file (5.8.1 Write Data).







 %%{ machine buildargv; write data; }%%
      
      





Next, we declare a lineElement



machine, which in turn consists of a union (2.5.1 Union) of two machines: arg



and whitespace



.







 lineElement = arg >start_arg %end_arg | whitespace; main := blineElements**;
      
      





At the input and output of the arg



machine, the actions start_arg



and end_arg



respectively.







 action start_arg { argv_s = p; } action end_arg { nargv = (char**)realloc((*argv), (argc_ + 1)*sizeof(char*)); (*argv) = nargv; (*argv)[argc_] = strndup(argv_s, p - argv_s); argc_++; }
      
      





Moreover, the start_arg



task start_arg



save the position of the character at the input, and the end_arg



task end_arg



add a new element to the argv



array, in case of successful exit from the arg



machine.







Now let's take a closer look at arg



.







 arg = '\''> { fcall squote; } | '"'>{ fcall dquote; } | ( '\\'>{fcall skip;} | ^[ \t"'\\] )+;
      
      





It consists of a union of three machines '



, "



and (\ | ^[ \t"'\])



, the latter in turn is a union of \



and ^[ \t"'\]



respectively.







When we find the character '



we call squote



, '



we call squote



, or if the current character is \



call skip



, which skips any character following it, and any character is not 0x20



(space), 0x09



(tab), '



, "



or \



is considered correct .







It remains to consider a very small part:







 skip := any @{ fret; }; dquote := ( '\\'>{ fcall skip; } | ^[\\] )+ :> ["] @{ fret; } @err(dquote_err); squote := ( '\\'>{ fcall skip; } | ^[\\] )+ :> ['] @{ fret; } @err(squote_err);
      
      





With skip



we have already figured out what does ^['\\]



also should not cause questions. And here :>



this is the Entry-Guarded Concatenation



(4.2 Guarded Operators that Encapsulate Priorities) its meaning is that the machine ( '\\'>{ fcall skip; } | ^['\\] )+



completes execution when ["]



changes to the initial state.







And finally, in the case of an end-of-line error with open quotes, dquote_err



and squote_err



to indicate and set the corresponding error code.







 action dquote_err { ret = -1; errsv = BUILDARGV_EDQUOTE; } action squote_err { ret = -1; errsv = BUILDARGV_ESQUOTE; }
      
      





Code generation is carried out by the command:







 ragel -e -L -F0 -o buildargv.c buildargv.rl
      
      





A list of test lines can be found in test_cmdline.c



.







Conclusion



The problem is solved.







Was it faster? I doubt it. More clear? If only you are an expert on Ragel.







I do not pretend to absolutism, I will be grateful for constructive comments on the Ragel code.







Material List:







[^ 1]: Adrian Thurston. Ragel State Machine Compiler .








All Articles