Falling into a Rabbit Hole: A Story about One Varnish Reboot Error - Part 1

ghostinushanka , having thrashed the buttons for the previous 20 minutes, as if his life depended on it, turns to me with a half-wild expression in his eyes and a sly grin - "Dude, I seem to understand."







“Look over here,” - he says, pointing to one of the symbols on the screen - “I bet on my red hat that if we add here what I just sent you” - pointing to another piece of code - “the error is no longer will be displayed. "







A little puzzled and tired, I change the sed expression that we have been working on for some time, save the file and run systemctl varnish reload



. The error message has disappeared ...







“The mails I exchanged with the candidate,” continued my colleague, while his smirk turns into a genuine smile full of joy, “It suddenly dawned on me that this is exactly the same problem!”







How did it all begin



This article assumes an understanding of how bash, awk, sed, and systemd work. Knowledge of varnish is welcome but not required.

Snippet timestamps changed.

Written with ghostinushanka .

This text is a translation of the original published in English two weeks ago; boikoden translation.







The sun shines through the panoramic windows on another warm autumn morning, a cup of freshly prepared caffeinated drink rests away from the keyboard, a favorite symphony of sounds sounds in the headphones, overlapping the rustle of mechanical keyboards, and the fateful title “Investigate varnishre” playfully shines the first entry in the backlog ticket list on the kanban board. sh: echo: I / O error in staging ”(Investigate the“ varnishreload sh: echo: I / O error ”in the stage). When it comes to varnish, there are no errors and there can be no place, even if they do not translate into any problems, as in this case.







For those unfamiliar with varnishreload , this is a simple shell script used to reload a varnish configuration - also called VCL.







As the name of the ticket suggests, an error occurred on one of the servers on the stage, and since I was sure that the varnish routing on the stage worked correctly, I assumed that this would be a minor error. So, just a message that got into an already closed output stream. I take the ticket to myself, in full confidence that I will mark it ready in less than 30 minutes, pat myself on the shoulder to clean the board from the next trash and return to more important matters.







Crashing into a wall at a speed of 200 km / h



Having opened the varnishreload



file, on one of the servers running Debian Stretch, I saw a shell script with a length of less than 200 lines.







After running through the script, I did not notice anything that could result in problems when it was run multiple times directly from the terminal.







In the end, this is a stage, even if it breaks, no one will complain, well ... not too much. I run the script and see what will be written to the terminal, but I can’t see any errors.







A couple more starts to make sure that I can not reproduce the error without any additional effort, and I start to figure out how to change this script and make it still give an error.







Can the script override STDOUT (using > &-



)? Or STDERR? Neither one worked as a result.







Obviously, systemd somehow modifies the startup environment, but how, and why?

I chop vim and edit varnishreload



, adding set -x



directly under the shebang, hoping that the debug script output will shed a little light.







The file is fixed, so I restart varnish and see that the change completely broke everything ... The exhaust is a complete mess, in which there are tons of C-like code. Even scrolling in the terminal is not enough to find where it starts. I am completely confused. Can debug mode affect the work of programs launched in a script? No, nonsense. A bug in the shell? Several possible scenarios rush through my head like cockroaches in different directions. A cup of caffeine-full drink instantly emptied, a quick trip to the kitchen to replenish the stock and ... let's go. I open the script and look at the shebang: #!/bin/sh



.







/bin/sh



is simply a symlink in bash, so the script is interpreted in POSIX-compatible mode, right? There it was! The default shell in Debian is dash, and that is exactly what /bin/sh



refers to .







 # ls -l /bin/sh lrwxrwxrwx 1 root root 4 Jan 24 2017 /bin/sh -> dash
      
      





For the sake of trial, I changed the shebang to #!/bin/bash



, deleted set -x



and tried again. Finally, during the subsequent reboot of varnish, a decent error appeared in the output:







 Jan 01 12:00:00 hostname varnishreload[32604]: /usr/sbin/varnishreload: line 124: echo: write error: Broken pipe Jan 01 12:00:00 hostname varnishreload[32604]: VCL 'reload_20190101_120000_32604' compiled
      
      





Line 124, there it is!







 114 find_vcl_file() { 115 VCL_SHOW=$(varnishadm vcl.show -v "$VCL_NAME" 2>&1) || : 116 VCL_FILE=$( 117 echo "$VCL_SHOW" | 118 awk '$1 == "//" && $2 == "VCL.SHOW" {print; exit}' | { 119 # all this ceremony to handle blanks in FILE 120 read -r DELIM VCL_SHOW INDEX SIZE FILE 121 echo "$FILE" 122 } 123 ) || : 124 125 if [ -z "$VCL_FILE" ] 126 then 127 echo "$VCL_SHOW" >&2 128 fail "failed to get the VCL file name" 129 fi 130 131 echo "$VCL_FILE" 132 }
      
      





But as it turned out, line 124 is quite empty and is of no interest. I could only assume that the error arose as part of a multi-line starting on line 116.

What is finally written to the VCL_FILE



variable as a result of the execution of the aforementioned sub-shell?







At the beginning, it sends the contents of the VLC_SHOW



variable created on line 115 to the next command through the pipe. And then what happens then?







First, it uses varnishadm



, which is part of the varnish installation package, to configure varnish without restarting.







The vcl.show -v



used to output the entire VCL configuration specified in ${VCL_NAME}



to STDOUT.







To display the current active VCL configuration, as well as several previous versions of varnish routing configurations that are still in memory, you can use the varnishadm vcl.list



, the output of which will be similar to the one below:







 discarded cold/busy 1 reload_20190101_120000_11903 discarded cold/busy 2 reload_20190101_120000_12068 discarded cold/busy 16 reload_20190101_120000_12259 discarded cold/busy 16 reload_20190101_120000_12299 discarded cold/busy 28 reload_20190101_120000_12357 active auto/warm 32 reload_20190101_120000_12397 available auto/warm 0 reload_20190101_120000_12587
      
      





The value of the variable ${VCL_NAME}



set in another part of the varnishreload



script to the name of the currently active VCL, if any. In this case, it will be “reload_20190101_120000_12397”.







Great, the variable ${VCL_SHOW}



contains the full configuration for varnish, so far it’s clear. Now, I finally understood why the dash output with set -x



turned out to be so broken - it included the contents of the resulting configuration.







It is important to understand that a complete VCL configuration can often be cobbled together from multiple files. C style comments are used to determine where some configuration files were included in others, and this is exactly what the entire line of code snippet below is all about.

The syntax of the comments describing the included files has the following format:







 // VCL.SHOW <NUM> <NUM> <FILENAME>
      
      





The numbers in this context are not important, we are interested in the file name.







So what is going on in the swamp of teams starting at line 116?

Let's figure it out.

The team consists of four parts:







  1. A simple echo



    that displays the value of the variable ${VCL_SHOW}





     echo "$VCL_SHOW"
          
          



  2. awk



    , which searches for a line (record), where the first field, after breaking the text, will be “//”, and the second “VCL.SHOW”.

    Awk will write the first line matching these patterns, and then immediately stop processing.

     awk '$1 == "//" && $2 == "VCL.SHOW" {print; exit}'
          
          



  3. A block of code that stores in five variable field values ​​separated by spaces. The fifth FILE variable gets the rest of the string. Finally, the last echo writes out the contents of the variable ${FILE}



    .

     { read -r DELIM VCL_SHOW INDEX SIZE FILE; echo "$FILE" }
          
          



  4. Since all steps 1 to 3 are enclosed in a sub-shell, the output of the value $FILE



    will be written to the variable VCL_FILE



    .


As follows from the comment on line 119, this serves a single purpose: to reliably handle cases where the VCL will refer to files with space characters in the name.







I commented out the original processing logic for ${VCL_FILE}



and tried to change the sequence of commands, but this did not lead to anything. Everything worked cleanly for me, and in case of starting the service, it gave an error.







It seems that the error is simply not reproducible when you run the script manually, while the expected 30 minutes have ended six times already and, in the appendage, a more priority task has appeared, pushing the rest of the cases aside. The rest of the week was packed with a variety of tasks and was only slightly diluted with a report on sed and an interview with the candidate. The problem with the varnishreload



was irretrievably lost in the sands of time.







Your so-called sed-fu ... really ... rubbish



The next week turned out to be one pretty free day, so again I decided to take up this ticket. I was hoping that in my brain, some background process had been looking for a solution to this problem all this time, and this time I would certainly understand what was happening.







Since last time a simple code change did not help, I just decided to rewrite it starting from the 116th line. In any case, the existing code was lousy. And there is absolutely no need to use read



.







Looking at the error again:

sh: echo: broken pipe



- in this command echo is in two places, but I suspect that the first is the more likely culprit (well, or at least an accomplice). Awk is also not credible. And in case it really is awk | {read; echo}



awk | {read; echo}



awk | {read; echo}



construction leads to all these problems, why not replace it? This one-line command does not use all the awk features, and even this extra read



in the appendage.







Since there was a report on sed



last week, I wanted to try my newly acquired skills and simplify echo | awk | { read; echo}



echo | awk | { read; echo}



echo | awk | { read; echo}



into a more understandable echo | sed



echo | sed



. Although this is definitely not the best approach to detecting an error, I thought that at least I would try my sed-fu and maybe find out something new about the problem. In the process, I asked my colleague, the author of the report on sed, to help me come up with a more efficient sed script.







I dropped the contents of varnishadm vcl.show -v "$VCL_NAME"



into the file, so I could concentrate on writing a sed script without any hassle associated with reloading the service.







A brief description of how sed handles input can be found in its GNU manual . In sed sources, the \n



character is explicitly specified as a line separator.







In several passes and with the recommendations of my colleague, we wrote a sed script that gave the same result as the entire source line 116.







The following is a sample input file:







 > cat vcl-example.vcl Text // VCL.SHOW 0 1578 file with 3 spaces.vcl More text // VCL.SHOW 0 1578 file.vcl Even more text // VCL.SHOW 0 1578 file with TWOspaces.vcl Final text
      
      





This may not be obvious from the above description, but we are only interested in the first comment // VCL.SHOW



, and there may be several of them in the input. That is why the original awk finishes its work after the first match.







 #  ,      #   sed,  -    '\#'    '/',           #    “// VCL.SHOW”,       #  -n   ,  sed     ,       (.  ) # -E      > cat vcl-processor-1.sed \#// VCL.SHOW#p > sed -En -f vcl-processor-1.sed vcl-example.vcl // VCL.SHOW 0 1578 file with 3 spaces.vcl // VCL.SHOW 0 1578 file.vcl // VCL.SHOW 0 1578 file with TWOspaces.vcl #  ,     #   “substitute”,     ,    a #      ,    > cat vcl-processor-2.sed \#// VCL.SHOW# { s#.* [0-9]+ [0-9]+ (.*)$#\1# p } > sed -En -f vcl-processor-2.sed vcl-example.vcl file with 3 spaces.vcl file.vcl file with TWOspaces.vcl #  ,      #      awk,         > cat vcl-processor-3.sed \#// VCL.SHOW# { s#.* [0-9]+ [0-9]+ (.*)$#\1# p q } > sed -En -f vcl-processor-3.sed vcl-example.vcl file with 3 spaces.vcl #  ,    ,      > sed -En -e '\#// VCL.SHOW#{s#.* [0-9]+ [0-9]+ (.*)$#\1#p;q;}' vcl-example.vcl file with 3 spaces.vcl
      
      





So, the contents of the varnishreload script will look something like this:







 VCL_FILE="$(echo "$VCL_SHOW" | sed -En '\#// VCL.SHOW#{s#.*[0-9]+ [0-9]+ (.*)$#\1#p;q;};')"
      
      





The above logic can be summarized as follows:

If the line matches the regular expression // VCL.SHOW



, then greedily eat up the text that includes both numbers in this line and save everything that remains after this operation. Give the saved value and finish the program.







Simple, right?







We were pleased with the sed script and the fact that it replaces all of the original code. All my tests gave the desired results, so I changed the “varnishreload” on the server and ran systemctl reload varnish



again. The filthy mistake echo: write error: Broken pipe



laughed again in our faces. A winking cursor was waiting for a new command to be entered in the dark emptiness of the terminal ...








All Articles