What does the zip archive look like and what can we do about it

Good day, dear Habr!



Over the past half a year, the crooked path of my pet projects has led me into such jungle from where it is still not possible to get out. And it all started harmlessly - a site with pictures, but a sense of perfectionism, the pursuit of a freebie, as well as some features of my mindset turned this, as originally planned, little walk, into a real long journey. Well, okay, as one rather burry revolutionary used to say: “Learn, study and study again”, and I, willy-nilly, have to follow this admonition.



Oh, something we were distracted from the main topic. I will no longer bore you with lengthy speeches, but I will get down to business.



Create a zip archive



In principle, I will not rewrite the specification here. On the whole, it makes no sense to describe the structure either, because all this was done before me .



For those who are too lazy to follow the links, I just outline briefly that any zip archive should contain:





Knowing this, we can try to write a simple archive that will contain only two files:



<?php //        (1.txt  2.txt)   : $entries = [ '1.txt' => 'Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nunc id ante ultrices, fermentum nibh eleifend, ullamcorper nunc. Sed dignissim ut odio et imperdiet. Nunc id felis et ligula viverra blandit a sit amet magna. Vestibulum facilisis venenatis enim sed bibendum. Duis maximus felis in suscipit bibendum. Mauris suscipit turpis eleifend nibh commodo imperdiet. Donec tincidunt porta interdum. Aenean interdum condimentum ligula, vitae ornare lorem auctor in. Suspendisse metus ipsum, porttitor et sapien id, fringilla aliquam nibh. Curabitur sem lacus, ultrices quis felis sed, blandit commodo metus. Duis tincidunt vel mauris at accumsan. Integer et ipsum fermentum leo viverra blandit.', '2.txt' => 'Mauris in purus sit amet ante tempor finibus nec sed justo. Integer ac nibh tempus, mollis sem vel, consequat diam. Pellentesque ut condimentum ex. Praesent finibus volutpat gravida. Vivamus eleifend neque sit amet diam scelerisque lacinia. Nunc imperdiet augue in suscipit lacinia. Curabitur orci diam, iaculis non ligula vitae, porta pellentesque est. Duis dolor erat, placerat a lacus eu, scelerisque egestas massa. Aliquam molestie pulvinar faucibus. Quisque consequat, dolor mattis lacinia pretium, eros eros tempor neque, volutpat consectetur elit elit non diam. In faucibus nulla justo, non dignissim erat maximus consectetur. Sed porttitor turpis nisl, elementum aliquam dui tincidunt nec. Nunc eu enim at nibh molestie porta ut ac erat. Sed tortor sem, mollis eget sodales vel, faucibus in dolor.', ]; //      Lorem.zip,      cwd (      ) $destination = 'Lorem.zip'; $handle = fopen($destination, 'w'); //      ,    ,     ,   "" Central Directory File Header $written = 0; $dictionary = []; foreach ($entries as $filename => $content) { //         Local File Header,     //        ,      . $fileInfo = [ //     'versionToExtract' => 10, //   0,        - 'generalPurposeBitFlag' => 0, //      ,    0 'compressionMethod' => 0, // -    mtime ,          ? 'modificationTime' => 28021, //   , ? 'modificationDate' => 20072, //      .     ,       ,   ? 'crc32' => hexdec(hash('crc32b', $content)), //     .        . //       :) 'compressedSize' => $size = strlen($content), 'uncompressedSize' => $size, //    'filenameLength' => strlen($filename), //  .    ,   0. 'extraFieldLength' => 0, ]; //      . $LFH = pack('LSSSSSLLLSSa*', ...array_values([ 'signature' => 0x04034b50, //  Local File Header ] + $fileInfo + ['filename' => $filename])); //       ,       Central Directory File Header $dictionary[$filename] = [ 'signature' => 0x02014b50, //  Central Directory File Header 'versionMadeBy' => 798, //  .      -  . ] + $fileInfo + [ 'fileCommentLength' => 0, //    . No comments 'diskNumber' => 0, //     0,       . 'internalFileAttributes' => 0, //    'externalFileAttributes' => 2176057344, //    'localFileHeaderOffset' => $written, //      Local File Header 'filename' => $filename, //  . ]; //      $written += fwrite($handle, $LFH); //    $written += fwrite($handle, $content); } // ,     ,    . //          End of central directory record (EOCD) $EOCD = [ //  EOCD 'signature' => 0x06054b50, //  .    ,   0 'diskNumber' => 0, //      -  0 'startDiskNumber' => 0, //       . 'numberCentralDirectoryRecord' => $records = count($dictionary), //    .    ,     'totalCentralDirectoryRecord' => $records, //   Central Directory Record. //      ,      'sizeOfCentralDirectory' => 0, // ,    Central Directory Records 'centralDirectoryOffset' => $written, //     'commentLength' => 0 ]; //     !   foreach ($dictionary as $entryInfo) { $CDFH = pack('LSSSSSSLLLSSSSSLLa*', ...array_values($entryInfo)); $written += fwrite($handle, $CDFH); } // ,   .      $EOCD['sizeOfCentralDirectory'] = $written - $EOCD['centralDirectoryOffset']; //     End of central directory record $EOCD = pack('LSSSSLLS', ...array_values($EOCD)); $written += fwrite($handle, $EOCD); //  . fclose($handle); echo '  : ' . $written . ' ' . PHP_EOL; echo '     `unzip -tq ' . $destination . '`' . PHP_EOL; echo PHP_EOL;
      
      





Try to run this primitive code and the output will give you a Lorem.zip file that will contain 1.txt and 2.txt.



What for?



Of course, any adequate person will say that writing archivers in php is a futile undertaking, especially since for a format such as zip, there are a bunch of ready-made implementations for every taste and color. And in the same php there are ready-made libraries. I will say so too :)



But why, then, is this whole article, why did I spend time writing it, and you reading it?

And then, that everything is not so simple and knowing how zip works opens up some additional possibilities.



Firstly, I hope, at least a little, but it will help those who want to understand the structure of zip.

And secondly, creating the archive with our own hands, we have control, and, most importantly, access to its internal data.



We can pre-calculate the Local File Header and Central Directory File Header, and then on-demand generate a zip archive on the fly with any content and order of files, simply substituting this data. And no overhead except for I / O.



Or, we can record the archive, upload it, for example, to the cloud, which supports fragmented downloading and, knowing the offsets for each of the files, get any of the archive files as if it weren’t in the archive at all, adding only one header to request. And then all this can be proxied and ...



Okay, let's not get ahead of ourselves. If you are interested in this topic, then in the following articles I will try to consider these opportunities and show how to use them.



All Articles