As soon as you cross the Borrow-Checker pain threshold and realize that Rust allows you to do things that are unimaginable (and sometimes dangerous) in other languages, you may also have the same irresistible desire to Rewrite Everything to Rust . Although in the best case this is banal unproductive (meaningless squandering efforts for several projects), and in the worst case it leads to a decrease in the quality of the code (after all, why do you consider yourself more experienced in using the library than its original author?)
It would be much more useful to provide a secure interface for the original library by reusing its code.
β’ First steps
β’ We collect chmlib-sys
β’ Writing a secure wrapper in Rust
β’ Search for items by name
β’ Bypass elements by filter
β’ Reading file contents
β’ Add examples
β’ Table of contents of the CHM file
β’ Unpacking the CHM file to disk
β’ What next?
This article discusses a real project. I had to extract information from existing CHM files, but there was no time to understand the format. Laziness is the engine of progress.
The chmlib crate is published on crates.io , and its source code is available on GitHub . If you find it useful or find problems in it, then let me know through the bugtracker .
The first steps
To begin with, itβs worth understanding how the work with the library was originally conceived.
This will not only teach you how to use it, but also make sure that everything is going. If you are lucky, you will even find ready-made tests and examples.
Do not skip this step!
We will be working with CHMLib , a C library for reading Microsoft Compiled HTML Help ( .chm
) files.
Let's start by creating a new project and connecting CHMLib as a git submodule:
$ git init chmlib && cd chmlib Initialized empty Git repository in /home/michael/Documents/chmlib/.git/ $ touch README.md Cargo.toml $ cargo new --lib chmlib Created library `chmlib` package $ cargo new --lib chmlib-sys Created library `chmlib-sys` package $ cat Cargo.toml [workspace] members = ["chmlib", "chmlib-sys"] $ git submodule add git@github.com:jedwing/CHMLib.git vendor/CHMLib Cloning into '/home/michael/Documents/chmlib/vendor/CHMLib'... remote: Enumerating objects: 99, done. remote: Total 99 (delta 0), reused 0 (delta 0), pack-reused 99 Receiving objects: 100% (99/99), 375.51 KiB | 430.00 KiB/s, done. Resolving deltas: 100% (45/45), done.
After that, take a look at what's inside using tree
:
$ tree vendor/CHMLib vendor/CHMLib βββ acinclude.m4 βββ AUTHORS βββ ChangeLog βββ ChmLib-ce.zip βββ ChmLib-ds6.zip βββ configure.in βββ contrib β βββ mozilla_helper.sh βββ COPYING βββ Makefile.am βββ NEWS βββ NOTES βββ README βββ src βββ chm_http.c βββ chm_lib.c βββ chm_lib.h βββ enum_chmLib.c βββ enumdir_chmLib.c βββ extract_chmLib.c βββ lzx.c βββ lzx.h βββ Makefile.am βββ Makefile.simple βββ test_chmLib.c 2 directories, 23 files
It looks like the library uses GNU Autotools to build. This is not good, because all users of the chmlib crate (and their users) will need to install Autotools.
We will try to get rid of this "contagious" dependency by collecting the C code manually, but more on that later.
The files lzx.h and lzx.c contain an implementation of the LZX compression algorithm. In general, it would be better to use some kind of liblzx library to get updates for free and all that, but perhaps it would be easier to stupidly compile these files.
enum_chmLib.c, enumdir_chmLib.c, extract_chmLib.c seem to be examples of using the functions chm_enumerate (), chm_enumerate_dir (), chm_retrieve_object (). It will come in handy ...
The file test_chmLib.c contains another example, this time extracting one page from the CHM file to disk.
chm_http.c implements a simple HTTP server showing a .chm file in a browser. This, probably, will no longer be useful.
So we sorted out everything that is in vendor / CHMLib / src. Will we collect the library?
Honestly, it is small enough to apply the scientific poke method.
$ clang chm_lib.c enum_chmLib.c -o enum_chmLib /usr/bin/ld: /tmp/chm_lib-537dfe.o: in function `chm_close': chm_lib.c:(.text+0x8fa): undefined reference to `LZXteardown' /usr/bin/ld: /tmp/chm_lib-537dfe.o: in function `_chm_decompress_region': chm_lib.c:(.text+0x18ca): undefined reference to `LZXinit' /usr/bin/ld: /tmp/chm_lib-537dfe.o: in function `_chm_decompress_block': chm_lib.c:(.text+0x2900): undefined reference to `LZXreset' /usr/bin/ld: chm_lib.c:(.text+0x2a4b): undefined reference to `LZXdecompress' /usr/bin/ld: chm_lib.c:(.text+0x2abe): undefined reference to `LZXreset' /usr/bin/ld: chm_lib.c:(.text+0x2bf4): undefined reference to `LZXdecompress' clang: error: linker command failed with exit code 1 (use -v to see invocation)
Okay, maybe this LZX is still needed ...
$ clang chm_lib.c enum_chmLib.c lzx.c -o enum_chmLib
Uh ... that's all?
To make sure the code is working, I downloaded an example from the Internet:
$ curl http://www.innovasys.com/static/hs/samples/topics.classic.chm.zip \ -o topics.classic.chm.zip $ unzip topics.classic.chm.zip Archive: topics.classic.chm.zip inflating: output/compiled/topics.classic.chm $ file output/compiled/topics.classic.chm output/compiled/topics.classic.chm: MS Windows HtmlHelp Data
Let's see how enum_chmLib handles it:
$ ./enum_chmLib output/compiled/topics.classic.chm output/compiled/topics.classic.chm: spc start length type name === ===== ====== ==== ==== 0 0 0 normal dir / 1 5125797 4096 special file /#IDXHDR ... 1 4944434 11234 normal file /BrowserView.html ... 0 0 0 normal dir /flash/ 1 532689 727 normal file /flash/expressinstall.swf 0 0 0 normal dir /Images/Commands/RealWorld/ 1 24363 1254 normal file /Images/Commands/RealWorld/BrowserBack.bmp ... 1 35672 1021 normal file /Images/Employees24.gif ... 1 3630715 200143 normal file /template/packages/jquery-mobile/script/ jquery.mobile-1.4.5.min.js ... 0 134 1296 meta file ::DataSpace/Storage/MSCompressed/Transform/ {7FC28940-9D31-11D0-9B27-00A0C91E9C7C}/ InstanceData/ResetTable
Lord, even here jQuery Β― \ _ (γ) _ / Β―
Build chmlib-sys
Now we know enough to use CHMLib in the chmlib -sys crate , which is responsible for building the native library, linking it with the Rast compiler, and an interface to C functions.
To build the library you need to write the build.rs
file. Using the cc crate, he will call the C compiler and do other friendships so that everything works together as it should.
We are fortunate that we can shift most of the work to cc, but sometimes it is much more difficult. Read more in the documentation for assembly scripts .
First add cc as a dependency for chmlib-sys:
$ cd chmlib-sys $ cargo add --build cc Updating 'https://github.com/rust-lang/crates.io-index' index Adding cc v1.0.46 to build-dependencies
Then we write build.rs
:
// chmlib-sys/build.rs use cc::Build; use std::{env, path::PathBuf}; fn main() { let project_dir = PathBuf::from(env::var("CARGO_MANIFEST_DIR").unwrap()) .canonicalize() .unwrap(); let root_dir = project_dir.parent().unwrap(); let src = root_dir.join("vendor").join("CHMLib").join("src"); Build::new() .file(src.join("chm_lib.c")) .file(src.join("lzx.c")) .include(&src) .warnings(false) .compile("chmlib"); }
You also need to tell Cargo that chmlib-sys links to the chmlib library. Then Cargo can guarantee that in the entire dependency graph there is only one crate, depending on the specific native library. This avoids obscure error messages about repeated characters or the accidental use of incompatible libraries.
--- a/chmlib-sys/Cargo.toml +++ b/chmlib-sys/Cargo.toml @@ -3,7 +3,13 @@ name = "chmlib-sys" version = "0.1.0" authors = ["Michael Bryan <michaelfbryan@gmail.com>"] edition = "2018" description = "Raw bindings to the CHMLib C library" license = "LGPL" repository = "https://github.com/Michael-F-Bryan/chmlib" +links = "chmlib" +build = "build.rs" [dependencies] [build-dependencies] cc = { version = "1.0" }
Next, we need to declare all the functions exported by the chmlib library so that they can be used from Rast.
It is for this that there is a wonderful bindgen project. The C header file is given to the input, and the file with the FFI bindings for Rast is output.
$ cargo install bindgen $ bindgen ../vendor/CHMLib/src/chm_lib.h \ -o src/lib.rs \ --raw-line '#![allow(non_snake_case, non_camel_case_types)]' $ head src/lib.rs /* automatically generated by rust-bindgen */ #![allow(non_snake_case, non_camel_case_types)] pub const CHM_UNCOMPRESSED: u32 = 0; pub const CHM_COMPRESSED: u32 = 1; pub const CHM_MAX_PATHLEN: u32 = 512; pub const CHM_PARAM_MAX_BLOCKS_CACHED: u32 = 0; pub const CHM_RESOLVE_SUCCESS: u32 = 0; pub const CHM_RESOLVE_FAILURE: u32 = 1; $ tail src/lib.rs extern "C" { pub fn chm_enumerate_dir( h: *mut chmFile, prefix: *const ::std::os::raw::c_char, what: ::std::os::raw::c_int, e: CHM_ENUMERATOR, context: *mut ::std::os::raw::c_void, ) -> ::std::os::raw::c_int; }
I highly recommend reading the Bindgen user manual if you need to fix something in its exhaust.
At this stage, it will be useful to write a smoke test that will verify that everything works as expected and that we can actually call the functions of the original C library.
// chmlib-sys/tests/smoke_test.rs // Path char* . // , OsStr ( Path) Windows [u16] , // char*. #![cfg(unix)] use std::{ffi::CString, os::unix::ffi::OsStrExt, path::Path}; #[test] fn open_example_file() { let project_dir = Path::new(env!("CARGO_MANIFEST_DIR")); let sample_chm = project_dir.parent().unwrap().join("topics.classic.chm"); let c_str = CString::new(sample_chm.as_os_str().as_bytes()).unwrap(); unsafe { let handle = chmlib_sys::chm_open(c_str.as_ptr()); assert!(!handle.is_null()); chmlib_sys::chm_close(handle); } }
cargo test
says everything seems to be in order:
$ cargo test Finished test [unoptimized + debuginfo] target(s) in 0.03s Running ~/chmlib/target/debug/deps/chmlib_sys-2ffd7b11a9fd8437 running 1 test test bindgen_test_layout_chmUnitInfo ... ok test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out Running ~/chmlib/target/debug/deps/smoke_test-f7be9810412559dc running 1 test test open_example_file ... ok test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out Doc-tests chmlib-sys running 0 tests test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out
Writing a secure wrapper in Rust
Technically and technically, we can now call CHMLib from Rasta, but this requires an unsafe heap. It may work for a hackneyed craft, but for publishing on crates.io itβs worth writing a safe wrapper for all unsafe code.
If you look at the chmlib-sys API using cargo doc --open
, you can see many functions that take *mut ChmFile
as the first argument. This is similar to objects and methods.
/* $Id: chm_lib.h,v 1.10 2002/10/09 01:16:33 jedwin Exp $ */ /*************************************************************************** * chm_lib.h - CHM archive manipulation routines * * ------------------- * * * * author: Jed Wing <jedwin@ugcs.caltech.edu> * * version: 0.3 * * notes: These routines are meant for the manipulation of microsoft * * .chm (compiled html help) files, but may likely be used * * for the manipulation of any ITSS archive, if ever ITSS * * archives are used for any other purpose. * * * * Note also that the section names are statically handled. * * To be entirely correct, the section names should be read * * from the section names meta-file, and then the various * * content sections and the "transforms" to apply to the data * * they contain should be inferred from the section name and * * the meta-files referenced using that name; however, all of * * the files I've been able to get my hands on appear to have * * only two sections: Uncompressed and MSCompressed. * * Additionally, the ITSS.DLL file included with Windows does * * not appear to handle any different transforms than the * * simple LZX-transform. Furthermore, the list of transforms * * to apply is broken, in that only half the required space * * is allocated for the list. (It appears as though the * * space is allocated for ASCII strings, but the strings are * * written as unicode. As a result, only the first half of * * the string appears.) So this is probably not too big of * * a deal, at least until CHM v4 (MS .lit files), which also * * incorporate encryption, of some description. * ***************************************************************************/ /*************************************************************************** * * * This program is free software; you can redistribute it and/or modify * * it under the terms of the GNU Lesser General Public License as * * published by the Free Software Foundation; either version 2.1 of the * * License, or (at your option) any later version. * * * ***************************************************************************/ #ifndef INCLUDED_CHMLIB_H #define INCLUDED_CHMLIB_H #ifdef __cplusplus extern "C" { #endif /* RWE 6/12/1002 */ #ifdef PPC_BSTR #include <wtypes.h> #endif #ifdef WIN32 #ifdef __MINGW32__ #define __int64 long long #endif typedef unsigned __int64 LONGUINT64; typedef __int64 LONGINT64; #else typedef unsigned long long LONGUINT64; typedef long long LONGINT64; #endif /* the two available spaces in a CHM file */ /* NB: The format supports arbitrarily many spaces, but only */ /* two appear to be used at present. */ #define CHM_UNCOMPRESSED (0) #define CHM_COMPRESSED (1) /* structure representing an ITS (CHM) file stream */ struct chmFile; /* structure representing an element from an ITS file stream */ #define CHM_MAX_PATHLEN (512) struct chmUnitInfo { LONGUINT64 start; LONGUINT64 length; int space; int flags; char path[CHM_MAX_PATHLEN+1]; }; /* open an ITS archive */ #ifdef PPC_BSTR /* RWE 6/12/2003 */ struct chmFile* chm_open(BSTR filename); #else struct chmFile* chm_open(const char *filename); #endif /* close an ITS archive */ void chm_close(struct chmFile *h); /* methods for ssetting tuning parameters for particular file */ #define CHM_PARAM_MAX_BLOCKS_CACHED 0 void chm_set_param(struct chmFile *h, int paramType, int paramVal); /* resolve a particular object from the archive */ #define CHM_RESOLVE_SUCCESS (0) #define CHM_RESOLVE_FAILURE (1) int chm_resolve_object(struct chmFile *h, const char *objPath, struct chmUnitInfo *ui); /* retrieve part of an object from the archive */ LONGINT64 chm_retrieve_object(struct chmFile *h, struct chmUnitInfo *ui, unsigned char *buf, LONGUINT64 addr, LONGINT64 len); /* enumerate the objects in the .chm archive */ typedef int (*CHM_ENUMERATOR)(struct chmFile *h, struct chmUnitInfo *ui, void *context); #define CHM_ENUMERATE_NORMAL (1) #define CHM_ENUMERATE_META (2) #define CHM_ENUMERATE_SPECIAL (4) #define CHM_ENUMERATE_FILES (8) #define CHM_ENUMERATE_DIRS (16) #define CHM_ENUMERATE_ALL (31) #define CHM_ENUMERATOR_FAILURE (0) #define CHM_ENUMERATOR_CONTINUE (1) #define CHM_ENUMERATOR_SUCCESS (2) int chm_enumerate(struct chmFile *h, int what, CHM_ENUMERATOR e, void *context); int chm_enumerate_dir(struct chmFile *h, const char *prefix, int what, CHM_ENUMERATOR e, void *context); #ifdef __cplusplus } #endif #endif /* INCLUDED_CHMLIB_H */
Let's start with the data type, which calls chm_open () in the constructor, and chm_close () in the destructor.
pub unsafe extern "C" fn chm_open(filename: *const c_char) -> *mut chmFile; pub unsafe extern "C" fn chm_close(h: *mut chmFile);
To simplify error handling, we use the thiserror crate , which std::error::Error
implements automatically.
$ cd chmlib $ cargo add thiserror
Now we need to figure out how to turn std::path::Path
into *const c_char
. Unfortunately, this is not so easy to do due to various jokes with compatibility .
// chmlib/src/lib.rs use thiserror::Error; use std::{ffi::CString, path::Path}; #[cfg(unix)] fn path_to_cstring(path: &Path) -> Result<CString, InvalidPath> { use std::os::unix::ffi::OsStrExt; let bytes = path.as_os_str().as_bytes(); CString::new(bytes).map_err(|_| InvalidPath) } #[cfg(not(unix))] fn path_to_cstring(path: &Path) -> Result<CString, InvalidPath> { // , Windows CHMLib CreateFileA(), // ASCII. ... // , ? let rust_str = path.as_os_str().as_str().ok_or(InvalidPath)?; CString::new(rust_str).map_err(|_| InvalidPath) } #[derive(Error, Debug, Copy, Clone, PartialEq)] #[error("Invalid Path")] pub struct InvalidPath;
Now define the structure of the ChmFile . It stores a non-null pointer to chmlib_sys :: chmFile. If chm_open () returns a null pointer, then it means she couldnβt open the file due to some kind of error.
// chmlib/src/lib.rs use std::{ffi::CString, path::Path, ptr::NonNull}; #[derive(Debug)] pub struct ChmFile { raw: NonNull<chmlib_sys::chmFile>, } impl ChmFile { pub fn open<P: AsRef<Path>>(path: P) -> Result<ChmFile, OpenError> { let c_path = path_to_cstring(path.as_ref())?; // , c_path unsafe { let raw = chmlib_sys::chm_open(c_path.as_ptr()); match NonNull::new(raw) { Some(raw) => Ok(ChmFile { raw }), None => Err(OpenError::Other), } } } } impl Drop for ChmFile { fn drop(&mut self) { unsafe { chmlib_sys::chm_close(self.raw.as_ptr()); } } } /// The error returned when we are unable to open a [`ChmFile`]. #[derive(Error, Debug, Copy, Clone, PartialEq)] pub enum OpenError { #[error("Invalid path")] InvalidPath(#[from] InvalidPath), #[error("Unable to open the ChmFile")] Other, }
To make sure there are no memory leaks, run a simple test under Valgrind . He will create a ChmFile and release it immediately.
// chmlib/src/lib.rs #[test] fn open_valid_chm_file() { let sample = sample_path(); // let chm_file = ChmFile::open(&sample).unwrap(); // drop(chm_file); } fn sample_path() -> PathBuf { let project_dir = Path::new(env!("CARGO_MANIFEST_DIR")); let sample = project_dir.parent().unwrap().join("topics.classic.chm"); assert!(sample.exists()); sample }
Valgrind says there is no unaccounted memory left:
$ valgrind ../target/debug/deps/chmlib-8d8c740d578324 open_valid_chm_file ==8953== Memcheck, a memory error detector ==8953== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al. ==8953== Using Valgrind-3.14.0 and LibVEX; rerun with -h for copyright info ==8953== Command: ~/chmlib/target/debug/deps/chmlib-8d8c740d578324 open_valid_chm_file ==8953== running 1 test test tests::open_valid_chm_file ... ok test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out ==8953== ==8953== HEAP SUMMARY: ==8953== in use at exit: 0 bytes in 0 blocks ==8953== total heap usage: 249 allocs, 249 frees, 43,273 bytes allocated ==8953== ==8953== All heap blocks were freed -- no leaks are possible ==8953== ==8953== For counts of detected and suppressed errors, rerun with: -v ==8953== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Search for items by name
Next in line is the chm_resolve_object () function:
pub const CHM_RESOLVE_SUCCESS: u32 = 0; pub const CHM_RESOLVE_FAILURE: u32 = 1; /* resolve a particular object from the archive */ pub unsafe extern "C" fn chm_resolve_object( h: *mut chmFile, objPath: *const c_char, ui: *mut chmUnitInfo ) -> c_int;
The search may fail, so chm_resolve_object () returns an error code reporting success or failure, and information about the found object will be recorded by the passed pointer to chmUnitInfo .
The type std::mem::MaybeUninit
created just for our case with the out parameter ui .
For now, let's leave the UnitInfo structure empty - this is the Rust equivalent of the chmUnitInfo C structure. We will add the fields when we start reading something from the ChmFile.
// chmlib/src/lib.rs impl ChmFile { ... /// Find a particular object in the archive. pub fn find<P: AsRef<Path>>(&mut self, path: P) -> Option<UnitInfo> { let path = path_to_cstring(path.as_ref()).ok()?; unsafe { // chmUnitInfo let mut resolved = MaybeUninit::<chmlib_sys::chmUnitInfo>::uninit(); // - let ret = chmlib_sys::chm_resolve_object( self.raw.as_ptr(), path.as_ptr(), resolved.as_mut_ptr(), ); if ret == chmlib_sys::CHM_RESOLVE_SUCCESS { // "resolved" Some(UnitInfo::from_raw(resolved.assume_init())) } else { None } } } } #[derive(Debug)] pub struct UnitInfo; impl UnitInfo { fn from_raw(ui: chmlib_sys::chmUnitInfo) -> UnitInfo { UnitInfo } }
Note that ChmFile :: find () accepts&mut self
, although the code on the Rast does not contain an explicit state change. The fact is that the C implementation uses all sorts of fseek () to move around the file, so the internal state still changes during the search.
Let's check ChmFile :: find () on the experimental file that we previously downloaded:
// chmlib/src/lib.rs #[test] fn find_an_item_in_the_sample() { let sample = sample_path(); let chm = ChmFile::open(&sample).unwrap(); assert!(chm.find("/BrowserView.html").is_some()); assert!(chm.find("doesn't exist.txt").is_none()); }
Filter bypass items
CHMLib provides an API for viewing the contents of a CHM file through a bitmask filter.
Take the convenient bitflags crate for working with masks and flags:
$ cargo add bitflags Updating 'https://github.com/rust-lang/crates.io-index' index Adding bitflags v1.2.1 to dependencies
And define the Filter checkboxes based on the constants from chm_lib.h:
// chmlib/src/lib.rs bitflags::bitflags! { pub struct Filter: c_int { /// A normal file. const NORMAL = chmlib_sys::CHM_ENUMERATE_NORMAL as c_int; /// A meta file (typically used by the CHM system). const META = chmlib_sys::CHM_ENUMERATE_META as c_int; /// A special file (starts with `#` or `$`). const SPECIAL = chmlib_sys::CHM_ENUMERATE_SPECIAL as c_int; /// It's a file. const FILES = chmlib_sys::CHM_ENUMERATE_FILES as c_int; /// It's a directory. const DIRS = chmlib_sys::CHM_ENUMERATE_DIRS as c_int; } }
We also need an extern "C"
adapter for Rastovyh closures, which can be passed to C in the form of a pointer to a function:
// chmlib/src/lib.rs unsafe extern "C" fn function_wrapper<F>( file: *mut chmlib_sys::chmFile, unit: *mut chmlib_sys::chmUnitInfo, state: *mut c_void, ) -> c_int where F: FnMut(&mut ChmFile, UnitInfo) -> Continuation, { // FFI- let result = panic::catch_unwind(|| { // ManuallyDrop `&mut ChmFile` // ( double-free). let mut file = ManuallyDrop::new(ChmFile { raw: NonNull::new_unchecked(file), }); let unit = UnitInfo::from_raw(unit.read()); // state let closure = &mut *(state as *mut F); closure(&mut file, unit) }); match result { Ok(Continuation::Continue) => { chmlib_sys::CHM_ENUMERATOR_CONTINUE as c_int }, Ok(Continuation::Stop) => chmlib_sys::CHM_ENUMERATOR_SUCCESS as c_int, Err(_) => chmlib_sys::CHM_ENUMERATOR_FAILURE as c_int, } }
function_wrapper
contains a tricky unsafe code that you need to be able to use:
- The
state
pointer must point to instance of closure F.- Rasta code executed by a closure can cause panic. It should not cross the border between Rast and C, since stack promotion in different languages ββis undefined behavior. A possible panic should be intercepted using
std::panic::catch_unwind()
.- A pointer to chmlib_sys :: chmFile passed to function_wrapper is also stored in the calling ChmFile. At the time of the call, you must ensure that only the closure can manipulate chmlib_sys :: chmFile, otherwise a race condition may occur.
- The closure needs to be passed
&mut ChmFile
, and for this you will need to create a temporary object on the stack using the existing pointer. However, if the ChmFile destructor runs in this case, then chmlib_sys :: chmFile will be freed too soon. To solve this problem, there isstd::mem::ManuallyDrop
.
This is how function_wrapper is used to implement ChmFile::for_each()
:
// chmlib/src/lib.rs impl ChmFile { ... /// Inspect each item within the [`ChmFile`]. pub fn for_each<F>(&mut self, filter: Filter, mut cb: F) where F: FnMut(&mut ChmFile, UnitInfo) -> Continuation, { unsafe { chmlib_sys::chm_enumerate( self.raw.as_ptr(), filter.bits(), Some(function_wrapper::<F>), &mut cb as *mut _ as *mut c_void, ); } } /// Inspect each item within the [`ChmFile`] inside a specified directory. pub fn for_each_item_in_dir<F, P>( &mut self, filter: Filter, prefix: P, mut cb: F, ) where P: AsRef<Path>, F: FnMut(&mut ChmFile, UnitInfo) -> Continuation, { let path = match path_to_cstring(prefix.as_ref()) { Ok(p) => p, Err(_) => return, }; unsafe { chmlib_sys::chm_enumerate_dir( self.raw.as_ptr(), path.as_ptr(), filter.bits(), Some(function_wrapper::<F>), &mut cb as *mut _ as *mut c_void, ); } } }
Notice how the F parameter interacts with the generic function_wrapper function. This technique is often used when you need to pass the Rust closure through FFI to code in another language.
Reading file contents
The last function that we need is responsible for actually reading the file using chm_retrieve_object ().
Its implementation is pretty trivial. This is similar to the typical std :: io :: Read trait, with the exception of an explicit file offset.
// chmlib/src/lib.rs impl ChmFile { ... pub fn read( &mut self, unit: &UnitInfo, offset: u64, buffer: &mut [u8], ) -> Result<usize, ReadError> { let mut unit = unit.0.clone(); let bytes_written = unsafe { chmlib_sys::chm_retrieve_object( self.raw.as_ptr(), &mut unit, buffer.as_mut_ptr(), offset, buffer.len() as _, ) }; if bytes_written >= 0 { Ok(bytes_written as usize) } else { Err(ReadError) } } } #[derive(Error, Debug, Copy, Clone, PartialEq)] #[error("The read failed")] pub struct ReadError;
Of course, it would be nice to have a more detailed error message than βfailed to readβ, but judging by the source code, chm_retrieve_object () does not particularly distinguish between errors:
- returns 0 when the file is read to the end;
- returns 0 for invalid arguments: null pointers or going out of bounds;
- returns -1 if the system reads files incorrectly (and fills errno);
- returns β1 upon decompression errors without distinguishing between data corruption and, say, the inability to allocate memory for a temporary buffer via malloc ().
You can test ChmFile :: read () using files with known contents:
// chmlib/src/lib.rs #[test] fn read_an_item() { let sample = sample_path(); let mut chm = ChmFile::open(&sample).unwrap(); let filename = "/template/packages/core-web/css/index.responsive.css"; // let item = chm.find(filename).unwrap(); // let mut buffer = vec![0; item.length() as usize]; let bytes_written = chm.read(&item, 0, &mut buffer).unwrap(); // assert_eq!(bytes_written, item.length() as usize); // ... , let got = String::from_utf8(buffer).unwrap(); assert!(got.starts_with( "html, body, div#i-index-container, div#i-index-body" )); }
Add examples
We covered most of the APIs of the CHMLib library and many would have finished working on this, considering the porting to be completed successfully. However, it would be nice to make our rack even more user-friendly. β , Rust Go ( , rustdoc godoc ).
, CHMLib , .
, , .
CHM-
CHM- .
/* $Id: enum_chmLib.c,v 1.7 2002/10/09 12:38:12 jedwin Exp $ */ /*************************************************************************** * enum_chmLib.c - CHM archive test driver * * ------------------- * * * * author: Jed Wing <jedwin@ugcs.caltech.edu> * * notes: This is a quick-and-dirty test driver for the chm lib * * routines. The program takes as its input the paths to one * * or more .chm files. It attempts to open each .chm file in * * turn, and display a listing of all of the files in the * * archive. * * * * It is not included as a particularly useful program, but * * rather as a sort of "simplest possible" example of how to * * use the enumerate portion of the API. * ***************************************************************************/ /*************************************************************************** * * * This program is free software; you can redistribute it and/or modify * * it under the terms of the GNU Lesser General Public License as * * published by the Free Software Foundation; either version 2.1 of the * * License, or (at your option) any later version. * * * ***************************************************************************/ #include "chm_lib.h" #include <stdio.h> #include <stdlib.h> #include <string.h> /* * callback function for enumerate API */ int _print_ui(struct chmFile *h, struct chmUnitInfo *ui, void *context) { static char szBuf[128]; memset(szBuf, 0, 128); if(ui->flags & CHM_ENUMERATE_NORMAL) strcpy(szBuf, "normal "); else if(ui->flags & CHM_ENUMERATE_SPECIAL) strcpy(szBuf, "special "); else if(ui->flags & CHM_ENUMERATE_META) strcpy(szBuf, "meta "); if(ui->flags & CHM_ENUMERATE_DIRS) strcat(szBuf, "dir"); else if(ui->flags & CHM_ENUMERATE_FILES) strcat(szBuf, "file"); printf(" %1d %8d %8d %s\t\t%s\n", (int)ui->space, (int)ui->start, (int)ui->length, szBuf, ui->path); return CHM_ENUMERATOR_CONTINUE; } int main(int c, char **v) { struct chmFile *h; int i; for (i=1; i<c; i++) { h = chm_open(v[i]); if (h == NULL) { fprintf(stderr, "failed to open %s\n", v[i]); exit(1); } printf("%s:\n", v[i]); printf(" spc start length type\t\t\tname\n"); printf(" === ===== ====== ====\t\t\t====\n"); if (! chm_enumerate(h, CHM_ENUMERATE_ALL, _print_ui, NULL)) printf(" *** ERROR ***\n"); chm_close(h); } return 0; }
_print_ui() Rust. UnitInfo , , .
// chmlib/examples/enumerate-items.rs fn describe_item(item: UnitInfo) { let mut description = String::new(); if item.is_normal() { description.push_str("normal "); } else if item.is_special() { description.push_str("special "); } else if item.is_meta() { description.push_str("meta "); } if item.is_dir() { description.push_str("dir"); } else if item.is_file() { description.push_str("file"); } println!( " {} {:8} {:8} {}\t\t{}", item.space(), item.start(), item.length(), description, item.path().unwrap_or(Path::new("")).display() ); }
main() , , describe_item() ChmFile::for_each().
// chmlib/examples/enumerate-items.rs fn main() { let filename = env::args() .nth(1) .unwrap_or_else(|| panic!("Usage: enumerate-items <filename>")); let mut file = ChmFile::open(&filename).expect("Unable to open the file"); println!("{}:", filename); println!(" spc start length type\t\t\tname"); println!(" === ===== ====== ====\t\t\t===="); file.for_each(Filter::all(), |_file, item| { describe_item(item); Continuation::Continue }); }
:
$ cargo run --example enumerate-items topics.classic.chm > rust-example.txt $ cd vendor/CHMLib/src $ clang chm_lib.c enum_chmLib.c lzx.c -o enum_chmLib $ cd ../../.. $ ./vendor/CHMLib/src/enum_chmLib topics.classic.chm > c-example.txt $ diff -u rust-example.txt c-example.txt $ echo $? 0
diff , , , , . - , diff.
diff --git a/chmlib/examples/enumerate-items.rs b/chmlib/examples/enumerate-items.rs index e68fa58..ef855ac 100644 --- a/chmlib/examples/enumerate-items.rs +++ b/chmlib/examples/enumerate-items.rs @@ -36,6 +36,10 @@ fn describe_item(item: UnitInfo) { description.push_str("file"); } + if item.length() % 7 == 0 { + description.push_str(" :)"); + } + println!( " {} {:8} {:8} {}\t\t{}", item.space(),
:
$ cargo run --example enumerate-items topics.classic.chm > rust-example.txt $ diff -u rust-example.txt c-example.txt --- rust-example.txt 2019-10-20 16:51:53.933560892 +0800 +++ c-example.txt 2019-10-20 16:40:42.007053966 +0800 @@ -1,9 +1,9 @@ topics.classic.chm: spc start length type name === ===== ====== ==== ==== - 0 0 0 normal dir :) / + 0 0 0 normal dir / 1 5125797 4096 special file /#IDXHDR - 0 0 0 special file :) /#ITBITS + 0 0 0 special file /#ITBITS 1 5104520 148 special file /#IVB 1 5132009 1227 special file /#STRINGS 0 1430 4283 special file /#SYSTEM @@ -13,9 +13,9 @@ ...
!
CHM-
, CHMLib, «» .
/* $Id: extract_chmLib.c,v 1.4 2002/10/10 03:24:51 jedwin Exp $ */ /*************************************************************************** * extract_chmLib.c - CHM archive extractor * * ------------------- * * * * author: Jed Wing <jedwin@ugcs.caltech.edu> * * notes: This is a quick-and-dirty chm archive extractor. * ***************************************************************************/ /*************************************************************************** * * * This program is free software; you can redistribute it and/or modify * * it under the terms of the GNU Lesser General Public License as * * published by the Free Software Foundation; either version 2.1 of the * * License, or (at your option) any later version. * * * ***************************************************************************/ #include "chm_lib.h" #include <stdio.h> #include <stdlib.h> #include <string.h> #ifdef WIN32 #include <windows.h> #include <direct.h> #define mkdir(X, Y) _mkdir(X) #define snprintf _snprintf #else #include <unistd.h> #include <sys/stat.h> #include <sys/types.h> #endif struct extract_context { const char *base_path; }; static int dir_exists(const char *path) { #ifdef WIN32 /* why doesn't this work?!? */ HANDLE hFile; hFile = CreateFileA(path, FILE_LIST_DIRECTORY, 0, NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL); if (hFile != INVALID_HANDLE_VALUE) { CloseHandle(hFile); return 1; } else return 0; #else struct stat statbuf; if (stat(path, &statbuf) != -1) return 1; else return 0; #endif } static int rmkdir(char *path) { /* * strip off trailing components unless we can stat the directory, or we * have run out of components */ char *i = strrchr(path, '/'); if(path[0] == '\0' || dir_exists(path)) return 0; if (i != NULL) { *i = '\0'; rmkdir(path); *i = '/'; mkdir(path, 0777); } #ifdef WIN32 return 0; #else if (dir_exists(path)) return 0; else return -1; #endif } /* * callback function for enumerate API */ int _extract_callback(struct chmFile *h, struct chmUnitInfo *ui, void *context) { LONGUINT64 ui_path_len; char buffer[32768]; struct extract_context *ctx = (struct extract_context *)context; char *i; if (ui->path[0] != '/') return CHM_ENUMERATOR_CONTINUE; /* quick hack for security hole mentioned by Sven Tantau */ if (strstr(ui->path, "/../") != NULL) { /* fprintf(stderr, "Not extracting %s (dangerous path)\n", ui->path); */ return CHM_ENUMERATOR_CONTINUE; } if (snprintf(buffer, sizeof(buffer), "%s%s", ctx->base_path, ui->path) > 1024) return CHM_ENUMERATOR_FAILURE; /* Get the length of the path */ ui_path_len = strlen(ui->path)-1; /* Distinguish between files and dirs */ if (ui->path[ui_path_len] != '/' ) { FILE *fout; LONGINT64 len, remain=ui->length; LONGUINT64 offset = 0; printf("--> %s\n", ui->path); if ((fout = fopen(buffer, "wb")) == NULL) { /* make sure that it isn't just a missing directory before we abort */ char newbuf[32768]; strcpy(newbuf, buffer); i = strrchr(newbuf, '/'); *i = '\0'; rmkdir(newbuf); if ((fout = fopen(buffer, "wb")) == NULL) return CHM_ENUMERATOR_FAILURE; } while (remain != 0) { len = chm_retrieve_object(h, ui, (unsigned char *)buffer, offset, 32768); if (len > 0) { fwrite(buffer, 1, (size_t)len, fout); offset += len; remain -= len; } else { fprintf(stderr, "incomplete file: %s\n", ui->path); break; } } fclose(fout); } else { if (rmkdir(buffer) == -1) return CHM_ENUMERATOR_FAILURE; } return CHM_ENUMERATOR_CONTINUE; } int main(int c, char **v) { struct chmFile *h; struct extract_context ec; if (c < 3) { fprintf(stderr, "usage: %s <chmfile> <outdir>\n", v[0]); exit(1); } h = chm_open(v[1]); if (h == NULL) { fprintf(stderr, "failed to open %s\n", v[1]); exit(1); } printf("%s:\n", v[1]); ec.base_path = v[2]; if (! chm_enumerate(h, CHM_ENUMERATE_ALL, _extract_callback, (void *)&ec)) printf(" *** ERROR ***\n"); chm_close(h); return 0; }
. , .
extract(). , .
// chmlib/examples/extract.rs fn extract( root_dir: &Path, file: &mut ChmFile, item: &UnitInfo, ) -> Result<(), Box<dyn Error>> { if !item.is_file() || !item.is_normal() { // return Ok(()); } let path = match item.path() { Some(p) => p, // , None => return Ok(()), }; let mut dest = root_dir.to_path_buf(); // : CHM ( "/"), // root_dir "/". dest.extend(path.components().skip(1)); // if let Some(parent) = dest.parent() { fs::create_dir_all(parent)?; } let mut f = File::create(dest)?; let mut start_offset = 0; // CHMLib &[u8] (, // ), // let mut buffer = vec![0; 1 << 16]; loop { let bytes_read = file.read(item, start_offset, &mut buffer)?; if bytes_read == 0 { // break; } else { // start_offset += bytes_read as u64; f.write_all(&buffer)?; } } Ok(()) }
main() , extract(), .
// chmlib/examples/extract.rs fn main() { let args: Vec<_> = env::args().skip(1).collect(); if args.len() != 2 || args.iter().any(|arg| arg.contains("-h")) { println!("Usage: extract <chm-file> <out-dir>"); return; } let mut file = ChmFile::open(&args[0]).expect("Unable to open the file"); let out_dir = PathBuf::from(&args[1]); file.for_each(Filter::all(), |file, item| { match extract(&out_dir, file, &item) { Ok(_) => Continuation::Continue, Err(e) => { eprintln!("Error: {}", e); Continuation::Stop }, } }); }
CHM- HTML-, -.
$ cargo run --example extract -- ./topics.classic.chm ./extracted $ tree ./extracted ./extracted βββ default.html βββ BrowserForward.html ... βββ Images β βββ Commands β β βββ RealWorld β β βββ BrowserBack.bmp ... βββ script β βββ _community β β βββ disqus.js β βββ hs-common.js ... βββ userinterface.html $ firefox topics.classic/default.html ( default.html Firefox)
JavaScript ( - Microsoft Help), , .
What's next?
chmlib , , crates.io.
:
- ChmFile::for_each() ChmFile::for_each_item_in_dir() , , .
- , ChmFile
Continuation::Continue
. ,F: FnMut(&mut ChmFile, UnitInfo) -> C
C: Into<Continuation>
,impl From<()> for Continuation
. - (, extract()) ChmFile::for_each() .
impl<E> From<Result<(), E>> for Continuation where E: Error + 'static
. - -
std::fs::File
. , ChmFile::read() -std::io::Writer
.