Linux Kernel Library: Linux kernel in so or dll form factor

Once I read an article about the choice of file systems "so that it works everywhere, everywhere." Once again I saw complaints in it that Ext4 is a wonderful file system, but on Windows there is only the curves inaccurate proprietary drivers. But we rewind the tape a couple of years ago: then on Habré (and in those days - Giktayms) the news about LibOS flew by - an attempt to turn Linux kernel into a regular user-mode library. The emphasis there was on the removal of the network stack in user space. Once, I decided to take a look and see if the project was even alive, and on their blog I saw a link to a kind of competitor - the Linux Kernel Library (LKL) project. In fact, this is the kernel port, so to speak, to the hardware architecture "POSIX / Win32 user-mode library".







Why is LKL interesting? Firstly, by the fact that she lives and lives, albeit not in the core code base of the kernel. Secondly, it is more or less honest support for the "architecture", which automatically makes available most of the kernel. Moreover, example utilities are right in the kit: cptofs



/ cpfromfs



, fs2tar



, lklfuse



. In this article we will test LKL on host Linux, look at the file with the Ext4 image (Btrfs, XFS ...) without root and virtual machines, and briefly discuss how it can be tried on Windows.







DISCLAIMER 1: Want to try - make backups. If you want to do this with a section with important data - at your own peril and risk. However, here at least the drivers will be really native.







DISCLAIMER 2: Respect licenses. LKL linking probably makes your GPL program.







Primary acquaintance



The LKL repository ( lkl / linux on GitHub) is a fork of the usual Linux kernel, which adds support for another architecture, we will mainly see this in the arch/lkl



and tools/lkl



. Let's make a clone of the repository and try to assemble according to the instructions. For experiments, I will use a shallow clone, which does not contain the entire history of the repository, but only the specified number of recent commits:







 $ git clone https://github.com/lkl/linux.git lkl-linux --depth 10 $ cd lkl-linux $ patch -p 1 <<EOF diff --git a/tools/lkl/lib/hijack/xlate.cb/tools/lkl/lib/hijack/xlate.c index 03ccc6294..75368dcc2 100644 --- a/tools/lkl/lib/hijack/xlate.c +++ b/tools/lkl/lib/hijack/xlate.c @@ -3,6 +3,7 @@ #include <fcntl.h> #include <sys/ioctl.h> #include <sys/socket.h> +#include <linux/sockios.h> #undef st_atime #undef st_mtime #undef st_ctime EOF $ make -C tools/lkl -j4
      
      





I had to fix the source a bit, but in the end I got the library tools/lkl/lib/liblkl.so



(and also the static tools/lkl/liblkl.a



):







nm -D tools / lkl / lib / liblkl.so
  U __assert_fail U bind U calloc U clock_gettime U close w __cxa_finalize 0000000000063b30 T dbg_entrance 0000000000063f30 T dbg_handler U __errno_location U fcntl U fdatasync 0000000000639580 D fd_net_ops U fgets U __fprintf_chk U free U fwrite U getc U getenv w __gmon_start__ U if_nametoindex U inet_pton U ioctl U __isoc99_scanf w _ITM_deregisterTMCloneTable w _ITM_registerTMCloneTable 0000000000061750 T jmp_buf_longjmp 0000000000061720 T jmp_buf_set 0000000000065470 T jsmn_init 0000000000065060 T jsmn_parse 0000000000065490 T jsmn_strerror 00000000000614c0 T lkl_add_gateway 0000000000061290 T lkl_add_neighbor 00000000000621a0 T lkl_bug 000000000005f070 T lkl_closedir 0000000000639520 D lkl_dev_blk_ops 000000000005fa10 T lkl_dirfd 0000000000062640 T lkl_disk_add 0000000000062780 T lkl_disk_remove 000000000005ec50 T lkl_encode_dev_from_sysfs 000000000005f9f0 T lkl_errdir 000000000005ef80 T lkl_fdopendir 0000000000067f10 T lkl_get_free_irq 000000000005f2c0 T lkl_get_virtio_blkdev 00000000006395c0 D lkl_host_ops 00000000000614b0 T lkl_if_add_gateway 00000000000613e0 T lkl_if_add_ip 00000000000614a0 T lkl_if_add_linklocal 0000000000061520 T lkl_if_add_rule_from_saddr 0000000000061480 T lkl_if_del_ip 0000000000060d70 T lkl_if_down 0000000000060b10 T lkl_ifname_to_ifindex 0000000000061400 T lkl_if_set_ipv4 0000000000061530 T lkl_if_set_ipv4_gateway 0000000000061430 T lkl_if_set_ipv6 00000000000615b0 T lkl_if_set_ipv6_gateway 0000000000060ef0 T lkl_if_set_mtu 0000000000060bf0 T lkl_if_up 0000000000061160 T lkl_if_wait_ipv6_dad 000000000005fba0 T lkl_iomem_access 000000000005fb50 T lkl_ioremap 0000000000067730 T lkl_is_running 0000000000066150 T lkl_load_config_env 0000000000065950 T lkl_load_config_json 0000000000066880 T lkl_load_config_post 0000000000066510 T lkl_load_config_pre 000000000005f470 T lkl_mount_dev 000000000005eae0 T lkl_mount_fs 00000000000642a0 T lkl_netdev_add 00000000000645c0 T lkl_netdev_free 0000000000061030 T lkl_netdev_get_ifindex 0000000000064e70 T lkl_netdev_macvtap_create 0000000000064ed0 T lkl_netdev_pipe_create 0000000000064ce0 T lkl_netdev_raw_create 00000000000644c0 T lkl_netdev_remove 0000000000064c60 T lkl_netdev_tap_create 0000000000064a10 T lkl_netdev_tap_init 000000000005eea0 T lkl_opendir 0000000000062170 T lkl_perror 00000000000620b0 T lkl_printf 0000000000067f90 T lkl_put_irq 0000000000061620 T lkl_qdisc_add 0000000000061630 T lkl_qdisc_parse_add 000000000005f0f0 T lkl_readdir 0000000000063f80 T lkl_register_dbg_handler 0000000000064930 T lkl_register_netdev_fd 000000000005efe0 T lkl_rewinddir 000000000005fa20 T lkl_set_fd_limit 00000000000614e0 T lkl_set_ipv4_gateway 0000000000061500 T lkl_set_ipv6_gateway 0000000000065f60 T lkl_show_config 00000000004f51ad T lkl_start_kernel 0000000000062080 T lkl_strerror 00000000000685f0 T lkl_syscall 0000000000062270 T lkl_sysctl 0000000000062410 T lkl_sysctl_parse_write 0000000000067770 T lkl_sys_halt 00000000000680e0 T lkl_trigger_irq 000000000005f870 T lkl_umount_dev 000000000005edc0 T lkl_umount_timeout 0000000000066ed0 T lkl_unload_config 00000000008186a0 B lkl_virtio_devs U __longjmp_chk U lseek64 U malloc U memchr U memcpy U memset U open U perror U pipe U poll 0000000000064070 T poll_thread U pread64 U __printf_chk U pthread_create U pthread_detach U pthread_exit U pthread_getspecific U pthread_join U pthread_key_create U pthread_key_delete U pthread_mutexattr_init U pthread_mutexattr_settype U pthread_mutex_destroy U pthread_mutex_init U pthread_mutex_lock U pthread_mutex_unlock U pthread_self U pthread_setspecific U puts U pwrite64 U read U readv 00000000008196a0 B registered_devs 000000000005fa90 T register_iomem U sem_destroy U sem_init U sem_post U sem_wait U _setjmp U setsockopt U sigaction U sigemptyset U __snprintf_chk U socket U __stack_chk_fail U stderr U stdin U stpcpy U strchr U strcpy U __strcpy_chk U strdup U strerror U strlen U strncat U __strncat_chk U strncmp U strncpy U strrchr U strtok U strtok_r U strtol U strtoul U syscall U timer_create U timer_delete U timer_settime 000000000005fb00 T unregister_iomem U usleep 0000000000063110 T virtio_dev_cleanup 0000000000062ee0 T virtio_dev_setup 0000000000063100 T virtio_get_num_bootdevs 0000000000062c10 T virtio_process_queue 0000000000062af0 T virtio_req_complete 0000000000062ec0 T virtio_set_queue_max_merge_len U __vsnprintf_chk U write U writev
      
      





And where are the system calls, you ask. Without panic, they are hidden behind the common entry point lkl_syscall



. This is an analogue of the syscall



function for LKL. In a real situation, in most cases you will use the typed wrappers lkl_sys_<name>



. We also see all sorts of functions for configuring the "kernel", adding virtual devices to it, as well as wrappers over "complex" system calls provided by libc on a regular system. For example, there is such a system call getdents



, but ... "These are not the interfaces you are interested in." - the man page tells us from the threshold. In ordinary cases, it is supposed to use the standard library function readdir (3)



, but do not confuse it with readdir (2)



, an ancient system call that was not even implemented on x86_64. In the case of working with LKL, you will need the lkl_opendir



/ lkl_readdir



/ lkl_closedir



.







Let's try to write something



Remember, respect licenses. Linux kernel itself is distributed under GPL2, whether the program pulling for relatively public LKL interfaces will be considered derivative work - I do not know.







Well, let's try linking to the library. It is assumed that the variable $LKL



assigned the path to the repository with compiled LKL.







 #include <stdio.h> #include "lkl_host.h" #include "lkl.h" int main() { // lkl_host_ops       // "-" : `printk`, `panic`, ... // ,    --     lkl_start_kernel(&lkl_host_ops, "mem=128M"); return 0; }
      
      





Compile:







 $ gcc test.c -o test -I$LKL/tools/lkl/include -L$LKL/tools/lkl/lib -llkl
      
      





And it works!







 $ ./test ./test: error while loading shared libraries: liblkl.so: cannot open shared object file: No such file or directory $ LD_LIBRARY_PATH=$LKL/tools/lkl/lib ./test [ 0.000000] Linux version 5.3.0+ (trosinenko@trosinenko-pc) (gcc version 9.2.1 20191008 (Ubuntu 9.2.1-9ubuntu2)) #1 Tue Dec 3 14:37:02 MSK 2019 [ 0.000000] memblock address range: 0x7fba8c000000 - 0x7fba93fff000 [ 0.000000] Built 1 zonelists, mobility grouping on. Total pages: 32319 [ 0.000000] Kernel command line: mem=128M [ 0.000000] Dentry cache hash table entries: 16384 (order: 5, 131072 bytes, linear) [ 0.000000] Inode-cache hash table entries: 8192 (order: 4, 65536 bytes, linear) [ 0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off [ 0.000000] Memory available: 129044k/131068k RAM [ 0.000000] SLUB: HWalign=32, Order=0-3, MinObjects=0, CPUs=1, Nodes=1 [ 0.000000] NR_IRQS: 4096 [ 0.000000] lkl: irqs initialized [ 0.000000] clocksource: lkl: mask: 0xffffffffffffffff max_cycles: 0x1cd42e4dffb, max_idle_ns: 881590591483 ns [ 0.000000] lkl: time and timers initialized (irq1) [ 0.000003] pid_max: default: 4096 minimum: 301 [ 0.000019] Mount-cache hash table entries: 512 (order: 0, 4096 bytes, linear) [ 0.000022] Mountpoint-cache hash table entries: 512 (order: 0, 4096 bytes, linear) [ 0.003622] random: get_random_bytes called from _etext+0xbcdb/0x14b05 with crng_init=0 [ 0.003692] printk: console [lkl_console0] enabled [ 0.003707] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 19112604462750000 ns [ 0.003714] xor: automatically using best checksumming function 8regs [ 0.003783] NET: Registered protocol family 16 [ 0.171647] raid6: int64x8 gen() 4489 MB/s [ 0.343119] raid6: int64x8 xor() 3165 MB/s [ 0.514836] raid6: int64x4 gen() 4668 MB/s [ 0.689529] raid6: int64x4 xor() 3256 MB/s [ 0.861155] raid6: int64x2 gen() 6283 MB/s [ 1.032668] raid6: int64x2 xor() 3793 MB/s [ 1.206752] raid6: int64x1 gen() 5185 MB/s [ 1.378219] raid6: int64x1 xor() 2901 MB/s [ 1.378225] raid6: using algorithm int64x2 gen() 6283 MB/s [ 1.378227] raid6: .... xor() 3793 MB/s, rmw enabled [ 1.378229] raid6: using intx1 recovery algorithm [ 1.378333] clocksource: Switched to clocksource lkl [ 1.378427] NET: Registered protocol family 2 [ 1.378516] tcp_listen_portaddr_hash hash table entries: 256 (order: 0, 4096 bytes, linear) [ 1.378521] TCP established hash table entries: 1024 (order: 1, 8192 bytes, linear) [ 1.378527] TCP bind hash table entries: 1024 (order: 1, 8192 bytes, linear) [ 1.378532] TCP: Hash tables configured (established 1024 bind 1024) [ 1.378596] UDP hash table entries: 128 (order: 0, 4096 bytes, linear) [ 1.378618] UDP-Lite hash table entries: 128 (order: 0, 4096 bytes, linear) [ 1.379286] workingset: timestamp_bits=62 max_order=16 bucket_order=0 [ 1.380271] SGI XFS with ACLs, security attributes, no debug enabled [ 1.380864] io scheduler mq-deadline registered [ 1.380872] io scheduler kyber registered [ 1.383396] NET: Registered protocol family 10 [ 1.383763] Segment Routing with IPv6 [ 1.383779] sit: IPv6, IPv4 and MPLS over IPv4 tunneling driver [ 1.384091] Btrfs loaded, crc32c=crc32c-generic [ 1.384223] Warning: unable to open an initial console. [ 1.384237] This architecture does not have kernel memory protection. [ 1.384239] Run /init as init process
      
      





You can even see from timestamps that the kernel didn’t just “spit out” this text into the console, but it was beautifully gradually loaded like real .







We complicate the experiment



Let's now try to somehow really use this library - after all, the whole kernel of the OS! Let's try to read the file from the Ext4 section cleanly in user space. And the "native" driver! We take tools/lkl/cptofs.c



and implement only the most necessary (for clarity):







 #undef NDEBUG #include <stdio.h> #include <stdint.h> #include <string.h> #include <assert.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <stdlib.h> #include "lkl_host.h" #include "lkl.h" //        , //     --     :) int main(int argc, const char *argv[]) { const char * const fsimage = argv[1]; const char * const fstype = argv[2]; const char * const file_to_dump = argv[3]; struct lkl_disk disk; int disk_id, ret; char mpoint[128]; //      memset(&disk, 0, sizeof(disk)); disk.fd = open(fsimage, O_RDONLY); assert(disk.fd >= 0); //      disk_id = lkl_disk_add(&disk); assert(disk_id >= 0); //   lkl_start_kernel(&lkl_host_ops, "mem=128M"); //      ret = lkl_mount_dev(disk_id, 0 /* part */, fstype, LKL_MS_RDONLY, NULL, mpoint, sizeof(mpoint)); if (ret < 0) { fprintf(stderr, "lkl_mount_dev failed: %s\n", lkl_strerror(ret)); close(disk.fd); exit(1); } // ,    ... // ( -libc ) struct lkl_dir *dir = lkl_opendir(mpoint, &ret); struct lkl_linux_dirent64 *dent; while ((dent = lkl_readdir(dir)) != NULL) { fprintf(stderr, "Directory entry: %s\n", dent->d_name); } //     : NULL --     ... lkl_closedir(dir); //   - //        char tmp[256]; uint8_t buffer[65536]; snprintf(tmp, sizeof(tmp), "%s/%s", mpoint, file_to_dump); int fd = lkl_sys_open(tmp, LKL_O_RDONLY, 0); fprintf(stderr, "fd = %d\n", fd); assert(fd >= 0); int count = lkl_sys_read(fd, buffer, sizeof(buffer)); /*  */ write(STDERR_FILENO, buffer, count); lkl_sys_close(fd); return 0; }
      
      





Pay attention to renamed definitions with LKL_ prefixes (for example, LKL_O_RDONLY



): on a Linux host, they most likely coincide with those without prefixes, but on other systems this is not a fact.







 $ mke2fs ext4.img -t ext4 32M $ sudo mount ext4.img /mnt $ echo -e "Hello world\!\nTEST" | sudo tee /mnt/test.txt $ sudo umount /mnt $ LD_LIBRARY_PATH=$LKL/tools/lkl/lib ./read-file ext4.img ext4 test.txt [ 0.000000] Linux version 5.3.0+ (trosinenko@trosinenko-pc) (gcc version 9.2.1 20191008 (Ubuntu 9.2.1-9ubuntu2)) #1 Tue Dec 3 14:37:02 MSK 2019 // ... // [ 1.378960] Warning: unable to open an initial console. [ 1.378975] This architecture does not have kernel memory protection. [ 1.378977] Run /init as init process [ 1.379852] EXT4-fs (vda): mounted filesystem with ordered data mode. Opts: Directory entry: test.txt Directory entry: .. Directory entry: lost+found Directory entry: . fd = 0 Hello world\! TEST
      
      





Wow, it works! Is there anything more exotic?







 $ mksquashfs test.c read-file.c squashfs.img $ LD_LIBRARY_PATH=$LKL/tools/lkl/lib ./read-file squashfs.img squashfs test.c [ 0.000000] Linux version 5.3.0+ (trosinenko@trosinenko-pc) (gcc version 9.2.1 20191008 (Ubuntu 9.2.1-9ubuntu2)) #1 Tue Dec 3 14:37:02 MSK 2019 // ... // [ 1.378472] This architecture does not have kernel memory protection. [ 1.378474] Run /init as init process lkl_mount_dev failed: No such device
      
      





Oh! Although, wait, we probably just didn’t include SquashFS support in our core library!







Configure LKL build options



For myself, I developed a sequence of commands that works for LKL - perhaps it can be reduced to the traditional make defconfig



, make menuconfig



, make



.







 $ make defconfig ARCH=lkl $ make menuconfig ARCH=lkl ////   SquashFS      $ cp .config arch/lkl/configs/defconfig $ make mrproper $ make -C tools/lkl -j4 #    
      
      





And voila!







 $ gcc read-file.c -o read-file -I$LKL/tools/lkl/include -L$LKL/tools/lkl/lib -llkl $ LD_LIBRARY_PATH=$LKL/tools/lkl/lib ./read-file squashfs.img squashfs test.c [ 0.000000] Linux version 5.3.0+ (trosinenko@trosinenko-pc) (gcc version 9.2.1 20191008 (Ubuntu 9.2.1-9ubuntu2)) #1 Wed Dec 4 12:07:50 MSK 2019 // ... // [ 1.378346] This architecture does not have kernel memory protection. [ 1.378348] Run /init as init process Directory entry: . Directory entry: .. Directory entry: read-file.c Directory entry: test.c fd = 0 #include <stdio.h> #include "lkl_host.h" #include "lkl.h" int main() { lkl_start_kernel(&lkl_host_ops, "mem=128M"); return 0; }
      
      





In this case, however, it was hardly necessary to recompile read-file.c



- the library is dynamic.







Excuse me, where are the promised ready-made programs?



Indeed, the tools/lkl



contains cptofs.c



, fs2tar.c



and much more, but it is not going to! Having rummaged through Makefiles, I found that there is a certain Makefile.autoconf



that is looking for the required header files, and Makefile.conf



, where all this is written.







So, someone wants libarchive



, someone wants libarchive



- well, let's put libarchive-dev



, libfuse-dev



(in the case of Ubuntu) and rebuild. All the same, it doesn’t work ... And if you remove Makefile.conf



... Oops, it got together!







So what do we have now? Now in the tools/lkl



we have cptofs



, fs2tar



and lklfuse



.







First, copy cptofs



as cpfromfs



:







 $ $LKL/tools/lkl/cptofs --help Usage: cptofs [OPTION...] -t fstype -i fsimage path... fs_path Copy files to a filesystem image -i, --filesystem-image=string path to the filesystem image - mandatory -p, --enable-printk show Linux printks -P, --partition=int partition number -s, --selinux=string selinux attributes for destination -t, --filesystem-type=string select filesystem type - mandatory -?, --help Give this help list --usage Give a short usage message Mandatory or optional arguments to long options are also mandatory or optional for any corresponding short options. $ cp $LKL/tools/lkl/cp{to,from}fs $ $LKL/tools/lkl/cpfromfs --help Usage: cpfromfs [OPTION...] -t fstype -i fsimage fs_path... path Copy files from a filesystem image -i, --filesystem-image=string path to the filesystem image - mandatory -p, --enable-printk show Linux printks -P, --partition=int partition number -s, --selinux=string selinux attributes for destination -t, --filesystem-type=string select filesystem type - mandatory -?, --help Give this help list --usage Give a short usage message Mandatory or optional arguments to long options are also mandatory or optional for any corresponding short options.
      
      





As the saying goes, "What do you call a yacht ...". We start ...







 $ $LKL/tools/lkl/cpfromfs -t ext4 -i ext4.img test.txt . error processing entry /mnt/0000fe00/test.txt, aborting
      
      





Hmm ... We have to see ... However, for interactive use it is still inconvenient, because each time you have to wait about a second until the kernel boots. But fs2tar



works without problems:







 $ $LKL/tools/lkl/fs2tar -t ext4 ext4.img ext4.tar $ tar -tf ext4.tar tar:   `/'    /test.txt /lost+found/
      
      





But the most interesting program here in my opinion is lklfuse



:







 $ mkdir mountpoint $ $LKL/tools/lkl/lklfuse -o type=ext4 ext4.img mountpoint/ $ ls mountpoint/ lost+found test.txt $ mount sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime) proc on /proc type proc (rw,nosuid,nodev,noexec,relatime) ...   ,       /dev/fuse on /run/user/1000/doc type fuse (rw,nosuid,nodev,relatime,user_id=1000,group_id=1000) lklfuse on /path/to/mountpoint type fuse.lklfuse (rw,nosuid,nodev,relatime,user_id=1000,group_id=1000) $ echo ABC > mountpoint/ABC.XYZ $ umount mountpoint $ sudo mount ext4.img /mnt $ ls /mnt $ cat /mnt/ABC.XYZ ABC
      
      





In my opinion, it is impressive: you can mount the file system through FUSE without root (but it depends on the system settings), work with it, unmount it, and then connect it to the host kernel (already with root) and continue as if nothing had happened.







Not only that, lklfuse



allows an ordinary user to mount a partition using a regular kernel driver. The host kernel does not have to be built with support for this FS. But what’s there, I won’t be surprised if it all starts the same way on OS X.







A bit about cross-platform



And what about access to Linux FS from other operating systems? On OS X, I think it will be simpler: after all, it is a full-fledged UNIX, and FUSE support seems to be there. So there is hope that it will start on the move. If not, I would look in the direction of checking whether constants with LKL_



prefixes are passed everywhere to the LKL system calls, and not their host counterparts.







Windows is a bit more complicated: first, there may not be some commonplace libraries in the UNIX world (for example, for parsing command line arguments). Secondly, you need to understand how to mount to the host tree of file systems. The simplest would be - also through FUSE. They say that once there was a certain Dokan, now there is also something, but you need to google it. The main thing is that LKL itself is built on Windows, you only need to consider that it needs a 64-bit long



type to work in 64-bit mode, so not every compiler will do (at least, as it says in the current readme of the project).








All Articles