Cage Remote File Access System

System purpose

Support for remote access to files on computers on the network. The system “virtually” supports all basic file operations (creation, deletion, reading, writing, etc.) by exchanging transactions (messages) using the TCP protocol.

Areas of use

The functionality of the system is effective in the following cases:

in native applications for mobile and embedded devices (smartphones, on-board control systems, etc.) that require quick access to files on remote servers in the conditions of probable temporary interruptions in communication (with going offline);
in loaded DBMSs, if requests are processed on some servers and data storage on others;
in distributed corporate networks for the collection and processing of information that require high speed data exchange, redundancy and reliability;
in complex systems with microservice architecture, where delays in the exchange of information between modules are critical.

Structure

The Cage system (there is an implementation - beta version on Python 3.7 in Windows OS) includes two main parts:

Cageserver is a file server program (function package) that runs on computers on a network that need remote access to files;
Cage class with a library of methods for client software that simplifies the coding of server interactions.

Using the system on the client side

The methods of the Cage class replace the usual, “routine” file system operations: creating, opening, closing, deleting files, as well as reading / writing data in binary format (indicating the position and size of the data). Conceptually, these methods are close to the file functions of the C language, where the opening / closing of files is performed “on the channels” of input-output.

In other words, the programmer does not work with methods of "file" objects (class _io in Python), but with methods of the Cage class.

When creating an instance of the Cage object, it establishes the initial connection with the server (or several servers), passes authorization by the client Id, and receives confirmation with the dedicated port number for all file operations. When a Cage object is deleted, it instructs the server to end communication and close the files. Termination of communication can initiate and the servers themselves.

The system improves read / write performance based on buffering frequently used file fragments of client programs in the cache (buffer) of RAM.

Client software can use any number of Cage objects with various settings (the amount of buffer memory, the size of blocks when exchanging with the server, etc.).

A single Cage object can exchange data with multiple files on multiple servers. The parameters for communication (IP address or DNS server, main port for authorization, path and file name) are set when creating the object.

Since each Cage object can work with multiple files at the same time, the shared memory space is used for buffering. Cache Size - the number of pages and their size, is set dynamically when creating a Cage object. For example, a 1 GB cache means 1,000 pages of 1 MB each, or 10 thousand pages of 100 KB each, or 1 million pages of 1 KB each. The choice of page size and number of pages is a specific task for each application.

You can use several Cage objects at the same time to define different buffer memory settings depending on the features of access to information in different files. As the basic one, the simplest buffering algorithm is applied: after the specified amount of memory has been exhausted, new pages crowd out the old pages on the principle of retirement with a minimum number of hits. Buffering is especially effective in the case of uneven (in the statistical sense) sharing, firstly, to different files, and secondly, to fragments of each file.

The Cage class supports input / output not only at data addresses (indicating the position and length of the array, “replacing” file system operations), but also at a lower, “physical” level - by page numbers in the buffer memory.

For Cage objects, the original function of “hibernation” (“sleep”) is supported - they can be “minimized” (for example, in case of disconnection from the server, or when the application is stopped, etc.) to a local dump file on the client side and quickly restore from this file (after resuming communication, when the application is restarted). This makes it possible to significantly reduce traffic when activating the client program after a temporary “offline”, as frequently used fragments of files will already be in the cache.

Cage is about 3,600 lines of code.

Principles of building servers

Cageserver file servers can be started with an arbitrary number of ports, one of which (the “main”) is used only for authorization of all clients, the rest for data exchange. The Cage server program requires only Python. In parallel, a computer with a file server can perform any other work.

The server starts initially as a combination of two main processes:

“Connections” - a process for performing operations of establishing communication with clients and its termination at the server’s initiative;
“Operations” - a process for completing tasks (operations) of clients on working with files, as well as for closing communication sessions on client commands.

Both processes are not synchronized and organized as endless cycles of receiving and sending messages based on multiprocess queues, proxy objects, locks and sockets.

The “Connections” process provides each client with a port for receiving and transmitting data. The number of ports is set when the server starts. The correspondence between ports and clients is stored in a proxy memory shared between processes.

The Operations process supports the separation of file resources, and several different clients can read data from one file together ( quasi-parallel , since access is controlled by locks), if this was allowed when it was first opened by the "first" client.

Processing of commands for creating / deleting / opening / closing files on a server is carried out in the "Operations" process strictly sequentially using the file subsystem of the server OS.

For general read / write acceleration, these operations are carried out in threads generated by the "Operations" process. The number of threads is usually equal to the number of open files. Read / write tasks from clients are submitted to the general queue and the first freed stream takes the task out of her head. Special logic eliminates data overwriting operations in server RAM.

The "Operations" process monitors the activity of customers and stops their service both by their commands and when the inactivity timeout is exceeded.

To ensure reliability, Cageserver logs all transactions. One general journal contains copies of messages from clients with tasks to create / open / rename / delete files. A separate log is created for each working file, in which copies of messages with tasks for reading and writing data in this working file are recorded, as well as arrays of recorded (new) data and arrays of data that were destroyed when overwriting (writing new data “over” old )

These logs provide an opportunity both to restore new changes in backups and to “roll back” from the current content to the right moment in the past.

Cageserver is about 3,100 lines of code.

Starting the Cageserver file server program

When starting in the dialog, you need to determine:

- main port for authorization;

- the number of ports for exchanging transactions with authorized clients (from 1 or more, the pool of numbers begins with the next following the number of the main port).

Using the Cage Class

class cage. Cage ( cage_name = "", pagesize = 0, numpages = 0, maxstrlen = 0, server_ip = {}, wait = 0, awake = False, cache_file = "" )

From this class, objects are created that interact with file servers and contain buffer memory.

Options

cage_name ( str ) - the conditional name of the object that is used to identify clients on the server side
pagesize ( int ) - size of one page of buffer memory (in bytes)
numpages ( int ) - the number of pages of buffer memory
maxstrlen ( int ) - maximum byte string length in write and read operations
server_ip ( dict ) - a dictionary with the addresses of the servers used, where the key is the conditional name of the server (server id inside the application), and the value is a string with the address: “ip address: port” or “DNS: port” (matching names and real addresses is temporary , it can be changed)
wait ( int ) - time to wait for a response from the server when receiving ports (in seconds)
awake ( boolean ) - flag of the method of creating the object ( False - if a new object is created, True - if the object is created from a previously “minimized” one - using the “hibernation” operation, by default False)
cache_file ( str ) - file name for hibernation

Methods

Cage. file_create ( server, path ) - create a new file

Cage. file_rename ( server, path, new_name ) - rename file

Cage. file_remove ( server, path ) - delete the file

Cage. open ( server, path, mod ) - open file

Returns the fchannel channel number. The mod parameter is the file open mode: “wm” is exclusive (read / write), “rs” is read-only, and shared only by other clients, ws is read / write, and shared only by other clients.

Cage. close ( fchannel ) - close the file

Cage. write ( fchannel, begin, data ) - write a byte string to a file

Cage. read ( fchannel, begin, len_data ) - read a byte string from a file

Cage. put_pages ( fchannel ) - “pushes” from the buffer to the server all the pages of the specified channel that have been modified. It is used at those points in the algorithm when you need to be sure that all operations on the channel are physically stored in a file on the server.

Cage. push_all () - “pushes” from the buffer to the server all the pages of all channels for the Cage class instance that have been modified. Used when you need to be sure that all operations on all channels are stored on the server.

All Articles