
A short note about the embedded key-value database called Coffer
, written in Golang. If very briefly: when the database is in a stopped state, the data is on the disk, when it starts, the data is copied to memory. Reading comes from memory. When recording, the memory data is changed, and the changes are written to the disk on a log. The maximum size of the stored data is limited by the size of the RAM. The API allows you to create headers for database records and apply them in transactions, while maintaining data consistency.
But first, a little lyrical introduction. Once upon a time, when the grass was greener, it took me to embed a key-value database for the go-application. Looking around and bumping into different packages, I somehow did not find what I would like (subjectively), and just applied the solution with an external relational database. Great working solution. But as they say, a spoon was found, but the sediment remained. First of all, I wanted exactly the native, written on Go database, directly native-native. And there are such, just look awesome-go. However, there are not a million of them. This is even surprising when you consider that a programmer is rare in the world who did not write a database, framework or casual game in his life.
Well, you can try and pile your bike on your knee, with blackjack and other goodies. At the same time, everyone knows, or at least guesses, that writing even a simple key-value database seems simple only at first glance. But in fact, everything is much more fun (and it happened). And I was overcome by curiosity about ACID and transactions worried. True transactions are more likely in the financial sense, because I was then busy at fintech.
Data security
Consider the case when, during the operation of an application with active recording, the power supply in the computer was covered with a copper basin and the disk did not break. If at this moment the application received ok
from the database, then the data from this operation will not be lost. If the application received a negative answer, then of course, the operation was not completed. Well, the case where the application sent a request, but did not receive a response: this operation most likely did not complete, but there is a small chance that the operation fell into the log, but exactly at the time the response was sent, the power turned off.
How at the last case to find out what happened with the last operations? This is an interesting question. Indirectly, you can guess about it (draw conclusions) by looking at the value of the record of interest after a new application launch from the database. However, if the operations are frequent enough, I'm afraid it will not help. You can see the file of the last log (it will be with the highest number), but manually this is inconvenient. I think that in the future you can add the ability to view logs in the API (naturally, the logs in this case should not be deleted).
Frankly, I myself did not pull the cord out of the socket, because I do not want to risk iron for the sake of checking the database. In tests, I just corrupt normal log files, and in this case, everything happens as I expected. However, there is no experience in the practical use of the database, it did not work on the prod, and there are risks. However, for pet projects, I think the database can be used quite fearlessly. In general, the usual disclaimer, no guarantees.
The database is currently not protected from being used in two different applications (or the same, it doesn’t matter here) configured to work with the same directory. Please take this point into account! And yet, since the database is built-in, then passing it some kind of reference type in the arguments, it’s definitely not worth changing it somewhere in parallel goroutine.
Configuration
The database has quite a few parameters that can be configured, but almost all of them have default values, so everything can be fit into one short line cof, err, wrn := Db(dirPath).Create()
error is returned (if an error continues, further work with the database prohibited) and warning, which you can know about, but this does not interfere with the operation of the database.
I will not clutter up the text with bulky descriptions, if necessary, please watch them in the readme of the repository - github.com/claygod/coffer/blob/master/README_RU.md#config Pay attention to the Handler method, which connects the handler for the transaction, I'll write a couple of lines about it lower, here I just list them:
- Db (dirPath)
- BatchSize (batchSize)
- LimitRecordsPerLogfile (limitRecordsPerLogfile)
- FollowPause (100 * time.Second)
- LogsByCheckpoint (1000)
- AllowStartupErrLoadLogs (true)
- MaxKeyLength (maxKeyLength)
- MaxValueLength (maxValueLength)
- MaxRecsPerOperation (1,000,000)
- RemoveUnlessLogs (true)
- LimitMemory (100 * 1,000,000)
- LimitDisk (1000 * 1,000,000)
- Handler ("handler1", & handler1)
- Handler ("handler2", & handler2)
- Handlers (map [string] * handler)
- Create ()
API
As far as possible, I made the API simple, and for a key-value base, do not be too smart:
- Start - start the database
- Stop - stop the database
- StopHard - a stop regardless of the operations being performed right now (I will probably remove it)
- Save - save a snapshot of the current state of the database
- Write - add one record to the database
- WriteList - add several records to the database (strict and optional modes)
- WriteListUnsafe - add multiple records to the database without regard to data security
- Read - get one record by key
- ReadList - get a list of records
- ReadListUnsafe - get a list of records without regard to data security
- Delete - delete one record
- DeleteList - delete multiple records in strict / optional mode
- Transaction - execute a transaction
- Count - how many records in the database
- CountUnsafe - how many records in the database (a little faster, but unsafe)
- RecordsList - a list of all database keys
- RecordsListUnsafe - a list of all database keys (a little faster, but unsafe)
- RecordsListWithPrefix - a list of keys with the specified prefix
- RecordsListWithSuffix - list of keys with the specified end
Short explanations to the API:
- Strict mode - do it all or nothing.
- Optional mode - do everything that works.
- StopHard - perhaps this method should be removed from the API until it is decided.
- All RecordsList methods are not fast, because There are no indexes in the store right now, while this is a fullscan.
- All Unsafe methods are faster, but consistency is not implied when using them. It is logical to use them on a stopped DB for its quick filling or something else in the same vein.
- The follower monitors the regular updating of the database snapshot, so the Save method is most likely for some special cases when you definitely want to create a new snapshot (until such a case comes to my mind, but maybe it is).
A simple use case:
package main import ( "fmt" "github.com/claygod/coffer" ) const curDir = "./" func main() { // STEP init db, err, wrn := coffer.Db(curDir).Create() switch { case err != nil: fmt.Println("Error:", err) return case wrn != nil: fmt.Println("Warning:", err) return } if !db.Start() { fmt.Println("Error: not start") return } defer db.Stop() // STEP write if rep := db.Write("foo", []byte("bar")); rep.IsCodeError() { fmt.Sprintf("Write error: code `%d` msg `%s`", rep.Code, rep.Error) return } // STEP read rep := db.Read("foo") rep.IsCodeError() if rep.IsCodeError() { fmt.Sprintf("Read error: code `%v` msg `%v`", rep.Code, rep.Error) return } fmt.Println(string(rep.Data)) }
Transactions
As mentioned above, my definition of transactions may not coincide with the generally accepted in DB construction, perhaps they are united only by an idea. In a specific implementation, a transaction is a certain header specified at the database configuration stage ( Handler
method). When we invoke a transaction with this header, the database blocks the records that the header will work with and transfers their current values to the header. The header manipulates this data as it needs, and returns new values of the database, and that saves them in a hundred. After that, the records are unlocked and become available for other operations.
There are examples in the repo that reveal the essence of using transactions very well. Out of curiosity, I made a small financial example in which there are debit and credit operations, transfer, purchase and sale. It was very easy to write this example, and at the same time this knee-high implementation is quite consistent and suitable for use in various financial solutions, or for example in logistics.
An important point: the handler code is not stored in the database. I had the idea of storing it in a log, but it seemed to me too wasteful, so I did not complicate it, and accordingly the responsibility for the consistency of handlers between different database starts rests with the developer of the code that uses the database. Handlers can definitely not be changed if the application and the database stopped crashing. In this case, you must first start the database and then stop it properly - a new data snapshot will be created. In order not to get confused, I advise you to use the version number in the name of the handlers.
Receive and process responses
The database returns reports indicating the status of the response and with the data. Since there are a lot of codes, and writing a switch with the processing of each of them is troublesome, you may want to check for approx. This should not be done. The fact is that the code can have the status Ok, Error, Panic. With Ok, everything is clear, but what about the other two? If the status is Error, a specific operation is completed, or is not complete. This error must be handled appropriately in the application. However, further work with the database is possible (and necessary). Another thing Panic - work with the database should be discontinued.
Checking IsCodeError
makes it easy to deal with all errors, so if you are not interested in the details, continue to work.
The IsCodePanic
check covers all cases in which work with the database must be stopped.
In the simple case, a triple switch is enough to process the response:
-
IsCodeOk
- continue to work asIsCodeOk
-
IsCodeError
- log the error from the report and work further -
IsCodePanic
- log the error from the report and stop working with the database
Offtop
For the name, one of the options for translating the word
into English was chosen, of course I would prefer box
, but this is too popular a word, I hope coffer
will do too.
The topic with ACID seems to me rather holistic, so I would say that Coffer is striving for this, but not a fact, and I do not claim that he succeeded.
Performance
I immediately wrote a database taking into account concurrency and competition. It is in this mode that it shows its effectiveness (although this is probably said too loudly). In the results below, the benchmark shows a bandwidth of 200k rps. This is of course an artificial bench, and the reality will be completely different, because a lot depends on the size of the recorded data, the amount of data already recorded, the performance of iron and the phase of the moon. But the trend is at least clear. If the database is used in single-threaded mode, each request will be executed only after receiving a response to the previous one, the speed will be slow, and I would advise you to look at other databases, but not Coffer.
- BenchmarkCofferTransactionSequence-4 2000 227928 ns / op
- BenchmarkCofferTransactionPar32HalfConcurent-4 100000 4199 ns / op
By the way, if someone spends time and inclines to themselves a repository with Coffer, according to the possibility, run the bench lying in it. I am very interested in what machines the database will show what performance. First of all, of course it all depends on the disk. This became especially clear to me after I recently bought a new Samsung EVO. But do not worry, this is not a substitute for a dead disk. The old Toshiba continues to serve properly and now stores my video archive.
The built-in in-memory watch is still a simple map, not even divided into sections. Of course, it can be great to improve it, for example, to make it quick to select keys by prefixes and suffixes. While I did not do this, tk. the main functionality, as I say the DB chip, I see in transactions, and the bottleneck in performance for transactions will be working with the disk, and only then, working with memory.
License
Now the license allows you to store up to ten million records in the database, it seemed to me that this is a sufficient number. Further plans for the development of the database are in the process of formation.
In general, it’s interesting for me to use the database as a package, and focus primarily on its API.
conclusions
Recently, I often come across the task of writing services with the characteristic of high availability. Unfortunately, due to the fact that this almost always implies the presence of several instances, it is not worth using an embedded database with such a case. There remains the option of a regular application or service that exists in one instance. It seems to me a rarer case, but nevertheless it is, and in such a case it’s nice to have a database that tries, whenever possible, to save the data stored in it. The Coffer I created is trying to solve such a problem. Let's see how he does it.
Acknowledgments
- To everyone who read the article to the very end
- Commentators wishing to share their opinions
- Sent in a personal info on typos and errors in the text
- Neighbor turning on music at night
References
DB repository
Description in Russian