Hello, Habr! My name is Artyom Dobrovinsky and I am an Android developer at FINCH .
Once, wrapping myself in the smoke of a morning cigar, I studied the sources of one ORM for Android. Seeing there a package called benchmarks
immediately looked there, and was surprised that all the evaluations were done using Log.d(System.nanoTime())
. This is not the first time I've seen this. To be honest, I even saw benchmarks made using System.currentTimeMillis()
. The collapsed awareness that something needs to be changed forced me to put aside a glass of whiskey and sit down at the keyboard.
Why is this article written
The situation with understanding how to measure code performance in Android is sad.
How many don’t talk about profilers, and in 2019, someone remains confident that the JVM does everything that the developer wrote and in the exact order in which the code is written. In reality, there is nothing further from the truth.
In fact, the unfortunate virtual machine fights off a billion of careless button-readers who write their own code, never having strained how the processor will work with all of this. This battle has been going on for several years, and she has a million tricky optimizations in her sleeve that (if ignored) will turn any measurement of program performance into a waste of time.
That is, developers sometimes do not consider it necessary to measure the performance of the code, and even more often do not know how. The difficulty lies in the fact that to conduct a performance assessment, it is necessary to create the most similar and ideal conditions for all cases - only this way you can get useful information. These conditions are created by solutions not written on the knee.
If you need arguments about whether to use third-party frameworks for measuring performance, you can always read Alexei Shipilev and marvel at the depth of the problem. Everything is in the article by reference: why warmup is needed before benchmarking, why System.currentTimeMillis()
cannot be trusted at all when counting elapsed time, and jokes for 300. Excellent reading.
Why can I talk about this?
The fact is that I am a comprehensively developed developer: I not only own the Android SDK as if it were my pet-project, but for another month I wrote code for the backend.
When I brought my first microservice to the review, and there was no benchmarking in README
, he looked at me with a misunderstanding. I remembered this and never repeated this mistake again. Because he left in a week.
Go.
What are we measuring
As part of the case for benchmarking databases for Android, I decided to measure the speed of initialization and write / read speed for such ORMs as Paper, Hawk, Realm and Room.
Yes, I measure in one NoSQL test and a relational database - what is the next question?
Than we measure
It would seem that if we are talking about the JVM, then the choice is obvious - there is a glorified , perfected and impeccably documented JMH . But no, it does not start instrumentation tests for Android.
Google Calipher follows them - with the same result.
There is a fork of Calipher called Spanner - which for many years has been zeppercay and encourages the use of Androidx Benchmark .
Let us focus on the latter. If only because we had no choice.
Like everything that was added to Jetpack and not rethought when migrating from the Support Library, Androidx Benchmark looks and behaves as if it were written in a week and a half as a test task, and no one else will ever touch it. Plus, this lib is a little past - because, it is more for evaluating UI tests. But for want of the best, you can work with her. This will save us at least from obvious mistakes , and also help with warming up.
To reduce the ridiculousness of the results, I will run all the tests 10 times and calculate the average.
Testing device - Xiaomi A1. Not the weakest in the market, "clean" Android.
Connecting a library to a project
There are excellent instructions on connecting Andoridx Benchmark to a project. I strongly advise you not to be lazy and connect a separate module for making measurements.
Experiment progress
All our benchmarks will be executed in the following order:
- First, we initiate the database in the body of the test.
- Then, in the
benchmarkRule.scope.runWithTimingDisabled
block, we generate data that we feed the database. The code placed in this circuit will not be taken into account in the evaluation. - In the same closure we add the logic of clearing the database; make sure the database is empty before writing.
- The following is the logic of writing and reading. Be sure to initialize the variable with the result of reading so that the JVM does not remove this logic from the execution count as unused.
- We measure the performance of database initialization in a separate function.
- We feel like a man of science.
The code can be found here . If you are lazy to walk, the metering function for PaperDb looks like this:
@Test fun paperdbInsertReadTest() = benchmarkRule.measureRepeated { // ( ) benchmarkRule.scope.runWithTimingDisabled { Paper.book().destroy() if (Paper.book().allKeys.isNotEmpty()) throw RuntimeException() } // repository.store(persons, { list -> Paper.book().write(KEY_CONTACTS, list) }) val persons = repository.read { Paper.book().read<List<Person>>(KEY_CONTACTS, emptyList()) } }
Benchmarks for the rest of ORM look similar.
results
Initialization
test name | mean | one | 2 | 3 | 4 | 5 | 6 | 7 | 8 | nine | 10 |
---|---|---|---|---|---|---|---|---|---|---|---|
HawkInitTest | 49_512 | 49_282 | 50_021 | 49_119 | 50_145 | 49_970 | 50_047 | 46_649 | 50_230 | 49_863 | 49_794 |
PaperdbInitTest | 224 | 223 | 223 | 223 | 233 | 223 | 223 | 223 | 223 | 223 | 223 |
RealmInitTest | 218 | 217 | 217 | 217 | 217 | 217 | 217 | 217 | 227 | 217 | 217 |
RoomInitTest | 61_695.5 | 63_450 | 59_714 | 58_527 | 59_175 | 63_544 | 62_980 | 63_252 | 59_670 | 63_868 | 62_775 |
The winner is Realm, in second place is Paper. What Room is doing, you can still imagine that Hawk does almost the same amount of time - it is completely incomprehensible.
Writing and reading
test name | mean | one | 2 | 3 | 4 | 5 | 6 | 7 | 8 | nine | 10 |
---|---|---|---|---|---|---|---|---|---|---|---|
HawkInsertReadTest | 278_736_469.2 | 278_098_654 | 283_956_846 | 276_748_308 | 282_447_384 | 272_609_500 | 284_699_653 | 271_869_770 | 278_719_693 | 278_836_115 | 279_378_769 |
PaperdbInsertReadTest | 173_519_957.3 | 172_953_347 | 174_702_000 | 169_740_846 | 174_401_192 | 173_930_037 | 174_179_616 | 173_937_460 | 173_739_115 | 176_215_038 | 171_400_922 |
RealmInsertReadTest | 111_644_042.3 | 108_501_578 | 110_616_078 | 102_056_461 | 112_946_577 | 111_701_231 | 114_922_962 | 106_198_000 | 118_742_498 | 120_888_230 | 109_866_808 |
RoomInsertReadTest | 1_863_499_483.3 | 187_250_3614 | 1_837_078_614 | 1_872_482_538 | 1_827_338_460 | 1_869_147_999 | 1_857_126_229 | 1_842_427_537 | 1_870_630_652 | 1_878_862_538 | 1_907_396_652 |
Here again, the winner of Realm, but in these results smacks of failure.
The four times difference between the two “slowest” databases and sixteen times between the “fastest” and the “slowest” is very suspicious. Even taking into account that the difference is stable.
Conclusion
Measuring the performance of your code is at least out of curiosity. Even if we are talking about the most industry-launched cases (such as the evaluation of instrumental tests for Android).
There are all reasons to attract third-party frameworks for this business (rather than writing your own with timing and cheerleaders).
The situation in code bases is such that everyone is trying to write in a clean architecture; for most, the module with business logic is a java module - connecting a module with JMH nearby and checking the code for bottlenecks - it works for a day. And the benefits - for many years to come.
Happy coding!
PS: If an attentive reader knows about the framework for conducting benchmarks of instrumental tests for Android, not listed in the article - please share in the comments.
PPS: The test repository is open for pull requests.