Benchmarking ORM used to create Android applications

Hello, Habr! My name is Artyom Dobrovinsky and I am an Android developer at FINCH .







Once, wrapping myself in the smoke of a morning cigar, I studied the sources of one ORM for Android. Seeing there a package called benchmarks



immediately looked there, and was surprised that all the evaluations were done using Log.d(System.nanoTime())



. This is not the first time I've seen this. To be honest, I even saw benchmarks made using System.currentTimeMillis()



. The collapsed awareness that something needs to be changed forced me to put aside a glass of whiskey and sit down at the keyboard.







Why is this article written



The situation with understanding how to measure code performance in Android is sad.

How many don’t talk about profilers, and in 2019, someone remains confident that the JVM does everything that the developer wrote and in the exact order in which the code is written. In reality, there is nothing further from the truth.







In fact, the unfortunate virtual machine fights off a billion of careless button-readers who write their own code, never having strained how the processor will work with all of this. This battle has been going on for several years, and she has a million tricky optimizations in her sleeve that (if ignored) will turn any measurement of program performance into a waste of time.







That is, developers sometimes do not consider it necessary to measure the performance of the code, and even more often do not know how. The difficulty lies in the fact that to conduct a performance assessment, it is necessary to create the most similar and ideal conditions for all cases - only this way you can get useful information. These conditions are created by solutions not written on the knee.







If you need arguments about whether to use third-party frameworks for measuring performance, you can always read Alexei Shipilev and marvel at the depth of the problem. Everything is in the article by reference: why warmup is needed before benchmarking, why System.currentTimeMillis()



cannot be trusted at all when counting elapsed time, and jokes for 300. Excellent reading.







Why can I talk about this?


The fact is that I am a comprehensively developed developer: I not only own the Android SDK as if it were my pet-project, but for another month I wrote code for the backend.







When I brought my first microservice to the review, and there was no benchmarking in README



, he looked at me with a misunderstanding. I remembered this and never repeated this mistake again. Because he left in a week.







Go.







What are we measuring



As part of the case for benchmarking databases for Android, I decided to measure the speed of initialization and write / read speed for such ORMs as Paper, Hawk, Realm and Room.

Yes, I measure in one NoSQL test and a relational database - what is the next question?







Than we measure



It would seem that if we are talking about the JVM, then the choice is obvious - there is a glorified , perfected and impeccably documented JMH . But no, it does not start instrumentation tests for Android.







Google Calipher follows them - with the same result.







There is a fork of Calipher called Spanner - which for many years has been zeppercay and encourages the use of Androidx Benchmark .







Let us focus on the latter. If only because we had no choice.







Like everything that was added to Jetpack and not rethought when migrating from the Support Library, Androidx Benchmark looks and behaves as if it were written in a week and a half as a test task, and no one else will ever touch it. Plus, this lib is a little past - because, it is more for evaluating UI tests. But for want of the best, you can work with her. This will save us at least from obvious mistakes , and also help with warming up.







To reduce the ridiculousness of the results, I will run all the tests 10 times and calculate the average.







Testing device - Xiaomi A1. Not the weakest in the market, "clean" Android.







Connecting a library to a project



There are excellent instructions on connecting Andoridx Benchmark to a project. I strongly advise you not to be lazy and connect a separate module for making measurements.







Experiment progress



All our benchmarks will be executed in the following order:







  1. First, we initiate the database in the body of the test.
  2. Then, in the benchmarkRule.scope.runWithTimingDisabled



    block, we generate data that we feed the database. The code placed in this circuit will not be taken into account in the evaluation.
  3. In the same closure we add the logic of clearing the database; make sure the database is empty before writing.
  4. The following is the logic of writing and reading. Be sure to initialize the variable with the result of reading so that the JVM does not remove this logic from the execution count as unused.
  5. We measure the performance of database initialization in a separate function.
  6. We feel like a man of science.


The code can be found here . If you are lazy to walk, the metering function for PaperDb looks like this:







 @Test fun paperdbInsertReadTest() = benchmarkRule.measureRepeated { //   (     ) benchmarkRule.scope.runWithTimingDisabled { Paper.book().destroy() if (Paper.book().allKeys.isNotEmpty()) throw RuntimeException() } //    repository.store(persons, { list -> Paper.book().write(KEY_CONTACTS, list) }) val persons = repository.read { Paper.book().read<List<Person>>(KEY_CONTACTS, emptyList()) } }
      
      





Benchmarks for the rest of ORM look similar.







results



Initialization


test name mean one 2 3 4 5 6 7 8 nine 10
HawkInitTest 49_512 49_282 50_021 49_119 50_145 49_970 50_047 46_649 50_230 49_863 49_794
PaperdbInitTest 224 223 223 223 233 223 223 223 223 223 223
RealmInitTest 218 217 217 217 217 217 217 217 227 217 217
RoomInitTest 61_695.5 63_450 59_714 58_527 59_175 63_544 62_980 63_252 59_670 63_868 62_775


The winner is Realm, in second place is Paper. What Room is doing, you can still imagine that Hawk does almost the same amount of time - it is completely incomprehensible.







Writing and reading


test name mean one 2 3 4 5 6 7 8 nine 10
HawkInsertReadTest 278_736_469.2 278_098_654 283_956_846 276_748_308 282_447_384 272_609_500 284_699_653 271_869_770 278_719_693 278_836_115 279_378_769
PaperdbInsertReadTest 173_519_957.3 172_953_347 174_702_000 169_740_846 174_401_192 173_930_037 174_179_616 173_937_460 173_739_115 176_215_038 171_400_922
RealmInsertReadTest 111_644_042.3 108_501_578 110_616_078 102_056_461 112_946_577 111_701_231 114_922_962 106_198_000 118_742_498 120_888_230 109_866_808
RoomInsertReadTest 1_863_499_483.3 187_250_3614 1_837_078_614 1_872_482_538 1_827_338_460 1_869_147_999 1_857_126_229 1_842_427_537 1_870_630_652 1_878_862_538 1_907_396_652


Here again, the winner of Realm, but in these results smacks of failure.







The four times difference between the two “slowest” databases and sixteen times between the “fastest” and the “slowest” is very suspicious. Even taking into account that the difference is stable.







Conclusion



Measuring the performance of your code is at least out of curiosity. Even if we are talking about the most industry-launched cases (such as the evaluation of instrumental tests for Android).







There are all reasons to attract third-party frameworks for this business (rather than writing your own with timing and cheerleaders).







The situation in code bases is such that everyone is trying to write in a clean architecture; for most, the module with business logic is a java module - connecting a module with JMH nearby and checking the code for bottlenecks - it works for a day. And the benefits - for many years to come.







Happy coding!







PS: If an attentive reader knows about the framework for conducting benchmarks of instrumental tests for Android, not listed in the article - please share in the comments.







PPS: The test repository is open for pull requests.








All Articles