Good afternoon, habr readers!
Prometheus and its ecosystem of exporters (agents) is a good tool for any administrator and developer. Ease of delivery, simplicity (relative) settings, the ability to use the automatic detection service.
But it will be not so much about Prometheus, but about one of the notable agents, namely postgres_exporter. It allows you to collect metrics with PostgreSQL. But if everything was so simple ...
Unfortunately, the postgres_exporter documentation is quite ascetic and only affects general cases. But what if you got an instance of a DBMS cluster with several databases and / or you want to collect metrics for several instances of the cluster at once.
Actually, about the purpose of the article or rather notes. I note right away that here I will not describe the processes of assembly or configuration of Prometheus and postgres_exporter, their interaction. All this is described in the documentation and in many other sources.
I would like to touch upon some special cases of using postgres_exporter to solve specific problems, namely collecting metrics by one agent with:
Subjectively, the pros and cons.
From the pros:
Of the minuses:
Well, let's move on to practice. Suppose we have an instance of PostgreSQL with several databases and we need to organize the collection of instance metrics and all the databases.
Why did I separate the collection of database metrics and cluster instance, everything is very simple, the scenario of postgres_exporter working with several databases on the same cluster implies the execution of the same set of sql queries in different databases. And as a result, when trying to get metrics from the views pg_stat_replication, pg_stat_activity, pg_stat_statement, etc. being common to the cluster, we always get, in the understanding of postgres_exporter, the same set of metrics that will lead to duplicate keys and values, which will lead to an error.
Let's see how it looks in practice.
We have a test instance with a set of databases:
List of databases Name | Owner | Encoding | Collate | Ctype | Access privileges -----------+----------+----------+------------+------------+----------------------- dbtest1 | postgres | UTF8 | en_US.utf8 | en_US.utf8 | dbtest2 | postgres | UTF8 | en_US.utf8 | en_US.utf8 | dbtest3 | postgres | UTF8 | en_US.utf8 | en_US.utf8 | postgres | postgres | UTF8 | en_US.utf8 | en_US.utf8 |
We start postgres_exporter, with the option --auto-discover-databases (if the database name is not specified in the connection string, then the connection will be made to the database with the username):
$ DATA_SOURCE_NAME="postgresql://postgres@127.0.0.1:5432/?sslmode=disable" ./postgres_exporter --auto-discover-databases --log.format=logger:stdout
In the agent’s output, we observe an idyllic picture, it is launched and was able to connect to all the databases in the cluster (although it doesn’t write to which databases, we hope it will be fixed):
INFO[0000] Established new database connection to "127.0.0.1:5432". source="postgres_exporter.go:788" INFO[0000] Established new database connection to "127.0.0.1:5432". source="postgres_exporter.go:788" INFO[0000] Semantic Version Changed on "127.0.0.1:5432": 0.0.0 -> 11.5.0 source="postgres_exporter.go:1251" INFO[0000] Semantic Version Changed on "127.0.0.1:5432": 0.0.0 -> 11.5.0 source="postgres_exporter.go:1251" INFO[0000] Established new database connection to "127.0.0.1:5432". source="postgres_exporter.go:788" INFO[0000] Semantic Version Changed on "127.0.0.1:5432": 0.0.0 -> 11.5.0 source="postgres_exporter.go:1251" INFO[0000] Established new database connection to "127.0.0.1:5432". source="postgres_exporter.go:788" INFO[0000] Semantic Version Changed on "127.0.0.1:5432": 0.0.0 -> 11.5.0 source="postgres_exporter.go:1251" INFO[0000] Established new database connection to "127.0.0.1:5432". source="postgres_exporter.go:788" INFO[0000] Semantic Version Changed on "127.0.0.1:5432": 0.0.0 -> 11.5.0 source="postgres_exporter.go:1251" INFO[0000] Starting Server: :9187 source="postgres_exporter.go:1490"
I think an attentive reader will notice that there are four bases in the cluster (postgres, dbtest1, dbtest2 and dbtest3, template0 and template1 are ignored), and there are five connections. In our case, postgres_exporter creates two connections to the postgres database. And with this feature you need to be very careful. Why? We will find out about this a little further.
Well, let's continue and try to get the metrics:
$ curl http://localhost:9178/metrics
As a result, in the output we get warnings about duplicates "was collected before with the same name and label values" (but in the postgres_exporter log we will not see warnings):
... * collected metric pg_stat_activity_max_tx_duration label:<name:"datname" value:"dbtest1" > label:<name:"server" value:"127.0.0.1:5432" > label:<name:"state" value:"fastpath function call" > gauge:<value:0 > was collected before with the same name and label values * collected metric pg_stat_bgwriter_checkpoints_timed label:<name:"server" value:"127.0.0.1:5432" > counter:<value:1 > was collected before with the same name and label values ...
The only way to get rid of errors is to disable the collection of metrics by default. There are two ways to do this: first set the environment variables PG_EXPORTER_DISABLE_DEFAULT_METRICS and PG_EXPORTER_DISABLE_SETTINGS_METRICS to true or use the options --disable-default-metrics and --disable-settings-metrics
Restart postgres_exporter with additional options:
$ DATA_SOURCE_NAME="postgresql://postgres@127.0.0.1:5432/?sslmode=disable" ./postgres_exporter --auto-discover-databases --log.format=logger:stdout --disable-default-metrics --disable-settings-metrics
Trying to get the metrics:
$ curl http://localhost:9178/metrics
And so, everything went according to plan, but there is not a single metric associated with PostgreSQL in the output:
# HELP go_gc_duration_seconds A summary of the GC invocation durations. # TYPE go_gc_duration_seconds summary go_gc_duration_seconds{quantile="0"} 0 go_gc_duration_seconds{quantile="0.25"} 0 go_gc_duration_seconds{quantile="0.5"} 0 go_gc_duration_seconds{quantile="0.75"} 0 go_gc_duration_seconds{quantile="1"} 0 go_gc_duration_seconds_sum 0 go_gc_duration_seconds_count 0 ... # HELP process_virtual_memory_bytes Virtual memory size in bytes. # TYPE process_virtual_memory_bytes gauge process_virtual_memory_bytes 1.3832192e+07
Further, in order to get the payload, we need to create a file that describes which metrics we want to receive (do not forget, we can only collect database-specific metrics).
For the test, we will collect metrics from the pg_statio_user_tables relation. To do this, create a queries.yaml file with the following contents:
pg_statio_user_tables: query: "SELECT current_database() as datname, schemaname, relname, heap_blks_read, heap_blks_hit FROM pg_statio_user_tables" metrics: - datname: usage: "LABEL" description: "Name of database" - schemaname: usage: "LABEL" description: "Name of the schema that this table is in" - relname: usage: "LABEL" description: "Name of this table" - heap_blks_read: usage: "COUNTER" description: "Number of disk blocks read from this table" - heap_blks_hit: usage: "COUNTER" description: "Number of buffer hits in this table"
I think here it is necessary to clarify one point, namely the addition of the name of the database in which the query is executed. This is a mandatory requirement, and there are at least two reasons for this:
And so, we launch our agent with the option --extend.query-path (here the path to the yaml file with the description of the metrics is indicated):
DATA_SOURCE_NAME="postgresql://postgres@127.0.0.1:5432?sslmode=disable" ./postgres_exporter --log.format=logger:stdout --auto-discover-databases --disable-default-metrics --disable-settings-metrics --extend.query-path=./queries.yaml
We are trying to get the metrics (for clarity, we only take pg_statio_user_tables_heap_blks_hit):
curl -s http://localhost:9187/metrics | grep pg_statio_user_tables_heap_blks_hit
As a result, we get a uniquely interpreted set of metrics:
# HELP pg_statio_user_tables_heap_blks_hit Number of buffer hits in this table # TYPE pg_statio_user_tables_heap_blks_hit counter pg_statio_user_tables_heap_blks_hit{datname="dbtest1",relname="t1",schemaname="public",server="127.0.0.1:5432"} 0 pg_statio_user_tables_heap_blks_hit{datname="dbtest1",relname="t2",schemaname="public",server="127.0.0.1:5432"} 0 pg_statio_user_tables_heap_blks_hit{datname="dbtest2",relname="t1",schemaname="public",server="127.0.0.1:5432"} 0 pg_statio_user_tables_heap_blks_hit{datname="dbtest2",relname="t2",schemaname="public",server="127.0.0.1:5432"} 0 pg_statio_user_tables_heap_blks_hit{datname="dbtest3",relname="t1",schemaname="public",server="127.0.0.1:5432"} 0 pg_statio_user_tables_heap_blks_hit{datname="dbtest3",relname="t2",schemaname="public",server="127.0.0.1:5432"} 0
As a result, we got the opportunity, using the --auto-discover-databases option, to collect metrics from all the databases of one instance of the cluster. A nice bonus is that when you add a new database, you do not need to restart the agent.
But with all this, we were left without instance metrics. The solution, at the moment, is only one - to use different agents to collect database and instance metrics.
Of course it doesn’t look very good, but it’s possible to level this nuisance by grouping agents to collect metrics from several instances. We will consider this, another, rather interesting opportunity below.
Remember, at the beginning, we drew attention to the "extra" connection. So, this is a feature of postgres_exporter with the option --auto-discover-databases.
But why can it cause a lot of trouble? In fact, everything is simple and already described above, namely, the problem is that postgres_exporter will collect metrics from the postgres database twice and begin to duplicate metrics. In our case, only the appearance of the --exclude-databases option can help (so we are looking forward to the next release).
And yes, if you have user tables in the postgres database, then the example above will not work.
Well, move on. We figured out how to get metrics from several databases, now we will consider the option of how to monitor several instances with one agent. It is very simple, for this it is enough to list them in the environment variable DATA_SOURCE_NAME separated by a comma:
$ DATA_SOURCE_NAME="postgresql://postgres@127.0.0.1:5432/postgres?sslmode=disable,postgresql://postgres@127.0.0.1:5434/postgres?sslmode=disable" ./postgres_exporter --log.format=logger:stdout
Here we connect to two different cluster instances running, in our case, on the local node. Here's what it looks like in the logs:
INFO[0000] Established new database connection to "127.0.0.1:5432". source="postgres_exporter.go:788" INFO[0000] Semantic Version Changed on "127.0.0.1:5432": 0.0.0 -> 11.5.0 source="postgres_exporter.go:1251" INFO[0000] Established new database connection to "127.0.0.1:5434". source="postgres_exporter.go:788" INFO[0000] Semantic Version Changed on "127.0.0.1:5434": 0.0.0 -> 11.5.0 source="postgres_exporter.go:1251" INFO[0000] Starting Server: :9187 source="postgres_exporter.go:1490"
Next, we try to get the metrics (for clarity, we restrict ourselves to the pg_stat_database_blk_read_time metric):
curl -s http://localhost:9187/metrics | grep pg_stat_database_blk_read_time
As a result, from one agent, we get metrics for both instances:
# HELP pg_stat_database_blk_read_time Time spent reading data file blocks by backends in this database, in milliseconds # TYPE pg_stat_database_blk_read_time counter pg_stat_database_blk_read_time{datid="1",datname="template1",server="127.0.0.1:5432"} 0 pg_stat_database_blk_read_time{datid="1",datname="template1",server="127.0.0.1:5434"} 0 pg_stat_database_blk_read_time{datid="13116",datname="template0",server="127.0.0.1:5432"} 0 pg_stat_database_blk_read_time{datid="13116",datname="template0",server="127.0.0.1:5434"} 0 pg_stat_database_blk_read_time{datid="13117",datname="postgres",server="127.0.0.1:5432"} 0 pg_stat_database_blk_read_time{datid="13117",datname="postgres",server="127.0.0.1:5434"} 0 pg_stat_database_blk_read_time{datid="16384",datname="dbtest1",server="127.0.0.1:5432"} 0 pg_stat_database_blk_read_time{datid="16385",datname="dbtest2",server="127.0.0.1:5432"} 0 pg_stat_database_blk_read_time{datid="16386",datname="dbtest3",server="127.0.0.1:5432"} 0
In this case, everything was somewhat simpler than in the case of several databases on one instance. At the same time, we still have the opportunity to receive global metrics from all instances.
And so, the third case indicated for purposes is a combination of the two described above, so I see no reason to bring it.
As a result, what we have in the bottom line, postgres_exporter, in my opinion, is quite an interesting and promising administrator tool for monitoring instances of the PostgreSQL cluster and the databases deployed on them. But due to its age, it is not without flaws that can be easily understood and forgiven.
Version, at the time of writing, v 0.5.1. Supported versions of PostgreSQL 9.4+ (version 9.1+ restriction noted in source code).