Overview
AllegroGraph provides server monitoring via a set of HTTP endpoints. The server exposes information about internal worker processes (backends and sessions), active queries (jobs), and storage reports for each triple store. Additionally, the server exposes a log of critical server events (audit log).
Prometheus metrics
AllegroGraph's main monitoring/observability endpoint is /metrics, which exposes various metrics in Prometheus format. This endpoint requires superuser permissions.
In order to reduce computational load on the system, the /metrics enpoints supports requesting metrics of a particular kind (category). Currently supported categories are:
system: comprehensive system metrics including CPU usage, memory utilization, disk I/O, network activity, and AllegroGraph-specific connection counts (backend, sessions, etc);jobs: information about active SPARQL queries (jobs) including total count and age statistics (maximum, minimum, and average age in seconds);queries: information about queries, like the total number of queries executed per active triple store, cumulative time, number of running queries etc.indices: reports on repository index health including total indexed triples and optimization scores for each index by class; only returns metrics for the repositories currently in operation to avoid starting the dormant ones;replication: monitors Multi-Master Replication status including commits behind primary, ingest queue length, controlling status, and replication state for each repository; likeindices, only returns metrics for repositories currently in operation.
New metrics kinds (categories) will be added in the future. Multiple kinds (categories) of metrics can be retrieved at once. For example the following call
curl -u test:xyzzy https://<ag-host>:<ag-port>/metrics?kind=queries will only return queries metrics, while the call
curl -u test:xyzzy https://<ag-host>:<ag-port>/metrics?kind=system&kind=indices&kind=queries will return system, indices and queries metrics in a single response.
Tokens obtained from external authenticators like OIDC or LDAP can be used to avoid putting user credentials into Prometheus config or scripts. The same example as above but with an external token:
curl -H 'Authorization: Basic <OIDC or LDAP token>' https://<ag-host>:<ag-port>/metrics?kind=queries
curl -H 'Authorization: Bearer <OIDC token>' https://<ag-host>:<ag-port>/metrics?kind=queries All metrics follow Prometheus naming conventions and include labels for catalog and repository filtering where applicable. For example, indices metrics are reported for each repository, so the labels catalog and repository can be used to filter metrics for a particular repository.
List of exposed Prometheus metrics
The following is a list of all Prometheus-compatible metrics exposed by AllegroGraph, grouped by kind:
indices:allegrograph_index_oscore,gauge- OScore for each triple index in the repository .allegrograph_indexed_triples,gauge- Total number of indexed triples in the repository.
jobs:allegrograph_active_jobs,gauge- Total number of active jobs.allegrograph_active_jobs_age_seconds,gauge- Maximum, minimum and average age of jobs in seconds.
queries:allegrograph_queries_cache_hits_total,counter- Number of SPARQL queries read from results cache.allegrograph_queries_cache_misses_total,counter- Number of SPARQL queries bypassing the results cache.allegrograph_queries_duration_seconds_total,counter- Total duration of executed SPARQL queries.allegrograph_queries_failed_total,counter- Number of failed SPARQL queries.allegrograph_queries_total,counter- Total number of executed SPARQL queries.
replication:allegrograph_replication_commits_behind,gauge- Number of commits the replica is behind primary.allegrograph_replication_controlling,gauge- Whether this repository is controlling (1) or not (0).allegrograph_replication_ingest_queue_length,gauge- Number of items waiting in the replication ingest queue.allegrograph_replication_state_info,gauge- Current state of replication with state as a label.
system:allegrograph_backends_count,gauge- Number of backends.allegrograph_cpu_count,gauge- Number of CPUs.allegrograph_http_connections,gauge- HTTP connections.allegrograph_http_workers,gauge- HTTP workers.allegrograph_https_workers,gauge- HTTPS workers.allegrograph_memory_anon_bytes,gauge- Anonymous memory in bytes.allegrograph_proxy_connections,gauge- Proxy connections.allegrograph_server_timestamp,gauge- Server timestamp in milliseconds.allegrograph_sessions_count,gauge- Number of sessions.allegrograph_vmstat_context_switches_rate,gauge- Context switches per second.allegrograph_vmstat_cpu_idle_percent,gauge- CPU idle time percentage.allegrograph_vmstat_cpu_steal_percent,gauge- CPU steal time percentage.allegrograph_vmstat_cpu_system_percent,gauge- CPU system time percentage.allegrograph_vmstat_cpu_user_percent,gauge- CPU user time percentage.allegrograph_vmstat_cpu_wait_percent,gauge- CPU wait time percentage.allegrograph_vmstat_disk_blocks_in,gauge- Block input rate.allegrograph_vmstat_disk_blocks_out,gauge- Block output rate.allegrograph_vmstat_interrupts_rate,gauge- Interrupts per second.allegrograph_vmstat_memory_buffer_bytes,gauge- Buffer memory in bytes.allegrograph_vmstat_memory_cache_bytes,gauge- Cache memory in bytes.allegrograph_vmstat_memory_free_bytes,gauge- Free memory in bytes.allegrograph_vmstat_memory_swap_used_bytes,gauge- Used swap memory in bytes.allegrograph_vmstat_processes_blocked,gauge- Blocked processes.allegrograph_vmstat_processes_running,gauge- Running processes.allegrograph_vmstat_swap_in_rate,gauge- Swap-in rate.allegrograph_vmstat_swap_out_rate,gauge- Swap-out rate.
Example Grafana dashboard for AllegroGraph
An importable JSON description of an example Grafana dashboard for AllegroGraph can be found in the agraph-examples repository on GitHub here. Below are a couple of screenshots.
The included Prometheus configuration that collects all the metrics of all supported kinds supported kinds every 5 seconds is shown below:
scrape_configs:
- job_name: allegrograph
scrape_interval: 5s
metrics_path: /metrics
basic_auth:
username: test
password: xyzzy
static_configs:
- targets: ['localhost:10035']
labels: { kind: 'jobs' }
- targets: ['localhost:10035']
labels: { kind: 'replication' }
- targets: ['localhost:10035']
labels: { kind: 'indices' }
- targets: ['localhost:10035']
labels: { kind: 'system' }
- targets: ['localhost:10035']
labels: { kind: 'queries' }
relabel_configs:
- source_labels: [kind]
target_label: __param_kind
Standalone monitoring endpoints
AllegroGraph provides standalone endpoints for monitoring certain parts of the system. The most important of these is the Audit Log endpoint /auditLog, which is not available throught the Prometheus-compatible /metrics feature. To read more about the structured system audit log which tracks important changes to the server and its triple-stores please see Auditing.
Other standalone endpoints are legacy endpoints and are either already fully integrated into the Prometheus /metrics endpoint or will be in the future. These include:
/processes. It's useful to list all processes spawned by AllegroGgraph and check their resource (CPU/memory) usage. There are a lot of different tools to check process statistics. AllegroGraph also provides a way to get systemstat info about pid via
/systemstat.json?requestTree=sys.processes.pid<pid id>&startUt=<time-in-milliseconds>&id=<integer>. This endpoint is undocumented because it's used only for WebView to render charts on "Processes" (/webview/admin/processes), and "Server stats" (/webview/utils/systemstat) pages./jobs. A job is an a SPARQL query execution. The most important field in the response is the
ageInSecondsfield which can be used to detect queries that run abnormally long. Optionally, a callDELETE /jobs?jobId=<job-id>can be used to cancel a query./session. To read more about AllegroGraph sessions please see the Sessions section.
/catalogs/catalog-name/repositories/repository-name/reports. This is a legacy repository reports system which will eventually be fully integrated into the
/metricsendpoint and deprecated. If you send this request without the path parameter, you will get the list of supported path values. An example of a path for retrieving the storage layer summary:/repositories/<repo-name>/reports?path=storage.
Please contact Franz Support if you'd like to have a feature monitoring which is not described in the sections above.