aboutsummaryrefslogtreecommitdiff
path: root/text_collector_examples
diff options
context:
space:
mode:
authorMatt Bostock <matt@mattbostock.com>2018-08-18 09:57:51 +0100
committerJohannes 'fish' Ziemke <github@freigeist.org>2018-08-18 08:57:51 +0000
commit9e0aee8ae7db3e012af30dd43c77535375811d1c (patch)
tree77f6037361bc853b1f80fc66252baab9310317bd /text_collector_examples
parentd84873727f7679b2d782ecd27f3cc8ecd365457f (diff)
downloadprometheus_node_collector-9e0aee8ae7db3e012af30dd43c77535375811d1c.tar.bz2
prometheus_node_collector-9e0aee8ae7db3e012af30dd43c77535375811d1c.tar.xz
prometheus_node_collector-9e0aee8ae7db3e012af30dd43c77535375811d1c.zip
Add metrics exposing extended md RAID info (#958)
Add metrics that expose more information about MD RAID devices and disks: - the RAID level in use - the RAID set that a disk belongs to This allows for things like alert on unusually high I/O utilisation for a disk compared to other disks in the same RAID set, which usually means the disk is failing, and for comparing write/read latency across RAID sets. Output looks like: node_md_disk_info{disk_device="/dev/dm-0", md_device="md1", md_set="A"} 1 node_md_disk_info{disk_device="/dev/dm-3", md_device="md1", md_set="B"} 1 node_md_disk_info{disk_device="/dev/dm-2", md_device="md1", md_set="A"} 1 node_md_disk_info{disk_device="/dev/dm-1", md_device="md1", md_set="B"} 1 node_md_disk_info{disk_device="/dev/dm-4", md_device="md1", md_set="A"} 1 node_md_disk_info{disk_device="/dev/dm-5", md_device="md1", md_set="B"} 1 node_md_info{md_device="md1", md_name="foo", raid_level="10", md_metadata_version="1.2"} 1 The `node_md_info` metric, which gives additional information about the RAID array, is intentionally separate to avoid adding all of those labels to each disk. If you need to query using the labels contained in `node_md_info`, you can do that using PromQL: https://www.robustperception.io/how-to-have-labels-for-machine-roles/ I looked at adding the array UUID, but there's no sysfs entry for it and I'm not sure there's a strong use case for it. This patch to add a sysfs entry for the UUID was apparently not accepted: https://www.spinics.net/lists/raid/msg40667.html Add these metrics as a textfile script rather than adding them to the Go 'md' module as they're perhaps less commonly useful. If lots of people find them useful, we can later rewrite this in Go. Signed-off-by: Matt Bostock <mbostock@cloudflare.com>
Diffstat (limited to 'text_collector_examples')
-rwxr-xr-xtext_collector_examples/md_info.sh56
1 files changed, 56 insertions, 0 deletions
diff --git a/text_collector_examples/md_info.sh b/text_collector_examples/md_info.sh
new file mode 100755
index 0000000..c89f10f
--- /dev/null
+++ b/text_collector_examples/md_info.sh
@@ -0,0 +1,56 @@
1#!/usr/bin/env bash
2set -eu
3
4for MD_DEVICE in /dev/md/*; do
5 # Subshell to avoid eval'd variables from leaking between iterations
6 (
7 # Resolve symlink to discover device, e.g. /dev/md127
8 MD_DEVICE_NUM=$(readlink -f "${MD_DEVICE}")
9
10 # Remove /dev/ prefix
11 MD_DEVICE_NUM=${MD_DEVICE_NUM#/dev/}
12 MD_DEVICE=${MD_DEVICE#/dev/md/}
13
14 # Query sysfs for info about md device
15 SYSFS_BASE="/sys/devices/virtual/block/${MD_DEVICE_NUM}/md"
16 MD_LAYOUT=$(cat "${SYSFS_BASE}/layout")
17 MD_LEVEL=$(cat "${SYSFS_BASE}/level")
18 MD_METADATA_VERSION=$(cat "${SYSFS_BASE}/metadata_version")
19 MD_NUM_RAID_DISKS=$(cat "${SYSFS_BASE}/raid_disks")
20
21 # Remove 'raid' prefix from RAID level
22 MD_LEVEL=${MD_LEVEL#raid}
23
24 # Output disk metrics
25 for RAID_DISK in ${SYSFS_BASE}/rd[0-9]*; do
26 DISK=$(readlink -f "${RAID_DISK}/block")
27 DISK_DEVICE=$(basename "${DISK}")
28 RAID_DISK_DEVICE=$(basename "${RAID_DISK}")
29 RAID_DISK_INDEX=${RAID_DISK_DEVICE#rd}
30 RAID_DISK_STATE=$(cat "${RAID_DISK}/state")
31
32 DISK_SET=""
33 # Determine disk set using logic from mdadm: https://github.com/neilbrown/mdadm/commit/2c096ebe4b
34 if [[ ${RAID_DISK_STATE} == "in_sync" && ${MD_LEVEL} == 10 && $((MD_LAYOUT & ~0x1ffff)) ]]; then
35 NEAR_COPIES=$((MD_LAYOUT & 0xff))
36 FAR_COPIES=$(((MD_LAYOUT >> 8) & 0xff))
37 COPIES=$((NEAR_COPIES * FAR_COPIES))
38
39 if [[ $((MD_NUM_RAID_DISKS % COPIES == 0)) && $((COPIES <= 26)) ]]; then
40 DISK_SET=$((RAID_DISK_INDEX % COPIES))
41 fi
42 fi
43
44 echo -n "node_md_disk_info{disk_device=\"${DISK_DEVICE}\", md_device=\"${MD_DEVICE_NUM}\""
45 if [[ -n ${DISK_SET} ]]; then
46 SET_LETTERS=({A..Z})
47 echo -n ", md_set=\"${SET_LETTERS[${DISK_SET}]}\""
48 fi
49 echo "} 1"
50 done
51
52 # Output RAID array metrics
53 # NOTE: Metadata version is a label rather than a separate metric because the version can be a string
54 echo "node_md_info{md_device=\"${MD_DEVICE_NUM}\", md_name=\"${MD_DEVICE}\", raid_level=\"${MD_LEVEL}\", md_metadata_version=\"${MD_METADATA_VERSION}\"} 1"
55 )
56done