aboutsummaryrefslogtreecommitdiff
path: root/text_collector_examples
Commit message (Collapse)AuthorAge
* Remove text_collector_examples/ (#1441)Johannes 'fish' Ziemke2019-08-03
| | | | | | | | | | * Remove text_collector_examples/ These have been moved to https://github.com/prometheus-community/node-exporter-textfile-collector-scripts This closes #1077 Signed-off-by: Johannes 'fish' Ziemke <github@freigeist.org>
* Handle JBOD setup for storcli exporter (#1419)Solvik2019-08-02
| | | | | | * handle jbod setup Signed-off-by: Solvik Blum <solvik.blum@dailymotion.com>
* changed fields for disk write and read data of S.M.A.R.T, Signed-off-by: ↵Bernd Müller2019-07-24
| | | | | Bernd Mueller <mueller@b1-systems.de> (#1235) Signed-off-by: Bernd Müller <mueller@b1-systems.de>
* fix for 'Celsius' spelling problem in storcli.py (#1408)yosefy2019-07-01
| | | Signed-off-by: yosefy <yosef.yudilevich@gmail.com>
* FIX ipmitool sensor discrete values are expressed in hex (#1402)Nuno Tavares2019-06-29
| | | Signed-off-by: Nuno Tavares <n.tavares@portavita.eu>
* Add mulitipathd_info text collector example (#1375)ssinha-ionos2019-06-25
| | | | | | multipathd_info is a script that exposes device mapper multipathing metrics from multipathd daemon. Signed-off-by: Saket Sinha <saket.sinha@cloud.ionos.com>
* Account for spaces in repository label (#1348)Jérémy Ruffet2019-06-05
| | | Signed-off-by: Jérémy Ruffet <jeremy.ruffet@i-run.fr>
* Make storcli.py compatible with python2 (#1365)Brian Candler2019-06-03
| | | | | | This is only a minor change to .format() arguments, and is useful on CentOS6 servers which have only python2. Signed-off-by: Brian Candler <b.candler@pobox.com>
* Make scripts in text_collector_examples executable (#1358)Benjamin Drung2019-05-29
| | | Signed-off-by: Benjamin Drung <benjamin.drung@cloud.ionos.com>
* btrfs_stats: Upgrade to Python 3 (#1359)Benjamin Drung2019-05-29
| | | | | | Python 2.7 will not be maintained past 2020. Therefore upgrade `text_collector_examples/btrfs_stats.py` to Python 3. Signed-off-by: Benjamin Drung <benjamin.drung@cloud.ionos.com>
* fix or ignore codespell issues (#1351)Paul Gier2019-05-20
| | | Signed-off-by: Paul Gier <pgier@redhat.com>
* Add nvme_metrics.sh text collector example (#1309)Henk2019-04-08
| | | | | | * Add nvme_metrics.sh text collector example Signed-off-by: Henk <henk@wearespindle.com>
* Update smartmon.py to widen self_assessment_passed test (#1293)Edgaras Giedrė2019-03-20
| | | Signed-off-by: EdgarasG <edgaras.giedre@hostinger.com>
* yum.sh: yum update monitor (#1273)Slawomir Gonet2019-02-28
| | | Signed-off-by: Slawomir Gonet <slawek@otwiera.cz>
* Translate smartmon.py to Python (#1225)Julian Kornberger2019-02-27
| | | | | | * Add smartmon.py python port of the smartmon.sh bash script Signed-off-by: Arthur Skowronek <ags@digineo.de>
* Add the inotify-instances text collector (#1186)Saj Goonatilleke2019-02-27
| | | | | | | | | | | | | | | | | | | | | | | | This is an alternative take on the embedded inotify collector: https://github.com/prometheus/node_exporter/pull/988 The proposed embedded collector was not accepted for inclusion because it was not possible for a single unprivileged node_exporter process to detect inotify resource utilisation in other user domains. This text collector works around the problem by giving the operator a choice between the following: - Run only the text collector as root to gain visibility over all processes on the system. - Run one or more instances of the text collector as an unprivileged user to gain visibility over subsets of the system. In either case, the data generated by this collector can be useful when hunting down inotify instance leaks -- and when confirming the resolution of such leaks. Signed-off-by: Saj Goonatilleke <sg@redu.cx>
* remove "-n" flag from /usr/bin/awk (#1269)Cole White2019-02-23
| | | | | | | This flag causes no ipmi data to be emitted and an error log is generated on each invocation: "awk: not an option: -nf". I was unable to locate a "-n" flag in the mawk or gawk man pages, so I tested it by manually changing the script on a running Debian buster system. The issue was resolved and metrics were emitted. Signed-off-by: Cole White <cwhite@wikimedia.org>
* ADD Cachevault_Info.Temp, being a distinct phy component, I think it's worth ↵Nuno Tavares2019-02-21
| | | | | monitoring (#1268) Signed-off-by: Nuno Tavares <n.tavares@portavita.eu>
* add md_info_detail.sh (#1204)mpursley2019-02-10
| | | Signed-off-by: Matt Pursley <mpursley@gmail.com>
* add physical disk "state" to megaraid_pd_info metric (#1226)mpursley2019-01-31
| | | Signed-off-by: Matt Pursley <mpursley@gmail.com>
* Add S.M.A.R.T metrics (#1209)Dai Dang Van2019-01-03
| | | | | | | | | | Update metrics following SMART attributes in [1][2] - Seek_Error_Rate - ID: 7 - Reallocated_Event_Count - ID: 196 [1] https://en.wikipedia.org/wiki/S.M.A.R.T.#Known_ATA_S.M.A.R.T._attributes [2] https://en.wikibooks.org/wiki/Minimizing_Hard_Disk_Drive_Failure_and_Data_Loss/Self-Monitoring,_Analysis,_and_Reporting_Technology Signed-off-by: Dai, Dang Van <daikk115@gmail.com>
* Add a sample btrfs stats collector script (#1200)Anton Tolchanov2018-12-21
| | | Signed-off-by: Anton Tolchanov <commits@knyar.net>
* smartmon.sh: add metric for active/low-power mode (#1192)dhewg2018-12-13
| | | | | | | | | | | Add this new metric (where sda is active and sdb is in standby mode): smartmon_device_active{disk="/dev/sda",type="sat"} 1 smartmon_device_active{disk="/dev/sdb",type="sat"} 0 Also skip further metrics if the drive is in a low-power mode. This prevents spinning up disks just to get the metrics (which matches e.g. debian's default behavior for smartd). Signed-off-by: Andre Heider <a.heider@gmail.com>
* Handle 'Unknown' as measurement value. (#1113)Andreas Wirooks2018-11-23
| | | | | | | | | We use the output-compatible perccli and storcli.py does not handle 'Unknown' as a result: ``` sg="Error parsing \"/var/lib/node_exporter/perccli.prom\": text format parsing error in line 222: expected float as value, got \"Unknown\"" source="textfile.go:212" ``` I know, the perccli should not return 'Unknown' but this error breaks all other useful measurements because the prom file is not parsable. My if condition fixes this. Signed-off-by: Andreas Wirooks <andreas.wirooks@1und1.de>
* textfile example storcli enhancements (#1145)Christopher Blum2018-11-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * storcli.py: Remove IntEnum This removes an external dependency. Moved VD state to VD info labels * storcli.py: Fix BBU health detection BBU Status is 0 for a healthy cache vault and 32 for a healthy BBU. * storcli.py: Strip all strings from PD Strip all strings that we get from PDs. They often contain whitespaces.... * storcli.py: Add formatting options Add help text explaining how this documented was formatted * storcli.py: Add DG to pd_info label Add disk group to pd_info. That way we can relate to PDs in the same DG. For example to check if all disks in one RAID use the same interface... * storcli.py: Fix promtool issues Fix linting issues reported by promtool check-metrics * storcli.py: Exit if storcli reports issues storcli reports if the command was a success. We should not continue if there are issues. * storcli.py: Try to parse metrics to float This will sanitize the values we hand over to node_exporter - eliminating any unforeseen values we read out... * storcli.py: Refactor code to implement handle_sas_controller() Move code into methods so that we can now also support HBA queries. * storcli.py: Sort inputs "...like a good python developer" - Daniel Swarbrick * storcli.py: Replace external dateutil library with internal datetime Removes external dependency... * storcli.py: Also collect temperature on megaraid cards We have already collected them on mpt3sas cards... * storcli.py: Clean up old code Removed dead code that is not used any more. * storcli.py: strip() all information for labels They often contain whitespaces... * storcli.py: Try to catch KeyErrors generally If some key we expect is not there, we will want to still print whatever we have collected so far... * storcli.py: Increment version number We have made some changes here and there. The general look of the data has not been changed. * storcli.py: Fix CodeSpell issue Split string to avoid issues with Codespell due to Celcius in JSON Key Signed-off-by: Christopher Blum <zeichenanonym@web.de>
* Introduce example to get pending updates from pacman (#1114)Sven Haardiek2018-11-05
| | | | | | * Introduce example to get pending updates from pacman Signed-off-by: Sven Haardiek <sven@haardiek.de>
* Add mellanox_hca_temp text collector example (#1128)Benjamin Drung2018-11-01
| | | | | | | | | | | | | | * deleted_libraries: Upgrade to Python 3 Python 2.7 will not be maintained past 2020. Therefore upgrade text_collector_examples/deleted_libraries.py to Python 3. * Add mellanox_hca_temp text collector example mellanox_hca_temp is a script that reads Mellanox HCA temperature using the Mellanox mget_temp_ext tool. Signed-off-by: Benjamin Drung <benjamin.drung@cloud.ionos.com>
* textfile example script rework (#1074)Christopher Blum2018-09-18
| | | | | | | | | | | | | | * textfile smartmon.sh Added functions to also parse megaraid disks. Added parsing to also detect the grown_defects counters. * textfile storcli.py Reworked the example file to export lots more information about megaraid attached controllers, VDs and PDs. Signed-off-by: Christopher Blum <christopher.blum@profitbricks.com>
* Note how to get moreutils on FreeBSD (#1073)Mateusz Piotrowski2018-09-14
| | | Signed-off-by: Mateusz Piotrowski <0mp@FreeBSD.org>
* Add metrics exposing extended md RAID info (#958)Matt Bostock2018-08-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add metrics that expose more information about MD RAID devices and disks: - the RAID level in use - the RAID set that a disk belongs to This allows for things like alert on unusually high I/O utilisation for a disk compared to other disks in the same RAID set, which usually means the disk is failing, and for comparing write/read latency across RAID sets. Output looks like: node_md_disk_info{disk_device="/dev/dm-0", md_device="md1", md_set="A"} 1 node_md_disk_info{disk_device="/dev/dm-3", md_device="md1", md_set="B"} 1 node_md_disk_info{disk_device="/dev/dm-2", md_device="md1", md_set="A"} 1 node_md_disk_info{disk_device="/dev/dm-1", md_device="md1", md_set="B"} 1 node_md_disk_info{disk_device="/dev/dm-4", md_device="md1", md_set="A"} 1 node_md_disk_info{disk_device="/dev/dm-5", md_device="md1", md_set="B"} 1 node_md_info{md_device="md1", md_name="foo", raid_level="10", md_metadata_version="1.2"} 1 The `node_md_info` metric, which gives additional information about the RAID array, is intentionally separate to avoid adding all of those labels to each disk. If you need to query using the labels contained in `node_md_info`, you can do that using PromQL: https://www.robustperception.io/how-to-have-labels-for-machine-roles/ I looked at adding the array UUID, but there's no sysfs entry for it and I'm not sure there's a strong use case for it. This patch to add a sysfs entry for the UUID was apparently not accepted: https://www.spinics.net/lists/raid/msg40667.html Add these metrics as a textfile script rather than adding them to the Go 'md' module as they're perhaps less commonly useful. If lots of people find them useful, we can later rewrite this in Go. Signed-off-by: Matt Bostock <mbostock@cloudflare.com>
* Add scsi smart data to prometheus exporter (#862)Bernd Müller2018-07-04
| | | | | Add scsi smart data to prometheus exporter Signed-off-by: mueller <mueller@b1-systems.de>
* Fix spelling of celsius in IPMI example script (#967)Matt Bostock2018-06-08
| | | | | | 'Celsius' should be spelt with an 's': https://en.wikipedia.org/wiki/Celsius Signed-off-by: Matt Bostock <mbostock@cloudflare.com>
* Add metric for outdated libraries (#957)Matt Bostock2018-05-25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add metrics that count how many running processes are linking to deleted libraries on each machine. Deleted libraries are usually outdated libraries, and outdated libraries may have known security vulnerabilities. The rationale behind storing these as metrics is allow the rollout of security fixes to be tracked across a fleet of machines, ensuring that all affected processes are restarted (e.g. via a reboot). I'm parsing the output from `/proc/*/maps` because it's using `lsof -d DEL` can be too slow, particularly if you have sockets that bind to thousands of IP addresses. The metric labels include the library path and the base filename, which allows us to pinpoint the exact path of the deleted library but also allows us to aggregate on the library name (or approximations of it) even if library locations differ between operating system versions. The metrics output and the CPU time consumed is as follows: user@host:~$ time sudo python processes.py # HELP node_processes_linking_deleted_libraries Count of running processes that link a deleted library # TYPE node_processes_linking_deleted_libraries gauge node_processes_linking_deleted_libraries{library_path="locale-archive", library_name="/usr/lib/locale"} 3 node_processes_linking_deleted_libraries{library_path="libevent-2.0.so.5.1.9", library_name="/usr/lib/x86_64-linux-gnu"} 4 real 0m0.071s user 0m0.030s sys 0m0.041s Including the library filename and path will result in reasonably high metrics cardinality, however I think the benefits when an urgent security patch is being deployed outweigh concerns around cardinality. This script assumes that library files do not contain spaces in their path. Signed-off-by: Matt Bostock <mbostock@cloudflare.com>
* Fix metric name in directory size text collector exampleSandor Zeestraten2018-05-19
| | | | | | | | The directory size text collector example uses the wrong metric name in the HELP and TYPE lines rendering the comments unusable. This fixes that by using the same metric name. Signed-off-by: Sandor Zeestraten <sandor@zeestrataca.com>
* added additional smartmonattrsmueller2018-03-22
| | | | Signed-off-by: mueller <mueller@b1-systems.de>
* Document use of atomic wrapper (#781)Ben Kochie2018-02-27
| | | Document how to use `sponge` to atomic update textfiles.
* add sample directory size exporter (#789)anarcat2018-02-21
| | | | | | | | | | | | | | | * add sample directory size exporter This is a possible workaround for the lack of metrics in the new storage backend, as documented in: https://github.com/prometheus/prometheus/issues/3684 Partly inspired by this post as well: https://www.robustperception.io/monitoring-directory-sizes-with-the-textfile-collector/ * properly escape backslashes and double-quotes
* Fix apt.sh syntax (#811)tobald2018-02-05
| | | | | | This patch fixes: ./apt.test: command substitution: line 19: syntax error near unexpected token `|' ./apt.test: command substitution: line 19: ` | /usr/bin/sort | /usr/bin/uniq -c | awk '{ gsub(/\\\\/,
* Escape double quotes in device model family (#772)Shevchenko Vitaliy2018-01-24
|
* Fix smartmon.sh bugs (#792)Ben Kochie2018-01-22
| | | | | | * Fix smartmon.sh info label consistency. * Fix parsing of SMART-ID attributes <= 99.
* Update storcli.py (#783)Bruce Lee2018-01-09
|
* StorCli text collector: fix pylint issues and handle StorCli not installed ↵Mario Trangoni2017-12-12
| | | | | | | | (#758) * StorCli text collector: fix pylint issues and handle StorCli not installed * StorCli text collector: Add HELP and TYPE strings.
* apt.sh: handle multiple origins in apt-get output (#757)Filippo Giunchedi2017-12-12
| | | | | | | | | | It might happen that a given upgrade comes from multiple origins, in which case the origins are separated by ", " and thus breaking whitespace-based split. For example: Inst package [1.2.3] (1.2.4 Debian:8.10/oldstable, Debian-Security:8/oldstable [amd64]) To workaround this case, mangle the apt-get output to remove whitespaces from the origins list.
* Added text collector conversion for ipmitool output. (#746)Derek Marcotte2017-12-01
| | | | | | | | | | * Added text collector conversion for ipmitool output. * Sort metrics before exporting, add namespace. * Added HELP string, tidy up a bit. * Make status a gauge.
* added Wear_Leveling_Count attribute to smartmon.sh script (#707)William2017-10-19
|
* Fix smartmon.sh textfile script (#700)Ben Kochie2017-10-18
| | | | | | | | | When there are no SMART compatible devices (Raspberry Pi for example) an error is returned, but the return code is still 0. `# scan_smart_devices: glob(3) aborted matching pattern /dev/discs/disc*` * Remove unused `disks` variable. * Filter for only valid `/dev` devices.
* Add text file helper for apt-get. (#680)Ben Kochie2017-10-04
| | | | * Add metric for pending upgrades. * Add metric for pending reboot required.
* Always try to return smartmon_device_info metric (#663)Matt Bostock2017-08-31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Always try to return smartmon_device_info metric Sometimes the 'model family' field is not returned by `smartctl' because a disk is not in the disk database for the version of smartmontools installed on the system. In those cases, the device model and serial number is still returned (at least as far as I have observed. Re-work the logic to prefer the 'vendor' field first, and if not present, always output a `smartmon_device_info` metric even if some labels have empty values. On the box I'm testing this on, where previously no metric was returned, it now returns: # HELP smartmon_device_info SMART metric device_info # TYPE smartmon_device_info gauge smartmon_device_info{disk="/dev/sda",type="sat",model_family="",device_model="INTEL REDACTED",serial_number="REDACTED",firmware_version="REDACTED"} 1 smartmon_device_info{disk="/dev/sdb",type="sat",model_family="",device_model="INTEL REDACTED",serial_number="REDACTED",firmware_version="REDACTED"} 1 smartmon_device_info{disk="/dev/sdc",type="sat",model_family="",device_model="INTEL REDACTED",serial_number="REDACTED",firmware_version="REDACTED"} 1 smartmon_device_info{disk="/dev/sdd",type="sat",model_family="",device_model="INTEL REDACTED",serial_number="REDACTED",firmware_version="REDACTED"} 1 smartmon_device_info{disk="/dev/sde",type="sat",model_family="",device_model="INTEL REDACTED",serial_number="REDACTED",firmware_version="REDACTED"} 1 smartmon_device_info{disk="/dev/sdf",type="sat",model_family="",device_model="INTEL REDACTED",serial_number="REDACTED",firmware_version="REDACTED"} 1 * Add trailing newline Because POSIX: https://stackoverflow.com/a/729795
* Added metric for overall health status check to smartmon.sh example scriptWilliam Cooley2017-04-05
|
* Handle smart raw values >2^31Rene Treffer2017-03-21
| | | "%d" in awk will truncate values at 2^31. S.M.A.R.T. values can exceed that, thus use a floating point notation instead to encode larger values (at the possible cost of some precision).