prometheus_node_collector - Prometheus Node Collector, local patches

	Commit message (Collapse)	Author	Age
*	Merge pull request #1657 from povilasv/NodeTextFileCollectorScrapeError	Frederic Branczyk	2020-04-30
\|\ \| \| \| \|	Add NodeTextFileCollectorScrapeError alert to mixin
\| *	Add NodeTextFileCollectorScrapeError alert to mixin	Povilas Versockas	2020-03-31
\| \| \| \| \| \| \| \|	Signed-off-by: Povilas Versockas <p.versockas@gmail.com>
* \|	fix typo in TIME.md (#1670)	jangdm	2020-04-09
\| \| \| \| \| \| \| \| \| \|	fix typo in TIME.md Signed-off-by: jangdm <jamin4@naver.com>
* \|	Add more compatible rules	WOO CHANG HO	2020-04-08
\| \| \| \| \| \| \| \|	Signed-off-by: zodiac12k <zodiac12k@gmail.com>
* \|	Fix sign error in `NodeClockSkewDetected`	beorn7	2020-03-25
\| \| \| \| \| \| \| \|	Signed-off-by: beorn7 <beorn@grafana.com>
* \|	docs/node-mixin: alert on desynchronised clock	paulfantom	2020-03-23
\|/ \| \| \|	Signed-off-by: paulfantom <pawel@krupa.net.pl>
*	Add missing coma	Neraud	2020-03-21
\| \| \| \|	Signed-off-by: Neraud <neraud.login@gmail.com>
*	Add NodeHighNumberConntrackEntriesUsed	Povilas Versockas	2020-03-20
\| \| \| \|	Signed-off-by: Povilas Versockas <p.versockas@gmail.com>
*	Make FS space alerts thresholds configurable (#1624)	iuri aranda	2020-03-02
\| \| \| \| \| \| \| \| \| \| \| \|	* Make FS space alerts thresholds configurable (#1) This makes it possible to tweak the thresholds for the NodeFilesystemSpaceFillingUp alerts. Which might be necessary in systems like Kubernetes, where the image garbage collector runs at 85%, so it's not a problem that the disk reaches that usage %. Signed-off-by: iuri aranda <iuri@skyscrapers.eu>
*	docs/node-mixin/dashboards: do not mix tabs and spaces	paulfantom	2019-11-01
\| \| \| \|	Signed-off-by: paulfantom <pawel@krupa.net.pl>
*	Fix the normalization for the cluster-wide dashboards	beorn7	2019-10-30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We actually have to count or sum, respectively, _all_ the selected metrics for the cluster-wide view. Which means it's easiest to use the `scalar` approach after all (but only in the cluster dashboard). This still propagates all the labels. I have extended the comment for the `nodeExporterSelector` to note that the cluster dashboard only makes sense if all the selected node exporter actually belong to the same cluster. Since this is jsonnet, users can easily disable the cluster dashboard. Or even create multiple instances of the dashboards with different `nodeExporterSelector`s for different clusters. Signed-off-by: beorn7 <beorn@grafana.com>
*	docs/node-mixin: Improve memory pressure rule	Benoît Knecht	2019-10-28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The `instance:node_memory_swap_io_pages:rate1m` rule was intended to measure the amount of memory pressure a system is under, but its name is a bit misleading (it specifically refers to swap), and the rate of `node_vmstat_pgmajfault` is a better metric for memory pressure (see #1524). This commit renames `instance:node_memory_swap_io_pages:rate1m` to `instance:node_vmstat_pgmajfault:rate1m`, and defines it as `rate(node_vmstat_pgmajfault{%(nodeExporterSelector)s}[1m])`. The dashboards are updated accordingly. Signed-off-by: Benoît Knecht <benoit.knecht@fsfe.org>
*	Two quick typo fixes	Scott Brenner	2019-10-09
\| \| \| \|	Signed-off-by: Scott Brenner <scott@scottbrenner.me>
*	Merge pull request #1482 from ↵	Björn Rabenstein	2019-09-26
\|\ \| \| \| \| \| \| \| \|	leojonathanoh/fix-node-mixin-prometheus-alert-rules-to-use-percentage Fix node-mixin prometheus alert rules to use percentage
\| *	Fix node-mixin prometheus alert rules to use percentage	Leo	2019-09-11
\| \| \| \| \| \| \| \|	Signed-off-by: Leo <leonardjonathanoh@live.com>
* \|	node-mixin: fix configuration for unset fsSelector/diskDeviceSelector	Sergiusz Urbaniak	2019-09-12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As per https://github.com/prometheus/node_exporter/pull/1429#discussion_r304210103 we want to fetch all devices and all fs types. Currently, this is done by setting empty string which breaks most queries which rely on it. This fixes it by setting the appropriate selector instead of empty string. Signed-off-by: Sergiusz Urbaniak <sergiusz.urbaniak@gmail.com>
* \|	node-mixin: fix query in Disk Space Utilisation dashboard	Sergiusz Urbaniak	2019-09-12
\|/ \| \| \|	Signed-off-by: Sergiusz Urbaniak <sergiusz.urbaniak@gmail.com>
*	Node mixin: Clarify dashboard dependency on rules (#1475)	Björn Rabenstein	2019-09-08
\| \| \| \| \| \|	Following @discordianfish's suggestion [here](https://github.com/prometheus/node_exporter/issues/1454#issuecomment-524225222). Signed-off-by: beorn7 <beorn@grafana.com>
*	Update legendLink	beorn7	2019-08-20
\| \| \| \| \| \| \|	This still had the 'k8s' in as it was copied and pasted from the kubernetes-mixin. Signed-off-by: beorn7 <beorn@grafana.com>
*	Merge pull request #1449 from prometheus/beorn7/mixin3	Björn Rabenstein	2019-08-19
\|\ \| \| \| \|	node-mixin: Make the severity of "critical" alerts configurable
\| *	Make the severity of "critical" alerts configurable	beorn7	2019-08-14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This addresses the blissful scenario where single-node failures are unproblematic. No reason to wake somebody up if a node is about to screw itself up by filling the disk. Signed-off-by: beorn7 <beorn@grafana.com>
* \|	Add line for number of cores to load graph	beorn7	2019-08-15
\| \| \| \| \| \| \| \| \| \| \| \|	Backported from the node dashboard in the kubernetes-mixin. Signed-off-by: beorn7 <beorn@grafana.com>
* \|	Fix title of CPU panel to usage	beorn7	2019-08-15
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	We use the `mode="idle"` metric, but we are inverting it, so this is usage, and that's intended. Signed-off-by: beorn7 <beorn@grafana.com>
* \|	node-mixin: Improve disk usage panel	beorn7	2019-08-15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- Use a stacked graph instead of a gauge as development over time is especially useful for disk space usage. - By only taking one metric per device into account, we avoid double-counting for devices that are mounted multiple times. Signed-off-by: beorn7 <beorn@grafana.com>
* \|	node-mxin: Improve nodes dashboard (#1448)	Björn Rabenstein	2019-08-15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* node-mixin: Improve nodes dashboard - Use stacking where it makes sense. - Normalize idle CPU so that stacking is more meaningful. - Consistently fill where stacking is used but don't fill where not. - Fix y axis max value for Idle CPU panel. - Fix y axis min value for memory usage panel. - Use `$__interval` for range where applicable (and set min step to 1m). - Make the right Y axis for disk I/O actually work. This is just an incremental improvements. It doesn't touch the more involved TODOs. Signed-off-by: beorn7 <beorn@grafana.com>
* \|	node-mixin: Fix various straight-forward issues in the USE dashboards	beorn7	2019-08-13
\|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- Normalize cluster memory utilisation. - Fix missing `1m` in memory saturation. - Have both disk-related row next to each other instead with the network row in between. - Correctly render transmit network traffic as negative, using `seriesOverrides` and `min: null` for the y-axis. - Make panel and row naming consistent. - Remove legend where it would just display a single entry with exactly the title of the panel. - Fix metric name in individual node CPU Saturation panel. - Break up disk space utilisation by device in the panel for an individual node. NB: All of that doesn't touch any more subtle issues captured in the various TODOs. Signed-off-by: beorn7 <beorn@grafana.com>
*	docs/node-mixin: move fsSelector and diskDeviceSelector to the end of query	paulfantom	2019-07-24
\| \| \| \| \| \| \| \| \|	This will cause a query to be valid even if values of selector are empty. Additionally fixing query responsible for disk space usage. Signed-off-by: paulfantom <pawel@krupa.net.pl>
*	Added `_excluding_lo` to name of network rules that exclude lo	beorn7	2019-07-22
\| \| \| \|	Signed-off-by: beorn7 <beorn@grafana.com>
*	Improvement of comments and panel titles	beorn7	2019-07-22
\| \| \| \|	Signed-off-by: beorn7 <beorn@grafana.com>
*	Break out device in disk IO rules/dashboard	beorn7	2019-07-18
\| \| \| \|	Signed-off-by: beorn7 <beorn@grafana.com>
*	Removed unneeded `sum_` and `avg_` from rule names	beorn7	2019-07-18
\| \| \| \|	Signed-off-by: beorn7 <beorn@grafana.com>
*	Responses to review comments, round 3	beorn7	2019-07-17
\| \| \| \|	Signed-off-by: beorn7 <beorn@grafana.com>
*	Convert annotations from message to summary/description	beorn7	2019-07-16
\| \| \| \|	Signed-off-by: beorn7 <beorn@grafana.com>
*	Address review comments, batch 2	beorn7	2019-07-16
\| \| \| \|	Signed-off-by: beorn7 <beorn@grafana.com>
*	Make more use of config.libsonnet	beorn7	2019-07-16
\| \| \| \|	Signed-off-by: beorn7 <beorn@grafana.com>
*	Address first batch of old review comments	beorn7	2019-07-16
\| \| \| \|	Signed-off-by: beorn7 <beorn@grafana.com>
*	Make selector naming consistent	beorn7	2019-07-10
\| \| \| \|	Signed-off-by: beorn7 <beorn@grafana.com>
*	Fix indentation	beorn7	2019-07-10
\| \| \| \|	Signed-off-by: beorn7 <beorn@grafana.com>
*	(Re-)adjust to Grafana gauge expecting percentage 0-100 (rather than 1-0)	beorn7	2019-07-10
\| \| \| \|	Signed-off-by: beorn7 <beorn@grafana.com>
*	Use promgrafonnet as a vendored library from its source	beorn7	2019-07-06
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The only deviation that happened so far is to use format="percentunit" in a Grafana gauge. This change wasn't even properly used in this repo so far, so I opted to stick with "upstream" for now. If changes are really needed, we can try to change upstream first. Another change was done in parallal here and upstream, but it was "more correct" in upstream. (Change datasource to $datasource variable, only partially applied here.) Which is another point for using the upstream and not copy it here. Signed-off-by: beorn7 <beorn@grafana.com>
*	Add README.md	beorn7	2019-07-06
\| \| \| \|	Signed-off-by: beorn7 <beorn@grafana.com>
*	Add Makefile to easily make output files and lint sources	beorn7	2019-07-06
\| \| \| \|	Signed-off-by: beorn7 <beorn@grafana.com>
*	Create jsonnet files to create output files	beorn7	2019-07-06
\| \| \| \| \| \| \|	This allows to create YAML files with rules and JSON files with dashboard descriptions. Signed-off-by: beorn7 <beorn@grafana.com>
*	Update vendoring to current location of jsonnet-libs	beorn7	2019-07-06
\| \| \| \|	Signed-off-by: beorn7 <beorn@grafana.com>
*	Move node-mixin into docs directory	beorn7	2019-07-05
\| \| \| \|	Signed-off-by: beorn7 <beorn@grafana.com>
*	Add compat rules for node_time, node_memory_ShmemHugePages and ↵	Cougar	2018-11-05
\| \| \| \| \|	node_memory_ShmemPmdMapped (#1138) Signed-off-by: Cougar <cougar@random.ee>
*	Fix supervisord collector (#978)	Ben Kochie	2018-08-06
\| \| \| \| \| \| \| \| \| \| \| \| \|	* Replace supervisord xmlrpc library * Use `github.com/mattn/go-xmlrpc` that doesn't leak goroutines. * Fix uptime metric * Use Prometheus best practices for uptime metric. * Use "start time" rather than "uptime". * Don't emit a start time if the process is down. * Add changelog entry. * Add example compatibility rules. Signed-off-by: Ben Kochie <superq@gmail.com>
*	Fix sample rules for migration (#1022)	Rene Treffer	2018-07-27
\| \| \| \| \| \| \|	- add conversion from _ms to _seconds on disk metrics - add missing node_textfile_mtime section - add groups: header to pass promtool check rules Signed-off-by: Rene Treffer <rene.treffer@soundcloud.com>
*	Add example of translating new metrics to old format in case of migration to ↵	Ivan Kiselev	2018-07-02
\| \| \| \| \| \| \| \|	1.16 version (#982) Add additional example of how to save old metrics Signed-off-by: Ivan Kiselev <ivan@messagebird.com>
*	Add compat rules for filesystem collector. (#973)	Roman Vynar	2018-06-13
\| \| \|	Signed-off-by: Roman Vynar <roman.vynar@goquiq.com>