Resolves: #15559
As accurately noted in the issue description, the map is shared among
child loggers that get created when `WithAttr()`/`WithGroup()` are
called on the underlying handler, which happens via `log.With()` and
`log.WithGroup()` respectively.
The RW mutex was a value in the previous implementation that used
go-kit/log, and I should've updated it to use a pointer when I converted
the deduper.
Also adds a test.
Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>
Reduce string manipulation by just cutting off the histogram suffixes from
the series name label once.
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Resolves: #15433
When I converted prometheus to use slog in #14906, I update both the
`QueryLogger` interface, as well as how the log calls to the
`QueryLogger` were built up in `promql.Engine.exec()`. The backing
logger for the `QueryLogger` in the engine is a
`util/logging.JSONFileLogger`, and it's implementation of the `With()`
method updates the logger the logger in place with the new keyvals added
onto the underlying slog.Logger, which means they get inherited onto
everything after. All subsequent calls to `With()`, even in later
queries, would continue to then append on more and more keyvals for the
various params and fields built up in the logger. In turn, this causes
unbounded growth of the logger, leading to increased memory usage, and
in at least one report was the likely cause of an OOM kill. More
information can be found in the issue and the linked slack thread.
This commit does a few things:
- It was referenced in feedback in #14906 that it would've been better
to not change the `QueryLogger` interface if possible, this PR
proposes changes that bring it closer to alignment with the pre-3.0
`QueryLogger` interface contract
- reverts `promql.Engine.exec()`'s usage of the query logger to the
pattern of building up an array of args to pass at once to the end log
call. Avoiding the repetitious calls to `.With()` are what resolve the
issue with the logger growth/memory usage.
- updates the scrape failure logger to use the update `QueryLogger`
methods in the contract.
- updates tests accordingly
- cleans up unused methods
Builds and passes tests successfully. Tested locally and confirmed I
could no longer reproduce the issue/it resolved the issue.
Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>
PromQL: Correct the behaviour of some operator and aggregators with Native Histograms
---------
Signed-off-by: Neeraj Gartia <neerajgartia211002@gmail.com>
In general aim for the happy case when the exposer lists the buckets
in ascending order.
Use Compact(2) to compact the result of nhcb convert.
This is more in line with how client_golang optimizes spans vs
buckets.
aef8aedb4b/prometheus/histogram.go (L1485)
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
I used these wrapper methods during initial development of the custom
handler that the deduper now implements. Since the deduper implements
slog.Handler and can be used directly as a logger, these wrapper methods
are no longer needed.
Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>
This change should have been included in the initial prometheus slog
conversion, but I must've lost track of it in all the rebases involved
in that PR.
This changes the dedupe logger so that the only method that needs to use
the lock is the `Handle()` method that actually interacts with the
deduplication map.
Ex:
```
==================
WARNING: DATA RACE
Write at 0x00c000518bc0 by goroutine 29481:
github.com/prometheus/prometheus/util/logging.(*Deduper).WithAttrs()
/home/tjhop/go/src/github.com/prometheus/prometheus/util/logging/dedupe.go:89 +0xef
log/slog.(*Logger).With()
/home/tjhop/.asdf/installs/golang/1.23.1/go/src/log/slog/logger.go:132 +0x106
github.com/prometheus/prometheus/storage/remote.NewQueueManager()
/home/tjhop/go/src/github.com/prometheus/prometheus/storage/remote/queue_manager.go:483 +0x7a9
github.com/prometheus/prometheus/storage/remote.(*WriteStorage).ApplyConfig()
/home/tjhop/go/src/github.com/prometheus/prometheus/storage/remote/write.go:201 +0x102c
github.com/prometheus/prometheus/storage/remote.(*Storage).ApplyConfig()
/home/tjhop/go/src/github.com/prometheus/prometheus/storage/remote/storage.go:92 +0xfd
github.com/prometheus/prometheus/storage/remote.TestWriteStorageApplyConfigsDuringCommit.func1()
/home/tjhop/go/src/github.com/prometheus/prometheus/storage/remote/storage_test.go:172 +0x3e4
github.com/prometheus/prometheus/storage/remote.TestWriteStorageApplyConfigsDuringCommit.gowrap1()
/home/tjhop/go/src/github.com/prometheus/prometheus/storage/remote/storage_test.go:174 +0x41
Previous read at 0x00c000518bc0 by goroutine 31261:
github.com/prometheus/prometheus/util/logging.(*Deduper).Handle()
/home/tjhop/go/src/github.com/prometheus/prometheus/util/logging/dedupe.go:82 +0x2b1
log/slog.(*Logger).log()
/home/tjhop/.asdf/installs/golang/1.23.1/go/src/log/slog/logger.go:257 +0x228
log/slog.(*Logger).Error()
/home/tjhop/.asdf/installs/golang/1.23.1/go/src/log/slog/logger.go:230 +0x3d4
github.com/prometheus/prometheus/tsdb/wlog.(*Watcher).loop()
/home/tjhop/go/src/github.com/prometheus/prometheus/tsdb/wlog/watcher.go:254 +0x2db
github.com/prometheus/prometheus/tsdb/wlog.(*Watcher).Start.gowrap1()
/home/tjhop/go/src/github.com/prometheus/prometheus/tsdb/wlog/watcher.go:227 +0x33
Goroutine 29481 (running) created at:
github.com/prometheus/prometheus/storage/remote.TestWriteStorageApplyConfigsDuringCommit()
/home/tjhop/go/src/github.com/prometheus/prometheus/storage/remote/storage_test.go:164 +0xe4
testing.tRunner()
/home/tjhop/.asdf/installs/golang/1.23.1/go/src/testing/testing.go:1690 +0x226
testing.(*T).Run.gowrap1()
/home/tjhop/.asdf/installs/golang/1.23.1/go/src/testing/testing.go:1743 +0x44
Goroutine 31261 (running) created at:
github.com/prometheus/prometheus/tsdb/wlog.(*Watcher).Start()
/home/tjhop/go/src/github.com/prometheus/prometheus/tsdb/wlog/watcher.go:227 +0x177
github.com/prometheus/prometheus/storage/remote.(*QueueManager).Start()
/home/tjhop/go/src/github.com/prometheus/prometheus/storage/remote/queue_manager.go:934 +0x304
github.com/prometheus/prometheus/storage/remote.(*WriteStorage).ApplyConfig()
/home/tjhop/go/src/github.com/prometheus/prometheus/storage/remote/write.go:232 +0x151b
github.com/prometheus/prometheus/storage/remote.(*Storage).ApplyConfig()
/home/tjhop/go/src/github.com/prometheus/prometheus/storage/remote/storage.go:92 +0xfd
github.com/prometheus/prometheus/storage/remote.TestWriteStorageApplyConfigsDuringCommit.func1()
/home/tjhop/go/src/github.com/prometheus/prometheus/storage/remote/storage_test.go:172 +0x3e4
github.com/prometheus/prometheus/storage/remote.TestWriteStorageApplyConfigsDuringCommit.gowrap1()
/home/tjhop/go/src/github.com/prometheus/prometheus/storage/remote/storage_test.go:174 +0x41
==================
--- FAIL: TestWriteStorageApplyConfigsDuringCommit (2.26s)
testing.go:1399: race detected during execution of test
FAIL
FAIL github.com/prometheus/prometheus/storage/remote 68.321s
```
Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>
Fix some edge cases when OOO is enabled
Signed-off-by: Vanshikav123 <vanshikav928@gmail.com>
Signed-off-by: Vanshika <102902652+Vanshikav123@users.noreply.github.com>
Signed-off-by: Jesus Vazquez <jesusvzpg@gmail.com>
Co-authored-by: Jesus Vazquez <jesusvzpg@gmail.com>
promql: corrects binary operators functioning for mixed sample with histogram and float
For invalid pairings of sample types, an annotation is added now.
Signed-off-by: Neeraj Gartia <neerajgartia211002@gmail.com>
---------
Signed-off-by: Neeraj Gartia <neerajgartia211002@gmail.com>
* model: move classic to NHCB conversion into its own file
In preparation for #14978.
Author: Jeanette Tan <jeanette.tan@grafana.com> 2024-07-03 11:56:48
Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Co-authored-by: Jeanette Tan <jeanette.tan@grafana.com>
Co-authored-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
* Better naming from review comment
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
* Add doc strings.
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
---------
Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Co-authored-by: Jeanette Tan <jeanette.tan@grafana.com>
For: #14355
This commit updates Prometheus to adopt stdlib's log/slog package in
favor of go-kit/log. As part of converting to use slog, several other
related changes are required to get prometheus working, including:
- removed unused logging util func `RateLimit()`
- forward ported the util/logging/Deduper logging by implementing a small custom slog.Handler that does the deduping before chaining log calls to the underlying real slog.Logger
- move some of the json file logging functionality to use prom/common package functionality
- refactored some of the new json file logging for scraping
- changes to promql.QueryLogger interface to swap out logging methods for relevant slog sugar wrappers
- updated lots of tests that used/replicated custom logging functionality, attempting to keep the logical goal of the tests consistent after the transition
- added a healthy amount of `if logger == nil { $makeLogger }` type conditional checks amongst various functions where none were provided -- old code that used the go-kit/log.Logger interface had several places where there were nil references when trying to use functions like `With()` to add keyvals on the new *slog.Logger type
Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>
Several things done here:
- Set `max-issues-per-linter` to 0 so that we actually see all linter
warnings and not just 50 per linter. (As we also set
`max-same-issues` to 0, I assume this was the intention from the
beginning.)
- Stop using the golangci-lint default excludes (by setting
`exclude-use-default: false`. Those are too generous and don't match
our style conventions. (I have re-added some of the excludes
explicitly in this commit. See below.)
- Re-add the `errcheck` exclusion we have used so far via the
defaults.
- Exclude the signature requirement `govet` has for `Seek` methods
because we use non-standard `Seek` methods a lot. (But we keep other
requirements, while the default excludes completely disabled the
check for common method segnatures.)
- Exclude warnings about missing doc comments on exported symbols. (We
used to be pretty adamant about doc comments, but stopped that at
some point in the past. By now, we have about 500 missing doc
comments. We may consider reintroducing this check, but that's
outside of the scope of this commit. The default excludes of
golangci-lint essentially ignore doc comments completely.)
- By stop using the default excludes, we now get warnings back on
malformed doc comments. That's the most impactful change in this
commit. It does not enforce doc comments (again), but _if_ there is
a doc comment, it has to have the recommended form. (Most of the
changes in this commit are fixing this form.)
- Improve wording/spelling of some comments in .golangci.yml, and
remove an outdated comment.
- Leave `package-comments` inactive, but add a TODO asking if we
should change that.
- Add a new sub-linter `comment-spacings` (and fix corresponding
comments), which avoids missing spaces after the leading `//`.
Signed-off-by: beorn7 <beorn@grafana.com>
* split warnings and info annotations in API response
Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>
* update according to code review
Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>
* minimal UI change: show infos in different colour
Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>
* Update web/ui/react-app/src/pages/graph/Panel.tsx
Co-authored-by: Julius Volz <julius.volz@gmail.com>
Signed-off-by: zenador <zenador@users.noreply.github.com>
---------
Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>
Signed-off-by: zenador <zenador@users.noreply.github.com>
Co-authored-by: Julius Volz <julius.volz@gmail.com>
* tsdb: check for context cancel before regex matching postings
Regex matching can be heavy if the regex takes a lot of cycles to
evaluate and we can get stuck evaluating postings for a long time
without this fix. The constant checkContextEveryNIterations=100
may be changed later.
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
* process custom values in histogram unit test framework
* check for warnings when evaluating in unit test framework
* add test cases for custom buckets in test framework
Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>