Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(canary): Add test to check query results with and without cache. #13104

Merged
merged 8 commits into from
Jun 3, 2024

Conversation

kavirajk
Copy link
Collaborator

@kavirajk kavirajk commented Jun 3, 2024

What this PR does / why we need it:
Recently, we added support for Cache-Control header for loki.

This PR uses it to test if same query for same data returns same results with and without results cache.

Easy to catch any bug in results cache if there are any differences.

NOTE: This currently does count_over_time instant query for the cache test. In future we can extend it for range queries as well.

This PR introduces to additional metrics for loki-canary to track any differences in this test.

loki_canary_cache_test_query_results_total{status=~"success|fail"}
loki_canary_cache_test_query_results_diff_total

Which issue(s) this PR fixes:
Fixes #

Special notes for your reviewer:

This is one of the step to gain more confidence on recently added "results cache" support for instant query.

Manually ingested logs with older timestamps to create discrepancy with results cache. And counters seems to be working fine.

# HELP loki_canary_cache_test_query_results_diff_total counts number of times the query results was different with and without cache 
# TYPE loki_canary_cache_test_query_results_diff_total counter
loki_canary_cache_test_query_results_diff_total 16
# HELP loki_canary_cache_test_query_results_total counts number of times the query results test requests are done 
# TYPE loki_canary_cache_test_query_results_total counter
loki_canary_cache_test_query_results_total{status="success"} 23

Checklist

  • Reviewed the CONTRIBUTING.md guide (required)
  • Documentation added
  • Tests updated
  • Title matches the required conventional commits format, see here
    • Note that Promtail is considered to be feature complete, and future development for logs collection will be in Grafana Alloy. As such, feat PRs are unlikely to be accepted unless a case can be made for the feature actually being a bug fix to existing behavior.
  • Changes that require user attention or interaction to upgrade are documented in docs/sources/setup/upgrade/_index.md
  • For Helm chart changes bump the Helm chart version in production/helm/loki/Chart.yaml and update production/helm/loki/CHANGELOG.md and production/helm/loki/README.md. Example PR
  • If the change is deprecating or removing a configuration option, update the deprecated-config.yaml and deleted-config.yaml files respectively in the tools/deprecated-config-checker directory. Example PR
Add new test to check correctness of queries with and without cache.

Currently only for `Count_over_time` instant metric query

Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>
Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>
Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>
@kavirajk kavirajk requested a review from a team as a code owner June 3, 2024 09:41
Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>
Copy link
Contributor

@MichelHollands MichelHollands left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some cosmetic changes required but otherwise fine

@@ -79,6 +79,10 @@ func main() {
metricTestQueryRange := flag.Duration("metric-test-range", 24*time.Hour, "The range value [24h] used in the metric test instant-query."+
" Note: this value is truncated to the running time of the canary until this value is reached")

cacheTestInterval := flag.Duration("cache-test-interval", 15*time.Minute, "The interval the cache test query should be run")
cacheTestQueryRange := flag.Duration("cache-test-range", 24*time.Hour, "The range value [24h] used in the cache test instant-query.")
cacheTestQueryNow := flag.Duration("cache-test-now", 1*time.Hour, "duration how far back from current time the execustion time (--now) should be set for running this query in the cache test instant-query.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
cacheTestQueryNow := flag.Duration("cache-test-now", 1*time.Hour, "duration how far back from current time the execustion time (--now) should be set for running this query in the cache test instant-query.")
cacheTestQueryNow := flag.Duration("cache-test-now", 1*time.Hour, "duration how far back from current time the execution time (--now) should be set for running this query in the cache test instant-query.")
done chan struct{}
cacheTestInterval time.Duration
cacheTestRange time.Duration
// how far back from current time the execustion time (--now) should be set for running this query.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// how far back from current time the execustion time (--now) should be set for running this query.
// how far back from current time the execution time (--now) should be set for running this query.
cacheTestRange time.Duration
// how far back from current time the execustion time (--now) should be set for running this query.
cacheTestNow time.Duration

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency this empty line can be removed

// seems ugly but was easy, and multiple instantiations
// of the comparator should be an error
prometheus.Unregister(responseLatency)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Empty line can be removed

Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>
Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>
Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>
Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>
@kavirajk
Copy link
Collaborator Author

kavirajk commented Jun 3, 2024

Thanks @MichelHollands addressed your comments.

@kavirajk
Copy link
Collaborator Author

kavirajk commented Jun 3, 2024

Checked with internal dev loki clusters. Manually ingested some logs with old timestamp to create the diff.

Screenshot 2024-06-03 at 17 54 31

@kavirajk kavirajk merged commit 71507a2 into main Jun 3, 2024
59 checks passed
@kavirajk kavirajk deleted the kavirajk/canary-with-cache-check branch June 3, 2024 16:18
trevorwhitney added a commit that referenced this pull request Jun 3, 2024
commit 35585db
Merge: 1822b88 47f0236
Author: Trevor Whitney <trevorjwhitney@gmail.com>
Date:   Mon Jun 3 11:43:12 2024 -0600

    Merge branch 'main' into sample-count-and-bytes

commit 1822b88
Author: Trevor Whitney <trevorjwhitney@gmail.com>
Date:   Mon Jun 3 11:42:52 2024 -0600

    fix: formatting

commit 47f0236
Author: Dylan Guedes <djmgguedes@gmail.com>
Date:   Mon Jun 3 14:41:55 2024 -0300

    feat: Introduce `index audit` to `lokitool` (#13008)

    Adds a new `index audit` command to the `lokitool` cmd.
    The new `index audit` validates that all chunks required by a given index are available at the object storage. This is useful to validate if you're missing data after a backfill or when migrating data from one Loki instance to another.
    See `pkg/tool/audit/README.md` for usage instructions.

commit 71507a2
Author: Kaviraj Kanagaraj <kavirajkanagaraj@gmail.com>
Date:   Mon Jun 3 18:18:39 2024 +0200

    feat(canary): Add test to check query results with and without cache. (#13104)

    Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>

commit 7942e57
Merge: 6ed195e 8084259
Author: Trevor Whitney <trevorjwhitney@gmail.com>
Date:   Mon Jun 3 09:27:58 2024 -0600

    Merge branch 'main' into sample-count-and-bytes

commit 6ed195e
Author: Trevor Whitney <trevorjwhitney@gmail.com>
Date:   Mon Jun 3 09:25:37 2024 -0600

    fix: nanosecond values in test with non-decimal seconds value

commit 8084259
Author: Yuri Kotov <jura.kotov@gmail.com>
Date:   Mon Jun 3 21:30:25 2024 +0700

    feat: API: Expose optional label matcher for label names API (#11982)

    Co-authored-by: Cyril Tovena <cyril.tovena@gmail.com>

commit 09faea8
Author: Yoshitaka Fujii <76274657+ystkfujii@users.noreply.github.com>
Date:   Mon Jun 3 21:16:10 2024 +0900

    docs: Fix link in examples (#13094)

    Co-authored-by: J Stickler <julie.stickler@grafana.com>

commit c8cc0fb
Author: Grot (@grafanabot) <43478413+grafanabot@users.noreply.github.com>
Date:   Mon Jun 3 12:39:26 2024 +0100

    chore( operator): community release 0.6.1 (#12593)

commit fbd2739
Author: Joao Marcal <jmarcal@redhat.com>
Date:   Mon Jun 3 12:15:47 2024 +0200

    chore(operator): prepare community release v0.6.1 (#13105)

commit 4f3ed77
Author: Robert Jacob <rojacob@redhat.com>
Date:   Mon Jun 3 11:02:15 2024 +0200

    fix(operator): Use a minimum value for replay memory ceiling (#13066)

commit cbf9fc0
Author: Trevor Whitney <trevorjwhitney@gmail.com>
Date:   Fri May 31 16:46:39 2024 -0600

    docs: update docs

commit 29febb7
Author: Trevor Whitney <trevorjwhitney@gmail.com>
Date:   Fri May 31 16:42:12 2024 -0600

    chore: make format

commit 87f7282
Author: Trevor Whitney <trevorjwhitney@gmail.com>
Date:   Fri May 31 16:34:48 2024 -0600

    chore: clean up linting

commit abb31a8
Merge: 33ead60 00d3c7a
Author: Trevor Whitney <trevorjwhitney@gmail.com>
Date:   Fri May 31 16:10:58 2024 -0600

    Merge branch 'main' into sample-count-and-bytes

commit 33ead60
Author: Trevor Whitney <trevorjwhitney@gmail.com>
Date:   Fri May 31 16:03:04 2024 -0600

    feat: hook up samples endpoint

commit eb84303
Author: Trevor Whitney <trevorjwhitney@gmail.com>
Date:   Fri May 31 13:07:49 2024 -0600

    chore: a bit of cleanup

commit 6dd77ae
Author: Trevor Whitney <trevorjwhitney@gmail.com>
Date:   Fri May 31 12:56:05 2024 -0600

    feat: refactor metric samples to be it's own endpoint

commit 2587657
Author: Trevor Whitney <trevorjwhitney@gmail.com>
Date:   Fri May 24 17:35:15 2024 -0600

    fix: grouping

commit b897fc5
Author: Trevor Whitney <trevorjwhitney@gmail.com>
Date:   Fri May 24 13:36:00 2024 -0600

    fix: ring proxy methods on pattern ring_client

commit 0bfd0ad
Merge: 68aa188 efdae3d
Author: Trevor Whitney <trevorjwhitney@gmail.com>
Date:   Thu May 23 17:04:32 2024 -0600

    Merge branch 'main' into sample-count-and-bytes

commit 68aa188
Author: Trevor Whitney <trevorjwhitney@gmail.com>
Date:   Thu May 23 17:03:32 2024 -0600

    feat: guard aggregation behavior behind a feature flag

commit f0d6a92
Author: Trevor Whitney <trevorjwhitney@gmail.com>
Date:   Thu May 23 14:03:32 2024 -0600

    feat: reject filter queries to /patterns endpoint

commit dc620e7
Author: Trevor Whitney <trevorjwhitney@gmail.com>
Date:   Wed May 8 14:08:44 2024 -0600

    feat: collect and serve pre-agg bytes and count

    * pre-aggregate bytes and count per stream in the pattern ingester
    * serve bytes_over_time and count_over_time queries from the patterns
      endpoint
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2 participants