Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test flake: TestDistributor_Push_ShouldSupportWriteBothToIngestersAndPartitions #9299

Open
seizethedave opened this issue Sep 13, 2024 · 2 comments

Comments

@seizethedave
Copy link
Contributor

Describe the bug

Received a test flake in CI: https://github.com/grafana/mimir/actions/runs/10856690387/job/30131835759#step:8:136

--- FAIL: TestDistributor_Push_ShouldSupportWriteBothToIngestersAndPartitions (0.00s)
    --- FAIL: TestDistributor_Push_ShouldSupportWriteBothToIngestersAndPartitions/should_shard_series_across_all_partitions_when_shuffle_sharding_is_disabled (3.00s)
        distributor_ingest_storage_test.go:455: 
            	Error Trace:	/__w/mimir/mimir/pkg/distributor/distributor_ingest_storage_test.go:455
            	Error:      	Not equal: 
            	            	expected: map[int32][]string{0:[]string{"series_four", "series_one", "series_three"}, 1:[]string{"series_two"}, 2:[]string{"series_five"}}
            	            	actual  : map[int32][]string{2:[]string{"series_five"}}
            	            	
            	            	Diff:
            	            	--- Expected
            	            	+++ Actual
            	            	@@ -1,10 +1,2 @@
            	            	-(map[int32][]string) (len=3) {
            	            	- (int32) 0: ([]string) (len=3) {
            	            	-  (string) (len=11) "series_four",
            	            	-  (string) (len=10) "series_one",
            	            	-  (string) (len=12) "series_three"
            	            	- },
            	            	- (int32) 1: ([]string) (len=1) {
            	            	-  (string) (len=10) "series_two"
            	            	- },
            	            	+(map[int32][]string) (len=1) {
            	            	  (int32) 2: ([]string) (len=1) {
            	Test:       	TestDistributor_Push_ShouldSupportWriteBothToIngestersAndPartitions/should_shard_series_across_all_partitions_when_shuffle_sharding_is_disabled

Environment

Github actions CI.

@dimitarvdimitrov
Copy link
Contributor

i couldn't reproduce after about 600 runs.

After reducing the time to poll from kafka from 1s to 10ms I could get a similar failure: although there it's missing all 3 partitions, not just 2/3.

diff --git a/pkg/distributor/distributor_ingest_storage_test.go b/pkg/distributor/distributor_ingest_storage_test.go
index b598ac6a8f..0ac85ac77a 100644
--- a/pkg/distributor/distributor_ingest_storage_test.go
+++ b/pkg/distributor/distributor_ingest_storage_test.go
@@ -451,7 +451,7 @@ func TestDistributor_Push_ShouldSupportWriteBothToIngestersAndPartitions(t *test
 			require.NoError(t, err)
 
 			// Ensure series has been correctly sharded to partitions.
-			actualSeriesByPartition := readAllMetricNamesByPartitionFromKafka(t, kafkaCluster.ListenAddrs(), testConfig.ingestStoragePartitions, time.Second)
+			actualSeriesByPartition := readAllMetricNamesByPartitionFromKafka(t, kafkaCluster.ListenAddrs(), testConfig.ingestStoragePartitions, time.Second/100)
 			assert.Equal(t, testData.expectedMetricsByPartition, actualSeriesByPartition)
 
 			// Ensure series have been correctly sharded to ingesters.
Failure

    distributor_ingest_storage_test.go:455: 
        	Error Trace:	/Users/dimitar/grafana/mimir/pkg/distributor/distributor_ingest_storage_test.go:455
        	Error:      	Not equal: 
        	            	expected: map[int32][]string{0:[]string{"series_four", "series_one", "series_three"}, 1:[]string{"series_two"}, 2:[]string{"series_five"}}
        	            	actual  : map[int32][]string{}
        	            	
        	            	Diff:
        	            	--- Expected
        	            	+++ Actual
        	            	@@ -1,13 +1,2 @@
        	            	-(map[int32][]string) (len=3) {
        	            	- (int32) 0: ([]string) (len=3) {
        	            	-  (string) (len=11) "series_four",
        	            	-  (string) (len=10) "series_one",
        	            	-  (string) (len=12) "series_three"
        	            	- },
        	            	- (int32) 1: ([]string) (len=1) {
        	            	-  (string) (len=10) "series_two"
        	            	- },
        	            	- (int32) 2: ([]string) (len=1) {
        	            	-  (string) (len=11) "series_five"
        	            	- }
        	            	+(map[int32][]string) {
        	            	 }

I opened a PR to log the end offsets of the topic on a failure: #9323 so we can verify if the timeout is just too tight

@dimitarvdimitrov dimitarvdimitrov removed their assignment Sep 18, 2024
@dimitarvdimitrov
Copy link
Contributor

happened again

Details

--- FAIL: TestDistributor_Push_ShouldSupportWriteBothToIngestersAndPartitions (0.00s)
    --- FAIL: TestDistributor_Push_ShouldSupportWriteBothToIngestersAndPartitions/should_shard_series_across_the_number_of_configured_partitions_/_ingesters_when_shuffle_sharding_is_enabled (3.00s)
        distributor_ingest_storage_test.go:456: 
            	Error Trace:	/__w/mimir/mimir/pkg/distributor/distributor_ingest_storage_test.go:456
            	Error:      	Not equal: 
            	            	expected: map[int32][]string{1:[]string{"series_one", "series_three", "series_two"}, 2:[]string{"series_five", "series_four"}}
            	            	actual  : map[int32][]string{2:[]string{"series_five", "series_four"}}
            	            	
            	            	Diff:
            	            	--- Expected
            	            	+++ Actual
            	            	@@ -1,7 +1,2 @@
            	            	-(map[int32][]string) (len=2) {
            	            	- (int32) 1: ([]string) (len=3) {
            	            	-  (string) (len=10) "series_one",
            	            	-  (string) (len=12) "series_three",
            	            	-  (string) (len=10) "series_two"
            	            	- },
            	            	+(map[int32][]string) (len=1) {
            	            	  (int32) 2: ([]string) (len=2) {
            	Test:       	TestDistributor_Push_ShouldSupportWriteBothToIngestersAndPartitions/should_shard_series_across_the_number_of_configured_partitions_/_ingesters_when_shuffle_sharding_is_enabled
            	Messages:   	please report this failure in https://github.com/grafana/mimir/issues/9299
        distributor_ingest_storage_test.go:463: Kafka topic test end offsets: kadm.ListedOffsets{"test":map[int32]kadm.ListedOffset{0:kadm.ListedOffset{Topic:"test", Partition:0, Timestamp:-1, Offset:0, LeaderEpoch:0, Err:error(nil)}, 1:kadm.ListedOffset{Topic:"test", Partition:1, Timestamp:-1, Offset:1, LeaderEpoch:0, Err:error(nil)}, 2:kadm.ListedOffset{Topic:"test", Partition:2, Timestamp:-1, Offset:1, LeaderEpoch:0, Err:error(nil)}}}
level=debug msg=Get key=prefixuser/cluster wait_index=0
level=debug msg="Get - not found" key=prefixuser/cluster
level=debug msg=CAS key=prefixuser/cluster modify_index=0 value="\"\\x15P\\n\\x05first\\x10\\xcd\\xd7͋\\xa52 \\xcd\\xd7͋\\xa52\""
level=debug msg=Get key=prefixuser/cluster wait_index=0
level=debug msg=Get key=prefixuser/cluster modify_index=2 value="\"\\x15P\\n\\x05first\\x10\\xcd\\xd7͋\\xa52 \\xcd\\xd7͋\\xa52\""
level=debug msg=CAS key=prefixuser/cluster modify_index=2 value="\"\\x18\\\\\\n\\x06second\\x10ݥ\\u038b\\xa52 ݥ\\u038b\\xa52(\\x01\""
level=debug msg=Get key=prefixuser/cluster wait_index=0
level=debug msg=Get key=prefixuser/cluster modify_index=3 value="\"\\x18\\\\\\n\\x06second\\x10ݥ\\u038b\\xa52 ݥ\\u038b\\xa52(\\x01\""
level=debug msg=CAS key=prefixuser/cluster modify_index=3 value="\"\\x17X\\n\\x05first\\x10\\x85\\xec\\u038b\\xa52 \\xed\\xf3\\u038b\\xa52(\\x02\""
2024/10/03 08:16:13 label __name__ is overwritten. Check if Prometheus reserved labels are used.
2024/10/03 08:16:13 label __name__ is overwritten. Check if Prometheus reserved labels are used.
2024/10/03 08:16:13 label __name__ is overwritten. Check if Prometheus reserved labels are used.
2024/10/03 08:16:13 label __name__ is overwritten. Check if Prometheus reserved labels are used.
2024/10/03 08:16:13 label __name__ is overwritten. Check if Prometheus reserved labels are used.
level=info msg="server listening on addresses" http=127.0.0.1:43797 grpc=127.0.0.1:35619
level=warn method=/httpgrpc.HTTP/Handle duration=494.337µs request="&HTTPRequest{Method:POST,Url:/otlp,Headers:[]*Header{},Body:[104 101 108 108 111],}" msg=gRPC err="rpc error: code = Code(415) desc = unsupported content type: , supported: [application/json, application/x-protobuf]"
level=warn method=/httpgrpc.HTTP/Handle duration=5.354519ms request="&HTTPRequest{Method:POST,Url:/otlp,Headers:[]*Header{&Header{Key:Content-Type,Values:[application/json],},},Body:[105 110 118 97 108 105 100],}" msg=gRPC err="rpc error: code = Code(400) desc = ReadObjectCB: expect { or n, but found i, error found in #1 byte of ...|invalid|..., bigger context ...|invalid|..."
level=warn method=/httpgrpc.HTTP/Handle duration=442.3[69](https://github.com/grafana/mimir/actions/runs/11158307961/job/31014321687?pr=9509#step:8:70)µs request="&HTTPRequest{Method:POST,Url:/otlp,Headers:[]*Header{&Header{Key:Content-Type,Values:[application/json],},},Body:[123 34 114 101 115 111 117 114 99 101 77 101 116 114 105 99 115 34 58 32 91 123 34 115 99 111 112 101 77 101 116 114 105 99 115 34 58 32 91 123 34 109 101 116 114 105 99 115 34 58 32 91 123 34 110 97 109 101 34 58 32 34 114 101 112 111 114 116 95 115 101 114 118 101 114 95 101 114 114 111 114 34 44 32 34 103 97 117 103 101 34 58 32 123 34 100 97 116 97 80 111 105 110 116 115 34 58 32 91 123 34 116 105 109 101 85 110 105 120 [78](https://github.com/grafana/mimir/actions/runs/11158307961/job/31014321687?pr=9509#step:8:79) 97 110 111 34 58 32 34 49 54 55 57 57 49 50 52 54 51 51 52 48 48 48 48 48 48 48 34 44 32 34 97 115 68 111 117 98 108 101 34 58 32 49 48 46 54 54 125 93 125 125 93 125 93 125 93 125],}" msg=gRPC err="rpc error: code = Code(503) desc = some random push error"
level=info msg="=== Handler.Stop()'d ==="
FAIL

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants