Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/http: when reusing http.DefaultTransport and doing queries in parallel, sometimes all queries are starting to time out #69570

Closed
freak12techno opened this issue Sep 21, 2024 · 3 comments

Comments

@freak12techno
Copy link

Go version

go version go1.22.7 linux/amd64

Output of go env in your module/workspace:

GO111MODULE=''
GOARCH='amd64'
GOBIN='/home/monitoring/go/bin'
GOCACHE='/home/monitoring/.cache/go-build'
GOENV='/home/monitoring/.config/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='amd64'
GOHOSTOS='linux'
GOINSECURE=''
GOMODCACHE='/home/monitoring/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='linux'
GOPATH='/home/monitoring/go'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/usr/lib/go'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/usr/lib/go/pkg/tool/linux_amd64'
GOVCS=''
GOVERSION='go1.22.7'
GCCGO='gccgo'
GOAMD64='v1'
AR='ar'
CC='gcc'
CXX='g++'
CGO_ENABLED='1'
GOMOD='/home/monitoring/go-test/go.mod'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -m64 -pthread -Wl,--no-gc-sections -fmessage-length=0 -ffile-prefix-map=/tmp/go-build4029050489=/tmp/go-build -gno-record-gcc-switches'

What did you do?

I have a bunch of tools written in golang that use net/http to do queries in parallel. Recently something happened, not sure yet what exactly, but all of the queries started to time out.

All of my tools were using http.DefaultTransport, either explicitly or implicitly.

After investigating, it seems like replacing http.DefaultTransport with its .Clone(), all the problems went away. So I assume there's something mutating http.DefaultTransport or in some other way making it unusable.

This also seem to happen only on one of my servers, and I was not able to reproduce it on my local laptop, not sure what is the difference between the two.

I used this script to reproduce the behaviour and I was successfully able to reproduce it on my server:

package main

import (
	"net/http"
	"fmt"
	"io/ioutil"
	"sync"
)

func main() {
	urls := []string{
		"https://rest.provider-sentry-01.rs-testnet.polypore.xyz/cosmos/slashing/v1beta1/signing_infos/cosmosvalcons1d8h57qw3d2upngacprs4v2gdztq2k3q3ztrqkm",
		"https://rest.provider-sentry-01.rs-testnet.polypore.xyz/interchain_security/ccv/provider/consumer_validators/pion-1",
		"https://rest.provider-sentry-01.rs-testnet.polypore.xyz/cosmos/staking/v1beta1/params",
		"https://rest.provider-sentry-01.rs-testnet.polypore.xyz/interchain_security/ccv/provider/consumer_chains_per_validator/cosmosvalcons1d8h57qw3d2upngacprs4v2gdztq2k3q3ztrqkm",
		"https://rest.provider-sentry-01.rs-testnet.polypore.xyz/interchain_security/ccv/provider/validator_consumer_addr?chain_id=pion-1&provider_address=cosmosvalcons1d8h57qw3d2upngacprs4v2gdztq2k3q3ztrqkm",
		"https://rest.provider-sentry-01.rs-testnet.polypore.xyz/cosmos/staking/v1beta1/validators/cosmosvaloper1zyqledl5wr5rr8l4sx7lkuf9x4qrudmmm72xxp/delegations/cosmos1zyqledl5wr5rr8l4sx7lkuf9x4qrudmm727n2j",
		"https://rest.provider-sentry-01.rs-testnet.polypore.xyz/cosmos/distribution/v1beta1/validators/cosmosvaloper1zyqledl5wr5rr8l4sx7lkuf9x4qrudmmm72xxp/commission",
		"https://rest.provider-sentry-01.rs-testnet.polypore.xyz/cosmos/bank/v1beta1/supply?pagination.limit=10000&pagination.offset=0",
		"https://rest.provider-sentry-01.rs-testnet.polypore.xyz/cosmos/staking/v1beta1/validators?pagination.count_total=true&pagination.limit=1000",
		"https://rest.provider-sentry-01.rs-testnet.polypore.xyz/cosmos/distribution/v1beta1/delegators/cosmos1zyqledl5wr5rr8l4sx7lkuf9x4qrudmm727n2j/rewards/cosmosvaloper1zyqledl5wr5rr8l4sx7lkuf9x4qrudmmm72xxp",
		"https://rest.provider-sentry-01.rs-testnet.polypore.xyz/interchain_security/ccv/provider/consumer_chains",
		"https://rest.provider-sentry-01.rs-testnet.polypore.xyz/cosmos/staking/v1beta1/validators/cosmosvaloper1zyqledl5wr5rr8l4sx7lkuf9x4qrudmmm72xxp/delegations?pagination.count_total=true&pagination.limit=1",
		"https://rest.provider-sentry-01.rs-testnet.polypore.xyz/cosmos/slashing/v1beta1/params",
		"https://rest.provider-sentry-01.rs-testnet.polypore.xyz/interchain_security/ccv/provider/consumer_commission_rate/pion-1/cosmosvalcons1d8h57qw3d2upngacprs4v2gdztq2k3q3ztrqkm",
		"https://rest.provider-sentry-01.rs-testnet.polypore.xyz/cosmos/mint/v1beta1/inflation",
		"https://rest.provider-sentry-01.rs-testnet.polypore.xyz/cosmos/staking/v1beta1/validators/cosmosvaloper1zyqledl5wr5rr8l4sx7lkuf9x4qrudmmm72xxp/unbonding_delegations?pagination.count_total=true&pagination.limit=1",
		"https://rest.provider-sentry-01.rs-testnet.polypore.xyz/cosmos/base/tendermint/v1beta1/node_info",
		"https://rest.provider-sentry-01.rs-testnet.polypore.xyz/cosmos/bank/v1beta1/balances/cosmos1zyqledl5wr5rr8l4sx7lkuf9x4qrudmm727n2j",
	}

	var wg sync.WaitGroup

	for {
		for _, url := range urls {
			wg.Add(1)
			go func (url string) {
				defer wg.Done()

				client := &http.Client{
					Timeout:   10 * 1000000000,
					Transport: http.DefaultTransport,
				}
			
				req, err := http.NewRequest(http.MethodGet, url, nil)
				if err != nil {
					fmt.Printf("query failed: %s\n", err)
					return
				}
			
				fmt.Printf("doing query\n")
				res, err := client.Do(req)
				if err != nil {
					fmt.Printf("query failed: %s\n", err)
					return
				}
				body, _ := ioutil.ReadAll(res.Body)
				res.Body.Close()
		
				fmt.Printf("query done: %s\n", body)
			}(url)
		}

		wg.Wait()
	}
	
}

What did you see happen?

After running it for some time, I started seeing this repeatedly (later it starts working correctly, then stops, and all over again):

doing query
doing query
doing query
doing query
doing query
doing query
doing query
doing query
doing query
doing query
doing query
doing query
doing query
doing query
doing query
doing query
doing query
doing query
query failed: Get "https://rest.provider-sentry-01.rs-testnet.polypore.xyz/cosmos/staking/v1beta1/validators?pagination.count_total=true&pagination.limit=1000": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
query failed: Get "https://rest.provider-sentry-01.rs-testnet.polypore.xyz/cosmos/staking/v1beta1/validators/cosmosvaloper1zyqledl5wr5rr8l4sx7lkuf9x4qrudmmm72xxp/delegations/cosmos1zyqledl5wr5rr8l4sx7lkuf9x4qrudmm727n2j": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
query failed: Get "https://rest.provider-sentry-01.rs-testnet.polypore.xyz/interchain_security/ccv/provider/consumer_chains_per_validator/cosmosvalcons1d8h57qw3d2upngacprs4v2gdztq2k3q3ztrqkm": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
query failed: Get "https://rest.provider-sentry-01.rs-testnet.polypore.xyz/interchain_security/ccv/provider/consumer_validators/pion-1": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
query failed: Get "https://rest.provider-sentry-01.rs-testnet.polypore.xyz/cosmos/staking/v1beta1/validators/cosmosvaloper1zyqledl5wr5rr8l4sx7lkuf9x4qrudmmm72xxp/unbonding_delegations?pagination.count_total=true&pagination.limit=1": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
query failed: Get "https://rest.provider-sentry-01.rs-testnet.polypore.xyz/cosmos/staking/v1beta1/params": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
query failed: Get "https://rest.provider-sentry-01.rs-testnet.polypore.xyz/cosmos/mint/v1beta1/inflation": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
query failed: Get "https://rest.provider-sentry-01.rs-testnet.polypore.xyz/cosmos/bank/v1beta1/balances/cosmos1zyqledl5wr5rr8l4sx7lkuf9x4qrudmm727n2j": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
query failed: Get "https://rest.provider-sentry-01.rs-testnet.polypore.xyz/cosmos/bank/v1beta1/supply?pagination.limit=10000&pagination.offset=0": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
query failed: Get "https://rest.provider-sentry-01.rs-testnet.polypore.xyz/cosmos/distribution/v1beta1/validators/cosmosvaloper1zyqledl5wr5rr8l4sx7lkuf9x4qrudmmm72xxp/commission": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
query failed: Get "https://rest.provider-sentry-01.rs-testnet.polypore.xyz/interchain_security/ccv/provider/validator_consumer_addr?chain_id=pion-1&provider_address=cosmosvalcons1d8h57qw3d2upngacprs4v2gdztq2k3q3ztrqkm": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
query failed: Get "https://rest.provider-sentry-01.rs-testnet.polypore.xyz/cosmos/base/tendermint/v1beta1/node_info": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
query failed: Get "https://rest.provider-sentry-01.rs-testnet.polypore.xyz/cosmos/slashing/v1beta1/signing_infos/cosmosvalcons1d8h57qw3d2upngacprs4v2gdztq2k3q3ztrqkm": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
query failed: Get "https://rest.provider-sentry-01.rs-testnet.polypore.xyz/interchain_security/ccv/provider/consumer_commission_rate/pion-1/cosmosvalcons1d8h57qw3d2upngacprs4v2gdztq2k3q3ztrqkm": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
query failed: Get "https://rest.provider-sentry-01.rs-testnet.polypore.xyz/cosmos/staking/v1beta1/validators/cosmosvaloper1zyqledl5wr5rr8l4sx7lkuf9x4qrudmmm72xxp/delegations?pagination.count_total=true&pagination.limit=1": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
query failed: Get "https://rest.provider-sentry-01.rs-testnet.polypore.xyz/cosmos/slashing/v1beta1/params": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
query failed: Get "https://rest.provider-sentry-01.rs-testnet.polypore.xyz/interchain_security/ccv/provider/consumer_chains": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
query failed: Get "https://rest.provider-sentry-01.rs-testnet.polypore.xyz/cosmos/distribution/v1beta1/delegators/cosmos1zyqledl5wr5rr8l4sx7lkuf9x4qrudmm727n2j/rewards/cosmosvaloper1zyqledl5wr5rr8l4sx7lkuf9x4qrudmmm72xxp": context deadline exceeded (Client.Timeout exceeded while awaiting headers)

This seems to happen on all (or almost all) hosts that are queried by a client that is utilising this http.DefaultTransport.

What did you expect to see?

All the queries should work and not time out.

@seankhliao
Copy link
Member

this looks more like your remote end not responding or other network issues on your end.

@seankhliao seankhliao closed this as not planned Won't fix, can't repro, duplicate, stale Sep 21, 2024
@freak12techno
Copy link
Author

@seankhliao I thought so as well, but it fails regardless of which host I use. Also I cannot reproduce it with any means by using curl or scripts in other programming languages. Finally, replacing http.DefaultTransport with its .Clone() is fixing it permanently, so I think there's some Golang-related bug here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants