聊聊cortex的Backoff

【聊聊cortex的Backoff】本文主要研究一下cortex的Backoff
Backoff github.com/cortexproject/cortex/pkg/util/backoff.go

// Backoff implements exponential backoff with randomized wait times type Backoff struct { cfgBackoffConfig ctxcontext.Context numRetriesint nextDelayMin time.Duration nextDelayMax time.Duration }// NewBackoff creates a Backoff object. Pass a Context that can also terminate the operation. func NewBackoff(ctx context.Context, cfg BackoffConfig) *Backoff { return &Backoff{ cfg:cfg, ctx:ctx, nextDelayMin: cfg.MinBackoff, nextDelayMax: doubleDuration(cfg.MinBackoff, cfg.MaxBackoff), } }

Backoff定义了cfg、ctx、numRetries、nextDelayMin、nextDelayMax属性;NewBackoff提供了基于BackoffConfig的工厂方法,默认的nextDelayMin为cfg.MinBackoff
BackoffConfig github.com/cortexproject/cortex/pkg/util/backoff.go
// BackoffConfig configures a Backoff type BackoffConfig struct { MinBackoff time.Duration `yaml:"min_period"`// start backoff at this level MaxBackoff time.Duration `yaml:"max_period"`// increase exponentially to this level MaxRetries int`yaml:"max_retries"` // give up after this many; zero means infinite retries }

BackoffConfig定义了MinBackoff、MaxBackoff、MaxRetries属性
Ongoing github.com/cortexproject/cortex/pkg/util/backoff.go
// Reset the Backoff back to its initial condition func (b *Backoff) Reset() { b.numRetries = 0 b.nextDelayMin = b.cfg.MinBackoff b.nextDelayMax = doubleDuration(b.cfg.MinBackoff, b.cfg.MaxBackoff) }// Ongoing returns true if caller should keep going func (b *Backoff) Ongoing() bool { // Stop if Context has errored or max retry count is exceeded return b.ctx.Err() == nil && (b.cfg.MaxRetries == 0 || b.numRetries < b.cfg.MaxRetries) }// Err returns the reason for terminating the backoff, or nil if it didn't terminate func (b *Backoff) Err() error { if b.ctx.Err() != nil { return b.ctx.Err() } if b.cfg.MaxRetries != 0 && b.numRetries >= b.cfg.MaxRetries { return fmt.Errorf("terminated after %d retries", b.numRetries) } return nil }// NumRetries returns the number of retries so far func (b *Backoff) NumRetries() int { return b.numRetries }// Wait sleeps for the backoff time then increases the retry count and backoff time // Returns immediately if Context is terminated func (b *Backoff) Wait() { // Increase the number of retries and get the next delay sleepTime := b.NextDelay()if b.Ongoing() { select { case <-b.ctx.Done(): case <-time.After(sleepTime): } } }func (b *Backoff) NextDelay() time.Duration { b.numRetries++// Handle the edge case the min and max have the same value // (or due to some misconfig max is < min) if b.nextDelayMin >= b.nextDelayMax { return b.nextDelayMin }// Add a jitter within the next exponential backoff range sleepTime := b.nextDelayMin + time.Duration(rand.Int63n(int64(b.nextDelayMax-b.nextDelayMin)))// Apply the exponential backoff to calculate the next jitter // range, unless we've already reached the max if b.nextDelayMax < b.cfg.MaxBackoff { b.nextDelayMin = doubleDuration(b.nextDelayMin, b.cfg.MaxBackoff) b.nextDelayMax = doubleDuration(b.nextDelayMax, b.cfg.MaxBackoff) }return sleepTime }func doubleDuration(value time.Duration, max time.Duration) time.Duration { value = https://www.it610.com/article/value * 2if value <= max { return value }return max }

Backoff主要提供了Ongoing及Wait方法;Ongoing返回bool用于表示是否可以继续,如果err为nil且b.cfg.MaxRetries或者b.numRetries < b.cfg.MaxRetries返回true;Wait方法会等待执行完成或者是b.NextDelay()时间到;NextDelay方法会递增numRetries然后计算sleepTime;Err方法返回ctx的Err或者是重试次数超限的错误
实例
// NewBackoffRetry gRPC middleware. func NewBackoffRetry(cfg util.BackoffConfig) grpc.UnaryClientInterceptor { return func(ctx context.Context, method string, req, reply interface{}, cc *grpc.ClientConn, invoker grpc.UnaryInvoker, opts ...grpc.CallOption) error { backoff := util.NewBackoff(ctx, cfg) for backoff.Ongoing() { err := invoker(ctx, method, req, reply, cc, opts...) if err == nil { return nil }if status.Code(err) != codes.ResourceExhausted { return err }backoff.Wait() } return backoff.Err() } }

NewBackoffRetry展示了如何使用backoff,通过for循环,条件为backoff.Ongoing(),中间执行要重试的操作,最后执行backoff.Wait(),如果没有提前返回最后返回backoff.Err()
小结 cortex提供了Backoff,可以基于MinBackoff、MaxBackoff、MaxRetries来进行重试。
doc
  • cortex

    推荐阅读