Go net 超时处理

文章 509 字

`序`

这篇文章详细介绍了，net/http包中对应HTTP的各个阶段，如何使用timeout来进行读/写超时控制以及服务端和客户端支持设置的timeout类型。本质上，这些timeout都是代码层面对各个函数设置的处理时间。比如，读取客户端读取请求头、读取响应体的时间，本质上都是响应函数的超时时间。

作者强烈不建议，在工作中使用net/http包上层封装的默认方法（没有明确设置timeout），很容易出现系统文件套接字被耗尽等令人悲伤的情况。比如：

// 相信工作中也不会出现这样的代码
func main() {
	http.ListenAndServe("127.0.0.1:3900", nil)
}

正文

在使用Go开发HTTP Server或client的过程中，指定timeout很常见，但也很容易犯错。timeout错误一般还不容易被发现，可能只有当系统出现请求超时、服务挂起时，错误才被严肃暴露出来。

HTTP是一个复杂的多阶段协议，所以也不存在一个timeout值适用于所有场景。想一下StreamingEndpoint、JSON API、 Comet，很多情况下，默认值根本不是我们所需要的值。

这篇博客中，我会对HTTP请求的各个阶段进行拆分，列举可能需要设置的timeout值。然后从客户端和服务端的角度，分析它们设置timeout的不同方式。

`SetDeadline`

首先，你需要知道Go所暴露出来的，用于实现timeout的方法：Deadline。

timeout本身通过 net.Conn包中的Set[Read|Write]Deadline(time.Time)方法来控制。Deadline是一个绝对的时间点，当连接的I/O操作超过这个时间点而没有完成时，便会因为超时失败。

Deadlines不同于timeouts. 对一个连接而言，设置Deadline之后，除非你重新调用SetDeadline，否则这个Deadline不会变化。前面也提了，Deadline是一个绝对的时间点。因此，如果要通过SetDeadline来设置timeout，就不得不在每次执行Read/Write前重新调用它。

你可能并不想直接调用SetDeadline方法，而是选择 net/http提供的更上层的方法。但你要时刻记住：所有timeout操作都是通过设置Deadline实现的。每次调用，它们并不会去重置的deadline。

`Server Timeouts`

关于服务端超时，这篇帖子So you want to expose Go on the Internet也介绍了很多信息，特别是关于HTTP/2和Go 1.7 bugs的部分.

HTTP server phases

对于服务端而言，指定timeout至关重要。否则，一些请求很慢或消失的客户端很可能导致系统文件描述符泄漏，最终服务端报错：

http: Accept error: accept tcp [::]:80: accept4: too many open files; retrying in 5ms

在创建http.Server的时候，可以通过ReadTimeout和WriteTimeout来设置超时。你需要明确的声明它们：

srv := &http.Server{
    ReadTimeout: 5 * time.Second,
    WriteTimeout: 10 * time.Second,
}
log.Println(srv.ListenAndServe())

ReadTimeout指从连接被Accept开始，到request body被完全读取结束（如果读取body的话，否则是读取完header头的时间）。内部是net/http通过在Accept后调用SetReadDeadline实现的。

WriteTimeout一般指从读取完header头之后到写完response的时间（又称ServerHTTP的处理时间），内部通过在 readRequest之后调用SetWriteDeadline实现。

然而，如果是HTTPS的话，SetWriteDeadline方法在Accept后就被调用，所以TLS handshake也是WriteTimeout的一部分。同时，这也意味着（仅仅HTTPS）WriteTimeout包括了读header头以及握手的时间。

为了避免不信任的client端或者网络连接的影响，你应该同时设置这两个值，来保证连接不被client长时间占用。

最后，介绍一下http.TimeoutHandler，它并不是一个Server属性，它被用来Wrap http.Handler ，限制Handler处理请求的时长。它主要依赖缓存的response来工作，当超时发生时，响应503 Service Unavailable的错误。它在1.6存在问题，在1.6.2进行了修复。

`http.ListenAndServe` is doing it wrong

顺带说一句，这也意味：使用一些内部封装http.Server的包函数，比如http.ListenAndServe, http.ListenAndServeTLS以及http.Serve是不正规的，尤其是直接面向外网提供服务的场合。

这种方法默认缺省配置timeout值，也没有提供配置timeout的功能。如果你使用它们，可能就会面临连接泄漏和文件描述符耗尽的风险。我也好几次犯过这样的错误。

相反的，创建一个http.Server应该像文章开头例子中那样，明确设置ReadTimeout和WriteTimeout，并使用相应的方法来使server更完善。

`About streaming`

Very annoyingly, there is no way of accessing the underlying net.Conn from ServeHTTPso a server that intends to stream a response is forced to unset the WriteTimeout (which is also possibly why they are 0 by default). This is because without net.Conn access, there is no way of calling SetWriteDeadline before each Write to implement a proper idle (not absolute) timeout.

Also, there’s no way to cancel a blocked ResponseWriter.Write since ResponseWriter.Close (which you can access via an interface upgrade) is not documented to unblock a concurrent Write. So there’s no way to build a timeout manually with a Timer, either.

Sadly, this means that streaming servers can’t really defend themselves from a slow-reading client.

I submitted an issue with some proposals, and I welcome feedback there.

`Client Timeouts`

HTTP Client phases

client端的timeout可以很简单，也可以很复杂，这完全取决于你如何使用。但对于阻止内存泄漏或长时间连接占用的问题上，相对于Server端来说，它同样特别重要。

下面是使用http.Client指定timeout的最简单例子。timeout覆盖了整个请求的时间：从Dial（如果非连接重用）到读取response body。

c := &http.Client{
    Timeout: 15 * time.Second,
}
resp, err := c.Get("https://blog.filippo.io/")

像上面列举的那些server端方法一样，client端也封装了类似的方法，比如http.Get。他内部用的就是一个没有设置超时时间的Client。

下面提供了很多类型的timeout，可以让你更精细的控制超时：

net.Dialer.Timeout用于限制建立TCP连接的时间，包括域名解析的时间在内（如果需要创建的话）
http.Transport.TLSHandshakeTimeout用于限制TLS握手的时间
http.Transport.ResponseHeaderTimeout用于限制读取响应头的时间（不包括读取response body的时间）
http.Transport.ExpectContinueTimeout用于限制从客户端在发送包含Expect: 100-continue请求头开始，到接收到响应去继续发送post data的间隔时间。注意：在1.6中 HTTP/2 不支持这个设置(DefaultTransport从1.6.2起是一个例外 1.6.2).

c := &http.Client{
    Transport: &http.Transport{
        Dial: (&net.Dialer{
                Timeout:   30 * time.Second,
                KeepAlive: 30 * time.Second,
        }).Dial,
        TLSHandshakeTimeout:   10 * time.Second,
        ResponseHeaderTimeout: 10 * time.Second,
        ExpectContinueTimeout: 1 * time.Second,
    }
}

到目前为止，还没有一种方式来限制发送请求的时间。读响应体的时间可以手动的通过设置time.Timer来实现，因为这个过程是在client方法返回之后发生的（后面介绍如何取消一个请求）。

最后，在1.7的版本中增加了http.Transport.IdleConnTimeout，用于限制连接池中空闲连持的存活时间。它不能用于控制阻塞阶段的客户端请求，

注：客户端默认执行请求重定向（302等）。可以为每个请求指定细粒度的超时时间，其中http.Client.Timeout包括了重定向在内的请求花费的全部时间。而http.Transport是一个底层对象，没有跳转的概念。

`Cancel and Context`

net/http 提供了两种取消客户端请求的方法： Request.Cancel以及在1.7版本中引入的Context.

Request.Cancel是一个可选的channel，如果设置了它，便可以通过关闭该channel来终止请求,就跟请求超时了一样（它们的实现机制是相同的。在写这篇博客的时候，我还发现了一个 1.7的 bug ：所有被取消请求，返回的都是timeout超时错误）。

type Request struct {

    // Cancel is an optional channel whose closure indicates that the client
	// request should be regarded as canceled. Not all implementations of
	// RoundTripper may support Cancel.
	//
	// For server requests, this field is not applicable.
	//
	// Deprecated: Use the Context and WithContext methods
	// instead. If a Request's Cancel field and context are both
	// set, it is undefined whether Cancel is respected.
	Cancel <-chan struct{}
}

我们可结合Request.Cancel和time.Timer对timeout进行更细的控制。比如，在我们每次从response body中读取数据后，延长timeout的时间。

package main

import (
	"io"
	"io/ioutil"
	"log"
	"net/http"
	"time"
)

func main() {
    
    //定义一个timer：5s后取消该请求，即关闭该channel
	c := make(chan struct{})
	timer := time.AfterFunc(5*time.Second, func() {
		close(c)
	})

    // Serve 256 bytes every second.
	req, err := http.NewRequest("GET", "http://httpbin.org/range/2048?duration=8&chunk_size=256", nil)
	if err != nil {
		log.Fatal(err)
	}
	req.Cancel = c

    //执行请求,请求的时间不应该超过5s
	log.Println("Sending request...")
	resp, err := http.DefaultClient.Do(req)
	if err != nil {
		log.Fatal(err)
	}
	defer resp.Body.Close()

	log.Println("Reading body...")
	for {
		timer.Reset(2 * time.Second)
        // Try instead: timer.Reset(50 * time.Millisecond)
		_, err = io.CopyN(ioutil.Discard, resp.Body, 256)
		if err == io.EOF {
			break
		} else if err != nil {
			log.Fatal(err)
		}
	}
}

上述例子中，我们给Do设置了5s的超时，通过后续8个循环来读取response body的内容，这个操作至少花费了8s的时间。每次read的操作均设置了2s的超时。我们可以持续这样读，不需要考虑任何阻塞的风险。如果在2s内没有接受到数据，io.CopyN将会返回net/http: request canceled。

在1.7的版本中context被引入到了标注库，此处是一些介绍。接下来我们用它来替换 Request.Cancel，实现相同的功能。

使用context来取消一个请求，我们需要获取一个Context类型，以及调用context.WithCancel返回的cancel()方法，并通过Request.WithContext将context绑定到一个请求上。当我们想取消这个请求时，只需要调用cancel()方法(代替上述关闭channel的做法)

//ctx是context.TODO()的子节点
ctx, cancel := context.WithCancel(context.TODO())
timer := time.AfterFunc(5*time.Second, func() {
	cancel()
})

req, err := http.NewRequest("GET", "http://httpbin.org/range/2048?duration=8&chunk_size=256", nil)
if err != nil {
	log.Fatal(err)
}
req = req.WithContext(ctx)

Contexts有很多优，比如一个parent（传递给context.WithCancel的对象）被取消，那么命令会沿着传递的路径一直向下传递，直到关闭所有子context。

阅读原文：The complete guide to Go net/http timeouts

Go net 超时处理

序

正文

SetDeadline

Server Timeouts

http.ListenAndServe is doing it wrong

About streaming

Client Timeouts

Cancel and Context