套接字编程 HOWTO¶
作者: | Gordon McMillan |
---|
摘要
套接字几乎无处不在,但是它却是被误解最严重的技术之一。这是一篇简单的套接字概述。并不是一篇真正的教程 - 你需要做更多的事情才能让它工作起来。其中也并没有涵盖细节(细节会有很多),但是我希望它能提供足够的背景知识,让你像模像样的开始使用套接字
套接字¶
我将只讨论关于 INET(比如:IPv4 地址族)的套接字,但是它将覆盖几乎 99% 的套接字使用场景。并且我将仅讨论 STREAM(比如:TCP)类型的套接字 - 除非你真的知道你在做什么(那么这篇 HOWTO 可能并不适合你),使用 STREAM 类型的套接字将会得到比其它类型更好的表现与性能。我将尝试揭开套接字的神秘面纱,也会讲到一些阻塞与非阻塞套接字的使用。但是我将以阻塞套接字为起点开始讨论。只有你了解它是如何工作的以后才能处理非阻塞套接字。
理解这些东西的难点之一在于「套接字」可以表示很多微妙差异的东西,这取决于上下文。所以首先,让我们先分清楚「客户端」套接字和「服务端」套接字之间的不同,客户端套接字表示对话的一端,服务端套接字更像是总机接线员。客户端程序只能(比如:你的浏览器)使用「客户端」套接字;网络服务器则可以使用「服务端」套接字和「客户端」套接字来会话
历史¶
目前为止,在各种形式的 IPC 中,套接字是最流行的。在任何指定的平台上,可能会有其它更快的 IPC 形式,但是就跨平台通信来说,套接字大概是唯一的玩法
套接字做为 BSD Unix 操作系统的一部分在伯克利诞生,像野火一样在因特网传播。有一个很好的原因 —— 套接字与 INET 的结合使得与世界各地的任意机器间通信变得令人难以置信的简单(至少对比与其他方案来说)
创建套接字¶
简略地说,当你点击带你来到这个页面的链接时,你的浏览器就已经做了下面这几件事情:
# create an INET, STREAMing socket
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
# now connect to the web server on port 80 - the normal http port
s.connect(("www.python.org", 80))
当连接完成,套接字可以用来发送请求来接收页面上显示的文字。同样是这个套接字也会用来读取响应,最后再被销毁。是的,被销毁了。客户端套接字通常用来做一次交换(或者说一小组序列的交换)。
网络服务器发生了什么这个问题就有点复杂了。首页,服务器创建一个「服务端套接字」:
# create an INET, STREAMing socket
serversocket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
# bind the socket to a public host, and a well-known port
serversocket.bind((socket.gethostname(), 80))
# become a server socket
serversocket.listen(5)
有几件事需要注意:我们使用了 socket.gethostname()
,所以套接字将外网可见。如果我们使用的是 s.bind(('localhost', 80))
或者 s.bind(('127.0.0.1', 80))
,也会得到一个「服务端」套接字,但是后者只在同一机器上可见。s.bind(('', 80))
则指定套接字可以被机器上的任何地址碰巧连接
第二个需要注点是:低端口号通常被一些「常用的」服务(HTTP, SNMP 等)所保留。如果你想把程序跑起来,最好使用一个高位端口号(通常是4位的数字)。
最后,listen
方法的参数会告诉套接字库,我们希望在拒绝外部请求连接前最多使用 5 个连接请求的队列。如果所有的代码都要正确的写出来,代码量将会很大。
现在我们已经有一个「服务端」套接字,监听了 80 端口,我们可以进入网络服务器的主循环了:
while True:
# accept connections from outside
(clientsocket, address) = serversocket.accept()
# now do something with the clientsocket
# in this case, we'll pretend this is a threaded server
ct = client_thread(clientsocket)
ct.run()
事际上,通常有 3 种方法可以让这个循环工作起来 - 调度一个线程来处理 客户端套接字
,或者把这个应用改成使用非阻塞模式套接字,亦或是使用 select
库来实现「服务端」套接字与任意活动 客户端套接字
之间的多路复用。稍后会详细介绍。现在最重要的是理解:这就是一个 服务端 套接字做的 所有 事情。它并没有发送任何数据。也没有接收任何数据。它只创建「客户端」套接字。每个 客户端套接字
都是为了响应某些其它客户端套接字 connect()
到我们绑定的主机。一旦创建 客户端套接字
完成,就会返回并监听更多的连接请求。现个客户端可以随意通信 - 它们使用了一些动态分配的端口,会话结束时端口才会被回收
进程间通信¶
如果你需要在同一台机器上进行两个进程间的快速 IPC 通信,你应该了解管道或者共享内存。如果你决定使用 AF_INET 类型的套接字,绑定「服务端」套接字到 'localhost'
。在大多数平台,这将会使用一个许多网络层间的通用快捷方式(本地回环地址)并且速度会快很多
参见
multiprocessing
模块使跨平台 IPC 通信成为一个高层的 API
使用一个套接字¶
首先需要注意,浏览器的「客户端」套接字和网络服务器的「客户端」套接字是极为相似的。即这种会话是「点对点」的。或者也可以说 你作为设计师需要自行决定会话的规则和礼节 。通常情况下,连接
套接字通过发送一个请求或者信号来开始一次会话。但这属于设计决定,并不是套接字规则。
现在有两组用于通信的动词。你可以使用 send
和 recv
,或者你可以把客户端套接字改成文件类型的怪东西来使用 read
和 write
方法。后者是 Java 语言中表示套接字的方法,在这儿我将不会讨论这个,但是要提醒你需要调用套接字的 flush
方法,这些是缓冲区的文件,一个经常出现的错误是 write
一些东西,然后不调用 flush
就开始 read
一个响应,你可能会为了这个响应一直等待,因为请求可能还在你的输出缓冲中。
现在我来到了套接字的两个主要的绊脚石 - send
和 recv
操作网络缓冲区。它们并不一定可以处理所有你想要(期望)的字节,因为它们主要关注点是处理网络缓冲。通常,它们在关联的网络缓冲区 send
或者清空 recv
时返回。然后告诉你处理了多少个字节。你 的责任是一直调用它们直到你所有的消息处理完成。
当 recv
方法返回 0 字节时,就表示另一端已经关闭(或者它所在的进程关闭)了连接。你再也不能从这个连接上获取到任何数据了。你可以成功的发送数据;我将在后面讨论这一点。
像 HTTP 这样的协议只使用一个套接字进行一次传输。客户端发送一个请求,然后读取响应。就这么简单。套接字会被销毁。 表示客户端可以通过接收 0 字节序列表示检测到响应的结束。
但是如果你打算在随后来的传输中复用套接字的话,你需要明白 套接字里面是不存在 :abbr:`EOT (传输结束)` 的。重复一下:套接字 send
或者 recv
完 0 字节后返回,连接会中断。如果连接没有被断开,你可能会永远处于等待 recv
的状态,因为(就目前来说)套接字 不会 告诉你不用再读取了。现在如果你细心一点,你可能会意识到套接字基本事实:消息必须要么具有固定长度,要么可以界定,要么指定了长度(比较好的做法),要么以关闭连接为结束。选择完全由你而定(这比让别人定更合理)。
假定你不希望结束连接,那么最简单的解决方案就是使用定长消息:
class MySocket:
"""demonstration class only
- coded for clarity, not efficiency
"""
def __init__(self, sock=None):
if sock is None:
self.sock = socket.socket(
socket.AF_INET, socket.SOCK_STREAM)
else:
self.sock = sock
def connect(self, host, port):
self.sock.connect((host, port))
def mysend(self, msg):
totalsent = 0
while totalsent < MSGLEN:
sent = self.sock.send(msg[totalsent:])
if sent == 0:
raise RuntimeError("socket connection broken")
totalsent = totalsent + sent
def myreceive(self):
chunks = []
bytes_recd = 0
while bytes_recd < MSGLEN:
chunk = self.sock.recv(min(MSGLEN - bytes_recd, 2048))
if chunk == b'':
raise RuntimeError("socket connection broken")
chunks.append(chunk)
bytes_recd = bytes_recd + len(chunk)
return b''.join(chunks)
发送分部代码几乎可用于任何消息传递方案 - 在 Python 中你发送字符串,可以使用 len()
方法来确定它的长度(即使它嵌入了 \ 0
字符)。主要是接收代码变得更复杂。(在 C 语言中,并没有太糟糕,除非消息嵌入了 \ 0
字符而且你又无法使用 strlen
)
最简单的改进是让消息的第一个字符表示消息类型,由类型决定长度。现在你需要两次 recv
- 第一次取首字符来知晓长度,第二次在循环中获取剩余所有的消息。如果你决定到分界线,你将收到一些任意大小的块,(4096 或者 8192 通常是比较合适的网络缓冲区大小),扫描你接收到的分界符
一个需要意识到的复杂情况是:如果你的会话协议允许多个消息被发送回来(没有响应),调用 recv
传入任意大小的块,你可能会因为读到后续接收的消息而停止读取。你需要将它放在一边并保存,直到它需要为止。
用消息的长度做为消息的前缀(比如说,5个数字字符)会更复杂,因为,你可能在一次 recv
中没法取完这 5 个字符,为了能把程序跑起来,你将设法避免这种情况;但是在高负载的网络中,除非你使用两个``recv`` 循环 - 第一个用来确定消息长度,第二个用来获取消息体的数据,否则你的代码会很快中断。真讨厌,这也是当你发现 send
方法并不总是在一个地方处理所有的东西的感觉。尽管你读过这篇文章,但最终还是会有所了解
限于篇幅,建立你的角色,(保持与我的竞争位置),这些改进将留给读者做为练习。让我们继续。
二进制数据¶
通过套接字传送二进制数据是可能的。主要问题在于并非所有机器都用同样的二进制数据格式。比如 Motorola 芯片用两个十六进制字节 00 01 来表示一个 16 位整数值 1。而 Intel 和 DEC 则会做字节反转 —— 即用 01 00 来表示 1。套接字库要求转换 16 位和 32 位整数 —— ntohl, htonl, ntohs, htons
其中的「n」表示 network,「h」表示 host,「s」表示 short,「l」表示 long。在网络序列就是主机序列时它们什么都不做,但是如果机器是字节反转的则会适当地交换字节序。
在现今的 32 位机器中,二进制数据的 ascii 表示往往比二进制表示要小。这是因为在非常多的时候所有 long 的值均为 0 或者 1。字符串形式的 "0" 为两个字节,而二进制形式则为四个。当然这不适用于固定长度的信息。自行决定,请自行决定。
Disconnecting¶
Strictly speaking, you're supposed to use shutdown
on a socket before you
close
it. The shutdown
is an advisory to the socket at the other end.
Depending on the argument you pass it, it can mean "I'm not going to send
anymore, but I'll still listen", or "I'm not listening, good riddance!". Most
socket libraries, however, are so used to programmers neglecting to use this
piece of etiquette that normally a close
is the same as shutdown();
close()
. So in most situations, an explicit shutdown
is not needed.
One way to use shutdown
effectively is in an HTTP-like exchange. The client
sends a request and then does a shutdown(1)
. This tells the server "This
client is done sending, but can still receive." The server can detect "EOF" by
a receive of 0 bytes. It can assume it has the complete request. The server
sends a reply. If the send
completes successfully then, indeed, the client
was still receiving.
Python takes the automatic shutdown a step further, and says that when a socket
is garbage collected, it will automatically do a close
if it's needed. But
relying on this is a very bad habit. If your socket just disappears without
doing a close
, the socket at the other end may hang indefinitely, thinking
you're just being slow. Please close
your sockets when you're done.
When Sockets Die¶
Probably the worst thing about using blocking sockets is what happens when the
other side comes down hard (without doing a close
). Your socket is likely to
hang. TCP is a reliable protocol, and it will wait a long, long time
before giving up on a connection. If you're using threads, the entire thread is
essentially dead. There's not much you can do about it. As long as you aren't
doing something dumb, like holding a lock while doing a blocking read, the
thread isn't really consuming much in the way of resources. Do not try to kill
the thread - part of the reason that threads are more efficient than processes
is that they avoid the overhead associated with the automatic recycling of
resources. In other words, if you do manage to kill the thread, your whole
process is likely to be screwed up.
Non-blocking Sockets¶
If you've understood the preceding, you already know most of what you need to know about the mechanics of using sockets. You'll still use the same calls, in much the same ways. It's just that, if you do it right, your app will be almost inside-out.
In Python, you use socket.setblocking(0)
to make it non-blocking. In C, it's
more complex, (for one thing, you'll need to choose between the BSD flavor
O_NONBLOCK
and the almost indistinguishable Posix flavor O_NDELAY
, which
is completely different from TCP_NODELAY
), but it's the exact same idea. You
do this after creating the socket, but before using it. (Actually, if you're
nuts, you can switch back and forth.)
The major mechanical difference is that send
, recv
, connect
and
accept
can return without having done anything. You have (of course) a
number of choices. You can check return code and error codes and generally drive
yourself crazy. If you don't believe me, try it sometime. Your app will grow
large, buggy and suck CPU. So let's skip the brain-dead solutions and do it
right.
Use select
.
In C, coding select
is fairly complex. In Python, it's a piece of cake, but
it's close enough to the C version that if you understand select
in Python,
you'll have little trouble with it in C:
ready_to_read, ready_to_write, in_error = \
select.select(
potential_readers,
potential_writers,
potential_errs,
timeout)
You pass select
three lists: the first contains all sockets that you might
want to try reading; the second all the sockets you might want to try writing
to, and the last (normally left empty) those that you want to check for errors.
You should note that a socket can go into more than one list. The select
call is blocking, but you can give it a timeout. This is generally a sensible
thing to do - give it a nice long timeout (say a minute) unless you have good
reason to do otherwise.
In return, you will get three lists. They contain the sockets that are actually readable, writable and in error. Each of these lists is a subset (possibly empty) of the corresponding list you passed in.
If a socket is in the output readable list, you can be
as-close-to-certain-as-we-ever-get-in-this-business that a recv
on that
socket will return something. Same idea for the writable list. You'll be able
to send something. Maybe not all you want to, but something is better than
nothing. (Actually, any reasonably healthy socket will return as writable - it
just means outbound network buffer space is available.)
If you have a "server" socket, put it in the potential_readers list. If it comes
out in the readable list, your accept
will (almost certainly) work. If you
have created a new socket to connect
to someone else, put it in the
potential_writers list. If it shows up in the writable list, you have a decent
chance that it has connected.
Actually, select
can be handy even with blocking sockets. It's one way of
determining whether you will block - the socket returns as readable when there's
something in the buffers. However, this still doesn't help with the problem of
determining whether the other end is done, or just busy with something else.
Portability alert: On Unix, select
works both with the sockets and
files. Don't try this on Windows. On Windows, select
works with sockets
only. Also note that in C, many of the more advanced socket options are done
differently on Windows. In fact, on Windows I usually use threads (which work
very, very well) with my sockets.