【网络编程】TCP连接connect几次syn之后一直返回EINVAL问题
- 创业
- 2025-08-25 05:51:01

最近遇到一个网络问题,一个客户端线程在connect的时候,发几次syn之后不发了,每次connect都返回EINVAL。
用strace追踪了,connect的第一次参数socketfd并未变动,而且地址和端口号也是正确的,第三个参数len更是用sizeof获得的肯定不会有问题。
还好问题比较好复现。
逐步加打印是在__inet_stream_connect函数中返回的EINVAL elixir.bootlin /linux/v5.15.178/source/net/ipv4/af_inet.c#L649
switch (sock->state) { default: err = -EINVAL; /* 后面connect系统调用一直返回-22,而不触发syn报文发送 */ goto out; case SS_CONNECTED: err = -EISCONN; goto out; case SS_CONNECTING: if (inet_sk(sk)->defer_connect) err = is_sendmsg ? -EINPROGRESS : -EISCONN; else err = -EALREADY; /* Fall out of switch with err, set for this state */ break; case SS_UNCONNECTED: err = -EISCONN; if (sk->sk_state != TCP_CLOSE) goto out; if (BPF_CGROUP_PRE_CONNECT_ENABLED(sk)) { err = sk->sk_prot->pre_connect(sk, uaddr, addr_len); if (err) goto out; } ... ... err = sk->sk_prot->connect(sk, uaddr, addr_len); if (err < 0) goto out; sock->state = SS_CONNECTING; /* Connection was closed by RST, timeout, ICMP error * or another process disconnected us. */ if (sk->sk_state == TCP_CLOSE) goto sock_error; /* sk->sk_err may be not zero now, if RECVERR was ordered by user * and error was received after socket entered established state. * Hence, it is handled normally after connect() return successfully. */ sock->state = SS_CONNECTED; err = 0; out: return err; sock_error: err = sock_error(sk) ? : -ECONNABORTED; sock->state = SS_UNCONNECTED; if (sk->sk_prot->disconnect(sk, flags)) sock->state = SS_DISCONNECTING; /* 注意这里是关键,最后一次syn之后超时,disconnect返回失败就把sock状态设置成disconnecting */ goto out; }继续加打印为什么sk->sk_prot->disconnect会返回失败?返回值是EBUSY 就是这里: elixir.bootlin /linux/v5.15.178/source/net/ipv4/tcp.c#L2989
int tcp_disconnect(struct sock *sk, int flags) { ... ... /* Deny disconnect if other threads are blocked in sk_wait_event() * or inet_wait_for_connect(). */ if (sk->sk_wait_pending) return -EBUSY; /* 这里返回出错 */那就是sk_wait_pending值不为0,那看sk_wait_pending修改的位置 elixir.bootlin /linux/v5.15.178/source/include/net/sock.h#L1128
#define sk_wait_event(__sk, __timeo, __condition, __wait) \ ({ int __rc; \ __sk->sk_wait_pending++; \ release_sock(__sk); \ __rc = __condition; \ if (!__rc) { \ *(__timeo) = wait_woken(__wait, \ TASK_INTERRUPTIBLE, \ *(__timeo)); \ } \ sched_annotate_sleep(); \ lock_sock(__sk); \ __sk->sk_wait_pending--; \ __rc = __condition; \ __rc; \ })而sk_wait_event是在 elixir.bootlin /linux/v5.15.178/source/net/core/stream.c#L75
/** * sk_stream_wait_connect - Wait for a socket to get into the connected state * @sk: sock to wait on * @timeo_p: for how long to wait * * Must be called with the socket locked. */ int sk_stream_wait_connect(struct sock *sk, long *timeo_p) { DEFINE_WAIT_FUNC(wait, woken_wake_function); struct task_struct *tsk = current; int done; do { int err = sock_error(sk); if (err) return err; if ((1 << sk->sk_state) & ~(TCPF_SYN_SENT | TCPF_SYN_RECV)) return -EPIPE; if (!*timeo_p) return -EAGAIN; if (signal_pending(tsk)) return sock_intr_errno(*timeo_p); add_wait_queue(sk_sleep(sk), &wait); sk->sk_write_pending++; done = sk_wait_event(sk, timeo_p, !READ_ONCE(sk->sk_err) && !((1 << READ_ONCE(sk->sk_state)) & ~(TCPF_ESTABLISHED | TCPF_CLOSE_WAIT)), &wait); remove_wait_queue(sk_sleep(sk), &wait); sk->sk_write_pending--; } while (!done); return 0; } EXPORT_SYMBOL(sk_stream_wait_connect);sk_stream_wait_connect这个是在tcp send的时候调用的。 加打印可以看到connect线程和send线程在同时操作这个socketfd,根本原因是connect线程连接发送几个syn包后连接失败返回超时,内核会执行disconnect,而此时正好send线程走到wait for connect中,导致disconnect失败返回EBUSY,进而把sock状态设置成了disconnecting,后面每次connect系统调用就会直接返回EINVAL,不会触发syn报文的发送。
解决办法就是在send参数的flags中传递MSG_DONTWAIT,使得send线程不会去走wait for connect,如果未connect直接返回错误。这时connect线程每次调用都会触发syn报文。
【网络编程】TCP连接connect几次syn之后一直返回EINVAL问题由讯客互联创业栏目发布,感谢您对讯客互联的认可,以及对我们原创作品以及文章的青睐,非常欢迎各位朋友分享到个人网站或者朋友圈,但转载请说明文章出处“【网络编程】TCP连接connect几次syn之后一直返回EINVAL问题”
上一篇
3dmax噪波制作镜头震动动画
下一篇
开放表格式和对象存储架构指南