RFC 862 (echo) in PHP, Part 2: UDP mysteries

In my previous post, I've implemented an echo server functioning with the TCP protocol. However the RFC indicates that echo should work with both TCP and UDP [1].

When I implemented the TCP version I consider two different sets of network functions in PHP. the stream_socket_* set and the socket_* set.
I thought the former is higher-level abstraction of the latter set.
It's not the full story it turns out.
The doc [2] for socket_create() mentions this:

Type Description
SOCK_STREAM Provides sequenced, reliable, full-duplex, connection-based byte streams. An out-of-band data transmission mechanism may be supported. The TCP protocol is based on this socket type.
SOCK_DGRAM Supports datagrams (connectionless, unreliable messages of a fixed maximum length). The UDP protocol is based on this socket type.

So it correct assumption within the scope of TCP connection.
But UDP is connection-less and datagram-based (that's the D in UDP), and thestream_socket_* functions don't seem the best fit anymore. Below is a code that does the job of implementing an echo server over UDP, using the socket_* PHP functions:

<?php
$server ="0.0.0.0";
$port = 7;
$socket = socket_create(AF_INET, SOCK_DGRAM, SOL_UDP);
socket_bind($socket, $server, $port);

while (true) {

    $receivedLength = socket_recvfrom($socket, $data,4096,0,$server,$port);
    if (false === $receivedLength ) {
        continue;
    }

    $outputlength = socket_sendto($socket, $data, $receivedLength, 0, $server, $port);
    echo "output length: $outputlength" . PHP_EOL;
}
socket_close($socket);

Running the server:

$ php udp_echo.php

Interacting with using netcat:

$ nc -uv 0.0.0.0 7
Connection to 0.0.0.0 port 7 [udp/echo] succeeded!
XXXXhello monde
hello monde
^C⏎

Something worth observing here is that netcat won't return.
The reason is that the UDP protocol, from the receiver point of view is to accept any packet that goes its way, in whatever order, whatever origin.
There's is no sense of "End of file" with UDP itself.
Typically, one would built a client that agree some kind of "end of transmission" character with the server so the client knows when it got everything and can close the connection. It's like building a protocol on top of UDP, which is what UDP is all about. [3]

With netcat, you can add a timeout to close the connection

$ nc -uv -w 5 0.0.0.0 7
Connection to 0.0.0.0 port 7 [udp/echo] succeeded!
XXXXhello monde
hello monde

Note: there are multiple versions of netcat, and the parameter for timeout may vary. Above is the macos version.

There is something else interesting with the code above. consider this client usage:

$ lorem -c 5000 | nc -u -w 5 0.0.0.0 7

Note: lorem is a CLI command that generate placeholder test of a user chosen number of characer (or line or paragrah). Here I input 5001 characters [4] to the echo server.

the output will be the 5001 characters piped in as input, despite the 4096 length passed to socket_recvfrom.

By tweaking the server code slightly:

$server ="0.0.0.0";
$port = 7;
$socket = socket_create(AF_INET, SOCK_DGRAM, SOL_UDP);
socket_bind($socket, $server, $port);

while (true) {

    $receivedLength = socket_recvfrom($socket, $data,4096,0,$server,$port);
    if (false === $receivedLength ) {
        continue;
    }

    $outputLength = socket_sendto($socket, $data, $receivedLength, 0, $server, $port);
    echo "output length: $outputLength" . PHP_EOL;
}
socket_close($socket);

when running the same example, we get this server output:

output length: 1024
output length: 1024
output length: 1024
output length: 1024
output length: 905

In one way, it's consistent with the fact that the UDP client just accept any packet addressed to it and we have an infinite loop in the server that keeps churning data. However, what's surprising is the value of 1024. We've specified 4096, why is it capped at 1024 ? I don't know.

To experiment further, I've replaced 4096 with 960, and the server output for the same example became:

output length: 960
output length: 960
output length: 960
output length: 960
output length: 905

Thats 4745 characters over 5 iterations.

this was confirmed by counting the nb of output character:

$ lorem -c 5000 | nc -u -w 5 0.0.0.0 7 | wc -c
4745

Updating the code to print out the value of $receivedLength shows no additional insight:

received length: 960
output length: 960
received length: 960
output length: 960
received length: 960
output length: 960
received length: 960
output length: 960
received length: 905
output length: 905

Setting its value to 1024 yields as server output:

received length: 1024
output length: 1024
received length: 1024
output length: 1024
received length: 1024
output length: 1024
received length: 1024
output length: 1024
received length: 905
output length: 905

while client output is:

$ lorem -c 5000 | nc -u -w 5 0.0.0.0 7 | wc -c
5001

To summarize, here are the questions still open so far:

  • why is socket_recvfrom seemingly capped to 1024 bytes ?
  • why not all characters due to be output are sent back when I set the length to 960 bytes ?

To try to shed some light on what's wrong with my implementation, I wondered how is the echo server is implemented in UNIX/Linux system historically?

It turns out, it is a service provided by the super-server inetd.
And the source code is in the inetd.c file [5] which among other things, defines two functions echo_stream for TCP, and echo_dg for UDP. Here's the code for the latter:

/*
 * Internet services provided internally by inetd:
 */
#define	BUFSIZE	4096
[...]
void
echo_dg(int s, struct servtab *sep)
{
	char buffer[BUFSIZE];
	int i;
	socklen_t size;
	struct sockaddr_storage ss;

	size = sizeof(ss);
	if ((i = recvfrom(s, buffer, sizeof(buffer), 0,
	    (struct sockaddr *)&ss, &size)) < 0)
		return;
	if (dg_badinput((struct sockaddr *)&ss))
		return;
	(void) sendto(s, buffer, i, 0, (struct sockaddr *)&ss, size);
}

Not so different from our implementation after all!
There is no socket creation, nor infinite loop, but that's likely because inetd is performing those functions.

I noticed the buffer is also set to 4096 like in my original implementation.

Continuing my experimentation, keeping length set to 960:

$ lorem -c 10000 | nc -u -w 5 0.0.0.0 7 | wc -c
9425
$ lorem -c 5000 | nc -u -w 5 0.0.0.0 7 | wc -c
4745
$ lorem -c 1000 | nc -u -w 5 0.0.0.0 7 | wc -c
960

The last command's output seems to indicate that socket_recvfrom was called once and it read 960 bytes of data, then discarded the rest.

So what makes it that in the first two calls of that example, the server obviously calls socket_recvfrom multiple times?

So I ran one more experiment:

$ lorem -c 1920 | nc -u -w 5 0.0.0.0 7 | wc -c
1857

This time I've sent a multiple (2x) of the length (960).
And more characters were returned, not all, but it clearly called the PHP function twice.

Server output was:

received length: 960
output length: 960
received length: 897
output length: 897

When sending 2880 (960*3) characters, the server calls the PHP function 3 times:

received length: 960
output length: 960
received length: 960
output length: 960
received length: 833
output length: 833

I know UDP is considered not reliable (we use TCP if we want reliability), but still...

If I set the length to 1024 and send the same 2880 characters, I got all characters back and the server output is:

received length: 1024
output length: 1024
received length: 1024
output length: 1024
received length: 833
output length: 833

Ah! surely, that 833 that comes out again cannot be a coincidence!
Which gave me an idea for the next experiment.

First, set length to 960, but send a multiple of 1024, lets say 3072:
I only got 2881 character back and the server output is:

received length: 960
output length: 960
received length: 960
output length: 960
received length: 960
output length: 960
received length: 1
output length: 1

Then, I set length to 1024, and send 3072 characters:
I got all characters back and the server output is:

received length: 1024
output length: 1024
received length: 1024
output length: 1024
received length: 1024
output length: 1024
received length: 1
output length: 1

Yes, I'm starting to see what's happening here.
netcat (or actually is it the OS' network layer?) write our data into atmost 1024 bytes sized datagrams and send it asis to our server.
Our server receives the UDP packets but because the length to read is smaller than 1024, it reads 960 characters, discard the rest, and then move on to the next datagram, rinse and repeat. That's also why if the length is set to 4096, it will still read 1024 characters as that's the maximum size of datagrams.

That limit is call the Maximum Transmission Unit (MTU). It's 1500 by default on macos (so yes, it's an OS thing) [6]. That include the size of payload and of the header of the IP and UDP datagram, so in practice there's less for hte payload.[7]

Next installment, we will hopefully put together our echo server that work with both TCP and UDP.

[1] https://www.rfc-editor.org/rfc/rfc862
[2] https://www.php.net/manual/en/function.socket-create.php
[3] https://unix.stackexchange.com/a/482697/302549
[4]

$ lorem -c 5000 | wc -c
5001

[5] https://sources.debian.org/src/openbsd-inetd/0.20160825-4/inetd.c/
[6]

$ networksetup -getMTU en0
Active MTU: 1500 (Current Setting: 1500)

[7] https://stackoverflow.com/a/42610200/6518111