In TCP Socket, handling message boundaries can be a huge problem. This is because TCP sockets are stream-oriented, meaning that data is transmitted as a continuous stream of bytes without any inherent message boundaries. Sending and receiving messages operate on network buffers. So you have to save the data you receive in a buffer until you have read enough bytes to have a complete message that makes sense to your application. It is up to you to define and keep track of where the message ends. As far as the TCP socket is concerned, it is just sending and receiving bytes to and from the network.
Fortunately, there are several methods you can use to handle message boundaries in socket programming. In this article, I will explore some of the most common methods and provide examples for each one.
Method 1: Fixed-length messages
One of the simplest ways to handle message boundaries is to send messages of a fixed length. This means that the sender and receiver agree on the length of each message, and the receiver knows exactly how many bytes to read in order to receive a complete message.
Here is an example of how you can send and receive fixed-length messages:
import socket
HOST = "localhost"
PORT = 12345
MSG_LEN = 10
def serve():
# Initialize Socket
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.bind((HOST, PORT))
sock.listen()
print("Server listening on {}:{}".format(HOST, PORT))
# Accept connection from outside
conn, addr = sock.accept()
print("Connected by", addr)
# Receive the message data
chunks = []
bytes_recd = 0
# recv only tell you how many bytes they handled.
# So you need to keep recv until the message length is
# correct.
while bytes_recd < MSG_LEN:
chunk = conn.recv(min(MSG_LEN - bytes_recd, 2048))
if not chunk:
raise RuntimeError("ERROR")
chunks.append(chunk)
bytes_recd += len(chunk)
data = b"".join(chunks)
# Print the message
message = data.decode("utf-8").strip()
print("Received message: ", message)
# Close connection and socket
conn.close()
sock.close()
def client():
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.connect((HOST, PORT))
message = input("Enter a message: ")
# Truncate or pad the message to fit the fixed length
message = message[:MSG_LEN].ljust(MSG_LEN)
# Send the message data
sock.sendall(message.encode("utf-8"))
sock.close()
In this example, the server is set up to listen for incoming connections and receive messages of exactly 10 bytes. If the client sends a message with less than 10 bytes, we will pad the message with space. For this simplified version, if the message is larger than MSG_LEN
, we simply truncate the message.
This method is pretty easy to understand and implement. However, the downside of this method is how inefficient it is when dealing with lots of small messages. Plus you still have to deal with data not fitting into one message.
Method 2: Delimiter-Separated Messages
Another common method for handling message boundaries is to use a delimiter to separate messages. This means that the sender and receiver agree on a special character or sequence of characters that will mark the end of each message. The only thing you need to pay attention to is when you allow multiple messages to be sent back to back. You may end up reading the start of the following message. You will need to hold onto it until it is needed.
Here is an example of how you can send and receive delimiter-separated messages using Python:
import socket
HOST = "localhost"
PORT = 12346
DELIMITER = b"\r\n"
BUFFER_SIZE = 4096
class Buffer(object):
def __init__(self, sock):
self.sock = sock
self.buffer = b""
def get_line(self):
while b"\r\n" not in self.buffer:
data = self.sock.recv(BUFFER_SIZE)
if not data: # socket is closed
return None
self.buffer += data
line, sep, self.buffer = self.buffer.partition(b"\r\n")
return line.decode()
def serve():
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.bind((HOST, PORT))
sock.listen()
print(f"Server listening on {HOST}:{PORT}")
conn, addr = sock.accept()
print("Connected by", addr)
buff = Buffer(conn)
while True:
line = buff.get_line()
if line is None:
break
print("Received message: ", line)
conn.close()
sock.close()
def client():
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.connect((HOST, PORT))
messages = [b"Line One\r\n",
b"Line ",
b"Two\r\nLine",
b" Three\r\n"]
# Send the message data
for message in messages:
sock.sendall(message)
sock.close()
In this example, the server is set up to listen for incoming connections and receive messages that are delimited by \r\n
characters. The client sends four messages, and the server splits the incoming data into separate messages based on the delimiter.
Method 3: Message Length Header
A more advanced method for handling message boundaries is to include a message length header in each message. This means that the sender and receiver agree on a fixed number of bytes that will indicate the length of the message that follows.
In this method, we will use a message length header, which is a fixed-length field that indicates the length of the message to follow. The sender first sends the length of the message and then the actual message. The receiver reads the message length, reads the corresponding number of bytes from the socket, and then processes the message.
Here is an example of how you can send and receive messages with a length header using Python:
import socket
HOST = "localhost"
PORT = 12345
BUFFER_SIZE = 1024
def serve():
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.bind((HOST, PORT))
sock.listen()
print("Server listening on {}:{}".format(HOST, PORT))
conn, addr = sock.accept()
print("Connected by", addr)
while True:
# Receive the length
header = conn.recv(4)
if not header:
break
# Parse the header
msg_len = int.from_bytes(header[0:4], byteorder="big")
# Receive the message data
chunks = []
bytes_recd = 0
while bytes_recd < msg_len:
chunk = conn.recv(min(msg_len - bytes_recd,
BUFFER_SIZE))
if not chunk:
raise RuntimeError("ERROR")
chunks.append(chunk)
bytes_recd += len(chunk)
data = b"".join(chunks)
# Print the message
message = data.decode("utf-8").strip()
print("Received message:", message)
conn.close()
sock.close()
def client():
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.connect((HOST, PORT))
message = input('Enter a message: ')
# Create the header
msg_len = len(message.encode('utf-8'))
header = msg_len.to_bytes(4, byteorder='big')
# Send the header and message data
sock.sendall(header + message.encode('utf-8'))
sock.close()
In this example, the server is set up to listen for incoming connections and receive messages with a length header. The client sends two messages, each preceded by a 4-byte length header. The server first reads the message length header, unpacks it using the struct module, and then reads the corresponding number of bytes from the socket to get the message.
Warning
The above example has two recv
, the first is to get the length, and the second one is in a loop to get the rest. But, since we are working with a network buffer, the first recv
might not get all 4 characters in one recv
. If you choose to ignore this, in high network loads your code will break. To be bulletproof, you need two recv
loops, to first determine the length, and the second part to get the data.
Conclusion
Handling message boundaries in socket programming using Python can be challenging, but there are several methods you can use to solve this problem. Fixed-length messages, delimiter-separated messages, and message-length headers are some of the most common methods, each with its own advantages and disadvantages. By understanding these methods and how to implement them in Python, you can write robust and reliable network applications.