What is IO Multiplexing?

I/O multiplexing is a mechanism that allows a program to monitor multiple file descriptors simultaneously (including network sockets, pipes, files, etc.). Through this technology, a program does not need to create a thread or process for each I/O operation. Instead, it can manage multiple I/O operations within a single thread or process, thereby improving resource utilization and the system’s concurrent processing capability.

The following are detailed knowledge points and common interview questions about I/O multiplexing.

1. Concept of I/O Multiplexing

The core idea of I/O multiplexing is: through a mechanism, a program can monitor multiple I/O events simultaneously, and when one or more of these I/O events are ready, notify the application program to perform corresponding processing.

Compared with the traditional blocking I/O method, I/O multiplexing allows the application program not to create an independent thread or process for each I/O operation. Instead, it uses a single thread to wait for events (such as readable, writable, abnormal, etc.) on multiple file descriptors at the same time.

2. Common I/O Multiplexing Models

Common implementations of I/O multiplexing include:

select
poll
epoll (Linux exclusive)
kqueue (BSD systems, such as macOS)
IOCP (Windows)

(1) select

Function: select can monitor the status (readable, writable, abnormal) of multiple file descriptors simultaneously. It judges the status of file descriptors by passing three sets (readable set, writable set, abnormal set).
Features:
There is a limit on the number of file descriptors (usually 1024 or 2048).
Each call to select requires repopulating the file descriptor set, so the efficiency is low.
It uses a polling method to check the status of each file descriptor.
Example of use:

fd_set readfds;FD_ZERO(&readfds);FD_SET(socket_fd, &readfds);select(max_fd + 1, &readfds, NULL, NULL, NULL);if (FD_ISSET(socket_fd, &readfds)) {    // socket_fd is readable}

(2) poll

Function: poll is similar to select and can also monitor the status of multiple file descriptors, but it uses an array of pollfd structures to store file descriptors and their events.
Features:
There is no limit on the number of file descriptors (depending on system memory).
It still needs to traverse the entire file descriptor array during each call.
The array needs to be reinitialized for each call, which is inefficient.
Example of use:

struct pollfd fds[2];fds[0].fd = socket_fd;fds[0].events = POLLIN;poll(fds, 2, timeout);if (fds[0].revents & POLLIN) {    // socket_fd is readable}

(3) epoll

Epoll Working Modes

epoll supports two main working modes:

Epoll Overview

epoll is an efficient I/O multiplexing mechanism under Linux. It is an upgraded version of select and poll, and can manage a large number of file descriptors with low overhead. The full name of epoll is event poll. In network programming scenarios dealing with a large number of concurrent connections, it greatly improves the efficiency of I/O processing. epoll is the basis of non-blocking I/O and event-driven programming, and is very useful in high-concurrency scenarios.

Basic Mechanism of Epoll

epoll is an event-driven model that replaces polling through event notification. When a file descriptor is ready for I/O operations, epoll will notify the application program to perform corresponding read and write operations. Its basic working principle is to monitor changes in the set of file descriptors through an epoll instance (kernel object), thereby reducing unnecessary system call overhead.

Three System Calls of epoll

epoll_create: Creates an epoll instance.

Calling epoll_create creates an epoll instance, which returns an epoll file descriptor. All subsequent operations related to epoll are performed through this file descriptor.

int epoll_create(int size);

epoll_ctl: Controls the registration and deregistration of file descriptors.

Through epoll_ctl, we can add, modify, or delete file descriptors to be monitored and their associated events in the epoll instance.

int epoll_ctl(int epfd, int op, int fd, struct epoll_event *event);

Parameters:
epfd: The file descriptor returned by epoll_create.
op: Operation type (EPOLL_CTL_ADD, EPOLL_CTL_MOD, EPOLL_CTL_DEL).
fd: The file descriptor to be monitored.
event: Specifies the type of event to be monitored.
epoll_wait: Waits for file descriptors to be ready.

Through epoll_wait, the application program can wait for events on a set of file descriptors. It will block the current thread until events occur on one or more file descriptors.

int epoll_wait(int epfd, struct epoll_event *events, int maxevents, int timeout);

Parameters:
epfd: The file descriptor returned by epoll_create.
events: An array of epoll_event structures for occurred events.
maxevents: The size of the events array.
timeout: Waiting time (-1 means infinite waiting, 0 means immediate return).
Level-triggered (LT):

The default mode. When epoll_wait returns, if there is unprocessed data or events on the file descriptor, epoll will continuously notify the application program until all data is processed.

The application program can receive events on the same file descriptor multiple times.

event.events = EPOLLIN; // Level-triggered

Edge-triggered (ET):

A more efficient mode. epoll notifies the application program only when the status of the file descriptor changes from not ready to ready. Even if there is still unprocessed data on the file descriptor, epoll will not notify repeatedly.

In this mode, the application program must process all readable/writable data, otherwise event notifications may be lost.

In ET mode, it is usually used with non-blocking I/O.

event.events = EPOLLIN | EPOLLET; // Edge-triggered

3. Advantages of Epoll Over select and poll

Efficient management of a large number of file descriptors:

select and poll require the kernel to traverse the entire set of file descriptors to check their status for each call, resulting in performance bottlenecks. epoll adopts an event-driven mechanism and only cares about file descriptors with status changes, avoiding unnecessary checks and greatly improving performance.

Support for large-scale concurrent connections:

select and poll are limited by the maximum number of file descriptors, usually with an upper limit of 1024 or 2048. epoll is not subject to this limit and can effectively handle thousands of file descriptors.

Less memory copying:

When registering file descriptors, epoll, the kernel will record these file descriptors. The application program does not need to retransmit the entire set of file descriptors every time epoll_wait is called, reducing the overhead of system calls and memory copying.

Flexible event management:

epoll supports edge-triggered (ET) mode, which can effectively avoid frequent triggering of the same event and improve I/O processing efficiency in high-concurrency scenarios.

4. Example Code

The following is a typical epoll interview code example, demonstrating how to use epoll to implement a simple network server. This server uses epoll to manage multiple client connections and process requests from them.

Basic process:

Create a listening socket using socket() and bind it to the specified IP and port.
Call listen() to enter the listening state and wait for client connections.
Create an epoll instance using epoll_create().
Register the listening socket to the epoll instance using epoll_ctl().
Loop and wait for events on epoll_wait().
For readable events, accept new client connections or read data sent by clients.
After processing the events, continue to wait for other events.

Example code:

#include <sys/epoll.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <unistd.h>
#include <fcntl.h>
#include <stdio.h>
#include <errno.h>
#include <string.h>
 
#define MAX_EVENTS 10
#define READ_BUF_SIZE 1024
 
// Set the file descriptor to non-blocking mode
int set_nonblocking(int fd) {
    int flags = fcntl(fd, F_GETFL, 0);
    if (flags == -1) return -1;
    return fcntl(fd, F_SETFL, flags | O_NONBLOCK);
}
 
int main() {
    int listen_fd, conn_fd, nfds, epoll_fd;
    struct sockaddr_in server_addr;
    struct epoll_event ev, events[MAX_EVENTS];
    
    // Create a listening socket
    listen_fd = socket(AF_INET, SOCK_STREAM, 0);
    if (listen_fd == -1) {
        perror("socket");
        return -1;
    return 0;
}

Code points:

socket(): Creates a socket.
bind(): Binds the socket to a specific IP address and port.
listen(): Puts the socket into listening mode, ready to accept client connections.
epoll_create1(): Creates an epoll instance.
epoll_ctl(): Adds, modifies, or deletes file descriptors in the epoll instance. Here we add the listening socket descriptor to epoll.
epoll_wait(): Waits for events on file descriptors. It is a blocking wait here. When an event occurs on a monitored socket, it returns event information.
accept(): Accepts client connections.
read()/write(): Reads and writes data from/to the client.
set_nonblocking(): Sets the socket to non-blocking mode. In non-blocking mode, if there is no data for read() or accept(), the program will not block.

5. Common Interview Questions

(1) What is I/O multiplexing?

Key points for answering:
I/O multiplexing is a mechanism that allows a program to monitor multiple I/O operations simultaneously, avoiding the creation of independent threads or processes for each I/O operation.
Common I/O multiplexing technologies include select, poll, and epoll.

(2) What is the difference between select and poll?

Key points for answering:
select has a limit on the number of file descriptors (usually 1024), while poll does not have this limit.
poll uses a structure array to pass file descriptors, while select uses file descriptor sets.
Both need to reinitialize the set or array of file descriptors for each call.

(3) What advantages does epoll have over select and poll?

Key points for answering:
epoll has no limit on the number of file descriptors and is suitable for large-scale concurrent connections.
epoll does not need to traverse all file descriptors each time, so its performance is higher.
epoll adopts an event notification mechanism, while select and poll need polling.

(4) What are level-triggered (LT) and edge-triggered (ET)?

Key points for answering:
Level-triggered (LT): Event notifications are triggered every time there is data readable or writable, allowing the program to process data in multiple times.
Edge-triggered (ET): Event notifications are triggered only when the status changes from none to available, requiring the program to process all data at once.

(5) How to handle I/O operations in high-concurrency scenarios?

Key points for answering:
Use I/O multiplexing mechanisms (such as epoll) to manage a large number of file descriptors.
Use non-blocking I/O and asynchronous I/O to reduce the overhead of thread and process context switching.
Combine with technologies such as thread pools or coroutines to further optimize concurrent processing capabilities.

(6) How to choose between LT and ET modes of epoll?

Key points for answering:
LT mode is simpler to use and suitable for most scenarios, without the need to process all data at once.
ET mode is suitable for scenarios with high performance requirements, which can reduce unnecessary system calls, but requires more complex programming logic to ensure that all data is processed in a timely manner.

(7) Describe the application scenarios of I/O multiplexing in servers?

Key points for answering:
I/O multiplexing is used in high-concurrency servers to handle connection requests from multiple clients.
It is often used in network servers, proxy servers and other applications that need to handle a large number of I/O operations at the same time.
A large number of connections are managed through epoll or select, reducing the consumption of system resources and improving the throughput and response speed of the server.

What is the difference between epoll and select, poll?

select and poll are both polling mechanisms that traverse all file descriptors; epoll is an event-driven mechanism that notifies the application program only when an event occurs.
select and poll have a limit on the number of file descriptors, usually 1024, while epoll does not have this limit.
select and poll need to pass the entire set of file descriptors for each call, resulting in large system overhead, while epoll only needs to notify when the status of the file descriptor changes, with small overhead.

What is the difference between the two working modes (LT and ET) of epoll?

LT (Level-Triggered): As long as the event on the file descriptor is not processed completely, epoll_wait will return every time it is called.
ET (Edge-Triggered): epoll_wait will return a notification only when the status of the file descriptor changes from unreadable/unwritable to readable/writable. After that, even if the file descriptor is still readable