Subsections

4 Multiple Independent Flows

The biggest asynchronous problem, that pcap files are serial, has to be solved in a scaleable manner. Not much can be assumed about the network traffic contained in a pcap savefile other then Murphy's Law will be in effect. This means we'll have to deal with:

Thousands of small simultaneous flows (captured on a busy network)
Flows which ``hang'' mid-stream (an exploit against a server causes it to crash)
Flows which contain large quantities of data (FTP transfers of ISO's for example)

How we implement parallel processing of the pcap savefile will dramatically effect how well we can scale. A few considerations:

Most Unix systems limit the maximum number of open file descriptors a single process can have. Generally speaking this shouldn't be a problem except for highly parallel pcap's.
While RAM isn't limitless, we can use mmap() to get around this.
Many Unix systems have enhanced solutions to poll() which will improve flow management.

4.1 IP Fragments and TCP Streams

There are five major complications with flowreplay:

The IP datagrams may be fragmented- we won't be able to use the standard 5-tuple (src/dst IP, src/dst port, protocol) to lookup which flow a packet belongs to.
IP fragments may arrive out of order which will complicate ordering of data to be sent.
The TCP segments may arrive out of order which will complicate ordering of data to be sent.
Packets may be missing in the pcap file because they were dropped during capture.
There are tools like fragrouter which intentionally create non-deterministic situations.

First off, I've decided, that I'm not going to worry about fragrouter or it's cousins. I'll handle non-deterministic situations one and only one way, so that the way flowreplay handles the traffic will be deterministic. Perhaps, I'll make it easy for others to write a plug-in which will change it, but that's not something I'm going to concern myself with now.

Missing packets in the pcap file will probably make that flow unplayable. There are proabably certain situation where we can make an educated guess, but this is far too complex to worry about for the first stable release.

That still leaves creating a basic TCP/IP stack in user space. The good news it that there is already a library which does this called libnids. As of version 1.17, libnids can process packets from a pcap savefile (it's not documented in the man page, but the code is there).

A potential problem with libnids though is that it has to maintain it's own state/cache system. This not only means additional overhead, but jumping around in the pcap file as I'm planning on doing to handle multiple simultaneous flows is likely to really confuse libnids' state engine. Also, libnids is licensed under the GPL, but I want flowreplay released under a BSD-like license; I need to research if the two are compatible in this way.

Possible solutions:

Developing a custom wedge between the capture file and libnids which will cause each packet to only be processed a single time.
Use libnids to process the pcap file into a new flow-based format, effectively putting the TCP/IP stack into a dedicated utility.
Develop a custom user-space TCP/IP stack, perhaps based on a BSD TCP/IP stack, much like libnids is based on Linux 2.0.37.
Screw it and say that IP fragmentation and out of order IP packets/TCP segments are not supported. Not sure if this will meet the needs of potential users.

4.2 Blocking

As earlier stated, one of the main goals of this project is to keep things single threaded to make coding plugins easier. One caveat of that is that any function which blocks will cause serious problems.

There are three major cases where blocking is likely to occur:

Opening a socket
Reading from a socket
Writing to a socket

Reading from sockets in a non-blocking manner is easy to solve for using poll() or select(). Writing to a socket, or merely opening a TCP socket via connect() however requires a different method:

It is possible to do non-blocking IO on sockets by setting the O_NONBLOCK flag on a socket file descriptor using fcntl(2). Then all operations that would block will (usually) return with EAGAIN (operation should be retried later); connect(2) will return EINPROGRESS error. The user can then wait for various events via poll(2) or select(2).⁷

If connect() returns EINPROGRESS, then we'll just have to do something like this:

int e, len=sizeof(e);

if (getsockopt(conn->s, SOL_SOCKET, SO_ERROR, &e, &len) < 0) {

/* not yet */

if(errno != EINPROGRESS){ /* yuck. kill it. */

log_fn(LOG_DEBUG,"in-progress connect failed. Removing.");

return -1;

} else {

return 0; /* no change, see if next time is better */

}

/* the connect has finished. */

Note: It may not be totally right, but it works ok. (that chunk of code gets called after poll returns the socket as writable. if poll returns it as readable, then it's probably because of eof, connect fails. You must poll for both.

Footnotes

... ⁷: socket(7)

Aaron Turner 2005-08-07