5 pcap vs flow File Format

As stated before, the pcap file format really isn't well suited for flowreplay because it uses the raw packet as a container for data. Flowreplay however isn't interested in packets, it's interested in data streams8 which may span one or more TCP/UDP segments, each comprised of an IP datagram which may be comprised of multiple IP fragments. Handling all this additional complexity requires a full TCP/IP stack in user space which would have additional feature requirements specific to flowreplay.

Rather then trying to do that, I've decided to create a pcap preprocessor for flowreplay called: flowprep. Flowprep will handle all the TCP/IP defragmentation/reassembly and write out a file containing the data streams for each flow.

A flow file will contain three sections:

  1. A header which identifies this as a flowprep file and the file version
  2. An index of all the flows contained in the file
  3. The data streams themselves
\includegraphics{flowheader}

At startup, the file header is validated and the data stream indexes are loaded into memory. Then the first data stream header from each flow is read. Then each flow and subsequent data stream is processed based upon the timestamps and plug-ins.



Footnotes

... 8
A ``data stream'' as I call it is a simplex communication from the client or server which is a complete query, response or message.
Aaron Turner 2006-07-17