#LyX 1.3 created this file. For more info see http://www.lyx.org/
\lyxformat 221
\textclass article
\language english
\inputencoding latin1
\fontscheme pslatex
\graphics default
\paperfontsize default
\spacing single 
\papersize letterpaper
\paperpackage a4
\use_geometry 1
\use_amsmath 0
\use_natbib 0
\use_numerical_citations 0
\paperorientation portrait
\leftmargin 10mm
\topmargin 10mm
\rightmargin 10mm
\bottommargin 15mm
\secnumdepth 4
\tocdepth 3
\paragraph_separation skip
\defskip medskip
\quotes_language english
\quotes_times 2
\papercolumns 1
\papersides 1
\paperpagestyle default
\bullet 1
	0
	6
	-1
\end_bullet
\bullet 2
	0
	0
	-1
\end_bullet

\layout Title


\color black
Flowreplay Design Notes
\layout Author


\color black
Aaron Turner 
\newline 
http://tcpreplay.synfin.net/
\layout Date


\color black
Last Edited:
\newline 
October 23, 2003
\layout Section
\pagebreak_top 

\color black
Overview
\layout Standard


\color black
Tcpreplay
\begin_inset Foot
collapsed true

\layout Standard


\color black
http://tcpreplay.sourceforge.net/
\end_inset 

 was designed to replay traffic previously captured in the pcap format back
 onto the wire for testing NIDS and other passive devices.
 Over time, it was enhanced to be able to test in-line network devices.
 However, a re-occurring feature request for tcpreplay is to connect to
 a server in order to test applications and host TCP/IP stacks.
 It was determined early on, that adding this feature to tcpreplay was far
 too complex, so I decided to create a new tool specifically designed for
 this.
\layout Standard


\color black
Flowreplay is designed to replay traffic at Layer 4 or 7 depending on the
 protocol rather then at Layer 2 like tcpreplay does.
 This allows flowreplay to connect to one or more servers using a pcap savefile
 as the basis of the connections.
 Hence, flowreplay allows the testing of applications running on real servers
 rather then passive devices.
 
\layout Section


\color black
Features
\layout Subsection


\color black
Requirements
\layout Enumerate


\color black
Full TCP/IP support, including IP fragments and TCP stream reassembly.
\layout Enumerate


\color black
Support replaying TCP and UDP flows.
\layout Enumerate


\color black
Code should handle each flow/service independently.
\layout Enumerate


\color black
Should be able to connect to the server(s) in the pcap file or to a user
 specified IP address.
\layout Enumerate


\color black
Support a plug-in architecture to allow adding application layer intelligence.
\layout Enumerate


\color black
Plug-ins must be able to support multi-flow protocols like FTP.
\layout Enumerate


\color black
Ship with a default plug-in which will work 
\begin_inset Quotes eld
\end_inset 

well enough
\begin_inset Quotes erd
\end_inset 

 for simple single-flow protocols like HTTP and telnet.
\layout Enumerate


\color black
Flows being replayed 
\begin_inset Quotes eld
\end_inset 

correctly
\begin_inset Quotes erd
\end_inset 

 is more important then performance (Mbps).
\layout Enumerate


\color black
Portable to run on common flavors of Unix and Unix-like systems.
\layout Subsection


\color black
Wishes
\layout Enumerate


\color black
Support clients connecting to flowreplay on a limited basis.
 Flowreplay would replay the server side of the connection.
\layout Enumerate


\color black
Support other IP based traffic (ICMP, VRRP, OSPF, etc) via plug-ins.
\layout Enumerate


\color black
Support non-IP traffic (ARP, STP, CDP, etc) via plug-ins.
\layout Enumerate


\color black
Limit which flows are replayed using user defined filters.
 (bpf filter syntax?)
\layout Enumerate


\color black
Process pcap files directly with no intermediary file conversions.
\layout Enumerate


\color black
Should be able to scale to pcap files in the 100's of MB in size and 100+
 simultaneous flows on a P3 500MHz w/ 256MB of RAM.
\layout Section


\color black
Design Thoughts
\layout Subsection


\color black
Sending and Receiving traffic
\layout Standard


\color black
Flowreplay must be able to process multiple connections to one or more devices.
 There are two options:
\layout Enumerate


\color black
Use sockets
\begin_inset Foot
collapsed true

\layout Standard


\color black
socket(2)
\end_inset 

 to send and receive data
\layout Enumerate


\color black
Use libpcap
\begin_inset Foot
collapsed true

\layout Standard


\color black
http://www.tcpdump.org/
\end_inset 

 to receive packets and libnet
\begin_inset Foot
collapsed true

\layout Standard


\color black
http://www.packetfactory.net/projects/libnet/
\end_inset 

 to send packets
\layout Standard


\color black
Although using libpcap/libnet would allow more simultaneous connections
 and greater flexibility, there would be a very high complexity cost associated
 with it.
 With that in mind, I've decided to use sockets to send and receive data.
\layout Subsection


\color black
Handling Multiple Connections
\layout Standard


\color black
Because a pcap file can contain multiple simultaneous flows, we need to
 be able to support that too.
 The biggest problem with this is reading packet data in a different order
 then stored in the pcap file.
 
\layout Standard


\color black
Reading and writing to multiple sockets is easy with select() or poll(),
 however a pcap file has it's data stored serially, but we need to access
 it randomly.
 There are a number of possible solutions for this such as caching packets
 in RAM where they can be accessed more randomly, creating an index of the
 packets in the pcap file, or converting the pcap file to another format
 altogether.
 Alternatively, I've started looking at libpcapnav
\begin_inset Foot
collapsed true

\layout Standard

http://netdude.sourceforge.net/
\end_inset 

 as an alternate means to navigate a pcap file and process packets out of
 order.
\layout Subsection


\color black
Data Synchronization
\layout Standard


\color black
Knowing when to start sending client traffic in response to the server will
 be "tricky".
 Without understanding the actual protocol involved, probably the best general
 solution is waiting for a given period of time after no more data from
 the server has been received.
 Not sure what to do if the client traffic doesn't elicit a response from
 the server (implement some kind of timeout?).
 This will be the basis for the default plug-in.
\layout Subsection


\color black
TCP/IP
\layout Standard


\color black
Dealing with IP fragmentation and TCP stream reassembly will be another
 really complex problem.
 We're basically talking about implementing a significant portion of a TCP/IP
 stack.
 One thought is to use libnids
\begin_inset Foot
collapsed true

\layout Standard


\color black
http://www.avet.com.pl/~nergal/libnids/
\end_inset 

 which basically implements a Linux 2.0.37 TCP/IP stack in user-space.
 Other solutions include porting a TCP/IP stack from Open/Net/FreeBSD or
 writing our own custom stack from scratch.
\layout Section


\color black
Multiple Independent Flows
\layout Standard


\color black
The biggest asynchronous problem, that pcap files are serial, has to be
 solved in a scaleable manner.
 Not much can be assumed about the network traffic contained in a pcap savefile
 other then Murphy's Law will be in effect.
 This means we'll have to deal with:
\layout Itemize


\color black
Thousands of small simultaneous flows (captured on a busy network)
\layout Itemize


\color black
Flows which 
\begin_inset Quotes eld
\end_inset 

hang
\begin_inset Quotes erd
\end_inset 

 mid-stream (an exploit against a server causes it to crash)
\layout Itemize


\color black
Flows which contain large quantities of data (FTP transfers of ISO's for
 example)
\layout Standard


\color black
How we implement parallel processing of the pcap savefile will dramatically
 effect how well we can scale.
 A few considerations:
\layout Itemize

Most Unix systems limit the maximum number of open file descriptors a single
 process can have.
 Generally speaking this shouldn't be a problem except for highly parallel
 pcap's.
\layout Itemize

While RAM isn't limitless, we can use mmap() to get around this.
\layout Itemize

Many Unix systems have enhanced solutions to poll() which will improve flow
 management.
\layout Comment


\color black
Unix systems implement a maximum limit on the number of file descriptors
 a single process can open.
 My Linux box for example craps out at 1021 (it's really 1024, but 3 are
 reserved for STDIN, STDOUT, STDERR), which seems to be pretty standard
 for recent Unix's.
 This means we're limited to at most 1020 simultaneous flows if the pcap
 savefile is opened once and half that (510 flows) if the savefile is re-opened
 for each flow.
\begin_inset Foot
collapsed true

\layout Standard


\color black
It appears that most Unix-like OS's allow root to increase the 
\begin_inset Quotes eld
\end_inset 

hard-limit
\begin_inset Quotes erd
\end_inset 

 beyond 1024.
 Compiling a list of methods to do this for common OS's should be added
 to the flowreplay documentation.
\end_inset 


\layout Comment


\color black
RAM isn't limitless.
 Caching packets in memory may cause problems when one or more flows with
 a lot of data 
\begin_inset Quotes eld
\end_inset 

hang
\begin_inset Quotes erd
\end_inset 

 and their packets have to be cached so that other flows can be processed.
 If you work with large pcaps containing malicious traffic (say packet captures
 from DefCon), this sort of thing may be a real problem.
 Dealing with this situation would require complicated buffer limits and
 error handling.
\layout Comment


\color black
Jumping around in the pcap file via fgetpos() and fsetpos() is probably
 the most disk I/O intensive solution and may effect performance.
 However, on systems with enough free memory, one would hope the system
 disk cache will provide a dramatic speedup.
 The 
\begin_inset Quotes eld
\end_inset 

bookmarks
\begin_inset Quotes erd
\end_inset 

 used by fgetpos/fsetpos are just 64 bit integers which are relatively space
 efficent compared to other solutions.
\layout Comment


\color black
The other typical asynchronous issue is dealing with multiple sockets, which
 we will solve via poll()
\begin_inset Foot
collapsed true

\layout Standard


\color black
poll(2)
\end_inset 

.
 Each flow will define a 
\emph on 
struct pollfd
\emph default 
 and the amount of time in ms to timeout.
 Then prior to calling poll() we walk the list of flows and create the array
 of pollfd's and determine the flow(s) with the smallest timeout.
 A list of these flows is saved for when poll() returns.
 Finally, the current time is tucked away and the timeout and array of pollfd's
 is passed to poll().
\layout Comment


\color black
When poll() returns, the sockets that returned ready have their plug-in
 called.
 If no sockets are ready, then the flows saved prior to calling poll() are
 processed.
\layout Comment


\color black
Once all flows are processed, all the flows not processed have their timeout
 decremented by the time difference of the current time and when poll was
 last called and we start again.
\layout Subsection


\color black
IP Fragments and TCP Streams
\layout Standard


\color black
There are five major complications with flowreplay:
\layout Enumerate


\color black
The IP datagrams may be fragmented- we won't be able to use the standard
 5-tuple (src/dst IP, src/dst port, protocol) to lookup which flow a packet
 belongs to.
\layout Enumerate


\color black
IP fragments may arrive out of order which will complicate ordering of data
 to be sent.
\layout Enumerate


\color black
The TCP segments may arrive out of order which will complicate ordering
 of data to be sent.
\layout Enumerate


\color black
Packets may be missing in the pcap file because they were dropped during
 capture.
\layout Enumerate


\color black
There are tools like fragrouter which intentionally create non-deterministic
 situations.
\layout Standard


\color black
First off, I've decided, that I'm not going to worry about fragrouter or
 it's cousins.
 I'll handle non-deterministic situations one and only one way, so that
 the way flowreplay handles the traffic will be deterministic.
 Perhaps, I'll make it easy for others to write a plug-in which will change
 it, but that's not something I'm going to concern myself with now.
\layout Standard


\color black
Missing packets in the pcap file will probably make that flow unplayable.
 There are proabably certain situation where we can make an educated guess,
 but this is far too complex to worry about for the first stable release.
\layout Standard


\color black
That still leaves creating a basic TCP/IP stack in user space.
 The good news it that there is already a library which does this called
 libnids.
 As of version 1.17, libnids can process packets from a pcap savefile (it's
 not documented in the man page, but the code is there).
\layout Standard


\color black
A potential problem with libnids though is that it has to maintain it's
 own state/cache system.
 This not only means additional overhead, but jumping around in the pcap
 file as I'm planning on doing to handle multiple simultaneous flows is
 likely to really confuse libnids' state engine.
 Also, libnids is licensed under the GPL, but I want flowreplay released
 under a BSD-like license; I need to research if the two are compatible
 in this way.
\layout Standard


\color black
Possible solutions:
\layout Itemize


\color black
Developing a custom wedge between the capture file and libnids which will
 cause each packet to only be processed a single time.
\layout Itemize


\color black
Use libnids to process the pcap file into a new flow-based format, effectively
 putting the TCP/IP stack into a dedicated utility.
\layout Itemize


\color black
Develop a custom user-space TCP/IP stack, perhaps based on a BSD TCP/IP
 stack, much like libnids is based on Linux 2.0.37.
\layout Itemize


\color black
Screw it and say that IP fragmentation and out of order IP packets/TCP segments
 are not supported.
 Not sure if this will meet the needs of potential users.
\layout Subsection


\color black
Blocking
\layout Standard


\color black
As earlier stated, one of the main goals of this project is to keep things
 single threaded to make coding plugins easier.
 One caveat of that is that any function which blocks will cause serious
 problems.
\layout Standard


\color black
There are three major cases where blocking is likely to occur:
\layout Enumerate


\color black
Opening a socket
\layout Enumerate


\color black
Reading from a socket
\layout Enumerate


\color black
Writing to a socket
\layout Standard


\color black
Reading from sockets in a non-blocking manner is easy to solve for using
 poll() or select().
 Writing to a socket, or merely opening a TCP socket via connect() however
 requires a different method:
\layout Quotation


\color black
It is possible to do non-blocking IO on sockets by setting the O_NONBLOCK
 flag on a socket file descriptor using fcntl(2).
 Then all operations that would block will (usually) return with EAGAIN
 (operation should be retried later); connect(2) will return EINPROGRESS
 error.
 The user can then wait for various events via poll(2) or select(2).
\begin_inset Foot
collapsed true

\layout Standard


\color black
socket(7)
\end_inset 


\layout Standard


\color black
If connect() returns EINPROGRESS, then we'll just have to do something like
 this:
\layout LyX-Code


\color black
int e, len=sizeof(e);
\layout LyX-Code


\color black
if (getsockopt(conn->s, SOL_SOCKET, SO_ERROR, &e, &len) < 0) { 
\layout LyX-Code


\color black
    /* not yet */
\layout LyX-Code


\color black
    if(errno != EINPROGRESS){  /* yuck.
 kill it.
 */ 
\layout LyX-Code


\color black
       log_fn(LOG_DEBUG,"in-progress connect failed.
 Removing."); 
\layout LyX-Code


\color black
       return -1; 
\layout LyX-Code


\color black
    } else { 
\layout LyX-Code


\color black
       return 0; /* no change, see if next time is better */ 
\layout LyX-Code


\color black
    } 
\layout LyX-Code


\color black
} 
\layout LyX-Code


\color black
/* the connect has finished.
 */ 
\layout Quote


\color black
Note: It may not be totally right, but it works ok.
 (that chunk of code gets called after poll returns the socket as writable.
 if poll returns it as readable, then it's probably because of eof, connect
 fails.
 You must poll for both.
\layout Section


\color black
pcap vs flow File Format
\layout Standard


\color black
As stated before, the pcap file format really isn't well suited for flowreplay
 because it uses the raw packet as a container for data.
 Flowreplay however isn't interested in packets, it's interested in data
 streams
\begin_inset Foot
collapsed true

\layout Standard


\color black
A 
\begin_inset Quotes eld
\end_inset 

data stream
\begin_inset Quotes erd
\end_inset 

 as I call it is a simplex communication from the client or server which
 is a complete query, response or message.
\end_inset 

 which may span one or more TCP/UDP segments, each comprised of an IP datagram
 which may be comprised of multiple IP fragments.
 Handling all this additional complexity requires a full TCP/IP stack in
 user space which would have additional feature requirements specific to
 flowreplay.
\layout Standard


\color black
Rather then trying to do that, I've decided to create a pcap preprocessor
 for flowreplay called: flowprep.
 Flowprep will handle all the TCP/IP defragmentation/reassembly and write
 out a file containing the data streams for each flow.
\layout Standard


\color black
A flow file will contain three sections:
\layout Enumerate


\color black
A header which identifies this as a flowprep file and the file version
\layout Enumerate


\color black
An index of all the flows contained in the file
\layout Enumerate


\color black
The data streams themselves
\layout Standard
\align center 

\color black

\begin_inset Graphics
	filename flowheader.eps

\end_inset 


\layout Standard


\color black
At startup, the file header is validated and the data stream indexes are
 loaded into memory.
 Then the first data stream header from each flow is read.
 Then each flow and subsequent data stream is processed based upon the timestamp
s and plug-ins.
\layout Section


\color black
Plug-ins
\layout Standard


\color black
Plug-ins will provide the 
\begin_inset Quotes eld
\end_inset 

intelligence
\begin_inset Quotes erd
\end_inset 

 in flowreplay.
 Flowreplay is designed to be a mere framework for connecting captured flows
 in a flow file with socket file handles.
 How data is processed and what should be done with it will be done via
 plug-ins.
\layout Standard


\color black
Plug-ins will allow proper handling of a variety of protocols while hopefully
 keeping things simple.
 Another part of the consideration will be making it easy for others to
 contribute to flowreplay.
 I don't want to have to write all the protocol logic myself.
\layout Subsection


\color black
Plug-in Basics
\layout Standard


\color black
Each plug-in provides the logic for handling one or more services.
 The main purpose of a plug-in is to decide when flowreplay should send
 data via one or more sockets.
 The plug-in can use any 
\emph on 
non-blocking
\emph default 
 method of determining if it appropriate to send data or wait for data to
 received.
 If necessary, a plug-in can also modify the data sent.
\layout Standard


\color black
Each time poll() returns, flowreplay calls the plug-ins for the flows which
 either have data waiting or in the case of a timeout, those flows which
 timed out.
 Afterwords, all the flows are processed and poll() is called on those flows
 which have their state set to POLL.
 And the process repeats until there are no more nodes in the tree.
\layout Subsection


\color black
The Default Plug-in
\layout Standard


\color black
Initially, flowreplay will ship with one basic plug-in called 
\begin_inset Quotes eld
\end_inset 

default
\begin_inset Quotes erd
\end_inset 

.
 Any flow which doesn't have a specific plug-in defined, will use default.
 The goal of the default plug-in is to work 
\begin_inset Quotes eld
\end_inset 

good enough
\begin_inset Quotes erd
\end_inset 

 for a majority of single-flow protocols such as SMTP, HTTP, and Telnet.
 Protocols which use encryption (SSL, SSH, etc) or multiple flows (FTP,
 RPC, etc) will never work with the default plug-in.
 Furthermore, the default plug-in will only support connections 
\emph on 
to
\emph default 
 a server, it will not support accepting connections from clients.
\layout Standard


\color black
The default plug-in will provide no data level manipulation and only a simple
 method for detecting when it is time to send data to the server.
 Detecting when to send data will be done by a 
\begin_inset Quotes eld
\end_inset 

no more data
\begin_inset Quotes erd
\end_inset 

 timeout value.
 Basically, by using the pcap file as a means to determine the order of
 the exchange, anytime it is the servers turn to send data, flowreplay will
 wait for the first byte of data and then start the 
\begin_inset Quotes eld
\end_inset 

no more data
\begin_inset Quotes erd
\end_inset 

 timer.
 Every time more data is received, the timer is reset.
 If the timer reaches zero, then flowreplay sends the next portion of the
 client side of the connection.
 This is repeated until the the flow has been completely replayed or a 
\begin_inset Quotes eld
\end_inset 

server hung
\begin_inset Quotes erd
\end_inset 

 timeout is reached.
 The server hung timeout is used to detect a server which crashed and never
 starts sending any data which would start the 
\begin_inset Quotes eld
\end_inset 

no more data
\begin_inset Quotes erd
\end_inset 

 timer.
\layout Standard


\color black
Both the 
\begin_inset Quotes eld
\end_inset 

no more data
\begin_inset Quotes erd
\end_inset 

 and 
\begin_inset Quotes eld
\end_inset 

server hung
\begin_inset Quotes erd
\end_inset 

 timers will be user defined values and global to all flows using the default
 plug-in.
\layout Subsection


\color black
Plug-in Details
\layout Standard


\color black
Each plug-in will be comprised of the following:
\layout Enumerate


\color black
An optional global data structure, for intra-flow communication
\layout Enumerate


\color black
Per-flow data structure, for tracking flow state information
\layout Enumerate


\color black
A list of functions which flow replay will call when certain well-defined
 conditions are met.
\begin_deeper 
\layout Itemize


\color black
Required functions:
\begin_deeper 
\layout Itemize


\color black
initialize_node() - called when a node in the tree created using this plug-in
\layout Itemize


\color black
post_poll_timeout() - called when the poll() returned due to a timeout for
 this node
\layout Itemize


\color black
post_poll_read() - called when the poll() returned due to the socket being
 ready
\layout Itemize


\color black
buffer_full() - called when a the packet buffer for this flow is full
\layout Itemize


\color black
delete_node() - called just prior to the node being free()'d
\end_deeper 
\layout Itemize


\color black
Optional functions:
\begin_deeper 
\layout Itemize


\color black
pre_send_data() - called before data is sent
\layout Itemize


\color black
post_send_data() - called after data is sent
\layout Itemize


\color black
pre_poll() - called prior to poll()
\layout Itemize


\color black
post_poll_default() - called when poll() returns and neither the socket
 was ready or the node timed out 
\layout Itemize


\color black
open_socket() - called after the socket is opened
\layout Itemize


\color black
close_socket() - called after the socket is closed
\end_deeper 
\end_deeper 
\layout LyX-Code

\layout LyX-Code

\the_end