flowreplay.txt 17 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498
  1. Flowreplay Design Notes
  2. Aaron Turner
  3. http://synfin.net/
  4. Last Edited:
  5. October 23, 2003
  6. Overview
  7. Tcpreplayhttp://tcpreplay.sourceforge.net/ was designed to replay traffic previously
  8. captured in the pcap format back onto the wire for
  9. testing NIDS and other passive devices. Over time, it
  10. was enhanced to be able to test in-line network
  11. devices. However, a re-occurring feature request for
  12. tcpreplay is to connect to a server in order to test
  13. applications and host TCP/IP stacks. It was determined
  14. early on, that adding this feature to tcpreplay was far
  15. too complex, so I decided to create a new tool
  16. specifically designed for this.
  17. Flowreplay is designed to replay traffic at Layer 4 or
  18. 7 depending on the protocol rather then at Layer 2 like
  19. tcpreplay does. This allows flowreplay to connect to
  20. one or more servers using a pcap savefile as the basis
  21. of the connections. Hence, flowreplay allows the
  22. testing of applications running on real servers rather
  23. then passive devices.
  24. Features
  25. Requirements
  26. Full TCP/IP support, including IP fragments and TCP
  27. stream reassembly.
  28. Support replaying TCP and UDP flows.
  29. Code should handle each flow/service independently.
  30. Should be able to connect to the server(s) in the pcap
  31. file or to a user specified IP address.
  32. Support a plug-in architecture to allow adding
  33. application layer intelligence.
  34. Plug-ins must be able to support multi-flow protocols
  35. like FTP.
  36. Ship with a default plug-in which will work "well enough"
  37. for simple single-flow protocols like HTTP and telnet.
  38. Flows being replayed "correctly" is more important then
  39. performance (Mbps).
  40. Portable to run on common flavors of Unix and
  41. Unix-like systems.
  42. Wishes
  43. Support clients connecting to flowreplay on a limited
  44. basis. Flowreplay would replay the server side of the
  45. connection.
  46. Support other IP based traffic (ICMP, VRRP, OSPF, etc)
  47. via plug-ins.
  48. Support non-IP traffic (ARP, STP, CDP, etc) via plug-ins.
  49. Limit which flows are replayed using user defined
  50. filters. (bpf filter syntax?)
  51. Process pcap files directly with no intermediary file
  52. conversions.
  53. Should be able to scale to pcap files in the 100's of
  54. MB in size and 100+ simultaneous flows on a P3 500MHz
  55. w/ 256MB of RAM.
  56. Design Thoughts
  57. Sending and Receiving traffic
  58. Flowreplay must be able to process multiple connections
  59. to one or more devices. There are two options:
  60. Use socketssocket(2) to send and receive data
  61. Use libpcaphttp://www.tcpdump.org/ to receive packets and libnethttp://www.packetfactory.net/projects/libnet/ to send packets
  62. Although using libpcap/libnet would allow more
  63. simultaneous connections and greater flexibility, there
  64. would be a very high complexity cost associated with
  65. it. With that in mind, I've decided to use sockets to
  66. send and receive data.
  67. Handling Multiple Connections
  68. Because a pcap file can contain multiple simultaneous
  69. flows, we need to be able to support that too. The
  70. biggest problem with this is reading packet data in a
  71. different order then stored in the pcap file.
  72. Reading and writing to multiple sockets is easy with
  73. select() or poll(), however a pcap file has it's data
  74. stored serially, but we need to access it randomly.
  75. There are a number of possible solutions for this such
  76. as caching packets in RAM where they can be accessed
  77. more randomly, creating an index of the packets in the
  78. pcap file, or converting the pcap file to another
  79. format altogether. Alternatively, I've started looking
  80. at libpcapnavhttp://netdude.sourceforge.net/ as an alternate means to navigate a pcap
  81. file and process packets out of order.
  82. Data Synchronization
  83. Knowing when to start sending client traffic in
  84. response to the server will be "tricky". Without
  85. understanding the actual protocol involved, probably
  86. the best general solution is waiting for a given period
  87. of time after no more data from the server has been
  88. received. Not sure what to do if the client traffic
  89. doesn't elicit a response from the server (implement
  90. some kind of timeout?). This will be the basis for the
  91. default plug-in.
  92. TCP/IP
  93. Dealing with IP fragmentation and TCP stream reassembly
  94. will be another really complex problem. We're basically
  95. talking about implementing a significant portion of a
  96. TCP/IP stack. One thought is to use libnidshttp://www.avet.com.pl/~nergal/libnids/ which
  97. basically implements a Linux 2.0.37 TCP/IP stack in
  98. user-space. Other solutions include porting a TCP/IP
  99. stack from Open/Net/FreeBSD or writing our own custom
  100. stack from scratch.
  101. Multiple Independent Flows
  102. The biggest asynchronous problem, that pcap files are
  103. serial, has to be solved in a scaleable manner. Not
  104. much can be assumed about the network traffic contained
  105. in a pcap savefile other then Murphy's Law will be in
  106. effect. This means we'll have to deal with:
  107. Thousands of small simultaneous flows (captured on a
  108. busy network)
  109. Flows which "hang" mid-stream (an exploit against a
  110. server causes it to crash)
  111. Flows which contain large quantities of data (FTP
  112. transfers of ISO's for example)
  113. How we implement parallel processing of the pcap
  114. savefile will dramatically effect how well we can
  115. scale. A few considerations:
  116. Most Unix systems limit the maximum number of open
  117. file descriptors a single process can have. Generally
  118. speaking this shouldn't be a problem except for
  119. highly parallel pcap's.
  120. While RAM isn't limitless, we can use mmap() to get
  121. around this.
  122. Many Unix systems have enhanced solutions to poll()
  123. which will improve flow management.
  124. Unix systems implement a maximum limit on the number of
  125. file descriptors a single process can open. My Linux
  126. box for example craps out at 1021 (it's really 1024,
  127. but 3 are reserved for STDIN, STDOUT, STDERR), which
  128. seems to be pretty standard for recent Unix's. This
  129. means we're limited to at most 1020 simultaneous flows
  130. if the pcap savefile is opened once and half that (510
  131. flows) if the savefile is re-opened for each flow.It appears that most Unix-like OS's allow root to
  132. increase the "hard-limit" beyond 1024. Compiling a list
  133. of methods to do this for common OS's should be added
  134. to the flowreplay documentation.
  135. RAM isn't limitless. Caching packets in memory may
  136. cause problems when one or more flows with a lot of
  137. data "hang" and their packets have to be cached so that
  138. other flows can be processed. If you work with large
  139. pcaps containing malicious traffic (say packet captures
  140. from DefCon), this sort of thing may be a real problem.
  141. Dealing with this situation would require complicated
  142. buffer limits and error handling.
  143. Jumping around in the pcap file via fgetpos() and
  144. fsetpos() is probably the most disk I/O intensive
  145. solution and may effect performance. However, on
  146. systems with enough free memory, one would hope the
  147. system disk cache will provide a dramatic speedup. The "bookmarks"
  148. used by fgetpos/fsetpos are just 64 bit integers which
  149. are relatively space efficent compared to other solutions.
  150. The other typical asynchronous issue is dealing with
  151. multiple sockets, which we will solve via poll()poll(2). Each
  152. flow will define a struct pollfd and the amount of time
  153. in ms to timeout. Then prior to calling poll() we walk
  154. the list of flows and create the array of pollfd's and
  155. determine the flow(s) with the smallest timeout. A list
  156. of these flows is saved for when poll() returns.
  157. Finally, the current time is tucked away and the
  158. timeout and array of pollfd's is passed to poll().
  159. When poll() returns, the sockets that returned ready
  160. have their plug-in called. If no sockets are ready,
  161. then the flows saved prior to calling poll() are processed.
  162. Once all flows are processed, all the flows not
  163. processed have their timeout decremented by the time
  164. difference of the current time and when poll was last
  165. called and we start again.
  166. IP Fragments and TCP Streams
  167. There are five major complications with flowreplay:
  168. The IP datagrams may be fragmented- we won't be able
  169. to use the standard 5-tuple (src/dst IP, src/dst
  170. port, protocol) to lookup which flow a packet belongs to.
  171. IP fragments may arrive out of order which will
  172. complicate ordering of data to be sent.
  173. The TCP segments may arrive out of order which will
  174. complicate ordering of data to be sent.
  175. Packets may be missing in the pcap file because they
  176. were dropped during capture.
  177. There are tools like fragrouter which intentionally
  178. create non-deterministic situations.
  179. First off, I've decided, that I'm not going to worry
  180. about fragrouter or it's cousins. I'll handle
  181. non-deterministic situations one and only one way, so
  182. that the way flowreplay handles the traffic will be
  183. deterministic. Perhaps, I'll make it easy for others to
  184. write a plug-in which will change it, but that's not
  185. something I'm going to concern myself with now.
  186. Missing packets in the pcap file will probably make
  187. that flow unplayable. There are proabably certain
  188. situation where we can make an educated guess, but this
  189. is far too complex to worry about for the first stable release.
  190. That still leaves creating a basic TCP/IP stack in user
  191. space. The good news it that there is already a library
  192. which does this called libnids. As of version 1.17,
  193. libnids can process packets from a pcap savefile (it's
  194. not documented in the man page, but the code is there).
  195. A potential problem with libnids though is that it has
  196. to maintain it's own state/cache system. This not only
  197. means additional overhead, but jumping around in the
  198. pcap file as I'm planning on doing to handle multiple
  199. simultaneous flows is likely to really confuse libnids'
  200. state engine. Also, libnids is licensed under the GPL,
  201. but I want flowreplay released under a BSD-like
  202. license; I need to research if the two are compatible
  203. in this way.
  204. Possible solutions:
  205. Developing a custom wedge between the capture file and
  206. libnids which will cause each packet to only be
  207. processed a single time.
  208. Use libnids to process the pcap file into a new
  209. flow-based format, effectively putting the TCP/IP
  210. stack into a dedicated utility.
  211. Develop a custom user-space TCP/IP stack, perhaps
  212. based on a BSD TCP/IP stack, much like libnids is
  213. based on Linux 2.0.37.
  214. Screw it and say that IP fragmentation and out of
  215. order IP packets/TCP segments are not supported. Not
  216. sure if this will meet the needs of potential users.
  217. Blocking
  218. As earlier stated, one of the main goals of this
  219. project is to keep things single threaded to make
  220. coding plugins easier. One caveat of that is that any
  221. function which blocks will cause serious problems.
  222. There are three major cases where blocking is likely to occur:
  223. Opening a socket
  224. Reading from a socket
  225. Writing to a socket
  226. Reading from sockets in a non-blocking manner is easy
  227. to solve for using poll() or select(). Writing to a
  228. socket, or merely opening a TCP socket via connect()
  229. however requires a different method:
  230. It is possible to do non-blocking IO on sockets by
  231. setting the O_NONBLOCK flag on a socket file descriptor
  232. using fcntl(2). Then all operations that would block
  233. will (usually) return with EAGAIN (operation should be
  234. retried later); connect(2) will return EINPROGRESS
  235. error. The user can then wait for various events via
  236. poll(2) or select(2).socket(7)
  237. If connect() returns EINPROGRESS, then we'll just have
  238. to do something like this:
  239. int e, len=sizeof(e);
  240. if (getsockopt(conn->s, SOL_SOCKET, SO_ERROR, &e, &len)
  241. < 0) {
  242. /* not yet */
  243. if(errno != EINPROGRESS){ /* yuck. kill it. */
  244. log_fn(LOG_DEBUG,"in-progress connect failed.
  245. Removing.");
  246. return -1;
  247. } else {
  248. return 0; /* no change, see if next time is
  249. better */
  250. }
  251. }
  252. /* the connect has finished. */
  253. Note: It may not be totally right, but it works ok.
  254. (that chunk of code gets called after poll returns the
  255. socket as writable. if poll returns it as readable,
  256. then it's probably because of eof, connect fails. You
  257. must poll for both.
  258. pcap vs flow File Format
  259. As stated before, the pcap file format really isn't
  260. well suited for flowreplay because it uses the raw
  261. packet as a container for data. Flowreplay however
  262. isn't interested in packets, it's interested in data streamsA "data stream" as I call it is a simplex communication
  263. from the client or server which is a complete query,
  264. response or message.
  265. which may span one or more TCP/UDP segments, each
  266. comprised of an IP datagram which may be comprised of
  267. multiple IP fragments. Handling all this additional
  268. complexity requires a full TCP/IP stack in user space
  269. which would have additional feature requirements
  270. specific to flowreplay.
  271. Rather then trying to do that, I've decided to create a
  272. pcap preprocessor for flowreplay called: flowprep.
  273. Flowprep will handle all the TCP/IP
  274. defragmentation/reassembly and write out a file
  275. containing the data streams for each flow.
  276. A flow file will contain three sections:
  277. A header which identifies this as a flowprep file and
  278. the file version
  279. An index of all the flows contained in the file
  280. The data streams themselves
  281. <Graphics file: flowheader.eps>
  282. At startup, the file header is validated and the data
  283. stream indexes are loaded into memory. Then the first
  284. data stream header from each flow is read. Then each
  285. flow and subsequent data stream is processed based upon
  286. the timestamps and plug-ins.
  287. Plug-ins
  288. Plug-ins will provide the "intelligence" in flowreplay.
  289. Flowreplay is designed to be a mere framework for
  290. connecting captured flows in a flow file with socket
  291. file handles. How data is processed and what should be
  292. done with it will be done via plug-ins.
  293. Plug-ins will allow proper handling of a variety of
  294. protocols while hopefully keeping things simple.
  295. Another part of the consideration will be making it
  296. easy for others to contribute to flowreplay. I don't
  297. want to have to write all the protocol logic myself.
  298. Plug-in Basics
  299. Each plug-in provides the logic for handling one or
  300. more services. The main purpose of a plug-in is to
  301. decide when flowreplay should send data via one or more
  302. sockets. The plug-in can use any non-blocking method of
  303. determining if it appropriate to send data or wait for
  304. data to received. If necessary, a plug-in can also
  305. modify the data sent.
  306. Each time poll() returns, flowreplay calls the plug-ins
  307. for the flows which either have data waiting or in the
  308. case of a timeout, those flows which timed out.
  309. Afterwords, all the flows are processed and poll() is
  310. called on those flows which have their state set to
  311. POLL. And the process repeats until there are no more
  312. nodes in the tree.
  313. The Default Plug-in
  314. Initially, flowreplay will ship with one basic plug-in
  315. called "default". Any flow which doesn't have a specific
  316. plug-in defined, will use default. The goal of the
  317. default plug-in is to work "good enough" for a majority
  318. of single-flow protocols such as SMTP, HTTP, and
  319. Telnet. Protocols which use encryption (SSL, SSH, etc)
  320. or multiple flows (FTP, RPC, etc) will never work with
  321. the default plug-in. Furthermore, the default plug-in
  322. will only support connections to a server, it will not
  323. support accepting connections from clients.
  324. The default plug-in will provide no data level
  325. manipulation and only a simple method for detecting
  326. when it is time to send data to the server. Detecting
  327. when to send data will be done by a "no more data"
  328. timeout value. Basically, by using the pcap file as a
  329. means to determine the order of the exchange, anytime
  330. it is the servers turn to send data, flowreplay will
  331. wait for the first byte of data and then start the "no
  332. more data" timer. Every time more data is received, the
  333. timer is reset. If the timer reaches zero, then
  334. flowreplay sends the next portion of the client side of
  335. the connection. This is repeated until the the flow has
  336. been completely replayed or a "server hung" timeout is
  337. reached. The server hung timeout is used to detect a
  338. server which crashed and never starts sending any data
  339. which would start the "no more data" timer.
  340. Both the "no more data" and "server hung" timers will be
  341. user defined values and global to all flows using the
  342. default plug-in.
  343. Plug-in Details
  344. Each plug-in will be comprised of the following:
  345. An optional global data structure, for intra-flow communication
  346. Per-flow data structure, for tracking flow state information
  347. A list of functions which flow replay will call when
  348. certain well-defined conditions are met.
  349. Required functions:
  350. initialize_node() - called when a node in the tree
  351. created using this plug-in
  352. post_poll_timeout() - called when the poll()
  353. returned due to a timeout for this node
  354. post_poll_read() - called when the poll() returned
  355. due to the socket being ready
  356. buffer_full() - called when a the packet buffer
  357. for this flow is full
  358. delete_node() - called just prior to the node
  359. being free()'d
  360. Optional functions:
  361. pre_send_data() - called before data is sent
  362. post_send_data() - called after data is sent
  363. pre_poll() - called prior to poll()
  364. post_poll_default() - called when poll() returns
  365. and neither the socket was ready or the node
  366. timed out
  367. open_socket() - called after the socket is opened
  368. close_socket() - called after the socket is closed