"I'm the fiber king, Dave. I'm the fiber king."
- Horatio Caine
Every so often I get an idea in my head that nags at me constantly until I try it out. This idea is a questionable but nonetheless interesting experiment that I was compelled to try over one weekend. Unless you're doing something like writing a DBMS using fibers, don't do this. This was an intellectual exercise. Caveat emptor.
I've been working on quite a bit of server-side Windows code as of late. One technique to maximize scalability on Windows NT systems is to use asynchronous I/O. While working on my I/O implementation, an idea crossed my mind: what if fibers were integrated into a thread pool? What would that look like? How would such a thread pool behave? Is it even a good idea?
Why Fibers?
Remember that a fiber is a user-mode construct that provides a safe mechanism for cooperative context switching. Though fibers can be used to implement user-level threads (such as those found in many UNIX environments), they can (in theory) have a much wider range of applications. Fibers run on top of OS threads, and since fibers are a user-mode concept the NT kernel doesn't recognize them. Fiber context switches are cheaper to execute than thread context switches and if used wisely they can maximize the utilization of the OS threads that they are running on top of. Fibers are sometimes used in databases to maximize performance; Microsoft SQL Server and Sybase SQL Anywhere both have fiber modes.
I was thinking that fibers could be used as a means to implement coroutines. I was interested in coroutines because they could be effective in eliminating the need to code state machines inside I/O completion callbacks. Instead a coroutine's context maintains the required state. As a corollary to this property, coroutines could potentially allow code that expects synchronous I/O to be integrated into an asynchronous model. Let us examine this latter idea in more detail.
Suppose I have a parser that can read in data from an arbitrary source. The parser accepts a pointer to a callback function that copies the data in from the source to a buffer. The parser is expecting the callback to be synchronous, i.e. when that callback returns, it is assumed to have completed the read request. Obviously this doesn't mesh well with asynchronous I/O, where we initiate a request and check back on it later. Using coroutines we could suspend the parser callback function, go and do something else, and then resume the callback once the I/O request has completed. This would allow us to leverage existing code that uses a synchronous I/O model without having to modify it too much (at least in theory).
The Fibrous Thread Pool
What I implemented was a thread pool that schedules fibers. I started off with a bunch of threads that wait on an I/O completion port (see this blog post by Larry Osterman for a rough idea of how this part would work). The difference is that each worker thread has been converted to a fiber. This "main fiber" is responsible for scheduling the "worker fibers." The fibers are scheduled via the pool's I/O completion port. Each completion packet that is queued to the port includes a pointer to a fiber that has been initialized to execute a work item. The main fiber then switches to the worker fiber to run it. Since fibers are cooperative, the thread will continue to execute the worker fiber until it explicitly switches back to the main fiber. This is accomplished via a suspend function that the pool provides that allows worker fibers to switch back to the main fiber for that thread. Each thread has its own dedicated main fiber, but the worker fibers do not have any particular thread affinity; they are scheduled and executed by whichever thread retrieves the completion packet.
The I/O completion port is also leveraged for asynchronous I/O purposes. If a worker fiber wants to initiate asynchronous I/O, it can bind itself to the completion port. Once the fiber initiates an asynchronous I/O request, it can suspend itself. When the I/O request completes, the fiber will be posted back to the completion port and will be scheduled as soon as a thread dequeues its packet.
Of course there are several minutiae that I have not gone into here. There are numerous conditions that must be handled correctly in order to ensure robustness and correctness.
Pitfalls
Perhaps this section should be called, "The Sarlacc Pit." There are many reasons that the uses of the fibrous thread pool (and fibers in general) are limited:
- Until Windows Server 2003, FPU context could not be saved. I can't see this as being a big deal, but if you do any floating-point arithmetic then you're out of luck.
- In general, code that uses TLS will break. The code would need to be modified use fiber local storage (FLS), which requires Windows Server 2003. Considering one of the applications of this exercise was to avoid extensively modifying synchronous code, we kind of blew it with this one.
- Corollary: The C runtime library uses TLS unless you compiled using Visual C++ 2003 (or newer) and are running on an FLS enabled OS. The CRT and C++ exceptions will give you problems unless the version requirements are met.
- Third party libraries are not likely to be fiber-safe. COM is not fiber-safe.
- CRITICAL_SECTIONs, mutex objects, or any other synchronization / mutual exclusion mechanism that tracks the owning thread, will not work correctly. To a critical section, two different fibers running on top of the same thread will appear to be the same unit of execution. Since critical sections allow recursive entry, they can allow multiple fibers to be inside. The mechanisms above would also experience difficulties if the holding fiber were rescheduled to a different thread. Other mechanisms that don't track the owning thread might work, but remember that they will block the fiber's underlying thread. If there is only one thread in the pool, the other fibers will not be able to execute.
- Code that depends on thread handles or thread IDs to be unique for different contexts will not work because multiple fibers will be sharing the same thread.
- This one goes without saying, but I'll say it anyway: GUI code will not like fibers. Mind you, GUI code doesn't like multithreading either; windows have thread affinity. If you're doing GUI stuff with a thread pool (WTF?), adding fibers would be the least of your worries.