OhmsBlog: code

Showing posts with label code. Show all posts

Wednesday, February 08, 2012

Announcing Audio Auto-adjust

As I've been chronicling in my Adventures in Android series, I've been working on an Android app. I'm pleased to announce that today I have published Audio Auto-adjust 1.0.0 to the Android Market. I'd like to take this opportunity to provide a brief overview of what Audio Auto-adjust is and how to use it.

While there are numerous volume control apps already on the Android Market, I wrote this app to because I had some very specific needs that I wanted to address. I wanted an app that didn't leave services constantly running in the background, yet I wanted an app that would be responsive to system actions. In particular, I wanted to be able to save my audio settings as presets, and then associate those presets with actions.

The Main Activity

Here is the main activity as it appears in the portrait orientation. As you can see, the various audio streams are easily adjusted by moving the slider controls.

The toolbar across the top provides three widgets. The first button on the left toggles silent mode. The next button saves the current volume configuration to a preset. The third widget is a dropdown that displays the currently active preset (if any) and allows you to switch to a different preset. This widget is only enabled when there are two or more presets that have been saved.

Notice that some of the streams are identified by hyperlinks. Those hyperlinks lead to context menus for additional configuration settings that are permitted for the selected audio stream.

To access Audio Auto-adjust's advanced features, select your phone's menu button to activate the main menu. Note that the Edit Presets and Actions options are only available if you have already saved at least one preset.

Edit Presets Activity

This activity lists all the presets that have been saved in Audio Auto-adjust. In this example I have saved two presets, "Foo" and "Max." Touching one of those presets will command Audio Auto-adjust to switch to that preset. Long-pressing a preset, however, raises a context menu that allows you to modify that preset. For example, long-pressing the "Foo" preset presents this menu:

Actions Activity

The most powerful feature of Audio Auto-adjust is the ability to associate presets with system actions. My personal use case for this was when I enter my car and my phone connects to my vehicle's hands-free system via Bluetooth. Upon receiving notification that my phone has connected to my car, Audio Auto-adjust will automatically switch to an associated preset.

NOTE: In order to configure Bluetooth events, you must enable Bluetooth on your phone before entering the "Configure Actions" activity.

Each row in this activity lists a system action that may be associated with a preset. Actions containing "(None)" are simply ignored by Audio Auto-adjust. To associate an action with a preset, select the action's corresponding dropdown, and then choose a preset from the list.

A Note About Precedence

The order of the listed system actions in the "Configure Actions" activity affects the precedence of those actions. Entries that are located higher on the list are given a higher precedence by Audio Auto-adjust. If more than one system event is activated at the same time, the action with the highest precedence is the one that will be activated by Audio Auto-adjust.

Precedence may be adjusted by placing your finger on the grip on the left side of the row, and then dragging it up or down to your desired position.

Settings Activity

I've tried to ensure that each preference in the settings activity contains a useful summary. Here's a brief overview that goes into a bit further detail:

Volume Control Stream: Chooses which audio stream will be adjusted by your phone's hardware volume controls. Note that this setting only applies when Audio Auto-adjust is running in the foreground.
Background Notification: When enabled, Audio Auto-adjust will post a notification whenever it switches presets while running in the background.
Play Sounds: When adjusting volume settings, play sounds to illustrate the loudness of the setting.
Tie Ring Volume: When set, hide the controls for the Notification audio stream. Instead, the Ring audio stream will control both itself and the Notification stream.

In Conclusion

My goal with Audio Auto-adjust was to make a volume control app that fulfilled my needs. I hope that is as useful to others as it is to me. In the future I plan to expand the app with additional features, so please stay tuned!

Wednesday, November 02, 2011

Adventures in Android, Part IV: Vertical SeekBars

One user interface element that I wanted to include in my activities was a vertically-oriented SeekBar. While Android has provided View.setRotation() since API level 11, since I am targeting API level 7 I do not have that functionality available to me. I knew that this could probably be achievable by manually applying a transformation matrix to a View, but I had no idea how difficult it would end up being. I attempted numerous different approaches in an effort to achieve my desired results. Much to my frustration, I kept finding limitations in the APIs that made every option either unfeasible or difficult.

My first attempt was to try something subclassing SeekBar similarly to this StackOverflow post. It overrides View.onDraw() and applies a transformation to the Canvas. It also sends fake size dimensions up to the SeekBar superclass and provides a custom View.onTouchEvent() handler. My concerns were as follows:
- A custom event handler must call SeekBar.setProgress() to update the SeekBar's state. Since this is being done programmatically, any listeners will be told that this change did not come from the user, even though indirectly it actually did.
- This custom event handler did not (and cannot) propagate touch events up to the SeekBar in a way that it will be able to redraw itself during the touch event. In particular, the SeekBar thumb was not being highlighted while the view was being touched.
Figure 1: SeekBar with thumb drawn at incorrect location
My second attempt was an extension to the first. Instead of completely overriding onTouchEvent(), I decided to use onTouchEvent() to perturb the touch coordinates of the provided MotionEvent, then call up into the superclass. While this fixed the issues from the first option, it still didn't look right.
- Notice that in Figure 1, the SeekBar's progress indicator is at 100%, yet the thumb is located near the bottom. It turns out that the SeekBar's drawing code calls getWidth() to figure out where to position the thumb. Since the SeekBar is now in a vertical orientation, it should be using the height instead of the width.
Finally we reach the third, definitive option: I wrote a derivative of ViewGroup called RotatedLayout that does a perfect transformation of the child View. From the child's perspective, it is operating using its regular orientation. The RotatedLayout class transforms coordinates for drawing, measuring, layout, invalidation, touch events and key events between its parent and its child. This allows me to provide my users with a pixel-perfect vertical SeekBar!

It was annoying to deal with invalidation; Android goes to great lengths to prevent you from tinkering with it. I had no choice, however: any invalidation rectangles generated by the SeekBar need to be transformed from the SeekBar's coordinate system to the parent view's coordinates. Figure 2 illustrates what happens if invalidation isn't transformed: Only a small region of pixels at the top of the SeekBar are redrawn. This happens because that small region happens to be exactly the same height as the underlying horizontal SeekBar. Figure 3 overlays a horizontal SeekBar with the misdrawn vertical SeekBar to illustrate.

Figure 2: Vertical SeekBar with no invalidation transformation

Figure 3: Invalidated region from Figure 2 overlaid with horizontal SeekBar

Figure 4: Vertical SeekBars in Audio Auto-adjust.

Are you interested in what RotatedLayout can do for your app? Drop me a line: ohmsblog at teamohms dot org

Wednesday, October 26, 2011

Adventures in Android, Part III

Today I want to talk about some architectural decisions that I made in the early stages of development that I hope will help to minimize the memory and battery consumption of my app.

First, let's discuss how I intended for my app to receive input. I do plan on having to interface directly with the user, so I'll be implementing a few activities. I also have a couple of BroadcastReceivers that I have added to my app's manifest in order to receive certain broadcast intents from the OS.

It very quickly became clear that I was going to want to implement a service to do my app's heavy lifting. Since my BroadcastReceiver was going to need to send intents over to the service via Context.startService(), I settled on using an IntentService so that my service would automatically stop itself when there were no more pending intents.

I also decided that it was going to be necessary for my service to communicate back to my activities. I decided that my activities will use Context.bindService() to facilitate this bi-directional communication. On the other hand, I don't want the service to be bound any longer than it reasonably needs to be.

Observe this snippet from the Android documentation on the service lifecycle:

A service can be both started and have connections bound to it. In such a case, the system will keep the service running as long as either it is started or there are one or more connections to it with the Context.BIND_AUTO_CREATE flag. Once neither of these situations hold, the service's onDestroy() method is called and the service is effectively terminated.

This information is important because it clarifies what happens if an IntentService tries to stop itself while the service is still bound: The service won't be destroyed until any connections are unbound.

After digesting all of this for a few moments, I came to the following conclusions:

My service will be implemented as an IntentService and it will process start intents sent by my activities and broadcast receivers. Unless any connections are bound to it, the service will stop itself as soon as it has processed all the intents in its queue.
My service will also support having connections bound to it. My activities will bind in order to achieve bi-directional communication with my service. Because of the guarantees made in the Android service lifecycle documentation, the IntentService implementation can't "pull the rug out" from under bound connections.
I will under no circumstances bind anything to the service unless the communication is essential to the functioning of my app. In other words, I'm not going to bind to the service and leave it running "just in case." In the case of my main activity, this means that I'll bind to the service in Activity.onStart() and unbind in Activity.onStop(); if the activity isn't visible and there aren't any start intents pending, then my service doesn't need to be consuming resources.
To compensate for the fact that the service is unlikely to be running continuously, I ensure that my activities send a refresh command to the service when they become visible. This gives the service an opportunity to update the activity as necessary to ensure that the UI is consistent with the service's internal state.

Monday, October 24, 2011

Adventures in Android, Part II

A brief annoyance that I was dealing with:

For testing purposes I wanted a service to initiate a status bar notification. I couldn't understand why, but for some bizarre reason Android kept failing miserably when trying to load the resource for my notification's icon. I kept seeing stuff in my ddms logs about android.content.res.Resources$NotFoundException.

I checked my code and my resources over and over again. I blew away the gen directory and rebuilt my app from scratch. I scoured StackOverflow for a situation similar to mine, yet nothing quite the same came up.

After pulling my hair for an hour or two, I inadvertently realized what the problem was.

I'm running CyanogenMod 7.1 on my phone. I had it set to forcibly install apps to my phone's SD card. Unfortunately for me, my app doesn't include any SD card support whatsoever.

*facepalm*

Problem solved: I moved the app into my phone's internal memory. I also decided to end my little experiment with forcing new installs onto the SD card.

Bloody hell!

Sunday, October 02, 2011

Adventures in Android, Part I

I've spent most of this afternoon and evening working on my first Android app.

I was dissatisfied with the current slate of applications that are available for a particular problem domain, so I have decided to write my own.

More details will follow as my work approaches completion, but in the meantime I'm going to jot down some thoughts about the stuff that I worked on today.

I haven't written a line of Java in probably seven or eight years because I think that it sucks. It wasn't difficult to get back into it, though I still don't like it.
I hate checked exceptions; they make the compiler annoying enough to motivate you to write suppressing catch blocks just so that your code will compile (Doing so is a bad idea, of course, so suck it up and do the right thing).
The APIs I wanted to use were not available publicly until Honeycomb. I want my app to run on Eclair. I ended up using reflection to give me the appropriate interfaces depending on availability: I use the documented API if it's available, but if I catch a NoSuchFieldException then I fall back to the undocumented APIs. I am amused that my first Android app, the one that I am writing to teach myself, is using undocumented APIs; I did the same thing when I started hacking 16-bit Windows code in 1995. It's actually a bit easier this time around because I may browse the entire source code. When I taught myself Windows I didn't have that luxury, though I always had my trusty copy of Undocumented Windows sitting nearby (I still have that book)!
I haven't used Eclipse since the last time that I wrote code in Java. This time around, I am proud to say that I successfully resisted using Eclipse for my Android work. I was worried that command-line environments were going to be an afterthought when it came to the Android developer tools but I was pleasantly surprised to be proven wrong. Vim and Ant FTW!
ddms is really cool.
Today's work passed testing! svn commit

<digression>This experience reinforces my opinion that hiring developers based on their ability to satisfy a laundry list of keywords is ridiculous. Good developers don't need a year to pick up whatever technologie du jour you're pushing. Whatever your buzzwords are, they'll probably be obsolete in five years anyway.</digression> That's another topic for another day.

Tuesday, May 03, 2011

Oracle OCI Tips

If OCIDirPathPrepare fails with ORA-01403, check and make sure that the OCI_ATTR_SCHEMA_NAME attribute has been set on the OCIDirPathCtx handle.

If OCIDirPathColArrayToStream fails with ORA-12899, you're probably not passing in the rowcnt and rowoff values correctly. The docs aren't very clear on how to specify those two parameters, so I'll try to elaborate here a little bit:

rowcnt should always be the number of rows that have been set in the column array, including rows that have already been sent to the stream buffer. A column array's set rows don't get cleared until you call OCIDirPathColArrayReset. If you specify a rowcnt that is too large (i.e. it includes unset rows), you may encounter errors.
rowoff should be the row index into the column array where stream conversion should begin.

If you follow these guidelines, OCIDirPathColArrayToStream will process rowcnt - rowoff rows during the conversion.

For example, let's suppose that you've previously converted three rows of a column array to the stream. You have just added a fourth row to the column array, and now you want to add that new row to the stream. Set rowcnt to 4, since that's the total number of rows that have been specified in the column array. Set rowoff to 3, since you've already converted rows 0, 1, and 2.

Fixing the winsock header file mess

Do you write Windows Sockets code? Are you having conflicts between winsock.h and winsock2.h?

If you take a look at winsock.h, you will notice that it uses the _WINSOCKAPI_ macro to prevent multiple inclusion. winsock2.h uses _WINSOCK2API_ to prevent multiple inclusion, but it also sets _WINSOCKAPI_ to fool the preprocessor into thinking that winsock.h has already been included.

Knowing this information, we can take this a step further and suppress winsock.h across the board (assuming Visual C++):

First, modify your build system so that /D_WINSOCKAPI_ is always passed to the compiler. This makes the preprocessor think that winsock.h has already been included, so it never makes it past winsock.h's multiple inclusion preprocessor directives.
Create a proxy header file using the following code snippet. Instead of including winsock2.h directly, always include the proxy header instead.

#ifndef __MYWINSOCK2_H
#define __MYWINSOCK2_H
#pragma push_macro("_WINSOCKAPI_")
// We clear _WINSOCKAPI_ to avoid preprocessor warnings about
// multiple definitions of the _WINSOCKAPI_ macro, as winsock2.h will
// attempt to #define _WINSOCKAPI_ itself.
#undef _WINSOCKAPI_
#include <winsock2.h>
#pragma pop_macro("_WINSOCKAPI_")
#endif // __MYWINSOCK2_H

Monday, April 25, 2011

How to generate your own sequential GUIDs for use with Microsoft SQL Server

Let's suppose that you want to create a table whose primary key is a uniqueidentifier and uses a clustered index. You may certainly use the NEWSEQUENTIALID() function to supply the next sequential GUID as a default. On the other hand, what do you do if you want to generate such a GUID in your application?

The answers are hidden in plain sight: The documentation for NEWSEQUENTIALID() states that this function wraps UuidCreateSequential(). At the same time, this blog post indicates that SQL Server slightly modifies the GUID before inserting the value as a default.

At this point, it's easy to see for yourself what SQL Server is doing to the GUID:

Attach WinDbg to sqlservr.exe, enter bp rpcrt4!UuidCreateSequential, and resume the program.
Insert a row into a table such that NEWSEQUENTIALID() will be invoked.
When SQL Server triggers the breakpoint, enter the kb 1 command. Take note of the first argument to UuidCreateSequential().
Enter the gu command.
Type db address L16, where address is the argument from step 3. The output from this command is the raw bytes of the GUID structure that was generated.
Resume the debugger.
Query for the newly inserted row and inspect the GUID that SQL Server inserted into the database.

You'll find that the resulting GUID is identical to the GUID from step 5, except that the Data1, Data2, and Data3 fields have been converted to big-endian representation (RFC4122 actually recommends that GUIDs be big-endian encoded, but the Microsoft implementation uses little-endian fields).

Friday, February 26, 2010

On the Risks of Fibers

Raymond Chen blogged about the risks of fibers today. I addressed these same concerns when writing about fibrous thread pools nearly two years ago. Raymond's post goes into a little bit more detail about why certain things are risky with fibers. I highly recommend it.

Saturday, December 05, 2009

Leaky Abstractions Redux, Part II

Last time I discussed why Winsock's connect function shouldn't be called by a thread running in the I/O component of the system thread pool. To recap, the implementation of connect does an alertable wait, causing APC work items that are enqueued to that thread to execute. This causes a cyclical effect where connect and the work items invoke each other repeatedly.

Another thread pool no-no is to call GetQueuedCompletionStatus from a thread in the default component. Instead of using the APC queue, the default component uses an I/O completion port as its queuing mechanism. This is actually very handy because I/O handles can be bound to the system thread pool using BindIoCompletionCallback. This allows asynchronous I/O completion events to be posted directly to the system thread pool.

I like to think of I/O completion ports as thread-safe queues provided by the OS that possess two unique properties:

Integration with the scheduler: When a thread that is associated with an I/O completion port blocks, the scheduler and the port cooperate so that another thread that is blocked on the port can be woken up and provided with an I/O completion packet.
Integration with the I/O subsystem: I/O completion packets can be posted directly to the port without any user-mode intervention.

Unfortunately only one I/O completion port can be associated with a thread at any given time. Because the default component of the system thread pool uses an I/O completion port for its queue, all WT_EXECUTEDEFAULT threads are already associated with a port. This means that if one of those threads blocks, the pool's port will know about it.

One associates a thread with an I/O completion port by calling GetQueuedCompletionStatus. The calling thread becomes associated with the port whose handle was passed in as the first parameter. This action overwrites any previous associations -- including the association that was made with the thread pool's port! This impairs the thread pool's ability to detect when one of its threads has blocked because the scheduler is no longer aware of the pool's completion port. In the best case, the thread pool will run less efficiently. In the worst case, a deadlock is possible.

Thursday, December 03, 2009

!banned

If I banned myself from using every programming construct that could be abused, I'd be staring at a blank screen. Comments included.

Wednesday, October 07, 2009

Leaky Abstractions Redux, Part I

Today at work I was stuck fixing a stubborn deadlock between two multithreaded services on Windows that were communicating with each other over TCP/IP. This problem ended up being another instance of what Joel Spolsky refers to as Leaky Abstractions.

To understand what is happening here, I need to sketch out a bit of background. The first item to note is that the server process uses an M:N threading model, as opposed to a 1:1 threading model. M connections are multiplexed onto N threads, instead of creating one thread per connection. Because N does not increase without bound, it could be possible for those threads to be exhausted under pathological conditions.

The second noteworthy point is that the threads that are initiating the connections on the client side come from the I/O component of the system thread pool. That is, the TCP/IP client was being queued up onto the system thread pool by calling QueueUserWorkItem with the WT_EXECUTEINIOTHREAD flag. The I/O component of the system thread pool uses user-mode asynchronous procedure calls (APCs) as the queuing mechanism. When the pool thread needs another work item, it performs an alertable wait until the next APC arrives. Since there is only one APC queue per thread, APCs can come from multiple sources but arrive on this single queue in some arbitrary order. If there are multiple sources, we can't guarantee which APC actually gets called when the thread goes alertable; whatever's at the head of the queue is what gets invoked.

That's enough background, so let's take a look at the deadlocked client process. Once I had WinDbg attached, I noticed that the call stack for the thread pool's I/O component looked something like this:

ntdll!KiFastSystemCallRet
ntdll!NtWaitForSingleObject+0xc
kernel32!WaitForSingleObjectEx+0xac
kernel32!WaitForSingleObject+0x12
myprog!MyHandshake+0x86
myprog!MyWorkItem+0x3a
ntdll!RtlpWorkerCallout+0x71
ntdll!RtlpExecuteIOWorkItem+0x29
ntdll!KiUserApcDispatcher+0x25
mswsock!SockDoConnectReal+0x27a
mswsock!SockDoConnect+0x38a
mswsock!WSPConnect+0xbe
WS2_32!connect+0x52
myprog!MyWorkItem+0x3a
ntdll!RtlpWorkerCallout+0x71
ntdll!RtlpExecuteIOWorkItem+0x29
ntdll!KiUserApcDispatcher+0x25
mswsock!SockDoConnectReal+0x27a
mswsock!SockDoConnect+0x38a
mswsock!WSPConnect+0xbe
WS2_32!connect+0x52
myprog!MyWorkItem+0x3a
ntdll!RtlpWorkerCallout+0x71
ntdll!RtlpExecuteIOWorkItem+0x29
ntdll!KiUserApcDispatcher+0x25
mswsock!SockDoConnectReal+0x27a
mswsock!SockDoConnect+0x38a
mswsock!WSPConnect+0xbe
WS2_32!connect+0x52
myprog!MyWorkItem+0x3a
ntdll!RtlpWorkerCallout+0x71
ntdll!RtlpExecuteIOWorkItem+0x29
ntdll!KiUserApcDispatcher+0x25
mswsock!SockDoConnectReal+0x27a
mswsock!SockDoConnect+0x38a
mswsock!WSPConnect+0xbe
WS2_32!connect+0x52
myprog!MyWorkItem+0x3a
ntdll!RtlpWorkerCallout+0x71
ntdll!RtlpExecuteIOWorkItem+0x29
ntdll!KiUserApcDispatcher+0x25
mswsock!SockDoConnectReal+0x27a
mswsock!SockDoConnect+0x38a
mswsock!WSPConnect+0xbe
WS2_32!connect+0x52
myprog!MyWorkItem+0x3a
ntdll!RtlpWorkerCallout+0x71
ntdll!RtlpExecuteIOWorkItem+0x29
ntdll!KiUserApcDispatcher+0x25
mswsock!SockDoConnectReal+0x27a
mswsock!SockDoConnect+0x38a
mswsock!WSPConnect+0xbe
WS2_32!connect+0x52
myprog!MyWorkItem+0x3a
ntdll!RtlpWorkerCallout+0x71
ntdll!RtlpExecuteIOWorkItem+0x29
ntdll!KiUserApcDispatcher+0x25

Notice that there are seven invocations of myprog!MyWorkItem on the call stack. Further notice that the call stack contains a pattern that repeats itself after every invocation of mswsock!SockDoConnectReal. Of course Winsock has no knowledge of my code, so it can't be intentionally invoking my work item. As soon as I saw this I knew what it meant: the internals of the Winsock connect API are implemented using APCs! Since both my code and Winsock were queuing up APCs to this thread, Winsock invoked whichever procedure was at the head of the APC queue when it went alertable. In this case it was my work item instead of the internal Winsock procedure. My work item then attempts another connect, thus repeating the cycle. Since all of the connections on the stack had only been partially completed, this gobbled up resources on the server side until there were no threads available to service the handshake at the top of the call stack. Talk about pathological conditions: there's our deadlock!

I ended up removing the offending client-side code from the system thread pool altogether. You might be wondering why that code was even using the I/O component of the thread pool in the first place. Why didn't it just use WT_EXECUTEDEFAULT? There is a good reason for this, and ironically enough it too is because of a leaky abstraction! That tale will have to wait until Part II.

Thursday, May 21, 2009

Things To Come

0: kd> dg 0x23
P Si Gr Pr Lo
Sel Base Limit Type l ze an es ng Flags
---- ----------------- ----------------- ---------- - -- -- -- -- --------
0023 00000000`00000000 00000000`ffffffff Code RE Ac 3 Bg Pg P Nl 00000cfb

0: kd> dg 0x33
P Si Gr Pr Lo
Sel Base Limit Type l ze an es ng Flags
---- ----------------- ----------------- ---------- - -- -- -- -- --------
0033 00000000`00000000 00000000`00000000 Code RE Ac 3 Nb By P Lo 000002fb

Two words: far jump. Just sayin'.

Thursday, February 26, 2009

A vim lesson

Back when I was a co-op student, one of the developers that trained me had a philosophy that a vi user should learn a new editing trick every day. While I haven't bothered to maintain such a regimen, I do from time to time like to learn new commands as editing situations arise. My latest lesson involved the use of pattern matching to select specific lines within a range, deleting them to a register, and putting them in a different location. Allow me to illustrate with an example.

Suppose I have code like this (familiarity with C-like languages is assumed):

1: char array[3][2];
2:
3: array[0][0] = 'x';
4: array[0][1] = 'y';
5: array[1][0] = 'z';
6: array[1][1] = 'z';
7: array[2][0] = 'y';
8: array[2][1] = '\n';

Let's suppose that I decided that I wanted to group these assignments according to the second index. I want to remove all rows that contain [1] as the second dimension's index and move them elsewhere. How can I do that in vim without deleting each row individually?

:3,8g/.*\[1\] /d A

Let's decompose this. The characters in blue specify the range of lines to consider. I only want vim to look at lines 3 through 8. The g tells vim to apply the pattern match to every line in the range. The yellow text is the pattern match itself. Finally, the delete command is highlighted in orange. This command is repeated for each line that matches the pattern. The uppercase A tells vim to append the line that is deleted into register "a". After this command is run, we'd see:

1: char array[3][2];
2:
3: array[0][0] = 'x';
4: array[1][0] = 'z';
5: array[2][0] = 'y';

Later on, when we want to paste these saved rows, we can position the cursor at line 5 and type the following command:

"ap

This tells vim to put the text in register "a" into the current location in the buffer. Now we end up with the following:

1: char array[3][2];
2:
3: array[0][0] = 'x';
4: array[1][0] = 'z';
5: array[2][0] = 'y';
6: array[0][1] = 'y';
7: array[1][1] = 'z';
8: array[2][1] = '\n';

To clear the register, issue this command:

:let @a = ""

Friday, October 24, 2008

MinWin and the NT Kernel

Lately Slashdot has been discussing Windows 7 and MinWin, but its coverage sucks. I don't know if people don't understand the architecture of Windows NT, or they don't understand operating systems in general, or they just don't care so long as it's Microsoft bashing.

The first thing that bugs me is all this talk about the "MinWin" project. Ever since this posting ran on Slashdot, there's been all kinds of discussion about MinWin, what it means for Windows 7, and so on. The thing is, lots of people seem to think that MinWin represents a completely different kernel. It's not: it's still NT. MinWin is just a stripped-down Windows NT installation - the essential components for booting up an NT system. The suggestion that MinWin should be "added" to Windows 7 doesn't make any sense; what people really mean is that they want all of Windows' "features" to be removed.

This just leads into the next misconception, which is this idea that the Windows "kernel" is big, bloated, ancient, monolithic, and filled with compatibility hacks. This is perpetuated by an error-filled article in the New York Times, no less.

Let's take a breather and take a step back. First of all, is the Windows NT kernel monolithic? Most references refer to it as a "hybrid" kernel. My opinion is that this is more an issue of nomenclature than of implementation. While the kernel might be compartmentalized, as long as its components and device drivers share address space and full supervisor privileges, they are effectively dealing with the same issues that a monolithic kernel might have. The primary result is the fact that a crash in one component of the kernel (or in a device driver) can take down the entire kernel as a whole.

The bigger question though: Is a monolithic NT kernel at all relevant to the notion that Windows is bloated? The correct answer here is, "No." It is not fair to imply that a monolithic kernel consists of "bloated layers we don't use." As stated above, that's not really what it means to be monolithic; the issues outlined above, not bloat, are among the main drawbacks of a monolithic kernel. Unfortunately Slashdot and the Times seem to believe otherwise. I think that I know why, and the reason is historical.

Back in the day, Windows consisted of three core modules: USER.EXE, GDI.EXE, and KERNEL.EXE (or KRNL[23]86.EXE depending on the operating mode that the hardware could support). USER was the window manager, GDI was the graphics library, and KERNEL handled, well, the kernel-ish functionality. I say "kernel-ish" because the 16-bit Windows kernel would sometimes "suck the brains out of DOS" and implement its own system calls, and other times it would just end up issuing DOS syscalls itself. When NT came along it was designed to support multiple subsystems in user-mode, so that the NT kernel could run OS/2 programs or POSIX programs (recall that NT was originally supposed to be a portable, "new technology" implementation of OS/2). Once the decision was made to make NT a Windows product, the OS/2 subsystem was replaced by another subsystem for 32-bit Windows programs that we now know as Win32.

Win32 consists of USER32.DLL, KERNEL32.DLL, and GDI32.DLL (of course there are now other components, but those are extraneous to my argument). These DLLs have similar names to the old core modules of Windows 3.x, and many of the APIs that these DLLs export are 32-bit counterparts to the old 16-bit APIs. Note, however, that just because one of the Win32 DLLs is still named KERNEL32, it doesn't mean that that library implements the kernel of a NT system!

I think that what these writers are really getting at is that they are fed up with the Win32 subsystem. They keep whining about the kernel because they've seen a core module named KERNEL since Windows 1.0 came out in 1985. The NT kernel itself is no older than Linux or Mach, the very kernels that NT is being compared to!

I think that the most absurd part of that Times coverage was that they proceed to compare the Win32 subsystem to microkernels. We're talking apples versus oranges here. The article then claims that because OS X was written on Mach that it became portable enough to run on the iPhone. This is probably true in comparison to the classic Mac OS, but for NT portability was an original design goal. IA-32 was not the first architecture that NT was written for; the Intel i860, a RISC chip, was the original platform. Further note that there were several RISC versions of NT that were available for many years. Itanium and AMD64 versions of NT sprung up rather quickly as well because of this portability. There are many more important factors that affect portability than the type of kernel that is involved; both NT and Linux are testaments to that.

The article also ignores the separate evolution of the DOS-based and NT-based code bases prior to Windows XP. In another instance of ignoring the fact that NT uses subsystems, the article suggests that the NT architecture isn't compartmentalized enough to change. If NT could switch from an OS/2 API to Win32, why couldn't it be switched to something else? I'm not saying that Microsoft would do this, but it certainly isn't a technical limitation.

I'm not defending monolithic kernels, and I'm not defending Microsoft either. As a matter of fact, I've personally been involved with writing two microkernels. I also just happen to have done lots of Windows programming over the past dozen years. What I am arguing for is some correctness in this armchair discussion of both the design and the future of Windows.

Sunday, June 01, 2008

Fibrous Thread Pools, Part II

After I wrote my original post about the fibrous thread pool, I recalled an additional issue that I needed to address during implementation.

Suppose I'm executing a function on a fiber that has been scheduled by the fibrous thread pool. If I call ReadFile, I need to perform a suspend operation because the I/O completion port will generate a completion packet corresponding to the read operation. What if ReadFile returns TRUE? We already know that the operation has completed successfully, so why should we bother suspending? In older versions of Windows, the completion event fires no matter what. This leads to some extra activity in the fibrous thread pool that is merely housekeeping to ensure that our completion events are in sync. Unfortunately this housekeeping is effectively redundant. What we really need is a way to suppress I/O completion callbacks when a file I/O operation returns TRUE without ERROR_IO_PENDING being set.

The Windows 6.x (i.e. Vista and Server 2008) versions provide the solution via a new API: SetFileCompletionNotificationModes. This function allows us to disable I/O completion notifications in instances where an I/O operation meets the above criteria.

For all the criticism that Vista has received, I've got to say that I'm actually quite pleased with many of the additions to the Win32 API. I only wish that they had been made available sooner!

Tuesday, May 13, 2008

Fibrous Thread Pools

"I'm the fiber king, Dave. I'm the fiber king."
- Horatio Caine

Every so often I get an idea in my head that nags at me constantly until I try it out. This idea is a questionable but nonetheless interesting experiment that I was compelled to try over one weekend. Unless you're doing something like writing a DBMS using fibers, don't do this. This was an intellectual exercise. Caveat emptor.

I've been working on quite a bit of server-side Windows code as of late. One technique to maximize scalability on Windows NT systems is to use asynchronous I/O. While working on my I/O implementation, an idea crossed my mind: what if fibers were integrated into a thread pool? What would that look like? How would such a thread pool behave? Is it even a good idea?

Why Fibers?

Remember that a fiber is a user-mode construct that provides a safe mechanism for cooperative context switching. Though fibers can be used to implement user-level threads (such as those found in many UNIX environments), they can (in theory) have a much wider range of applications. Fibers run on top of OS threads, and since fibers are a user-mode concept the NT kernel doesn't recognize them. Fiber context switches are cheaper to execute than thread context switches and if used wisely they can maximize the utilization of the OS threads that they are running on top of. Fibers are sometimes used in databases to maximize performance; Microsoft SQL Server and Sybase SQL Anywhere both have fiber modes.

I was thinking that fibers could be used as a means to implement coroutines. I was interested in coroutines because they could be effective in eliminating the need to code state machines inside I/O completion callbacks. Instead a coroutine's context maintains the required state. As a corollary to this property, coroutines could potentially allow code that expects synchronous I/O to be integrated into an asynchronous model. Let us examine this latter idea in more detail.

Suppose I have a parser that can read in data from an arbitrary source. The parser accepts a pointer to a callback function that copies the data in from the source to a buffer. The parser is expecting the callback to be synchronous, i.e. when that callback returns, it is assumed to have completed the read request. Obviously this doesn't mesh well with asynchronous I/O, where we initiate a request and check back on it later. Using coroutines we could suspend the parser callback function, go and do something else, and then resume the callback once the I/O request has completed. This would allow us to leverage existing code that uses a synchronous I/O model without having to modify it too much (at least in theory).

The Fibrous Thread Pool

What I implemented was a thread pool that schedules fibers. I started off with a bunch of threads that wait on an I/O completion port (see this blog post by Larry Osterman for a rough idea of how this part would work). The difference is that each worker thread has been converted to a fiber. This "main fiber" is responsible for scheduling the "worker fibers." The fibers are scheduled via the pool's I/O completion port. Each completion packet that is queued to the port includes a pointer to a fiber that has been initialized to execute a work item. The main fiber then switches to the worker fiber to run it. Since fibers are cooperative, the thread will continue to execute the worker fiber until it explicitly switches back to the main fiber. This is accomplished via a suspend function that the pool provides that allows worker fibers to switch back to the main fiber for that thread. Each thread has its own dedicated main fiber, but the worker fibers do not have any particular thread affinity; they are scheduled and executed by whichever thread retrieves the completion packet.

The I/O completion port is also leveraged for asynchronous I/O purposes. If a worker fiber wants to initiate asynchronous I/O, it can bind itself to the completion port. Once the fiber initiates an asynchronous I/O request, it can suspend itself. When the I/O request completes, the fiber will be posted back to the completion port and will be scheduled as soon as a thread dequeues its packet.

Of course there are several minutiae that I have not gone into here. There are numerous conditions that must be handled correctly in order to ensure robustness and correctness.

Pitfalls

Perhaps this section should be called, "The Sarlacc Pit." There are many reasons that the uses of the fibrous thread pool (and fibers in general) are limited:

Until Windows Server 2003, FPU context could not be saved. I can't see this as being a big deal, but if you do any floating-point arithmetic then you're out of luck.
In general, code that uses TLS will break. The code would need to be modified use fiber local storage (FLS), which requires Windows Server 2003. Considering one of the applications of this exercise was to avoid extensively modifying synchronous code, we kind of blew it with this one.
Corollary: The C runtime library uses TLS unless you compiled using Visual C++ 2003 (or newer) and are running on an FLS enabled OS. The CRT and C++ exceptions will give you problems unless the version requirements are met.
Third party libraries are not likely to be fiber-safe. COM is not fiber-safe.
CRITICAL_SECTIONs, mutex objects, or any other synchronization / mutual exclusion mechanism that tracks the owning thread, will not work correctly. To a critical section, two different fibers running on top of the same thread will appear to be the same unit of execution. Since critical sections allow recursive entry, they can allow multiple fibers to be inside. The mechanisms above would also experience difficulties if the holding fiber were rescheduled to a different thread. Other mechanisms that don't track the owning thread might work, but remember that they will block the fiber's underlying thread. If there is only one thread in the pool, the other fibers will not be able to execute.
Code that depends on thread handles or thread IDs to be unique for different contexts will not work because multiple fibers will be sharing the same thread.
This one goes without saying, but I'll say it anyway: GUI code will not like fibers. Mind you, GUI code doesn't like multithreading either; windows have thread affinity. If you're doing GUI stuff with a thread pool (WTF?), adding fibers would be the least of your worries.

Part II of this topic is available here.