Had some more work on the DirectSound driver, but still far from happy with it. As I have implemented it at the moment:
Set up
n (in this case

(secondary) buffers and events. When a sound is played it is put into the buffer (pretty big buffers) and a notify event is set for the end of the sound (DirectSound has no concept of sound-ending, etc. just buffersize). When the event is fired I "free" the buffer for more sounds.
The events are checked every 300ms to see if they have finished.
Now the problem is that I don't like this approach. The buffers are way too big (each sound, as they aren't streamed) must fit and I'm continually checking events to see if a buffer was freed.
I don't like throwing out the buffers after the sound has played (memory fragmentation), but can't keep all sounds in memory either (80/90 samples?).
We could go the current way that OpenTTD itself mixes the sounds and streams them to the buffer. I've also tried this but didn't really work out (either very choppy sound and high CPU usage due to high buffer-streaming (small buffer-size) or lagging sounds (due to bigger buffer-size (1-2 second(s)).
If anyone has ideas or written DirectSound for games I'd like to hear.