Since we've brought up the topic, I'll mention that I've been doing all my HAL's with template classes. I've been very satisfied with it as a technique, and I can change processors/boards/etc smootly, just by changing the 'type' the template is based on. Since templates are handled by the preprocessor, I don't think it adds overhead either. You do need to do a couple extra things, though, such as wrapping your interrupt handlers, as well as many 3rd party headers in extern "C" {} blocks so the c++ compiler will chew on them.
Another nice aspect is if you put the hal in one file and the behavior in another file, all the processor-specific includes are in a single file, which can really un-yuck your hierarchy sometimes. Also it let's you conceal hardware details (such as inverted drive for leds) into a programmer-centric behavior (
i.e. 1=on always, regardless if led connected to Gnd or Vcc), which can make things nicer for teams with non-hw-savvy programmers on them.