A bit of an update.
Using the method I described above, it is possible to have gcc generate code that will execute in real mode (and presumably v8086), but only on a 386 or above, as it still insists on using the 32-bit registers, even for 2 byte wide variables. Be prepared to get lots of 66 and 67 prefixes.
In actuality it is gas which produces the 16-bit code - the asm instruction just inserts the .code16gcc line at the top of the intermediate assembly file.
In addition, as it is gcc, it is completely unaware of segments. Therefore, if you wish to access any memory outside the segment which ds/es are set to, you need to create your own farpeek/farpoke functions with inline asm (see [wiki]Inline_Assembly/Examples[/wiki] - I can't seem to link to this, its something to do with the slash in the title).
As a small test, I compiled the following, both with and without the .code16asm header. I attach the disassembly for reference. No optimisations were used in the compilation, otherwise this would have become really small
int add2(int a);
int a = 31;
int b = add2(a);
int add2(int a)
return a + 2;
If you use short ints instead of ints then the code produced still uses 32-bit registers with prefixes, but the values on the stack are 2-byte aligned.
The .code16gcc option can be changed to .code16, which is similar in its output except that it does not necessarily manipulate the stack pointer in the same way as gcc does making it more difficult to walk the stack from a 32-bit monitor.
The disassembly, produced by the NASM disassembler is attached.