MZ and PM16

Discussions on more advanced topics such as monolithic vs micro-kernels, transactional memory models, and paging vs segmentation should go here. Use this forum to expand and improve the wiki!
Post Reply
Posts: 274
Joined: Fri Nov 17, 2006 5:26 am

MZ and PM16

Post by kerravon »

I had previously given up on MZ because I thought it was not technically possible to use it to migrate to PM16. I was going to use NE instead and be compatible between MSDOS 4 European and OS/2 1.x. Maybe only the latter. And this is for both 8086 and 80286. ie OS/2 1.x on 8086.

However, work on trying to create a mini Atari 68000 clone led to a new idea (ie not all of this came from me, and it's all tentative - and some of it came from previous discussion in this forum).

So the goal is to write an 8086 program circa 1984 that will survive the transition to PM16. But the PM16 target will be PDOS/286 or PDOS/PM16 with a linear address space for a single process/whatever you call it. The valid MSDOS program like microemacs will suddenly have access to nearly 16 MiB of memory (or much more when PM16 on the 80386 comes out). I am especially interested in huge memory model, including using industry standard Microsoft C 5.1 - now available for free under MIT license (binary only) which can run on a Book 8088.

Here is the MZ header:

typedef struct {
unsigned char magic[2]; /* "MZ" or "ZM". */ /* 0 */
unsigned short num_last_page_bytes; /* 2 */ /* page = 512 bytes */
unsigned short num_pages; /* 4 */
unsigned short num_reloc_entries; /* 6 */
unsigned short header_size; /* In paragraphs (16 byte). */ /* 8 */
unsigned short min_alloc; /* 10 A */
unsigned short max_alloc; /* 12 C */
unsigned short init_ss; /* 14 E */
unsigned short init_sp; /* 16 10 */
unsigned short checksum; /* 18 12 */
unsigned short init_ip; /* 20 14 */
unsigned short init_cs; /* 22 16 */
unsigned short reloc_tab_offset; /* 24 18 */
unsigned short overlay; /* 26 1A */
unsigned short reserved1[4]; /* First set of reserved words. */ /* 28 1C */
unsigned short oem_id; /* 36 24 */
unsigned short oem_info; /* 38 26 */
unsigned short reserved2[10]; /* Second set of reserved words */ /* 40 28 */
unsigned long e_lfanew; /* Offset to the PE header. */ /* 60 3C */
} Mz_hdr;
[kerravon@paul-pinebook src]$

I don't think I should populate e_lfanew - that should be reserved for when a 32-bit Win32 version of my program comes along, and someone wants to have it multiplatform or whatever the term is. So they can have a PE signature and override my 16-bit version (that works on standard MSDOS 2.0 or PDOS/286).

So maybe the field before e_lfanew - ideally something that doesn't clash with anyone else's MZ extensions.

Ok, so I believe there are two ways that linkers can resolve calls in medium/large memory models.

1. Keep the offset to a minimum, and change the segment.
2. Minimize changes to the segment until a single object code would exceed the offset limit ffff and then break into a new segment.

Number 2, if used - even by whim - and I think Watcom does that for its MSDOS executables - has already created a segmented executable like NE needs. However, the segment markers have been lost in MZ because no-one cares. But now I care.

Rather than throw the baby (MZ) out with the bathwater and switch to NE, it can instead just be NE-inspired.

The new ex-reserved word can point to an extension that identifies each segment length and whether it is code or data.


0xfe00, 0x1 (code)
0xfc08, 0x1 (code)
0x3456, 0x1 (code)
0x235, 0x2 (data)

The relocatable information is not changed - it's normal - set assuming an 8086. There are no intersegment gaps - once again, normal 8086. The PM16 loader can move this stuff around, ie align each segment on a 64k boundary, and make the adjustments, with just the existing relocatable information - even though it is totally inappropriate for the 80286. And only the segment needs to be changed, as usual.

The next thing that is needed is support for Microsoft C's AHINCR/AHSHIFT. These only occur with huge memory model (or huge pointers, anyway).

So after the segment size information you have:

segment 1
offset 1
offset 2
offset 3
segment 2
offset 1
offset 2

segment 1
offset 1
offset 2

with all the places that need to be zapped on a PM16 environment - because the existing values are all set already, to values suitable for the 8086 - not appropriate for PM16.

Additionally, the PM16 loader will detect this "new format MZ" and set the first 4 bytes of BSS not to zero, but to a structure containing callback functions - mainly just two functions - an int86 and int86x.

The MZ application (I'm only trying to support new MZ executables - new rules for new executables - not trying to make existing programs run on somewhere other than MSDOS) is expected to call int86 to do its work, and it checks a flag (that variable in BSS), and if it is non-zero, it does a callback, otherwise it executes an INT instruction. This means a PM16 environment doesn't need to have interrupt handlers, and nor does some usermode clone need to have privilege to intercept real INT instructions. The application itself is supposed to gracefully return control to the person/system/OS/util who/that loaded the executable.

There may be an existing executable format that does some/all of that, which means I could hook into that existing format (or a subset of that format). I don't think "NE" fits the bill - it's a replacement for MZ.

Also note that I'm not interested in self-relocating executables. I only want extra data (no code) - minimal extra data (for PM16) added to a beautiful, pure, simple, stock-standard MZ executable. If Watcom was used as the compiler - or if no huge pointers are present, the extra information is very trivial - a handful of bytes showing the segment lengths (to identify boundaries).

And nor do I want conditional execution plastered through the code - that one place in int86 would be fine though. Basically the OS nominally tells you which method it would like the application to use to interact with it. It's still MSDOS INT 21H calls - but they don't result in a real interrupt. This is not the same as OS/2 1.x where it's the other way around - OS/2 dictates the API and you only have a subset available if you are using an MSDOS system. Not sure what else Family API involves - but it's not a simple, pure, MZ executable - it's NE that is smart enough to cope with being run under MSDOS. I think that's the situation, anyway.

Note that one of the things I do is use as86 and ld86 to produce MZ executables, and it uses a.out as the intermediate format. I thought that might have been technically impossible, but nope, it was able to cope WITH THE SUBSET I am willing to live with. So I was wondering how the AHINCR/SHIFT worked, and it appears that those go into the a.out symbol table as values. That was just from a quick look at the a.out from dossupa.asm - I may be mistaken.

The above proposal will require support from the linker - ie, ld86, to organize an appropriate BSS variable. Not exactly sure how that will work. Presumably some special name it needs to detect. It will need to detect AHINCR/SHIFT anyway..

Any thoughts?

Any existing executable format that suits all my needs without being overly complicated because they wanted to add other unrelated stuff (like DLLs), so made it more complicated?

I don't mind the more complicated format so long as I can use a subset without too much fuss.

Oh - other schemes other than the first 4 bytes of BSS being non-zero are possible - something in the PSP? - suggestions?

Thanks. Paul.
Post Reply