OSDev.org
https://forum.osdev.org/

COM File, where is data/code located?
https://forum.osdev.org/viewtopic.php?f=1&t=32286
Page 1 of 3

Author:  Postmann [ Tue Aug 08, 2017 5:16 pm ]
Post subject:  COM File, where is data/code located?

Hello :)
I have a problem loading DOS .COM binaries. How do I know where the data is located in the file, and where the code? I always thought, the data would be located at the end, which is right, when my code looks like this:
Code:
section .code
   les bx, [ptrstr]
   ret
section .data
   str2 db 'Hello Hallo Hola. $'
   ptrstr dd str2

But when I swap the sections, the code is at the end in the binary (compiled with NASM):
Code:
section .data
   str2 db 'Hello Hallo Hola. $'
   ptrstr dd str2
section .code
   les bx, [ptrstr]
   ret

In my execution-handler, I simply "call" the address, where the binary was loaded to. But this doesn't work, when data comes first.
Any ideas?

Author:  alexfru [ Tue Aug 08, 2017 6:20 pm ]
Post subject:  Re: COM File, where is data/code located?

Code and data can be anywhere in the file. The only requirement is that the file begins with an instruction where execution starts. It's often a jump to some other code. Between the jump and that code there can be code or data. DOS .EXE files have a file header, which tells where the first executable instruction is in the file and it doesn't have to be right after the header. But even in .EXEs there's no requirement on how code and data are arranged. You can have any sequence of code and data, e.g. code, data, code again, data again, etc. DOS executables were designed to run on processors without memory protection and DOS did not support any kind of virtual memory, so there was never a need to distinguish code from data, load code or data on demand, share pieces of code or data (DLLs), etc.

Author:  Postmann [ Tue Aug 08, 2017 6:39 pm ]
Post subject:  Re: COM File, where is data/code located?

But how do I check, what's data and what's code?
My loader is converting the 16bit rm-code to 32bit pm-code, since I don't want to play with vm8086. But it can't distinguish code and data. :?

Author:  mariuszp [ Tue Aug 08, 2017 6:57 pm ]
Post subject:  Re: COM File, where is data/code located?

Postmann wrote:
But how do I check, what's data and what's code?
My loader is converting the 16bit rm-code to 32bit pm-code, since I don't want to play with vm8086. But it can't distinguish code and data. :?

You can't tell. Code is just a special type of data. And COM files do not make the distinsction (in fact, they might even use data as code, etc). What you're trying to do is impossible in the general case.

Author:  Postmann [ Tue Aug 08, 2017 7:03 pm ]
Post subject:  Re: COM File, where is data/code located?

So, back to vm8086 I guess. Fun :(
Thanks anyway

Author:  zaval [ Tue Aug 08, 2017 7:04 pm ]
Post subject:  Re: COM File, where is data/code located?

mariuszp wrote:
Postmann wrote:
But how do I check, what's data and what's code?
My loader is converting the 16bit rm-code to 32bit pm-code, since I don't want to play with vm8086. But it can't distinguish code and data. :?

You can't tell. Code is just a special type of data. And COM files do not make the distinsction (in fact, they might even use data as code, etc). What you're trying to do is impossible in the general case.

really? what if I told you there are jump instructions? :D if one needs data coming before code in a non-structured file (like COM files), he/she (<- yes, I am for diversity by both hands) needs to insert a jump instruction at the beginning of data, which will jump over.

Author:  TheCool1Kevin [ Tue Aug 08, 2017 7:04 pm ]
Post subject:  Re: COM File, where is data/code located?

Why would you need to distinguish between code and data? Since COM is 16 bit you need to do some segmenting and the code must fit within that segment. Let's say the COM file is located at the segment CS:0x0000, then you would execute a
Code:
call cs:0x100
to run the program and the program will terminate with a
Code:
ret
instruction. Note the offset of 0x100!
http://i.imgur.com/cuyliWz.png

Author:  Postmann [ Tue Aug 08, 2017 7:11 pm ]
Post subject:  Re: COM File, where is data/code located?

Ya, but during loading, I am converting the 16bit instructions to 32bit (some sort of recompiling) and changing JMPs, CALLs and pointers, so I can execute them in protected mode.

Author:  BrightLight [ Tue Aug 08, 2017 7:27 pm ]
Post subject:  Re: COM File, where is data/code located?

Postmann wrote:
Ya, but during loading, I am converting the 16bit instructions to 32bit (some sort of recompiling) and changing JMPs, CALLs and pointers, so I can execute them in protected mode.

There's no real reason to translate the 16-bit instructions of a COM binary into 32-bit instructions. If you insist on running COM binaries from protected mode, you really have two options.
  • Use v8086.
  • Write a software CPU implementation.

The first option is somewhat easier because the CPU can do most of the dirty work, and all you need to do is handle interrupts and other v8086 GPFs. However, you can't do this in 64-bit mode, because v8086 is removed. The second option is mostly a project of its own, and doesn't belong with OSDev, really. In both cases, you need to implement the DOS API (INT 0x21) functions.

Unless your goal is to run 16-bit binaries in 32-bit mode, this is mostly pointless and will never get anywhere.

Author:  Postmann [ Tue Aug 08, 2017 7:54 pm ]
Post subject:  Re: COM File, where is data/code located?

I guess, I will write an emulator for 16bit-code then. :?

Author:  mariuszp [ Tue Aug 08, 2017 8:13 pm ]
Post subject:  Re: COM File, where is data/code located?

zaval wrote:
mariuszp wrote:
Postmann wrote:
But how do I check, what's data and what's code?
My loader is converting the 16bit rm-code to 32bit pm-code, since I don't want to play with vm8086. But it can't distinguish code and data. :?

You can't tell. Code is just a special type of data. And COM files do not make the distinsction (in fact, they might even use data as code, etc). What you're trying to do is impossible in the general case.

really? what if I told you there are jump instructions? :D if one needs data coming before code in a non-structured file (like COM files), he/she (<- yes, I am for diversity by both hands) needs to insert a jump instruction at the beginning of data, which will jump over.

But that wasn't the question. The question was whether you can tell what is "data" and what is "code"; and I answered that unless there is ancilliary information, you can't tell. So you cannot write a program which, for the general case, translates 16-bit instructions into 32-bit, while skiping over data (it would also be a mess in general even if it succeeded, since it would have to adjust all offsets etc).

Author:  ~ [ Tue Aug 08, 2017 8:27 pm ]
Post subject:  Re: COM File, where is data/code located?

Wherever you want and wherever it works.

It's just a binary that needs to start with valid code, and you can do anything from there.

Author:  iansjack [ Tue Aug 08, 2017 11:51 pm ]
Post subject:  Re: COM File, where is data/code located?

Wouldn't it be easier just to compile the programs as 32-bit protected mode in the first place? Compiling them as real mode and converting as you load them sounds crazy

Author:  Postmann [ Tue Aug 08, 2017 11:58 pm ]
Post subject:  Re: COM File, where is data/code located?

I want to be able to execute 16bit DOS-programs though. I am going for the emulator now. :wink:

Author:  mallard [ Wed Aug 09, 2017 2:25 am ]
Post subject:  Re: COM File, where is data/code located?

omarrx024 wrote:
The second option is mostly a project of its own, and doesn't belong with OSDev, really.


If you're not averse to using existing, well-tested, third-party code in your OS (I'm not going to get into the debate about whether you should do this, but I have no issue with it; I see myself as the "architect" of my OS, not the "bricklayer"), the emulation option becomes by far the simplest and most compatible option. The "libx86emu" emulator is pretty trivial to port and has been in widespread use for some time (forming part of SciTech's graphics driver products and the driver layers of XFree86/Xorg; used extensively on non-x86 platforms to run video BIOS code). It's not exactly the most performant emulator ever, but there's not a lot of real-mode x86 code that actually needs to be run at full speed on a modern CPU.

Also, while it's more complex, it is possible to use V86 mode in a "mostly 64-bit" OS by switching to 32-bit pmode as an intermediate step. Unofficial patches exist to implement this on Linux. However, some recent CPUs have shipped with bugs in their V86 mode implementations, so I'd still recommend emulation as a more future-proof solution.

Page 1 of 3 All times are UTC - 6 hours
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group
http://www.phpbb.com/