[Sorry for giving some details here that doesn't really belong in this thread, but it will allow others to find it through Google and it may even help you. I need to document this on the wiki.]
[Edit: These details are now documented on the wiki under http://wiki.osdev.org/Creating_a_C_Library#Program_Initialization
crtbegin.o and crtend.o are provided by the compiler (may need to configure GCC to enable them). Note that you cannot provide a real implementation of these object files yourself. The purpose is to allow running global constructors/destructors (C++, or using the constructor function attribute) during process initialization and termination. GCC does this by maintaining tables of constructors and destructors somewhere internally, in a manner that doesn't allow anyone but GCC to link to them. Instead, GCC adds code that uses these tables in the .init and .fini sections of crtbegin.o and crtend.o. However, this code is just added as call instructions, and doesn't reside in a function. The missing piece is provided by the standard library, through the object files crti.o and crtn.o. The key is that through linker magic GCC makes sure to link files in this order: crti.o, crtbegin.o, your-program.o, crtend.o, crtn.o (I forgot, maybe I got crtend.o and crtbegin.o swapped). The idea is that crtbegin.o and crtend.o provide the bodies of the constructor/desctructor functions _init and _fini, but not the symbol itself nor the return instruction.
Hence an crti.s implementation will simply be (x86_64):
movq %rsp, %rbp
/* gcc will nicely put the contents of crtbegin.o's .init section here. */
movq %rsp, %rbp
/* gcc will nicely put the contents of crtbegin.o's .fini section here. */
and a simple implementation of crtn.s will be (x86_64):
/* gcc will nicely put the contents of crtend.o's .init section here. */
/* gcc will nicely put the contents of crtend.o's .fini section here. */
Then your crt0.s implementation can simply be something like:
call exit # which calls _fini before really exiting
Now that we understand what crt0.o, crti.o, crtbegin.o, crtend.o, and crtn.o does, we can put the pieces together. If you don't need global constructor functions, you can simply 1) don't call _init and _fini, as you don't have those then 2) make empty crti.o and crtn.o if your cross-compiler links them in - if using newlib you should really set this up, but your miles may vary 3) use the crtbegin.o and crtend.o as provided by your cross-compiler and simply disregard them.
You can change details of how these files are linked in by modifying your OS-specific toolchain in the gcc/config directory. You can search for STARTFILE_SPEC for examples, or look at gcc/config/gnu-user.h that does what I discussed here. Note that shared libraries use a similar method, but different. I need to look into that. Normally, the compiler will provide crtbegin.o and crtend.o, and you provide crti.o, crtn.o and, crt0.o yourself. (Again newlib needs to be configured or something. I use my own libc and did the above.)
Note that some third party software relies on global constructors, even if written in C, by using the GCC constructor function attribute. For instance, binutils uses this to set up some variables, leading a crash if the constructor was never called. If you don't call _init and _fini, then such software will build, but it may fail mysteriously at run-time.
Hopefully this is of use and otherwise serves as documentation. Feel free to ask me if you need more details on how this works.
Oh and --without-headers? I use a neat trick where I install the headers manually before building the cross-compiler (I made a make target that installs them without needing a compiler). Then I can directly build the real cross-compiler and save the time needed to bootstrap.