While you can express any logic as machine code on pretty much any architecture, there are some natural limiting factors when implementing something not so trivial or small:
- maximum code size
- maximum data size
- maximum stack size
- addressability of code as data and vice versa
- whether you can have separate address spaces or must swap in and out programs in their entirety
- CPU speed
- memory protection (segmentation, paging, etc)
For example, on the PIC32MX CPU (MIPS32) there's only 128KB of RAM (for code, data/stack) and quite a bit of Flash memory (512KB AFAIR). The CPU runs at 80MHz. There's no page translation, no address spaces, and the only form of protection is a boundary between the kernel and the user memory. The RetroBSD OS (a port of BSD 2.11) on it takes 32KB of RAM for the kernel data structures and all of the Flash for the kernel (there's some left unused). The FS lives on an SD card.
Running a more or less decent C compiler on RetroBSD is a challenge. You can't ran any fat executables nor any memory-greedy ones. So, no pcc, no lcc, no Belard's TinyCC, forget about gcc or clang. While you could fake larger RAM by using the SD card and an MIPS CPU emulator, you'd slash your 80MHz by a factor of 50+. I've tried it. It's not much fun to run a compiler on a 1MHz computer. And Small-C is too much of a toy. So, right now the compiler is split into multiple stages to fit into the RAM: driver, preprocessor, compiler proper, assembler, linker. When the driver runs any of its subordinates, it gets swapped out to free all of the 96KB of user RAM for the new process.
The compiler proper (my Smaller C with a MIPS code generator) eats nearly all of the available 96KB. In order to fit more language and more declarations from the code being compiled into that space I had to do things like:
- use the simplest algorithms and the most compact data structures, favoring smaller memory footprint
- use static memory allocation (with a tiny bit of flexibility via recursion)
- throw out all of the standard C library (use custom fprintf(), fopen() and such, often as tiny wrappers around system calls)
- rework an internal data structure to minimize the memory occupied by declarations (I changed an array whose element was a pair of ints to two arrays, one of chars and the other of ints (I couldn't just go from an 8-byte to a 5-byte element because of alignment restrictions))
- compile the compiler using shorter (but limited) instructions of the MIPS16e ISA (somewhat like Thumb on ARM), which slows compilation a bit
In order to do more, either more stages are needed or temporary files need to be used instead of in-memory buffers. Needless to say, either necessitates quite a bit of work to separate the code and/or convert pointer arithmetic into calls to fwrite()/fseek()/fread(). It's not fun doing this in your own program, it's much less fun doing it in something you haven't written and don't have good understanding of (pcc, lcc, etc).
In theory, some of the compiler code could go into the Flash, but in practice it would mean updating the kernel with every compiler update or implementing an FS in a portion of the Flash.
Things will get worse if you get a more primitive CPU (e.g. the i8051), slower, with even less RAM.
And all of this assumes there actually is a file system with enough of space on it!
Network and "multimedia" may have their own hardware issues, imposing further limitations on software, on what can be done (if at all), in what order, etc.
IOW, while cross-compiling itself is not a problem, dealing with platform limitations and oddities is as you may need to restructure your code in order to work on a specific device satisfactorily.