One thing I will recommend is considering a
Microkernel design¹ for this use. Why? Because it is exactly the scenario for which micro-kernels were originally developed - the primary goal of a micro is not to keep the kernel small, but to provide a layer of separation between the drivers and the kernel proper, so the lack of memory protection isn't as severe a problem.
(I am aware that memory mapping and memory protection are separate issues, but in most systems the two functions are handled by the MMU, and are tightly coupled in their operations. This has been the case for the ARM designs I have studied in the past, IIRC.)
I realize that a micro-kernel may not suit your intended design, but I would still recommend taking a look at the model and see if anything about it is useful to you.
Now, I don't know the CPU in question, but I had the impression that all ARM designs have, at the very least, a high-low separation for the kernel. I am not certain how this is done without an MMU, other than possibly having two separate memory spaces, with a portion of the memory wired directly to the region about 2GB. I am pretty sure this isn't the case, so I can only guess I am mistaken about ARM always having a Supervisor mode separation.
Given that, I am assuming that in this instance, the microcontroller is always running in the equivalent of Supervisor mode, across the whole memory space. I'd have to read up on the specific model to say more.
Footnote
A key part of this - indeed, IMAO the defining property of a micro-kernel - is the replacement of specific system calls with a small set of message-passing system calls (usually just send_msg() and receive_msg() for synchronous messages, and/or mail_msg() and check_mail() for asynchronous) with all other 'system calls' being library wrappers around pre-set messages. In a 'pure' micro, these are the only mechanism for both communicating with the kernel and synchronizing between processes.
This message passing - which at once both narrows the OS interface while at the same time replacing ad-hoc system calls with a general mechanism - is the real means by which the kernel is protected from hanging, and how the risk of addressing errors (as opposed to malicious scribbling over system memory, which there isn't much one can do to prevent without memory protection) is mitigated.
My own opinion is that any system with more systems calls than those four isn't a 'pure' microkernel. However, very few if any modern micro-kernels fit this definition (purity not necessarily being desirable).