Hi,
Korona wrote:
Brendan wrote:
The device does participate in the cache coherency protocol; or possibly more correctly, read and write requests that originated from devices are received by something (e.g. memory controller) and that something is responsible for ensuring coherency (e.g. by forcing "modified, write-back" to be written back for reads and writes, and also invalidation on writes).
That makes a lot of sense. Is this documented somewhere?
I can't think of anywhere that it's documented; but 80x86 has always been cache coherent and far too much would break if it wasn't.
Korona wrote:
I always thought PCI (non PCIe) transactions didn't participate in cache coherency but I don't remember where I got that info from. PCIe has a "no snoop" attribute that explicitly controls this behavior.
For PCI (from host bridge to end point device) there's no need to worry about cache coherency because it's handled elsewhere. For PCIe, the "no snoop" seems to only be used for isochronous transfers that have strict timing (latency) requirements, and isn't used for normal reads/writes for bus mastering or DMA, and I'd be tempted to assume it exists to prevent any unexpected additional latency caused by (e.g.) caches writing back data from caches to RAM in response to a snooped write.
Korona wrote:
Brendan wrote:
I'm assuming that CPU, RAM and VRAM are all "faster than PCI bus/link" and PCI bus/link is the bottleneck. For that assumption ordering of writes across the PCI bus/link either makes no difference or strict sequential order is better; and for both cases (uncached and WC) I'd expect sequential order almost always (close enough to "100% always" for any difference to be insignificant).
That is right. What I meant by "ordering" is that WC allows the CPU to post multiple writes to main memory (or in this case to the PCI bus) without waiting for a single write to complete (see SDM section 11.3). With UC it has to wait until main memory signals completion after every single write (it has to respect store-store ordering and there is no cache that takes care of it like in the WB case).
You can send "write requests" in program order without waiting for any kind of acknowledgement to come back - PCI doesn't reorder in transit requests and if anything goes wrong you get asynchronous notification (e.g. machine check exception, NMI) at some point after the write was considered completed.
Essentially, "waiting for write to complete" means "waiting for memory controller or northbridge or PCI host bridge to say the write request has been forwarded to PCI" and doesn't mean "waiting for an acknowledgement to come back from a device all the way on the other side of a PCI bus/link".
Note that (if you have access to PCI Express Base Specification) there's a list of memory transaction types (in "2.1.1.1. Memory Transactions"), which include things like "memory write request", "completion without data", "completion with data", etc. All of the "completions" are described as either being used for reads (where CPU is waiting for the data to be fetched) or for writes to IO ports or PCI configuration space; and there are no completions for writes to memory mapped IO (even for the "status other than successfully completion" case). This also helps to explain why memory mapped IO writes are faster than IO port writes (and PCI configuration space writes); and why writes to video memory are faster than reads from video memory. Basically; writes to memory mapped IO are mostly "fire and forget" (once they reach PCI).
Korona wrote:
In theory non-temporal writes should also allow to bypass store-store ordering but I do not know if non-temporal stores to UC memory actually work.
The CPU has a set of write-combining buffers. Writes using the WC type (as determined by MTRRs or PAT) get shoved into the write combining buffers (and bypass normal caches); and non-temporal stores get shoved into the write combining buffers (and bypass normal caches). If non-temporal stores didn't work for UC, then the WC type (as determined by MTRRs or PAT) wouldn't work for UC either.
Cheers,
Brendan