I've done it. I'd like to thank a few heroes of the decade prior for assisting me spiritually in this journey, such as XanClic, iocoder and prajwal. If you look at their threads, however, you'll see that they seem to had either given up on their tasks and just wrote an EHCI driver, or had erratically been evolving their code until something worked. The latter was my method of choice, too.
And now, I no longer shall be mad when I come across an old post, where the solution was found yet not stated, because I understand why that is now - none of us knew what the hell we were doing. And, similarly, I have no idea what I had changed to have made it work. What I do know is that the line status bits were all zero before. When the D+ bit came on for me, I had wasted time thinking it was a new error, instead of a sign of progress. All of it works now. But, I am willing to share the entire relevant code with thorough comments, for whomever shall be reading this in the 2030s. It's not ideal, and it even sucks, but you know damn well I won't be touching this for as long as I can.
As for how the line status bit suddenly started being set, I think it's to do with me switching to a much more robust port enabling method. IOW, a simple "reset, wait, enable wait" sequence just isn't enough. So, ultimately, my conclusion is that I wasn't enabling the ports properly.
Code:
/* Effectively disables all EHCIs, and initializes UHCIs */
int inithc() {
while(1) {
hcId = ClaimPCIDevice(0x0C032000); /* Class codes of EHCI */
if(hcId == -1) { /* No longer finding new EHCIs */
break;
}
/* The following variables are volatile, to make sure accesses are strictly 32-bit and in-order. */
/* Get MMIO address from BAR0 */
volatile uint32_t *usbbase = MemoryMap((PciRead32(hcId, 16) & ~15), 4096);
/* First byte of the MMIO is the offset to the operational registers */
volatile uint32_t *opregs = (uint32_t*) ((uintptr_t) usbbase + (usbbase[0] & 0xFF));
/* Extended capabilities are stored in PCI config space, but the offset is read from MMIO */
uint8_t caps = (usbbase[2] & 0xFF00) >> 8;
if(caps < 0x40) {
/* Capabilities cannot exist below 0x40, so */
/* give up, but it might be doable more gracefully */
return 0;
} else {
/* Although capabilities were designed to be extensible, and thus dynamically ordered in memory, EHCI had only defined one at the beginning, and it's the one we need */
/* You might want to have a timeout system here */
/* Request BIOS abdication by writing OS ownership bit */
PciWrite8(hcId, caps + 3, 1);
/* Wait until the BIOS ownership bit resets */
while((PciRead32(hcId, caps) & (1 << 16)) != 0);
/* Wait until the OS ownership bit actually sets */
while((PciRead32(hcId, caps) & (1 << 24)) == 0);
/* Disable legacy support (USBLEGCTLSTS in spec) */
PciWrite32(hcId, caps + 4, 0);
}
/* Stop EHCI. Necessary to reset it */
opregs[0] &= ~1;
/* Poll the halted bit to wait until it truly stops */
do {
/* This is a memory barrier, to slap the wrists of overly smart compilers */
asm volatile(: : : "memory");
} while((opregs[1] & 4096) == 0);
/* Finally reset it */
opregs[0] |= 2;
do {
asm volatile(: : : "memory");
} while(opregs[0] & 2);
/* A lot of these are default values, but they're set again as a sanity check */
opregs[1] = opregs[1]; /* Clears WC bits */
opregs[2] = 0;
opregs[4] = 0;
opregs[0] = 0x80000;
opregs[16] = 0;
MemoryUnmap(usbbase, 4096);
}
while(1) {
hcId = ClaimPCIDevice(0x0C030000); /* Class codes of UHCI */
if(hcId == -1) {
break;
}
/* Get IOIO address */
hcIo = PciRead32(hcId, 0x20) & ~3;
/* Disable legacy support */
PciWrite16(hcId, 0xC0, 0x8F00);
/* Reset it */
outw(hcIo + 0, 2);
while(inw(hcIo + 2) & 2);
/* Disable interrupts */
outw(hcIo + 4, 0);
/* Self-explanatory */
outd(hcIo + 8, (uint32_t) Virt2Phys(frameList));
outb(hcIo + 12, 64);
outw(hcIo + 2, 0xFFFF);
outw(hcIo + 0, 1);
/* UHCIs can technically have more than two ports, but that's not my problem */
for(int p = 0; p < 2; p++) {
/* Reset it */
outw(hcIo + 16 + p+p, 512);
SleepMS(50);
/* You must stop resetting manually */
outw(hcIo + 16 + p+p, inw(hcIo + 16 + p+p) & ~512);
for(int try = 0; try < 10; try++) {
SleepMS(10);
uint16_t sc = inw(hcIo + 16 + p+p);
/* If no device connected, no point in continuing */
if((sc & 1) == 0) break;
/* If WC bits are set, clear them */
if(sc & (2 | 8)) outw(hcIo + 16 + p+p, sc);
/* */
if(sc & 4) break;
outw(hcIo + 16 + p+p, sc | 4);
}
if(inw(hcIo + 16 + p+p) & 1) {
/* It's here that you should be able to transfer packets finally */
}
/* Only for testing, disable the port */
outw(hcIo + 16 + p+p, 0);
}
/* Only for testing, disable the HC */
outw(hcIo + 0, 0);
}
return 1;
}
The above is public domain, or CC0. If you'll be using the above, make sure to do it
exactly as is, and only later try making it cleaner. Whatever you may think is redundant, put it in anyway. You must not underestimate the prissiness of what's involved. Neither you and I are meant to comprehend it.
Thank you, Ben, for the help, too. This 6-month saga can finally be put to rest.