OSDev.org

The Place to Start for Operating System Developers
It is currently Thu Mar 28, 2024 12:51 pm

All times are UTC - 6 hours




Post new topic Reply to topic  [ 9 posts ] 
Author Message
 Post subject: Machine check exception on LAPIC read when using PAT
PostPosted: Thu Nov 21, 2019 9:32 am 
Offline
Member
Member
User avatar

Joined: Sun Apr 30, 2017 12:16 pm
Posts: 68
Location: Poland
Hi!

I recently added PAT support into my kernel, and all is well and now framebuffer scrolling is nice and fast.
Except, as the title says, when running on real hardware (with an Intel i5-3210m CPU, also happens on an i5-3320m, does not happen on my Athlon 64 X2 PC), I receive a machine check exception.
When reading the MCE status MSRs, MCE_STATUS says an MCE is in progress, and that restart IP is valid, bank 1 MCi_STATUS says that it's valid and that MCi_ADDR is valid, and bank 1 MCi_ADDR points to the LAPIC (although it points to a register I never access, either directly or indirectly).

I can't see why it fails now, or if it's a problem unrelated to PAT, why it didn't fail before.

Thanks in advance for any help!

Relevant code:

Code that maps the LAPIC:
Code:
arch_mm_map_kernel((void *)lapic_base, (void *)lapic_base, 4, // 4 pages
   ARCH_MM_FLAG_R | ARCH_MM_FLAG_W, ARCH_MM_CACHE_UC);


Defines for ARCH_MM_xxx:
Code:
#define ARCH_MM_FLAG_R      0x01 /* page is readable */
#define ARCH_MM_FLAG_W      0x02 /* page is writable */
#define ARCH_MM_FLAG_E      0x04 /* page is executable */
#define ARCH_MM_FLAG_U      0x08 /* page is accessible in user mode */

#define ARCH_MM_CACHE_WB   0 /* write-back */
#define ARCH_MM_CACHE_WT   1 /* write-through */
#define ARCH_MM_CACHE_WC   2 /* write-combining */
#define ARCH_MM_CACHE_WP   3 /* write-protect */
#define ARCH_MM_CACHE_UC   4 /* uncacheable */
#define ARCH_MM_CACHE_DEFAULT ARCH_MM_CACHE_WB


Relevant bits of VMM code:
Code:
// PAT setup
cpu_set_msr(0x277, 0x0000000005010406);

// flags
#define VMM_FLAG_WRITE      (1<<1)
#define VMM_FLAG_USER      (1<<2)
#define VMM_FLAG_PAT0      (1<<3)
#define VMM_FLAG_PAT1      (1<<4)
#define VMM_FLAG_PAT2      (1<<4)
#define VMM_FLAG_NX      (1ull<<63)

// these flags only apply to the lowest-level table entries
// higher tables (PML4, PDP, PD) only get (arch_flags & (WRITE | USER))
// the present bit is set automatically for each entry
int vmm_arch_to_vmm_flags(int flags, int cache) {
   int arch_flags = 0;

   if (flags & ARCH_MM_FLAG_W) arch_flags |= VMM_FLAG_WRITE;
   if (flags & ARCH_MM_FLAG_U) arch_flags |= VMM_FLAG_USER;
   if (!(flags & ARCH_MM_FLAG_E)) arch_flags |= VMM_FLAG_NX;

   if (cache & (1 << 0)) arch_flags |= VMM_FLAG_PAT0;
   if (cache & (1 << 1)) arch_flags |= VMM_FLAG_PAT1;
   if (cache & (1 << 2)) arch_flags |= VMM_FLAG_PAT2;

   return arch_flags;
}

_________________
Working on managarm.


Top
 Profile  
 
 Post subject: Re: Machine check exception on LAPIC read when using PAT
PostPosted: Thu Nov 21, 2019 12:21 pm 
Online
Member
Member

Joined: Mon Mar 25, 2013 7:01 pm
Posts: 5100
qookie wrote:
When reading the MCE status MSRs, MCE_STATUS says an MCE is in progress, and that restart IP is valid, bank 1 MCi_STATUS says that it's valid and that MCi_ADDR is valid, and bank 1 MCi_ADDR points to the LAPIC (although it points to a register I never access, either directly or indirectly).

What are the actual values of these registers? There's more information in them that might help us identify the cause (or at least narrow down the possibilities).


Top
 Profile  
 
 Post subject: Re: Machine check exception on LAPIC read when using PAT
PostPosted: Thu Nov 21, 2019 1:02 pm 
Offline
Member
Member
User avatar

Joined: Sun Apr 30, 2017 12:16 pm
Posts: 68
Location: Poland
Octocontrabass wrote:
What are the actual values of these registers? There's more information in them that might help us identify the cause (or at least narrow down the possibilities).


The values are as follows
Code:
MCG_CAP = 0x0000000000000C07
MCG_STATUS = 0x0000000000000005

MCi_STATUS (for bank 1) = 0xBF80000000200001
MCi_ADDR (for bank 1) = 0x00000000FEE00340
MCi_MISC (for bank 1) = 0x0000000000000086

MCi_STATUS (for banks 0, 2, 3, 4) = 0x0000000000000000
MCi_STATUS (for banks 5, 6) = 0x0020000000000000

_________________
Working on managarm.


Top
 Profile  
 
 Post subject: Re: Machine check exception on LAPIC read when using PAT
PostPosted: Fri Nov 22, 2019 5:00 am 
Offline
Member
Member

Joined: Fri Apr 04, 2008 6:43 am
Posts: 357
When core is accessing Local APIC, it first checks the memory access for consistency. If the access is found invalid, #MC is raised.
The things that are checked :
- The access must be UC memory type
- The access must be 1,2 or 4 bytes
- The access must be aligned to its data size

I would guess that your APIC page in mapped to WB memory type and this is causing machine check.
Especially if the problem started when you introduce PAT.

Quote:
although it points to a register I never access, either directly or indirectly


If APIC page is not mapped to UC it might be even access speculatively on wrong path and this will certainly cause #MC


Last edited by stlw on Fri Nov 22, 2019 5:06 am, edited 1 time in total.

Top
 Profile  
 
 Post subject: Re: Machine check exception on LAPIC read when using PAT
PostPosted: Fri Nov 22, 2019 5:03 am 
Offline
Member
Member

Joined: Fri Apr 04, 2008 6:43 am
Posts: 357
I guess I see your problem:

Code:
#define ARCH_MM_CACHE_WB   0 /* write-back */
#define ARCH_MM_CACHE_WT   1 /* write-through */
#define ARCH_MM_CACHE_WC   2 /* write-combining */
#define ARCH_MM_CACHE_WP   3 /* write-protect */
#define ARCH_MM_CACHE_UC   4 /* uncacheable */


while in real life:

Code:
enum {
  BX_MEMTYPE_UC = 0,
  BX_MEMTYPE_WC = 1,
  BX_MEMTYPE_RESERVED2 = 2,
  BX_MEMTYPE_RESERVED3 = 3,
  BX_MEMTYPE_WT = 4,
  BX_MEMTYPE_WP = 5,
  BX_MEMTYPE_WB = 6,
  BX_MEMTYPE_UC_WEAK = 7, // PAT only
};


With memory type == 4 you map your APIC page to write through and this cause #MC


Top
 Profile  
 
 Post subject: Re: Machine check exception on LAPIC read when using PAT
PostPosted: Fri Nov 22, 2019 7:48 am 
Offline
Member
Member
User avatar

Joined: Sun Apr 30, 2017 12:16 pm
Posts: 68
Location: Poland
stlw wrote:
I guess I see your problem:

Code:
#define ARCH_MM_CACHE_WB   0 /* write-back */
#define ARCH_MM_CACHE_WT   1 /* write-through */
#define ARCH_MM_CACHE_WC   2 /* write-combining */
#define ARCH_MM_CACHE_WP   3 /* write-protect */
#define ARCH_MM_CACHE_UC   4 /* uncacheable */


while in real life:

Code:
enum {
  BX_MEMTYPE_UC = 0,
  BX_MEMTYPE_WC = 1,
  BX_MEMTYPE_RESERVED2 = 2,
  BX_MEMTYPE_RESERVED3 = 3,
  BX_MEMTYPE_WT = 4,
  BX_MEMTYPE_WP = 5,
  BX_MEMTYPE_WB = 6,
  BX_MEMTYPE_UC_WEAK = 7, // PAT only
};


With memory type == 4 you map your APIC page to write through and this cause #MC


I might be mistaken, but I reprogram the PAT with 0x0000000005010406 (apologies if this was hard to see in the VMM snippets!), which should match my caching mode constants. I haven't noticed anything in Intel or AMD manuals about reserved entries either.

_________________
Working on managarm.


Top
 Profile  
 
 Post subject: Re: Machine check exception on LAPIC read when using PAT
PostPosted: Sat Nov 23, 2019 3:01 am 
Online
Member
Member

Joined: Mon Mar 25, 2013 7:01 pm
Posts: 5100
qookie wrote:
Code:
MCi_ADDR (for bank 1) = 0x00000000FEE00340
MCi_MISC (for bank 1) = 0x0000000000000086

Going by this, the fault address could be anywhere from 0xFEE00340 to 0xFEE0037F (physical). Unfortunately I couldn't figure out how to interpret the other error registers; Intel doesn't seem to have it documented for these CPUs. If it's caused by a misplaced read or write in your code, you should be able to catch it by setting a breakpoint using the debug registers.


Top
 Profile  
 
 Post subject: Re: Machine check exception on LAPIC read when using PAT
PostPosted: Sat Nov 23, 2019 6:53 am 
Offline
Member
Member

Joined: Fri Apr 04, 2008 6:43 am
Posts: 357
As a self check you may try to define MTRR which overlaps with APIC physical address and have UC memory type.
The MTRR of UC memory type will take over PAT memory type regardless what you have in it.
If your problem goes away -> your APIC page is configured to other than UC memory type.


Top
 Profile  
 
 Post subject: Re: Machine check exception on LAPIC read when using PAT
PostPosted: Thu Jan 16, 2020 6:07 pm 
Offline
Member
Member
User avatar

Joined: Sun Apr 30, 2017 12:16 pm
Posts: 68
Location: Poland
Apologies for not getting back to you all earlier.

It seems I am very stupid, and as such made a very stupid mistake. My page table bit defines were wrong. :oops:

the PAT bit was defined as such:
Code:
#define VMM_FLAG_PAT2      (1<<4)

while it should've been
Code:
#define VMM_FLAG_PAT2      (1<<7)

You can even see this mistake in the original post.

Thanks for all the help. I apologise for not noticing the issue earlier.

_________________
Working on managarm.


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 9 posts ] 

All times are UTC - 6 hours


Who is online

Users browsing this forum: Bing [Bot] and 52 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group