JOS的内存管理
JOS是MIT6.828课程所需要实现的一个操作系统。本篇博客为该系列博客的第一篇,后续会采用相同标题发布同一系列。 注意,以下材料均为描述性材料,需要自己结合例子(也就是开发操作系统中的内存管理部分)来看。
虚拟内存布局
/*
* Virtual memory map: Permissions
* kernel/user
*
* 4 Gig --------> +------------------------------+
* | | RW/--
* ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
* : . :
* : . :
* : . :
* |~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~| RW/--
* | | RW/--
* | Remapped Physical Memory | RW/--
* | | RW/--
* KERNBASE, ----> +------------------------------+ 0xf0000000 --+
* KSTACKTOP | CPU0's Kernel Stack | RW/-- KSTKSIZE |
* | - - - - - - - - - - - - - - -| |
* | Invalid Memory (*) | --/-- KSTKGAP |
* +------------------------------+ |
* | CPU1's Kernel Stack | RW/-- KSTKSIZE |
* | - - - - - - - - - - - - - - -| PTSIZE
* | Invalid Memory (*) | --/-- KSTKGAP |
* +------------------------------+ |
* : . : |
* : . : |
* MMIOLIM ------> +------------------------------+ 0xefc00000 --+
* | Memory-mapped I/O | RW/-- PTSIZE
* ULIM, MMIOBASE --> +------------------------------+ 0xef800000
* | Cur. Page Table (User R-) | R-/R- PTSIZE
* UVPT ----> +------------------------------+ 0xef400000
* | RO PAGES | R-/R- PTSIZE
* UPAGES ----> +------------------------------+ 0xef000000
* | RO ENVS | R-/R- PTSIZE
* UTOP,UENVS ------> +------------------------------+ 0xeec00000
* UXSTACKTOP -/ | User Exception Stack | RW/RW PGSIZE
* +------------------------------+ 0xeebff000
* | Empty Memory (*) | --/-- PGSIZE
* USTACKTOP ---> +------------------------------+ 0xeebfe000
* | Normal User Stack | RW/RW PGSIZE
* +------------------------------+ 0xeebfd000
* | |
* | |
* ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
* . .
* . .
* . .
* |~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~|
* | Program Data & Heap |
* UTEXT --------> +------------------------------+ 0x00800000
* PFTEMP -------> | Empty Memory (*) | PTSIZE
* | |
* UTEMP --------> +------------------------------+ 0x00400000 --+
* | Empty Memory (*) | |
* | - - - - - - - - - - - - - - -| |
* | User STAB Data (optional) | PTSIZE
* USTABDATA ----> +------------------------------+ 0x00200000 |
* | Empty Memory (*) | |
* 0 ------------> +------------------------------+ --+
*
* (*) Note: The kernel ensures that "Invalid Memory" is *never* mapped.
* "Empty Memory" is normally unmapped, but user programs may map pages
* there if desired. JOS user programs map pages temporarily at UTEMP.
*/
x86 硬件内存管理
#d x86内存翻译机制
x86有两层内存翻译,第一层通过GDT表来翻译内存,也就是下图的Segmentation Mechanism,通过段寄存器访问GDT表,然后访问内存。第二层通过page dir和page table这两张表来翻译内存,也就是Paging Mechanism, 实现内存管理也就是本章节需要实现的功能。
#c 说明 未使用第一层
由于JOS未使用第一层机制,因此此处略过,也就是说Virtual和Linear相同。因此,均使用Virtual来指代。
#d Paging Mechanism
从下图可以看到,首先通过Virtual address的Dir字段用Page Directory查找对应的Page Table。然后在Page Table中使用Virtual address的Table字段查找对应的Page。一个Page Directory大小为4MiB,一个页为4KiB。
#d 应用过程
通过加载Page Directory表的开始地址到CR3寄存器,并且通过CR0寄存器开启paging,此时指令所有的内存访问都要通过pd和pt这两张表。
以下是Paging开启的汇编代码。
# load page directory
movl $(RELOC(entry_pgdir)), %eax
movl %eax, %cr3
# turn on paging
movl %cr0, %eax
orl $(CR0_PE|CR0_PG|CR0_WP), %eax
movl %eax, %cr0
Structures
Phy Mem
Total: 131 072KiB, base: 640KiB, extended: 130 432KiB 和pages相对应
---------------- <- Max Phy Mem
PAGE
----------------
--------------------------------------------------------------- -> 0xF040 0000 (4M)
----------------
PAGE
----------------
...
----------------
PAGE
----------------
PageInfo array 192K
----------------
page dir 4K
---------------- <- roundup to PGSIZE
---------------- end <- 0x0011 66c0 (variable according kernel size)
kernel data
(bootstack)
----------------
kernel text
---------------- <- 0x0010 0000
1MiB size
---------------- <- 0x0000 0000 ---------------- -> 0xF000 0000 (0M)
pages
#c 说明 所处时刻
和物理内存(Phy Mem)对应,此时mem_init函数还没有调用boot_map_region,且刚刚完成调用page_init.
[Phy Mem Max]
...
...
[end+n] -> pages
... -> pages
[end+2] -> pages
[end+1] -> pgdir
[end](round to PGSIZE) -> kernel img end
...
[16] -> kernel img start
[15] --------------------------------------------------------------- base
...
[4]
[3]
[2]
[1]
[0] -> IDT and BIOS structures
pde_t *pgdir
这个表中的一项代表4MiB, 总共1024项,4GiB
Entry | Base Virtual Address | Points to (logically) | Permission |
---|---|---|---|
… | |||
960 | 0xf0000000 | Phy 0 - 4MiB | P |
… | |||
0 | 0x00000000 | Phy 0 - 4MiB | P W |
上表表示bootloader刚加载kernel时,Page Directory表所对应的。
Entry | Base Virtual Address | Points to (logically) | Permission |
---|---|---|---|
1023 | Phy 0 - (4GiB/16) | W P | |
… | Phy 0 - (4GiB/16) | W P | |
960 | Phy 0 - (4GiB/16) | W P | |
959.1 | bootstack | W P | |
959.2 | |||
958 | |||
957 | 0xef400000 | kern_pgdir UVPT | U P |
956 | 0xef000000 | kern_pgdir | U P |
… | |||
… | |||
63 | 0x0fc00000 | ||
… | |||
0 | 0x00000000 |
此表是mem_init后,Page Directory表中的项。
函数功能
mem_init
此函数通过
i386_init
函数调用,i386_init
通过kern/entry.S中的汇编调用。也就是内核被加载之后,然后调用。
#d 功能
首先调用i386_detect_memory
,然后通过i386_detect_memory
来初始化各种表示内存容量的变量。然后通过boot_alloc
来更改表示当前空闲位置的变量(nextfree, 会ROUNDUP),分配了kern_pgdir和pages数组。再通过page_init
来初始化pages,也就是PageInfo结构体数组。此时page_free_list为表示空闲页的链表。pages三级标题下的内容。然后通过boot_map_region
函数将虚拟地址映射到对应的pde和pte上,此处如果不存在对应的pte会创建新的表,也就是分配一页(page_alloc
)。以下内容为具体代码。
void mem_init(void)
{
uint32_t cr0;
size_t n;
// Find out how much memory the machine has (npages & npages_basemem).
i386_detect_memory();
//////////////////////////////////////////////////////////////////////
// create initial page directory.
kern_pgdir = (pde_t *) boot_alloc(PGSIZE);
memset(kern_pgdir, 0, PGSIZE);
//////////////////////////////////////////////////////////////////////
// Recursively insert PD in itself as a page table, to form
// a virtual page table at virtual address UVPT.
// (For now, you don't have understand the greater purpose of the
// following line.)
// Permissions: kernel R, user R
kern_pgdir[PDX(UVPT)] = PADDR(kern_pgdir) | PTE_U | PTE_P;
//////////////////////////////////////////////////////////////////////
// Allocate an array of npages 'struct PageInfo's and store it in 'pages'.
// The kernel uses this array to keep track of physical pages: for
// each physical page, there is a corresponding struct PageInfo in this
// array. 'npages' is the number of physical pages in memory. Use memset
// to initialize all fields of each struct PageInfo to 0.
// Your code goes here:
pages = (struct PageInfo *) boot_alloc(sizeof(struct PageInfo) * npages);
memset(pages, 0, sizeof(struct PageInfo) * npages);
//////////////////////////////////////////////////////////////////////
// Now that we've allocated the initial kernel data structures, we set
// up the list of free physical pages. Once we've done so, all further
// memory management will go through the page_* functions. In
// particular, we can now map memory using boot_map_region
// or page_insert
page_init();
check_page_free_list(1);
check_page_alloc();
check_page();
//////////////////////////////////////////////////////////////////////
// Now we set up virtual memory
//////////////////////////////////////////////////////////////////////
// Map 'pages' read-only by the user at linear address UPAGES
// Permissions:
// - the new image at UPAGES -- kernel R, user R
// (ie. perm = PTE_U | PTE_P)
// - pages itself -- kernel RW, user NONE
// Your code goes here:
boot_map_region(kern_pgdir, (uintptr_t) UPAGES, sizeof(struct PageInfo) * npages, \
PADDR(pages), PTE_U | PTE_P);
//////////////////////////////////////////////////////////////////////
// Use the physical memory that 'bootstack' refers to as the kernel
// stack. The kernel stack grows down from virtual address KSTACKTOP.
// We consider the entire range from [KSTACKTOP-PTSIZE, KSTACKTOP)
// to be the kernel stack, but break this into two pieces:
// * [KSTACKTOP-KSTKSIZE, KSTACKTOP) -- backed by physical memory
// * [KSTACKTOP-PTSIZE, KSTACKTOP-KSTKSIZE) -- not backed; so if
// the kernel overflows its stack, it will fault rather than
// overwrite memory. Known as a "guard page".
// Permissions: kernel RW, user NONE
// Your code goes here:
boot_map_region(kern_pgdir, (uintptr_t) (KSTACKTOP-KSTKSIZE), KSTKSIZE, \
PADDR(bootstack), PTE_P | PTE_W);
//////////////////////////////////////////////////////////////////////
// Map all of physical memory at KERNBASE.
// Ie. the VA range [KERNBASE, 2^32) should map to
// the PA range [0, 2^32 - KERNBASE)
// We might not have 2^32 - KERNBASE bytes of physical memory, but
// we just set up the mapping anyway.
// Permissions: kernel RW, user NONE
// Your code goes here:
boot_map_region(kern_pgdir, (uintptr_t) (KERNBASE), (0x100000000 - KERNBASE), \
0x0, PTE_P | PTE_W);
// Check that the initial page directory has been set up correctly.
check_kern_pgdir();
// Switch from the minimal entry page directory to the full kern_pgdir
// page table we just created. Our instruction pointer should be
// somewhere between KERNBASE and KERNBASE+4MB right now, which is
// mapped the same way by both page tables.
//
// If the machine reboots at this point, you've probably set up your
// kern_pgdir wrong.
lcr3(PADDR(kern_pgdir));
check_page_free_list(0);
// entry.S set the really important flags in cr0 (including enabling
// paging). Here we configure the rest of the flags that we care about.
cr0 = rcr0();
cr0 |= CR0_PE|CR0_PG|CR0_AM|CR0_WP|CR0_NE|CR0_MP;
cr0 &= ~(CR0_TS|CR0_EM);
lcr0(cr0);
// Some more checks, only possible after kern_pgdir is installed.
check_page_installed_pgdir();
}
page_init
#d 功能
此函数通过mem_init
调用。根据目前物理内存所占用情况,对pages结构体和page_free_list进行更新。具体间pages部分。
boot_map_region
#d 功能
被mem_init
调用,仅仅用于启动。仅仅修改pde和pte表,不修改pages结构体。
page_alloc
#d 功能
真正的内存分配器。减少page_free_list, 更新pages结构体的pp_link为NULL,但是不增加pp_ref(通过page_insert
或者显式增加)。
page_free
#d 功能
被page_decref
调用。将pp_ref和pp_link都为空的page归还给page_free_list。
page_decref
#d 功能
调用page_free
。减少pp_ref,如果为0, 调用page_free
.
pgdir_walk
#d 功能
查看一个va是否在pde和pte表里,如果在,返回对应的pte项地址,如果不再,创建一个pte表,返回对应的pte项地址。
page_insert
#d 功能
映射地址,将一个物理页映射到对应的pte表中,并增加pp_ref. pp_link通过page_alloc
变为NULL.
page_lookup
#d 功能
会被page_remove
调用。在pde和pte表中查找对应va的的页,然后返回.
page_remove
#d 功能
调用page_lookup
,如果查找到,就调用page_decref
. 同时清理对应位置的pte项。
tlb_invalidate
#d 功能
刷新TLB(即Page Directory和Page Table)表