../主页

JOS的内存管理

JOS是MIT6.828课程所需要实现的一个操作系统。本篇博客为该系列博客的第一篇,后续会采用相同标题发布同一系列。 注意,以下材料均为描述性材料,需要自己结合例子(也就是开发操作系统中的内存管理部分)来看。

虚拟内存布局

/*
 * Virtual memory map:                                Permissions
 *                                                    kernel/user
 *
 *    4 Gig -------->  +------------------------------+
 *                     |                              | RW/--
 *                     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 *                     :              .               :
 *                     :              .               :
 *                     :              .               :
 *                     |~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~| RW/--
 *                     |                              | RW/--
 *                     |   Remapped Physical Memory   | RW/--
 *                     |                              | RW/--
 *    KERNBASE, ---->  +------------------------------+ 0xf0000000      --+
 *    KSTACKTOP        |     CPU0's Kernel Stack      | RW/--  KSTKSIZE   |
 *                     | - - - - - - - - - - - - - - -|                   |
 *                     |      Invalid Memory (*)      | --/--  KSTKGAP    |
 *                     +------------------------------+                   |
 *                     |     CPU1's Kernel Stack      | RW/--  KSTKSIZE   |
 *                     | - - - - - - - - - - - - - - -|                 PTSIZE
 *                     |      Invalid Memory (*)      | --/--  KSTKGAP    |
 *                     +------------------------------+                   |
 *                     :              .               :                   |
 *                     :              .               :                   |
 *    MMIOLIM ------>  +------------------------------+ 0xefc00000      --+
 *                     |       Memory-mapped I/O      | RW/--  PTSIZE
 * ULIM, MMIOBASE -->  +------------------------------+ 0xef800000
 *                     |  Cur. Page Table (User R-)   | R-/R-  PTSIZE
 *    UVPT      ---->  +------------------------------+ 0xef400000
 *                     |          RO PAGES            | R-/R-  PTSIZE
 *    UPAGES    ---->  +------------------------------+ 0xef000000
 *                     |           RO ENVS            | R-/R-  PTSIZE
 * UTOP,UENVS ------>  +------------------------------+ 0xeec00000
 * UXSTACKTOP -/       |     User Exception Stack     | RW/RW  PGSIZE
 *                     +------------------------------+ 0xeebff000
 *                     |       Empty Memory (*)       | --/--  PGSIZE
 *    USTACKTOP  --->  +------------------------------+ 0xeebfe000
 *                     |      Normal User Stack       | RW/RW  PGSIZE
 *                     +------------------------------+ 0xeebfd000
 *                     |                              |
 *                     |                              |
 *                     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 *                     .                              .
 *                     .                              .
 *                     .                              .
 *                     |~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~|
 *                     |     Program Data & Heap      |
 *    UTEXT -------->  +------------------------------+ 0x00800000
 *    PFTEMP ------->  |       Empty Memory (*)       |        PTSIZE
 *                     |                              |
 *    UTEMP -------->  +------------------------------+ 0x00400000      --+
 *                     |       Empty Memory (*)       |                   |
 *                     | - - - - - - - - - - - - - - -|                   |
 *                     |  User STAB Data (optional)   |                 PTSIZE
 *    USTABDATA ---->  +------------------------------+ 0x00200000        |
 *                     |       Empty Memory (*)       |                   |
 *    0 ------------>  +------------------------------+                 --+
 *
 * (*) Note: The kernel ensures that "Invalid Memory" is *never* mapped.
 *     "Empty Memory" is normally unmapped, but user programs may map pages
 *     there if desired.  JOS user programs map pages temporarily at UTEMP.
 */

x86 硬件内存管理

#d x86内存翻译机制

x86有两层内存翻译,第一层通过GDT表来翻译内存,也就是下图的Segmentation Mechanism,通过段寄存器访问GDT表,然后访问内存。第二层通过page dir和page table这两张表来翻译内存,也就是Paging Mechanism, 实现内存管理也就是本章节需要实现的功能。

#c 说明 未使用第一层

由于JOS未使用第一层机制,因此此处略过,也就是说Virtual和Linear相同。因此,均使用Virtual来指代。

Mechanism

#d Paging Mechanism

从下图可以看到,首先通过Virtual address的Dir字段用Page Directory查找对应的Page Table。然后在Page Table中使用Virtual address的Table字段查找对应的Page。一个Page Directory大小为4MiB,一个页为4KiB。

paging hardware

#d 应用过程

通过加载Page Directory表的开始地址到CR3寄存器,并且通过CR0寄存器开启paging,此时指令所有的内存访问都要通过pd和pt这两张表。

以下是Paging开启的汇编代码。

# load page directory
movl $(RELOC(entry_pgdir)), %eax
movl %eax, %cr3

# turn on paging
movl %cr0, %eax
orl $(CR0_PE|CR0_PG|CR0_WP), %eax
movl %eax, %cr0

Structures

Phy Mem

Total: 131 072KiB, base: 640KiB, extended: 130 432KiB 和pages相对应


----------------                 <- Max Phy Mem   
PAGE
----------------
--------------------------------------------------------------- ->  0xF040 0000 (4M)
----------------
PAGE
----------------
...
----------------
PAGE                            
----------------


PageInfo array 192K


----------------
page dir 4K
----------------                <- roundup to PGSIZE
---------------- end            <- 0x0011 66c0 (variable according kernel size)
kernel data
(bootstack)
----------------
kernel text
----------------                <- 0x0010 0000 

1MiB size 

----------------                <- 0x0000 0000 ---------------- -> 0xF000 0000 (0M)

pages

#c 说明 所处时刻

和物理内存(Phy Mem)对应,此时mem_init函数还没有调用boot_map_region,且刚刚完成调用page_init.

[Phy Mem Max]

...
...
[end+n]                                -> pages
...                                    -> pages
[end+2]                                -> pages
[end+1]                                -> pgdir
[end](round to PGSIZE)                 -> kernel img end
...
[16]                                   -> kernel img start
[15] --------------------------------------------------------------- base
...
[4]
[3] 
[2]
[1]
[0]                                    -> IDT and BIOS structures

pde_t *pgdir

这个表中的一项代表4MiB, 总共1024项,4GiB

Entry Base Virtual Address Points to (logically) Permission
960 0xf0000000 Phy 0 - 4MiB P
0 0x00000000 Phy 0 - 4MiB P W

上表表示bootloader刚加载kernel时,Page Directory表所对应的。

Entry Base Virtual Address Points to (logically) Permission
1023 Phy 0 - (4GiB/16) W P
Phy 0 - (4GiB/16) W P
960 Phy 0 - (4GiB/16) W P
959.1 bootstack W P
959.2
958
957 0xef400000 kern_pgdir UVPT U P
956 0xef000000 kern_pgdir U P
63 0x0fc00000
0 0x00000000

此表是mem_init后,Page Directory表中的项。

函数功能

mem_init

此函数通过i386_init函数调用,i386_init通过kern/entry.S中的汇编调用。也就是内核被加载之后,然后调用。

#d 功能

首先调用i386_detect_memory,然后通过i386_detect_memory来初始化各种表示内存容量的变量。然后通过boot_alloc来更改表示当前空闲位置的变量(nextfree, 会ROUNDUP),分配了kern_pgdir和pages数组。再通过page_init来初始化pages,也就是PageInfo结构体数组。此时page_free_list为表示空闲页的链表。pages三级标题下的内容。然后通过boot_map_region函数将虚拟地址映射到对应的pde和pte上,此处如果不存在对应的pte会创建新的表,也就是分配一页(page_alloc)。以下内容为具体代码。

void mem_init(void)
{
    uint32_t cr0;
    size_t n;
    
    // Find out how much memory the machine has (npages & npages_basemem).
    i386_detect_memory();

    //////////////////////////////////////////////////////////////////////
    // create initial page directory.
    kern_pgdir = (pde_t *) boot_alloc(PGSIZE);
    memset(kern_pgdir, 0, PGSIZE);

    //////////////////////////////////////////////////////////////////////
    // Recursively insert PD in itself as a page table, to form
    // a virtual page table at virtual address UVPT.
    // (For now, you don't have understand the greater purpose of the
    // following line.)
    // Permissions: kernel R, user R
    kern_pgdir[PDX(UVPT)] = PADDR(kern_pgdir) | PTE_U | PTE_P;

    //////////////////////////////////////////////////////////////////////
    // Allocate an array of npages 'struct PageInfo's and store it in 'pages'.
    // The kernel uses this array to keep track of physical pages: for
    // each physical page, there is a corresponding struct PageInfo in this
    // array.  'npages' is the number of physical pages in memory.  Use memset
    // to initialize all fields of each struct PageInfo to 0.
    // Your code goes here:
    pages = (struct PageInfo *) boot_alloc(sizeof(struct PageInfo) * npages);  
    memset(pages, 0, sizeof(struct PageInfo) * npages);

    //////////////////////////////////////////////////////////////////////
    // Now that we've allocated the initial kernel data structures, we set
    // up the list of free physical pages. Once we've done so, all further
    // memory management will go through the page_* functions. In
    // particular, we can now map memory using boot_map_region
    // or page_insert
    page_init();

    check_page_free_list(1);
    check_page_alloc();
    check_page();

    //////////////////////////////////////////////////////////////////////
    // Now we set up virtual memory
    //////////////////////////////////////////////////////////////////////
    // Map 'pages' read-only by the user at linear address UPAGES
    // Permissions:
    //    - the new image at UPAGES -- kernel R, user R
    //      (ie. perm = PTE_U | PTE_P)
    //    - pages itself -- kernel RW, user NONE
    // Your code goes here:
  
    boot_map_region(kern_pgdir, (uintptr_t) UPAGES, sizeof(struct PageInfo) * npages, \
		PADDR(pages), PTE_U | PTE_P);

    //////////////////////////////////////////////////////////////////////
    // Use the physical memory that 'bootstack' refers to as the kernel
    // stack.  The kernel stack grows down from virtual address KSTACKTOP.
    // We consider the entire range from [KSTACKTOP-PTSIZE, KSTACKTOP)
    // to be the kernel stack, but break this into two pieces:
    //     * [KSTACKTOP-KSTKSIZE, KSTACKTOP) -- backed by physical memory
    //     * [KSTACKTOP-PTSIZE, KSTACKTOP-KSTKSIZE) -- not backed; so if
    //       the kernel overflows its stack, it will fault rather than
    //       overwrite memory.  Known as a "guard page".
    //     Permissions: kernel RW, user NONE
    // Your code goes here:

    boot_map_region(kern_pgdir, (uintptr_t) (KSTACKTOP-KSTKSIZE), KSTKSIZE, \
		PADDR(bootstack), PTE_P | PTE_W);

    //////////////////////////////////////////////////////////////////////
    // Map all of physical memory at KERNBASE.
    // Ie.  the VA range [KERNBASE, 2^32) should map to
    //      the PA range [0, 2^32 - KERNBASE)
    // We might not have 2^32 - KERNBASE bytes of physical memory, but
    // we just set up the mapping anyway.
    // Permissions: kernel RW, user NONE
    // Your code goes here:

    boot_map_region(kern_pgdir, (uintptr_t) (KERNBASE), (0x100000000 - KERNBASE), \
		0x0, PTE_P | PTE_W);

    // Check that the initial page directory has been set up correctly.
    check_kern_pgdir();

    // Switch from the minimal entry page directory to the full kern_pgdir
    // page table we just created.  Our instruction pointer should be
    // somewhere between KERNBASE and KERNBASE+4MB right now, which is
    // mapped the same way by both page tables.
    //

    // If the machine reboots at this point, you've probably set up your
    // kern_pgdir wrong.
    lcr3(PADDR(kern_pgdir));
    check_page_free_list(0);

    // entry.S set the really important flags in cr0 (including enabling
    // paging).  Here we configure the rest of the flags that we care about.
    cr0 = rcr0();
    cr0 |= CR0_PE|CR0_PG|CR0_AM|CR0_WP|CR0_NE|CR0_MP;
    cr0 &= ~(CR0_TS|CR0_EM);
    lcr0(cr0);

    // Some more checks, only possible after kern_pgdir is installed.
    check_page_installed_pgdir();
}

page_init

#d 功能

此函数通过mem_init调用。根据目前物理内存所占用情况,对pages结构体和page_free_list进行更新。具体间pages部分。

boot_map_region

#d 功能

mem_init调用,仅仅用于启动。仅仅修改pde和pte表,不修改pages结构体。

page_alloc

#d 功能

真正的内存分配器。减少page_free_list, 更新pages结构体的pp_link为NULL,但是不增加pp_ref(通过page_insert或者显式增加)。

page_free

#d 功能

page_decref调用。将pp_ref和pp_link都为空的page归还给page_free_list。

page_decref

#d 功能

调用page_free。减少pp_ref,如果为0, 调用page_free.

pgdir_walk

#d 功能

查看一个va是否在pde和pte表里,如果在,返回对应的pte项地址,如果不再,创建一个pte表,返回对应的pte项地址。

page_insert

#d 功能

映射地址,将一个物理页映射到对应的pte表中,并增加pp_ref. pp_link通过page_alloc变为NULL.

page_lookup

#d 功能

会被page_remove调用。在pde和pte表中查找对应va的的页,然后返回.

page_remove

#d 功能

调用page_lookup,如果查找到,就调用page_decref. 同时清理对应位置的pte项。

tlb_invalidate

#d 功能

刷新TLB(即Page Directory和Page Table)表