Lab1: Booting a PC

Part1 PC Bootstrap

Getting Started with x86 assembly

整个实验使用的汇编语言都是x86，因此需要了解x86的语法和一些技巧。需要注意的是使用语法格式是AT&T风格而不是Intel风格，具体来说，一个显著的区别在于：

1 2	mov src, dst # This is AT&T style mov dst, src # This is Intel style

Simulating the x86

实现环境不采用实机而是通过QEMU模拟器，更为方便。QEMU可以提供远程调试模板，这个在之后的跟踪内核启动指令有用。

通过make生成内核映像（obj/kern/kernel.img)，它包含两部分：boot loader(obj/boot/boot)和kernel(obj/kernel)，然后通过QEMU加载该映像启动内核。

实现提供了一个Makefile文件，里面有准备好的命令，敲入make qemu就能启动PC了：

退出方式：Ctrl+a x
现在内核提供了一个显示器（Monitor），可以接受键盘输入，开始可以输入的有效命令只有两个：

调试该内核，则使用如下指令

// terminal window 1
make qemu-gdb

// terminal window 2
make gdb

The PC’s Physical Address Space

+------------------+  <- 0xFFFFFFFF (4GB)
|      32-bit      |
|  memory mapped   |
|     devices      |
|                  |
/\/\/\/\/\/\/\/\/\/\

/\/\/\/\/\/\/\/\/\/\
|                  |
|      Unused      |
|                  |
+------------------+  <- depends on amount of RAM
|                  |
|                  |
| Extended Memory  |
|                  |
|                  |
+------------------+  <- 0x00100000 (1MB)
|     BIOS ROM     |
+------------------+  <- 0x000F0000 (960KB)
|  16-bit devices, |
|  expansion ROMs  |
+------------------+  <- 0x000C0000 (768KB)
|   VGA Display    |
+------------------+  <- 0x000A0000 (640KB)
|                  |
|    Low Memory    |
|                  |
+------------------+  <- 0x00000000

The ROM BIOS

使用gdb调试内核。

开始执行指令的物理地址为0x000ffff0，对照内存空间不难发现这是在64KB BIOS的顶部

第一个指令是ljmp，跳转到的地址为cs: 0xf000, ip: 0xe05d，即0xfe05d，至于为什么要跳转，文档给出的理由是只有16字节啥都干不了，所以要跳到BIOS的前面去（0xf0000~0xfffff）。

BIOS安置中断描述符表（interrupt descriptor table）和初始化各种设备（比如VGA display）。

在初始化PCI总线和所有BIOS知晓的重要设备后，它搜索可启动（bootable）设备，比如软盘（floppy），硬件驱动（hard drive），CD-ROM。最终，找到了可启动硬盘，BIOS从中读取boot loader，并将控制转交给它。

如果硬盘是可启动的，那么第一个扇区是boot sector，boot loader的代码就在那里。BIOS会将boot sector加载到物理地址0x7c00~0x7dff,接着跳转到0000:7c00，转交控制给boot loader。

随着PC发展，后面采用的不是软件或硬盘，而是CD-ROM，它更复杂同时也更强大，使用的是2048字节的扇区大小，可以加载更大的启动印象（boot image）到内存中。

JOS采用的是传统做法，即加载512字节大小的扇区。

Exercise 2
使用si命令跟踪ROM BIOS的指令，尝试猜测它在干什么。不需要搞懂所有细节，仅大概了解BIOS在启动时所做的工作是什么。

笼统来说，BIOS的工作是：
初始化硬件设备等，找到第一个可启动硬盘并将其第一个扇区加载到0x7c00~0x7dff，然后转交控制给boot loader（jmp 0000:7c00)

Part2: Boot Loader

boot loader的工作有以下几点：

将cpu模式由16bit实模式切换到32bit保护模式（boot/boot.S)
从硬盘加载内核到内存中并转交控制权给内核（boot/main.c)

为了看懂这两个文件，我们要了解实模式（real mode）和保护模式（protected mode）。

实模式和保护模式

在实模式下，只有16bit模式，同时内存被限制为1MB，但是16位的寄存器不能表示所有物理地址，因此intel用两个16位值解决这个问题，第一个值为选择器（selector），存储在段寄存器（segment register）中，第二个值是偏移（offset）。因此物理地址可以表示为：

$physical \ address = 16 \times selector + offset$

但是16bit的实模式问题很多，

单个选择器最多只能引用64KB的内存，假如程序不只64KB，那么就需要跨段，对于DS也是同理。
每个物理地址的表示方式并不唯一，比如047c:0048和047d:0038表示的物理地址是相同的，如果要区分那么必须两部分都比较。

Intel 80286引入了16bit的保护模式，在实模式，选择器是物理地址的段落，但在保护模式，它是描述符表（descriptor table）的索引，它们并不是物理内存的固定位置，每个段在描述符表中有自己的项，这个项有一些元数据：访问权限，是否在内存中，内存的位置（如果在）。

也正是从保护模式开始，使用了虚拟内存的技术。仅维持当前程序使用的代码和数据，其他的放在硬盘上，待需要时再用，段也就往返于内存和硬盘间了。所有这一切当然对用户是透明的，方便了用户编写程序。

很多机制都贴近现在的虚拟内存了，但是效率不高：段的粒度不一，有大有小，每次换入换出都是以段作为单位。根据局部性原则，采用固定的块（即page）是更好的，而16bit保护模式仍是段分。

同时也没有突破段的限制（64KB）。

Intel 80386引入了32bit的保护模式，

偏移扩展为32位。这样段最大为4GB。
段被划分为更小的页（4KB小页，4GB大页）

虚拟内存的主要部分是在lab2，这里主要是涉及了实模式到保护模式的切换，所以稍微提一下。

JOS设置qemu的cpu为i386（make qemu可以知道），因此一个这个cpu的编程文档对于解读该实验的一些细节很有用：Intel 80386 Programmer’s Reference Manual

其中第5章讲述就是段转化（segment translation），其中有些东西这里有用。

段转化

段描述符

段描述符（segment descriptor）一般不由程序员提供（但是这个实验还是得由你提供）。它的字段如下：

Base：定义段的位置。有3个部分，cpu将其组合在一起是32位值
Limit：定义段的大小。有两个部分，cpu组合起来是20位值。这里cpu解释它有两个粒度：1B和4KB。
Granularity bit（粒度位）：置位表示4KB，未置位则表示1B。
Type（类型）：
Descriptor privilege level（DPL，特权级别）：保护机制
Segment-Present Bit（存在位）：为0表示描述符无效（即此时它已经被换出内存了）。和page的类似，如果访问的时候不在的话触发异常。OS如果不同，可以标记该位为AVALIABLE。
Accessed bit（访问位）：当该段被访问后，置位。用途好像主要是用于基于段实现的虚拟内存OS监控段使用频率（定期测试和清空）

// Application segment type bits
#define STA_X		0x8	    // Executable segment
#define STA_E		0x4	    // Expand down (non-executable segments)
#define STA_C		0x4	    // Conforming code segment (executable only)
#define STA_W		0x2	    // Writeable (non-executable segments)
#define STA_R		0x2	    // Readable (executable segments)
#define STA_A		0x1	    // Accessed

/*
 * Macros to build GDT entries in assembly.
 */
#define SEG_NULL						\
	.word 0, 0;						\
	.byte 0, 0, 0, 0
#define SEG(type,base,lim)					\
	.word (((lim) >> 12) & 0xffff), ((base) & 0xffff);	\
	.byte (((base) >> 16) & 0xff), (0x90 | (type)),		\
		(0xC0 | (((lim) >> 28) & 0xf)), (((base) >> 24) & 0xff)

#else	// not __ASSEMBLER__

在<inc/mmu.h>中有一段就是段描述符的宏，SEG_NULL是给第一个元素用的，因为不被使用。

第二个SEG是应用段，采用的格式是上面第一张图片，其中由于一开始A必然为0，而type最低位也是0，所以不需要考虑A的情况，dpl也不需要考虑。

0xC0是一个比较迷惑的点，最高位表示粒度为4KB，还有一位是X，这个在386的手册中并未解释，我翻了IA-32第三卷发现了下面的段描述符格式：

由此看来该位是决定该段是16bit还是32bit，在该实验肯定是32bit，所以置为1。

通过xv6的<asm.h>的注释也可以知道这个描述是没有问题的：

1 2	// The 0xC0 means the limit is in 4096-byte units // and (for executable segments) 32-bit mode.

（至于386手册为什么没写，可能当时该位不起任何作用？）

段描述符表

有两种描述符表：

Global Descriptor Table（GDT）
Local Descriptor Table（LDT）

这个在接下来和lab3都有用，所以有必要讲下。

描述符表示为数组，其中的元素就是8B的描述符（最多8192个元素）。第一个元素并不被cpu使用。

cpu通过GDTR和LDTR寄存器定位GDT和LDT。这两个寄存器存储两个数据：

base address：在地址空间的位置
segment limits：大小

LGDT，SGDT是对GDT的load/store指令，Intel 64 and IA-32 Intel Architecture Software Developer’s Manuals第二卷A有LGDT的说明

（64的是给64位模式使用的，我们用不到，略）

16bit是limit，而32bit的是base address。

LGDT的操作数是一个地址，这个地址包含这两个字段：低2字节是大小，高4字节是base address。

在实模式（即本实验中）的用法算是一种惯用法（idiom）

需要注意一点，就是limit实际是$8N-1$，

boot.S

#include <inc/mmu.h>

# Start the CPU: switch to 32-bit protected mode, jump into C.
# The BIOS loads this code from the first sector of the hard disk into
# memory at physical address 0x7c00 and starts executing in real mode
# with %cs=0 %ip=7c00.

.set PROT_MODE_CSEG, 0x8         # kernel code segment selector
.set PROT_MODE_DSEG, 0x10        # kernel data segment selector
.set CR0_PE_ON,      0x1         # protected mode enable flag

.globl start
start:
  .code16                     # Assemble for 16-bit mode
  cli                         # Disable interrupts
  cld                         # String operations increment

  # Set up the important data segment registers (DS, ES, SS).
  xorw    %ax,%ax             # Segment number zero
  movw    %ax,%ds             # -> Data Segment
  movw    %ax,%es             # -> Extra Segment
  movw    %ax,%ss             # -> Stack Segment

  # Enable A20:
  #   For backwards compatibility with the earliest PCs, physical
  #   address line 20 is tied low, so that addresses higher than
  #   1MB wrap around to zero by default.  This code undoes this.
seta20.1:
  inb     $0x64,%al               # Wait for not busy
  testb   $0x2,%al
  jnz     seta20.1

  movb    $0xd1,%al               # 0xd1 -> port 0x64
  outb    %al,$0x64

seta20.2:
  inb     $0x64,%al               # Wait for not busy
  testb   $0x2,%al
  jnz     seta20.2

  movb    $0xdf,%al               # 0xdf -> port 0x60
  outb    %al,$0x60

  # Switch from real to protected mode, using a bootstrap GDT
  # and segment translation that makes virtual addresses 
  # identical to their physical addresses, so that the 
  # effective memory map does not change during the switch.
  lgdt    gdtdesc
  movl    %cr0, %eax
  orl     $CR0_PE_ON, %eax
  movl    %eax, %cr0
  
  # Jump to next instruction, but in 32-bit code segment.
  # Switches processor into 32-bit mode.
  ljmp    $PROT_MODE_CSEG, $protcseg

  .code32                     # Assemble for 32-bit mode
protcseg:
  # Set up the protected-mode data segment registers
  movw    $PROT_MODE_DSEG, %ax    # Our data segment selector
  movw    %ax, %ds                # -> DS: Data Segment
  movw    %ax, %es                # -> ES: Extra Segment
  movw    %ax, %fs                # -> FS
  movw    %ax, %gs                # -> GS
  movw    %ax, %ss                # -> SS: Stack Segment
  
  # Set up the stack pointer and call into C.
  movl    $start, %esp
  call bootmain

  # If bootmain returns (it shouldn't), loop.
spin:
  jmp spin

# Bootstrap GDT
.p2align 2                                # force 4 byte alignment
gdt:
  SEG_NULL				# null seg
  SEG(STA_X|STA_R, 0x0, 0xffffffff)	# code seg
  SEG(STA_W, 0x0, 0xffffffff)	        # data seg

gdtdesc:
  .word   0x17                            # sizeof(gdt) - 1
  .long   gdt                             # address gdt

代码注释其实写的很详细了，这里讲一下大概的流程：

首先关掉中断，因为原本是用于BIOS初始化工作用的，现在没有必要开启，在内核准备完毕后再打开。

由于各个段寄存器在BIOS工作做完之后，不一定为空，所以需要全部置零。

然后是A20使能，这个步骤是为了向后兼容，不搞懂它也没有关系，毕竟是历史遗留。主要是为了强制忽略超过1MB的地址第21位，这样就不会出现与早期PC的兼容性问题。有兴趣可以看下：

之后的一步，就是需要加载GDT，通过它，即使切换到了保护模式，也能保持虚拟地址（或逻辑地址）与物理地址等同，这样内存映射的逻辑不会被打乱。

然后通过CR0（控制寄存器）,将PE(Protection Enable)置位，便切换到了保护模式，但还未进入32位模式。

通过ljmp跳转用到的代码段寄存器，切换到了32位模式（因为0xC0），

然后将保护模式下的段寄存器们全部设置为现在的数据段，保持一致。

这里有个小问题就是栈应该设置在哪里？内核会加载到0x100000，而boot loader是0x7c00-0x7e00，由于栈是向下增长，因此可以设为0x7c00，即start标签处（31KB对于boot loader应该够用了）

然后调用bootmain读取内核。

至此，boot.S的工作完成。

main.c

#include <inc/x86.h>
#include <inc/elf.h>

/**********************************************************************
 * This a dirt simple boot loader, whose sole job is to boot
 * an ELF kernel image from the first IDE hard disk.
 *
 * DISK LAYOUT
 *  * This program(boot.S and main.c) is the bootloader.  It should
 *    be stored in the first sector of the disk.
 *
 *  * The 2nd sector onward holds the kernel image.
 *
 *  * The kernel image must be in ELF format.
 *
 * BOOT UP STEPS
 *  * when the CPU boots it loads the BIOS into memory and executes it
 *
 *  * the BIOS intializes devices, sets of the interrupt routines, and
 *    reads the first sector of the boot device(e.g., hard-drive)
 *    into memory and jumps to it.
 *
 *  * Assuming this boot loader is stored in the first sector of the
 *    hard-drive, this code takes over...
 *
 *  * control starts in boot.S -- which sets up protected mode,
 *    and a stack so C code then run, then calls bootmain()
 *
 *  * bootmain() in this file takes over, reads in the kernel and jumps to it.
 **********************************************************************/

#define SECTSIZE	512
#define ELFHDR		((struct Elf *) 0x10000) // scratch space

void readsect(void*, uint32_t);
void readseg(uint32_t, uint32_t, uint32_t);

void
bootmain(void)
{
	struct Proghdr *ph, *eph;

	// read 1st page off disk
	readseg((uint32_t) ELFHDR, SECTSIZE*8, 0);

	// is this a valid ELF?
	if (ELFHDR->e_magic != ELF_MAGIC)
		goto bad;

	// load each program segment (ignores ph flags)
	ph = (struct Proghdr *) ((uint8_t *) ELFHDR + ELFHDR->e_phoff);
	eph = ph + ELFHDR->e_phnum;
	for (; ph < eph; ph++)
		// p_pa is the load address of this segment (as well
		// as the physical address)
		readseg(ph->p_pa, ph->p_memsz, ph->p_offset);

	// call the entry point from the ELF header
	// note: does not return!
	((void (*)(void)) (ELFHDR->e_entry))();

bad:
	outw(0x8A00, 0x8A00);
	outw(0x8A00, 0x8E00);
	while (1)
		/* do nothing */;
}

// Read 'count' bytes at 'offset' from kernel into physical address 'pa'.
// Might copy more than asked
void
readseg(uint32_t pa, uint32_t count, uint32_t offset)
{
	uint32_t end_pa;

	end_pa = pa + count;

	// round down to sector boundary
	pa &= ~(SECTSIZE - 1);

	// translate from bytes to sectors, and kernel starts at sector 1
	offset = (offset / SECTSIZE) + 1;

	// If this is too slow, we could read lots of sectors at a time.
	// We'd write more to memory than asked, but it doesn't matter --
	// we load in increasing order.
	while (pa < end_pa) {
		// Since we haven't enabled paging yet and we're using
		// an identity segment mapping (see boot.S), we can
		// use physical addresses directly.  This won't be the
		// case once JOS enables the MMU.
		readsect((uint8_t*) pa, offset);
		pa += SECTSIZE;
		offset++;
	}
}

void
waitdisk(void)
{
	// wait for disk reaady
	while ((inb(0x1F7) & 0xC0) != 0x40)
		/* do nothing */;
}

void
readsect(void *dst, uint32_t offset)
{
	// wait for disk to be ready
	waitdisk();

	outb(0x1F2, 1);		// count = 1
	outb(0x1F3, offset);
	outb(0x1F4, offset >> 8);
	outb(0x1F5, offset >> 16);
	outb(0x1F6, (offset >> 24) | 0xE0);
	outb(0x1F7, 0x20);	// cmd 0x20 - read sectors

	// wait for disk to be ready
	waitdisk();

	// read a sector
	insl(0x1F0, dst, SECTSIZE/4);
}

注意文件开头的注释，硬盘的布局如下：

因此能够它会假设内核映像在硬盘上的位置在第二个扇区及之后。

至此，加载内核映像完成。

Boot的流程：

加载ROM中的BIOS到内存中并执行它
BIOS初始化硬件设备，读取第一个扇区（启动扇区）到指定的位置并转交控制给boot loader
boot loader启动保护模式和设置栈（mov $start, %esp）以至于C代码能够运行，然后调用bootmain()
bootmain()读取内核并转交控制给内核

Debug相关

obj/boot/boot.asm：boot loader
obj/kern/kernel.asm： JOS内核

两个是已经反汇编好的汇编文件，其中装载地址都已经填好了，对于debug来说是很有用的

Question

lab1的文档提了一下问题：

At what point does the processor start executing 32-bit code? What exactly causes the switch from 16- to 32-bit mode?

.code32后都是32位模式

ljmp $PROT_MODE_CSEG, $prot_cseg

What is the last instruction of the boot loader executed, and what is the first instruction of the kernel it just loaded?

最后的语句应该是boot/main.c中bootmain函数最后一句，就是调用内核入口函数，不过这里指的是最后的指令，从obj/boot/boot.asm可以看出应为：call *0x10018，这个地址有点特别，这个和ELF有关。
内核执行的第一条指令为movw $0x1234, 0x472（obj/kern/kernel.asm）