Multithreading

Giovanna

About 1782 wordsAbout 6 min

2024-08-08

开启新实验

git fetch
git checkout thread
make clean

Uthread: switching between threads

任务描述：实现用户态多线程机制。（可以参考内核态中多线程的实现）

thread_switch需要在user/uthread_switch.S中实现，而user/uthread.c基本的框架已经完成，只需要完善两个函数的内容。

1. thread_switch

参考内核中swtch.S：

thread_switch和内核中的 swtch() 完全一样，用于切换处理器的上下文。

和内核中相同，因为执行这个函数的过程是一个正常的函数调用，所以不需要保存和交换调用者保存的寄存器。

		.text

        /*
         * save the old thread's registers,
         * restore the new thread's registers.
         */

        .globl thread_switch
thread_switch:
        /* YOUR CODE HERE */
        sd ra, 0(a0)
        sd sp, 8(a0)
        sd s0, 16(a0)
        sd s1, 24(a0)
        sd s2, 32(a0)
        sd s3, 40(a0)
        sd s4, 48(a0)
        sd s5, 56(a0)
        sd s6, 64(a0)
        sd s7, 72(a0)
        sd s8, 80(a0)
        sd s9, 88(a0)
        sd s10, 96(a0)
        sd s11, 104(a0)

        ld ra, 0(a1)
        ld sp, 8(a1)
        ld s0, 16(a1)
        ld s1, 24(a1)
        ld s2, 32(a1)
        ld s3, 40(a1)
        ld s4, 48(a1)
        ld s5, 56(a1)
        ld s6, 64(a1)
        ld s7, 72(a1)
        ld s8, 80(a1)
        ld s9, 88(a1)
        ld s10, 96(a1)
        ld s11, 104(a1)
        
        ret    /* return to ra */

2. context

utrhead.c 原本的文件中并没有给 struct thread 加上一个上下文的属性。上下文保存的寄存器和内核态多线程proc.h中定义的完全相同。

struct Context{
  uint64 ra;
  uint64 sp;

  // callee-saved
  uint64 s0;
  uint64 s1;
  uint64 s2;
  uint64 s3;
  uint64 s4;
  uint64 s5;
  uint64 s6;
  uint64 s7;
  uint64 s8;
  uint64 s9;
  uint64 s10;
  uint64 s11;
};

struct thread {
  char       stack[STACK_SIZE]; /* the thread's stack */
  int        state;             /* FREE, RUNNING, RUNNABLE */
  struct Context ctx;
};

3. thread_schedule

参考内核中的实现，这个函数和内核proc.c中的 scheduler() 的作用相同。

在 thread_schedule() 中，我们会需要调用 thread_switch() 来切换处理器的上下文。

观察原来thread_schedule函数的代码可以看到，最开始的循环找到了第一个为 RUNNABLE 的线程，然后把这个线程赋值到 next_thread。所以很明显，我们应该交换 current_thread 和 next_thread() 的上下文。

void thread_schedule(void)
{
  struct thread *t, *next_thread;

  /* Find another runnable thread. */
  next_thread = 0;
  t = current_thread + 1;
  for(int i = 0; i < MAX_THREAD; i++){
    if(t >= all_thread + MAX_THREAD)
      t = all_thread;
    if(t->state == RUNNABLE) {
      next_thread = t;
      break;
    }
    t = t + 1;
  }

  if (next_thread == 0) {
    printf("thread_schedule: no runnable threads\n");
    exit(-1);
  }

  if (current_thread != next_thread) {         /* switch threads?  */
    next_thread->state = RUNNING;
    t = current_thread;
    current_thread = next_thread;
    /* YOUR CODE HERE
     * Invoke thread_switch to switch from t to next_thread:
     * thread_switch(??, ??);
     */
     thread_switch((uint64)&t->ctx, (uint64)&current_thread->ctx);
  } else
    next_thread = 0;
}

4. thread_create

实现这个函数主要需要思考如何设置 ra 和 sp 寄存器。因为用户进程一开始的时候是没有使用寄存器的，所以如何设置上下文中的其他寄存器是无所谓的。

首先，在 thread_create() 之后，如果我们调用了 thread_schedule() ，应该执行的是线程函数的第一个语句。所以可以这么设置 ra：

t->ctx.ra = (uint64) func;

对于 sp，需要注意的是栈是从高地址到低地址增长的，那么 sp 应该被设置在栈的最高地址：

t->ctx.sp = (uint64) &t->stack + (STACK_SIZE - 1);

那么这个 thread_create() 就写完了：

void 
thread_create(void (*func)())
{
  struct thread *t;

  for (t = all_thread; t < all_thread + MAX_THREAD; t++) {
    if (t->state == FREE) break;
  }
  t->state = RUNNABLE;
  // YOUR CODE HERE
  t->ctx.ra = (uint64) func;
  t->ctx.sp = (uint64) &t->stack + (STACK_SIZE - 1);
}

Using threads

任务描述：阅读一个散列表（哈希表）的程序，然后做一些更改，使得这个程序在多线程的环境下也可用。

尝试运行下提供的程序，只使用一个线程时一切正常。如果改成两个及以上就会出现某些在散列表中插入的键值对消失了。

为了解决这个问题，我们可以先看一遍这个散列表，找一找问题出现的地方。这个程序中，最关键的有三个函数 insert()，put() 和 get()。

insert的作用是创建一个结点e插入到p之后n之前。

put用于将一个键值对插入哈希表中。如果键已经存在，更新其对应的值；如果键不存在，则插入一个新的键值对。

get根据键 key 查找并返回与之关联的键值对（即 struct entry 指针）。

在单线程的情况下这几个函数是没有问题的，但是多线程可能出现同时操作链表导致错误产生。

所以我们可以对于散列表中的每个链表都创建一个互斥锁，然后在 put() 和 get() 的开头和结尾加锁和解锁。

不在 insert() 里加锁是因为 insert() 都是 put() 调用的，已经在put()加锁了。

根据提示完善代码。

对于散列表中的每个链表都创建一个互斥锁：

pthread_mutex_t bkt_lock[NBUCKET];

在put()里添加锁：

static
void put(int key, int value)
{
  int i = key % NBUCKET;

  pthread_mutex_lock(&bkt_lock[i]);
  // is the key already present?
  struct entry *e = 0;
  for (e = table[i]; e != 0; e = e->next) {
    if (e->key == key)
      break;
  }
  if(e){
    // update the existing key.
    e->value = value;
  } else {
    // the new is new.
    insert(key, value, &table[i], table[i]);
  }

  pthread_mutex_unlock(&bkt_lock[i]);

}

在get()里添加锁：

static struct entry*
get(int key)
{
  int i = key % NBUCKET;

  pthread_mutex_lock(&bkt_lock[i]);
  struct entry *e = 0;
  for (e = table[i]; e != 0; e = e->next) {
    if (e->key == key) break;
  }

  pthread_mutex_unlock(&bkt_lock[i]);
  return e;
}

Barrier

任务描述：实现同步屏障。

根据维基百科：

同步屏障(Barrier)是并行计算中的一种同步方法。对于一群进程或线程，程序中的一个同步屏障意味着任何线程/进程执行到此后必须等待，直到所有线程/进程都到达此点才可继续执行下文。

那么一个朴素的实现方法就是在一个线程到达屏障时把某个变量 +1，最后如果这个变量等于线程总数量，就可以执行了。

需要使用到的相关方法：

pthread_cond_wait()的作用是把线程放到等待列表中，然后解锁。

pthread_cond_broadcast()的作用是唤醒等待列表中的所有线程。

barrier()中要进行的操作是：

获得操作权后将等待的线程数+1
如果等待的线程数小于规定的线程数，把线程放入等待列表并释放操作权
如果等待的线程数达到规定的线程数，等待线程数清零，轮次+1，唤醒所有等待列表中的线程
释放操作权

然后就可以写出如下代码：

static void barrier()
{
  // YOUR CODE HERE
  //
  // Block until all threads have called barrier() and
  // then increment bstate.round.
  //
  pthread_mutex_lock(&bstate.barrier_mutex);
  bstate.nthread++;
  if(bstate.nthread < nthread){
    pthread_cond_wait(&bstate.barrier_cond, &bstate.barrier_mutex);
  } else {
    bstate.nthread = 0;
    bstate.round++;
    pthread_cond_broadcast(&bstate.barrier_cond);
    pthread_mutex_unlock(&bstate.barrier_mutex);
  }
}

The End

第一个实验是比较有难度的，既考察了对于源码的阅读，又考察了对于riscv底层的了解。（我自己做的话不知道要整多久

第二和第三个实验就简单多啦，和之前课堂上讲过PV操作差不多，理解上毫无障碍，但是对于题目所给的函数进行了什么操作有一点疑问。