15-445 Project 2 - B+Tree

Task #1 - B+Tree Pages

实现三种Page

B+Tree Page 基类，记录了类型、尺寸
B+Tree Internal Page 内部节点，m个key，m+1个孩子指针(第一个key是空的来实现)
B+Tree Leaf Page 叶子节点，m个key，m个记录IDRID
考虑每种Page需要实现的操作
Insert 插入key-value
Lowerbound 获得小于等于key的index
Lookup 寻找Key，内部节点和叶子节点的两个Lookup有着不同的语义
- 叶子节点的Lookup使用Lowerbound找到第一个>=key的index，是为了找到对应的key
- 内部节点使用Upperbound-1获得**最后一个<=key的**的index，因为对应的value子树中的所有值一定是>=key[Upperbound-1] && < key[Upperbound]的。比如[0,3,5,7]，Key=4在3的子树中，Key=3也在3的子树中
- 考虑0和n的边界情况
Split 分割一半Key到另一个Page，内部节点的分割需要注意下
实现两个小函数
Lowerbound 获得第一个>=key的index
Upperbound 获得第一个>key的index

Task #2a - B+Tree Insertion and Search for Single Values

实现插入和查找，查找比较简单，插入有几点需要注意

内部节点什么时候分裂：要避免内部节点Size=1的情况（内部节点第一个key是空的，value是子指针），在我的实现中对内部节点分裂后再插入，如果新的内部节点Size<2，把原来内部节点的最后一个Key移过去
分裂内部节点时：新的内部节点的第一个Key是要给父亲节点的

关于Lower bound和Upper bound

在做这个的过程中，想统一下Lower和Upper bound的实现，统一使用$[l,r)$左闭右开

INDEX_TEMPLATE_ARGUMENTS
auto B_PLUS_TREE_INTERNAL_PAGE_TYPE::Lowerbound(const KeyType &key, const KeyComparator &comparator) const -> int {
  auto n = GetSize();
  BUSTUB_ASSERT(n != 0, "lookup in empty node");
  int l = 1;  // ignore the first key
  int r = n;  // [l, r)
  while (l < r) {
    int m = (l + r) >> 1;
    auto res = comparator(key, array_[m].first);
    if (res > 0) {
      l = m + 1;
    } else {
      r = m;
    }
  }
  return l;
}

INDEX_TEMPLATE_ARGUMENTS
auto B_PLUS_TREE_INTERNAL_PAGE_TYPE::Upperbound(const KeyType &key, const KeyComparator &comparator) const -> int {
  auto n = GetSize();
  BUSTUB_ASSERT(n != 0, "lookup in empty node");
  int l = 0;  // ignore the first key
  int r = n;  // [l, r)
  while (l < r) {
    int m = (l + r) >> 1;
    auto res = comparator(key, array_[m].first);
    if (res >= 0) {
      l = m + 1;
    } else {
      r = m;
    }
  }
  return l;
}

两者唯一的区别就是当array_[m] == key时的处理方式

Upperbound中，找的是>key的index，所以l=m+1
Lowerbound中，找的是>=key的index，所以r=m

Task #2b - B+Tree Deletions

CMU 15445-2022 P2 B+Tree Insert/Delete - 知乎
删除这块有点麻烦，看一下书上是怎么写的
目标是删除Srinivasan 是上边链接中的(internal page borrow)
![[Pasted image 20230822224149.png]]

找到叶子节点，将Srinivasan删除后，叶子节点中只剩下一个key，少于最小大小
删除后，可以将Wu的叶子节点与左侧的合并，然后在第二层节点中删除掉指向已删除叶子结点的key（这里是Srinivasan）
在第二层节点中删除后，这个内部节点小于最小大小。首先看能不能与兄弟节点合并，但是兄弟节点已满。所以要与兄弟节点**重新分配(redistribute)**：将最右侧的Gold移过来
这时，第二层的右侧内部节点有两个pointer：一个是原来的指向Mozart的，另一个是Gold自带的，发现没有一个Key可以分割这两个pointer（分割这两个pointer的Key在父亲节点中(Mozart)，因为Mozart左侧小于Mozart，右侧大于Mozart)。这时分割第二层的两个内部结点的Key应该变成了Gold。
删除完后变成了下图：Gold跑到了根节点，Mozart到了第二层中
![[Pasted image 20230822225530.png]]
然后删除Singh和Wu，也是相同的方式，叶子节点从左侧借一个，然后更新父亲节点中的Key(leafpage borrow：不会改变internal page的大小，只需要叶子节点borrow然后更新internal page的key)
![[Pasted image 20230822230459.png]]
在此之上删除Gold(internal page merge)：
叶子节点与右侧的Kim合并，合并后在内部节点中删掉对应的Key
内部节点小于最小大小，与左侧合并，合并的Key从内部结点的父亲来，这里是Gold
内部节点合并后在第一层里删掉原来的Gold，正好是根节点，根节点变过去
![[Pasted image 20230822231609.png]]
在B+树中，可能会出现叶子节点中删除的Key仍然在内部节点中出现，如Gold
![[IMG_B5D15EFE7985-1.jpeg]]
如果需要借，
叶子节点，借过来后更新父亲节点中的Key（最多借一个，因为删掉的就一个），return
内部节点，将兄弟节点偷走的Key放到父亲位置，父亲的Key和偷走的Value放过来(对应图中的2.1)，return
如果需要合并，往左侧合并，在父亲节点中删掉指向原来右侧的Key
叶子结点，往左侧挪过去，在父亲节点中删掉，设置NextPageId，更新内部节点
内部结点，将父亲的Key和被合并的KV放到合并节点中(因为被合并的中间，最左侧只有一个Value，没有Key，相当于补了一个Key)，在父亲节点中删掉(对应图中的2.2)，更新内部节点

在实现中，内部结点的借 (Borrow)可以有一个简单的实现方式：

对于内部节点，因为有n个key和n+1个指针，所以第一个key是空的，但是我们在分裂的时候是把分裂点的key提到了父亲的key中(绿色)，我们可以把这个key保存在分裂节点的第一个key中（如下图）
![[20230824_145857316_iOS.png]]
当需要Borrow的时候，因为第一个隐藏的key就是父节点的key
- 向左边借(offset=-1)，直接把最后一个插入到需要的节点的第一个位置，相当于把借来的value(橙色)和父节点的key(绿色)组成了新的kv
- 向右边借(offset=1)，把右边节点的第一个节点插入到需要的节点的最后，相当于删掉了第二个key(实际有意义的第一个key)
- 最后根据右边节点的第一个key更新父节点(offset=-1的橙色箭头)
  ![[20230824_145607089_iOS.png]]

Contention

一开始实现B+树操作的时候，都是用的Context从根到叶子拿上读锁，这样效果是很差的，也过不了contention test ，应该使用Crabing的方法：18-notes-indexconcurrency.pdf

Debug Deadlock

在改代码的过程中出现了./test/b_plus_tree_contention_test这个测试的deadlock，正好学习一下怎么debug deadlock
![[Pasted image 20230901201123.png]]

(lldb) thread select 4
(lldb) bt
* thread #4, name = 'b_plus_tree_con', stop reason = signal SIGSTOP
  * frame #0: 0x00007f9a90f6cf37 libpthread.so.0`__GI___pthread_rwlock_wrlock at futex-internal.h:284:13
    frame #1: 0x00007f9a90f6cf21 libpthread.so.0`__GI___pthread_rwlock_wrlock at pthread_rwlock_common.c:830:14
    frame #2: 0x00007f9a90f6cdc0 libpthread.so.0`__GI___pthread_rwlock_wrlock(rwlock=0x000055e176bb2fa0) at pthread_rwlock_wrlock.c:27:16
    frame #3: 0x000055e174c2aa23 b_plus_tree_contention_test`std::__glibcxx_rwlock_wrlock(__rwlock=0x000055e176bb2fa0) at shared_mutex:73:3
    frame #4: 0x000055e174c2bea5 b_plus_tree_contention_test`std::__shared_mutex_pthread::lock(this=0x000055e176bb2fa0) at shared_mutex:186:19
    frame #5: 0x000055e174c2be85 b_plus_tree_contention_test`std::shared_mutex::lock(this=0x000055e176bb2fa0) at shared_mutex:412:27
(lldb) frame select 3
frame #3: 0x000055e174c2aa23 b_plus_tree_contention_test`std::__glibcxx_rwlock_wrlock(__rwlock=0x000055e176bb2fa0) at shared_mutex:73:3
   70  	  }
   71  	  _GLIBCXX_GTHRW(rwlock_rdlock)
   72  	  _GLIBCXX_GTHRW(rwlock_tryrdlock)
-> 73  	  _GLIBCXX_GTHRW(rwlock_wrlock)
   74  	  _GLIBCXX_GTHRW(rwlock_trywrlock)
   75  	  _GLIBCXX_GTHRW(rwlock_unlock)
   76  	# ifndef PTHREAD_RWLOCK_INITIALIZER
(lldb) frame variable
(pthread_rwlock_t *) __rwlock = 0x000055e176bb2fa0
(lldb) p *__rwlock
(pthread_rwlock_t) $1 = {
  __data = {
    __readers = 250
    __writers = 0
    __wrphase_futex = 2
    __writers_futex = 1
    __pad3 = 0
    __pad4 = 0
    __cur_writer = 0
    __shared = 0
    __rwelision = '\0'
    __pad1 = "\0\0\0\0\0\0"
    __pad2 = 0
    __flags = 0
  }
  __size = "\xfa\0\0\0\0\0\0\0\U00000002\0\0\0\U00000001\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"
  __align = 250
}

可以看到有250个reader，这显然是不对的，应该是某些reader没有及时释放锁
原因是我的insert实现的是

拿从根节点到叶子节点的读锁
从叶子节点反向将一个个读锁换成写锁
但是这样是有问题的:
一个线程可能拿着儿子节点的WLock，获取父节点的WLock
另一个线程在拿着父节点的RLock，试图获取儿子节点的RLock
解决方式是Insert和Remove都需要从叶子节点开始拿WLock，首先将上一层锁住，所有的加锁顺序都是从上一层到下一层，不会出现循环等待的情况（无论是RLock还是WLock）
因为Insert和Remove都是从下往上处理，下层节点处理完后就可以把锁释放掉了（因为还拿着上层的锁）

Improved Lock Crabbing (Optimistic Lock Coupling): The problem with the basic lock crabbing algorithm is that transactions always acquire an exclusive lock on the root for every insert/delete operation. This greatly limits parallelism. Instead, we can assume that having to resize (i.e., split/merge nodes) is rare, and thus transactions can acquire shared locks down to the leaf nodes. Each transaction will assume that the path to the target leaf node is safe, and use S-locks and crabbing to reach it, and verify. If any node in the path is not safe, then do previous algorithm (i.e., acquire X-locks)

一种优化的方式是Insert/Remove从跟节点到叶子节点拿RLock，叶子节点拿WLock，如果需要Split/Merge/Borrow，再重新从跟节点拿下来WLock(Optimistic)

Iterator

Iterator是从左往右获得叶子节点的锁，然而当Merge或者Borrow时，可能会相反，造成deadlock

We do not test your iterator for thread-safe leaf scans. A correct implementation, however, would require the Leaf Page to throw a std::exception when it cannot acquire a latch on its sibling to avoid potential dead-locks.

但是事实上我们无法修改RLatch的实现(修改的page.h并不在gradescope提交文件中)

Deal with concurrent

任何的拿锁释放锁都是有道理的，不要为了实现功能随意的拿锁释放锁
涉及到多个锁，要考虑不同情况下的拿锁顺序，考虑死锁

遇到的问题

一开始的实现中，在LeafPage Remove时，将Lowerbound(key)到GetSize()依次向前移动，最后忘了清零array_[GetSize()]。而Lowerbound当给定的key大于所有key的时候会返回GetSize()，这样造成了问题：解决方法是拿到LowerBound结果后验证是否小于GetSize()——Commit
有两个MixTest1和InsertTest2死活过不去，发现原来是自己的MoveEndToHeadOf函数写错了，这个函数是Split后进行调整的，比如说Size=5的节点，Split再Insert可能会造成一个Size=4另一个Size=2，需要调整下