数据库并发控制 (2) -- InnoDB 并发控制

# 系列目录

# 参考

# 并发控制

并发控制 (Concurrency Control, CC)

数据库管理系统（DBMS）中的并发控制的任务是确保在多个事务同时访问数据库中同一数据时不破坏ACID。

并发控制里，最重要的大前提就是保证统一性。不管底层怎么折腾、不管使用什么高端技术，前提就是保证 ACID。只有在保证 ACID 的前提下，我们才能讨论利用什么方法提升数据库性能。

并发控制方法的分类众说纷纭，我个人倾向于分为 Lock-Based Concurrency Control (LBCC) 和 Multiversion Concurreny Control (MVCC) 两类。基于锁的并发控制比较好理解，读/写的时候加上锁，来防止其他事务访问。很清晰简洁的方案，但是并发度很低。MVCC 就利用保存多个版本，替代了数据库锁。

# MySQL 多版本并发控制

多版本并发控制（Multiversion CC, MVCC）

多版本并发控制通过保存一条记录的多个副本，替代了数据库锁，使得读写和写读可以并发。

InnoDB 是在 undolog 中实现的，通过 undolog 可以找回数据的历史版本。找回的数据历史版本可以提供给用户读(按照隔离级别的定义，有些读请求只能看到比较老的数据版本)，也可以在回滚的时候覆盖数据页上的数据。在InnoDB内部中，会记录一个全局的活跃读写事务数组，其主要用来判断事务的可见性。

对于用户删除的数据，InnoDB 并不是立刻删除，而是标记一下，后台线程批量的真正删除。类似的还有InnoDB的二级索引的更新操作，不是直接对索引进行更新，而是标记一下，然后产生一条新的。这个线程就是后台的Purge线程。此外，过期的undolog也需要回收，这里说的过期，指的是undo不需要被用来构建之前的版本，也不需要用来回滚事务。

# InnoDB 为每一行添加的字段

参考：MySQL 8.0 Reference Manual :: 15.3 InnoDB Multi-Versioning (opens new window)

InnoDB 为每一行多记录了三个字段：

字段名	长度	意义	备注
`DB_TRX_ID`	6 字节	表示最后插入/更新该行的事务 id	删除操作视为更新，同时会将其中的一位置为 `1` 表示 deleted
`DB_ROLL_PTR`	7 字节	指向该次插入/更新对应的 undo logs
`DB_ROW_ID`	6 字节	自增。如果有聚集索引，字段会出现在索引中	和 MVCC 无关

# undo logs

在 InnoDB 中，undo logs 分为 insert undo logs 和 update undo logs。

insert undo logs 对其他事务没啥用：其他事务只需要根据查到的行的 DB_TRX_ID，就知道该不该看到这条记录，不需要读 insert undo log 的具体内容。所以 insert undo logs 只有在回滚的时候才会起作用，在该事务提交以后就可以删除 insert undo logs 了。

而 update undo logs 就麻烦很多了：其他事务发现查到的行的 DB_TRX_ID 是自己不该看到的事务，就要根据 DB_ROLL_PTR 找到此更新对应的 undo logs，找到更新前的信息（包括更新前的值和 DB_TRX_ID DB_ROLL_PTR）。如果这行的 DB_TRX_ID 自己仍然不应该看到，就还要通过这行 DB_ROLL_PTR 继续往前找。

所以，这就是多版本的含义：InnoDB 通过 undo logs，存储了多个事务的 undo logs。如果需要，可以通过 undo logs 还原整个事务应该看到的整张表。

# ReadView

定义：每个事务持有一个 ReadView，表示本 ReadView（或者说本事务）能看到哪些事务写的记录
创建时机：可重复读 (RR) 隔离级别下，在事务的第一次读时创建；读提交（RC）隔离级别下，每次读时创建。这是 RR 和 RC 在实现上的唯一不同
ReadView 创建后不会再变动。

问：一个事务能看到哪些事务写的记录 ——答：自己和在创建 ReadView 时已经提交了的事务
最简单的实现：将所有已提交的事务 id 记录到一个数组 m_ids 中（问题是已提交的事务太多）
优化：大于某个值的肯定都是没提交的，小于某个值的肯定都是提交了的，数组只需要记录中间范围内已提交的事务
- 上界 m_low_limit_id：设为此时还没分配的 id，大于等于该值的一定未提交
- 下界 m_up_limit_id：设为未提交事务的最小 id，小于该值的一定已提交
小优化：从记录已经提交的事务，改为记录未提交的事务（未提交的事务数会少很多）
- 未提交的事务又叫活跃事务

# 源码

源码截取了三个部分：数据结构、ReadView 的初始化，和判断是否可以看到某事务。

class ReadView {
    // trx id >= 该值的事务一定未提交
    // this is the "high water mark"
    trx_id_t m_low_limit_id;

    // trx id < 该值的事务一定已提交
    // this is the low water mark"
    trx_id_t m_up_limit_id;

    // 该 ReadView 所属事务的 id
    trx_id_t m_creator_trx_id;

    // 活跃事务 id 的数组
    ids_t m_ids;

    /** The view does not need to see the undo logs for transactions
    whose transaction number is strictly smaller (<) than this value:
    they can be removed in purge if not needed by other views */
    trx_id_t m_low_limit_no;


    // 初始化 ReadView 的过程
    void prepare(trx_id_t id) {
        ut_ad(trx_sys_mutex_own());

        m_creator_trx_id = id;

        m_low_limit_no = trx_get_serialisation_min_trx_no();

        m_low_limit_id = trx_sys_get_next_trx_id_or_no();

        ut_a(m_low_limit_no <= m_low_limit_id);

        if (!trx_sys->rw_trx_ids.empty()) {
            copy_trx_ids(trx_sys->rw_trx_ids);
        } else {
            m_ids.clear();
        }

        /* The first active transaction has the smallest id. */
        m_up_limit_id = !m_ids.empty() ? m_ids.front() : m_low_limit_id;

        ut_a(m_up_limit_id <= m_low_limit_id);

        ut_d(m_view_low_limit_no = m_low_limit_no);
        m_closed = false;
    }


    // 根据 m_up_limit_id m_low_limit_id m_ids m_creator_trx_id 判断是否可以看到某事物
    bool changes_visible(trx_id_t id, const table_name_t &name) const
        MY_ATTRIBUTE((warn_unused_result)) {
        ut_ad(id > 0);

        if (id < m_up_limit_id || id == m_creator_trx_id) {
            return (true);
        }

        check_trx_id_sanity(id, name);

        if (id >= m_low_limit_id) {
            return (false);

        } else if (m_ids.empty()) {
            return (true);
        }

        const ids_t::value_type *p = m_ids.data();

        return (!std::binary_search(p, p + m_ids.size(), id));
    }
}

# 例子

隔离级别：可重复读 (RR)
当前活跃事务：(4, 8)
开启事务：10
- m_up_limit_id = 11, m_low_limit_id = 4, m_ids = [4, 8, 10]
读取数据的事务 id 为 1
- 1 < m_low_limit_id = 4，很早之前就提交了，可见
读取数据的事务 id 为 12
- 12 > m_up_limit_id = 11，创建时还没提交，不可见
- 从这条数据的 undo log 找到前一个版本，找到修改者是 8，在 m_ids 中，是活跃事务（未提交），不可见
- 从前一个版本的 undo log 找到前两个版本，找到修改者是 6，不在 m_ids 中，不是活跃事务（未提交），可见

# 回滚段

回滚段 (rollback segment) 装有很多 undo log。

// 一个事务中的回滚段
struct trx_rsegs_t {
    // 这里将 undo log 分为两类：实体表的 undo log 和临时表的 undo log

    // 实体表的 undo log，需要写 redo
    trx_undo_ptr_t m_redo;

    // 临时表的 undo log，不需要写 redo
    trx_undo_ptr_t m_noredo;
};

/** Represents an instance of rollback segment along with its state variables.*/
struct trx_undo_ptr_t {
    trx_rseg_t *rseg;        // undo log 所在回滚段
    trx_undo_t *insert_undo; // insert undo log
    trx_undo_t *update_undo; // update undo log
};

// undo segment 在内存中的结构，每个 undo segement 一个
struct trx_rseg_t {
    /** List of update undo logs */
    Undo_list update_undo_list;

    /** List of update undo log segments cached for fast reuse */
    Undo_list update_undo_cached;

    /** List of insert undo logs */
    Undo_list insert_undo_list;

    /** List of insert undo log segments cached for fast reuse */
    Undo_list insert_undo_cached;
}

// undo log
struct trx_undo_t {

    // 链表结点，存储了 previous 和 next
    UT_LIST_NODE_T(trx_undo_t) undo_list;
}

id 在事务刚创建时分配，用于区分不同的事务。
no 是在事务提交前，通过同一个全局 id 生产器产生的，主要目的是为了确定事务提交的顺序，保证加入到history list中的update undo有序，方便purge线程清理。

# trx_t

文档 (opens new window)

每个连接持有一个，这个连接的所有事务一直复用里面的数据结构。

事务启动后，会把这个结构体加入到全局事务链表trx_sys->mysql_trx_list
如果是读写事务，还会加入到全局读写事务链表 trx_sys->rw_trx_list
在事务提交的时候，还会加入到全局提交事务链表trx_sys->trx_serial_list

// 事务
struct trx_t {
    ulint isolation_level;
    trx_id_t 	id,
    trx_id_t 	no,
    ReadView * 	read_view,
    trx_rsegs_t rsegs;    // 一个事务中的回滚段
};

# 事务系统 trx_sys 源码

// /storage/innobase/include/trx0sys.h

/** The transaction system */
extern trx_sys_t *trx_sys;

// /storage/innobase/include/trx0sys.h

struct trx_sys_t {
    // Rsegs 为 vector<trx_rseg_t *> 的封装
    // 记录了实体表的所有回滚段
    Rsegs rsegs;
    // 记录了临时表的所有回滚段
    Rsegs tmp_rsegs;

    // 系统当前还未分配的最小事务id
    std::atomic<trx_id_t> next_trx_id_or_no;
    // 活跃的最小 id，用于 purge
    std::atomic<trx_id_t> min_active_trx_id;

    // 存放当前系统的所有读写事务，包括活跃的和已经提交的事务。按照事务id排序，此外，奔溃恢复后产生的事务和系统的事务也放在上面
    UT_LIST_BASE_NODE_T(trx_t, trx_list) rw_trx_list;
    // 存放所有用户创建的事务，系统的事务和奔溃恢复后的事务不会在这个链表上，但是这个链表上可能会有还没开始的用户事务。
    UT_LIST_BASE_NODE_T(trx_t, mysql_trx_list) mysql_trx_list;
}

这里有一个小插曲：读到这部分代码的时候，注意到出现了很多个没有用的 pad。这些 pad 没有初始化，也没有引用，注释里写的是 “被互斥锁 mutex 保护的成员”，不明白是什么意思。

struct trx_sys_t {
    /* Members protected by neither trx_sys_t::mutex nor serialisation_mutex. */
    char pad0[ut::INNODB_CACHE_LINE_SIZE];

    //...

    char pad1[ut::INNODB_CACHE_LINE_SIZE];

    //...

    char pad2[ut::INNODB_CACHE_LINE_SIZE];

    // ...

    char pad7[ut::INNODB_CACHE_LINE_SIZE];

    //...

    char pad_after[ut::INNODB_CACHE_LINE_SIZE];
};

一番搜索以后，找到一篇文章剖析Disruptor:为什么会这么快？（二）神奇的缓存行填充 (opens new window)。大概意思是这种填充是在数据中间填充 CPU cache line 大小的 pad，故意把 pad 上下的数据分隔开，避免这些数据被一次性读进一个 cache line。两个数据被读进一个 cache line 的后果是，一个核想更新一个数据，另一个核想更新另一个数据，就会导致这两个核出现 false sharing，导致多次缓存失效。

false sharing

# Purge 系统 purge_sys

Purge 系统是和事务系统独立的另一个系统。

// storage/innobase/trx/trx0purge.cc

/** The global data structure coordinating a purge */
trx_purge_t *purge_sys = nullptr;

服务启动的时候会创建 purge_queue，并给到 perge_sys 初始化函数以初始化 perge_sys。

dberr_t srv_start(bool create_new_db) {
    purge_queue = trx_sys_init_at_db_start();

    /* The purge system needs to create the purge view and
    therefore requires that the trx_sys is inited. */

    trx_purge_sys_initialize(srv_threads.m_purge_workers_n, purge_queue);

}