跳转至

Chap 10: Indexing

2495 个字 3 行代码 22 张图片 预计阅读时间 25 分钟

Basic Concepts

Indexing mechanisms used to speed up access to desired data.
索引用来加速查找。

Search Key - attribute to set of attributes used to look up records in a file.
An index file consists of records (called index entries) of the form.
索引文件通常是有顺序的

Index Evaluation Metrics

  • Access types supported efficiently
    • Point query: records with a specified value in the attribute.
      点查询
    • Range query: records with an attribute value falling in a specified range of values.
      范围查询
  • Access time
  • Insertion time
  • Deletion time
  • Space overhead

Ordered Indices

  • Primary index(主索引): in a sequentially ordered file, the index whose search key specifies the sequential order of the file.
    • Also called clustering index(聚集索引)
    • The search key of a primary index is usually but not necessarily the primary key.
  • Secondary index(辅助索引): an index whose search key specifies an order different from the sequential order of the file. Also called non-clustering index.
Example

主索引和数据内的顺序是一样的。点查和范围查都是比较高效的。

如果 key 不是一个主键,那可能会对应多个记录。

Primary index 是很宝贵的,只能有一个,其他都是辅助索引。

  • Dense index( 稠密索引 ) — Index record appears for every search-key value in the file.
    所有 search-key 都要出现在索引文件里。
  • Sparse Index(稀疏索引): contains index records for only some search-key values.
Example

Good tradeoff: sparse index with an index entry for every block in file, corresponding to least search-key value in the block.

Multilevel Index

If primary index does not fit in memory, access becomes expensive.
可以对索引文件本身再建立一次索引。

B+ Tree Index

  • All paths from root to leaf are of the same length
  • Inner node(not a root or a leaf): between \(\lceil n/2\rceil\) and \(n\) children.
  • Leaf node: between \(\lceil (n–1)/2\rceil\) and \(n–1\) values
  • Special cases:
    • If the root is not a leaf:
      at least 2 children.
    • If the root is a leaf :
      between 0 and (n–1)values.

一般一个节点就是一个块的大小 , 4K.
B+ 树的叉是非常大的。

Observations about B+ Trees

Since the inter-node connections are done by pointers, “logically” close blocks need not be “physically” close. 如果有很多文件一次性建立 B+ 树,我们可以从叶子节点开始建立。

如果有 K 个索引项,则树高度不会超过 \(\lceil \log_{\lceil n/2 \rceil}(K/2)\rceil + 1\).
高度最小为 \(\log_n(K)\)

Examples of Insert on B+ Tree

注意内点的 split 和叶子的不一样。要把中间的节点 move 上去。

Examples of Delete on B+ Tree

中间点如果不够,从另外一边借一个过来。但是不能直接借,需要把它顶上去。

注意考虑和兄弟合并。

B+- tree : height and size estimation

全满的时候节点最少
注意这里的向上/向下取整问题
浅层节点个数少,我们可以把第一层和第二层的节点都放到内存中 pin 住。

B+ Tree File Organization

文件组织
B+ Tree File Organization:

  • Leaf nodes in a B+ Tree file organization store records, instead of pointers
    叶子节点不再放索引项,放记录本身。
  • Helps keep data records clustered even when there are insertions/deletions/updates

我们可以改变半满的要求以提高空间利用率。

Other Issues in Indexing

  • Record relocation and secondary indices
    If a record moves, all secondary indices that store record pointers have to be updated
    Node splits in B+ Tree file organizations become very expensive
    Solution: use primary-index search key instead of record pointer in secondary index
  • Variable length strings as keys
    Variable fanout
  • Prefix compression
    Key values at internal nodes can be prefixes of full key
    Keep enough characters to distinguish entries in the subtrees separated by the key value

Multiple-Key Access

Composite search keys are search keys containing more than one attribute
e.g. (dept_name, salary)

Lexicographic ordering: \((a_1, a_2) < (b_1, b_2)\) if either \(a_1 < b_1\), or \(a_1=b_1\) and \(a_2 < b_2\).

单个 key, 不同 key 之间组合都可以建立 B+ 树。这样会有很多组合,可以在频繁出现的查询属性上建立 B+ 树。

Non-unique Search Keys

我们可以在 B+ 树叶子节点不直接指向磁盘里的数据,而是指向一个块。
也可以在索引上加上去一个索引,使它对应的记录唯一。
可以通过范围查找

Bulk Loading and Bottom-Up Build

Inserting entries one-at-a-time into a B+ Tree requires \(\geq 1\) IO per entry

如果我们一次性插入很多索引项

  • Efficient alternative 1: Insert in sorted order
    局部性较好,减少 I/O.
  • Efficient alternative 2: Bottom-up B+ Tree construction
    • First sort index entries
    • Then create B+ Tree layer-by-layer, starting with leaf level
    • The built B+ Tree is written to disk using sequential I/O operations

Example

如果要排序的内容较大,无法放下内存,可以使用外部排序。
fanout 可以计算出来。
可以用 level-order 写到磁盘里,便于顺序访问所有索引,此时块是连续的。(便于顺序访问所有数据项)
这里的代价就是建好后,一次 seek 后全部写出去 (9 blocks)

Bulk insert index entries

Example

把刚刚那棵 B+ 树叶子节点(即遍历所有数据)需要 1seek+6blocks. 随后和上面的数据合并后,写回磁盘时需要 1seek+13blocks.

Merge two existing two B+ Trees , to create a new B+ Tree using the Bottom-UP Build algorithm, as in LSM-tree Index
假设有两棵这样生成的 B+ 树,将他们合并在一起。首先把叶子节点拿出来排序。

Indexing in Main Memory

cache cache line 传输 , 只有 64B.

  • Random access in memory
    • Much cheaper than on disk/flash, but still expensive compared to cache read
    • Binary search for a key value within a large B+ Tree node results in many cache misses
      二分查找可能带来很多 cache miss.
    • Data structures that make best use of cache preferable – cache conscious

B+- trees with small nodes that fit in cache line are preferable to reduce cache misses

并且由于顺着树往下查时只需要用到 search key,所以 search key pointer node 中可以分开放。

Key idea:

  • use large node size to optimize disk access,
  • but structure data within a node using a tree with small node size, instead of using an array, to optimize cache access.

Indexing on Flash

Flash 里不是即时修改,而是先擦掉再写。同时擦的次数是有限制的。因此最好的方法是从底构建,然后顺序写入。

  • Random I/O cost much lower on flash
    20 to 100 microseconds for read/write
  • Writes are not in-place, and (eventually) require a more expensive erase Optimum page size therefore much smaller
  • Bulk-loading still useful since it minimizes page erases
  • Write-optimized tree structures (i.e., LSM-tree, buffer tree) have been adapted to minimize page writes for flash-optimized search trees

下面介绍一些写优化索引结构

Log Structured Merge (LSM) Tree

  • Records inserted first into in-memory tree (\(L_0\) tree)
  • When in-memory tree is full, records moved to disk (\(L_1\) tree)
    B+ Tree constructed using bottom-up build by merging existing \(L_1\) tree with records from \(L_0\) tree
    内存里的 B+ 树如果满了,就马上写到磁盘里去(可以连续写)
  • When \(L_1\) tree exceeds some threshold, merge into \(L_2\) tree
    And so on for more levels
    Size threshold for \(L_{i+1}\) tree is \(k\) times size threshold for \(L_i\) tree

这样我们把随机写变为了顺序写。但此时查找一个索引,就要遍历所有 B+ 树。

  • Benefits of LSM approach
    • Inserts are done using only sequential I/O operations
    • Leaves are full, avoiding space wastage
    • Reduced number of I/O operations per record inserted as compared to normal B+ Tree (up to some size)
  • Drawback of LSM approach
    • Queries have to search multiple trees
    • Entire content of each level copied multiple times

Stepped Merge Index
原先的结构中,Merge 操作太多,我们可以一次性合并。 磁盘上每层有 k 棵树,当 k 个索引处于同一层时,合并它们并得到一棵层数 \(+1\) 的树,写回。
\(L_1\) 有大小限制,达到后会生成另一个 B+ 树,依次为 k 倍。

  • Reduces write cost compared to LSM tree
  • But queries are even more expensive since many trees need to be queries

Optimization for point lookups - Compute Bloom filter for each tree and store in-memory - Query a tree only if Bloom filter returns a positive result

Bloom filter

布隆过滤器 (Bloom filter) 是一种空间效率极高的 概率型数据结构,用于快速判断某个元素是否 可能存在于集合中,或 肯定不存在于集合中。在数据库中常用于加速查询,减少不必要的磁盘访问。

  • 核心思想

    • 位数组 + 哈希函数:
      • 初始化一个长度为 m 的二进制位数组(全 0
      • 使用 k 个哈希函数将元素映射到位数组的 k 个位置。
    • 插入元素:将元素哈希后的 k 个位置置 1
    • 查询元素:检查元素哈希后的 k 个位置是否全为 1
      • 若全为 1,元素 可能存在(可能误判
      • 若有任一位置为 0,元素 一定不存在。
  • 在数据库中的应用

    • LSM 树查询优化:
      • 每个 SSTable 关联一个布隆过滤器。
      • 查询时先检查布隆过滤器,若返回“不存在”,直接跳过该文件,避免无效磁盘 I/O

LSM 的删除和更新如下

  • Deletion handled by adding special “delete” entries
    • Lookups will find both original entry and the delete entry, and must return only those entries that do not have matching delete entry
    • When trees are merged, if we find a delete entry matching an original entry, both are dropped.
  • Update handled using insert+delete
  • LSM trees were introduced for disk-based indices
    • But useful to minimize erases with flash-based indices
    • The stepped-merge variant of LSM trees is used in many BigData storage systems
    • Google BigTable, Apache Cassandra, MongoDB
    • And more recently in SQLite4, LevelDB, and MyRocks storage engine of MySQL

Buffer tree

sjl 上课没讲过,但是说上次期末考了这个东西,让我们自学

  • Key idea: each internal node of B+ tree has a buffer to store inserts
    • Inserts are moved to lower levels when buffer is full
    • With a large buffer, many records are moved to lower level each time
    • Per record I/O decreases correspondingly
  • Benefits
    • Less overhead on queries
    • Can be used with any tree index structure
    • Used in PostgreSQL Generalized Search Tree (GiST) indices
  • Drawback: more random I/O than LSM tree

简要来说就是每插入一条记录,不会遍历节点到叶子节点中插入,而是直接插入在根节点的缓冲区中,直到缓冲区满再逐级向下搬

总体而言,读操作相比 LSM 树更快,写操作相比 LSM 树更慢。

Bitmap indices

这里上课也没讲过,但是小测和 ppt 中都有涉及

  • Bitmap indices are a special type of index designed for efficient querying on multiple keys
  • Applicable on attributes that take on a relatively small number of distinct values
    • E.g., gender, country, state, …
    • E.g., income-level (income broken up into a small number of levels such as 0-9999, 10000-19999, 20000-50000, 50000- infinity)
  • A bitmap is simply an array of bits

位图索引 (bitmap indices) 是一类适用于对多个键做简单查询的索引。在使用位图索引前,需要为关系中的每条记录标号(从 0 开始。如果记录的大小固定,且被分配在某个文件内的连续块上,这一操作还是很容易的,此时记录编号就可以被转换为块编号。

位图 (bitmap) 就是一组位,对于关系 \(r\) 的属性 \(A\),位图索引包含了 \(A\) 可取的每个值,而位的数量对应记录的数量。对于某个值 \(v_j\)​ 的位图,如果编号为 \(i\) 的记录的属性值为 \(v_j\)​,那么该位图的第 \(i\) 位置 1,否则置 0

下面就是一个位图索引的例子:

对于以下查询:

SELECT *
FROM instructor_info
WHERE gender = 'f' AND income_level = 'L2';

我们找到 gender 属性值为 f,以及 income_levelincome_level 属性值为 L2 的位图,然后对这两个位图进行交 (intersection) 运算(实际上是一个逻辑与的运算。根据上图,gender = f 对应的位图为 (01101)income_level = L2 对应的位图为 (01000),交运算后的位图是 01000,也就是说,编号为 1 的记录就是我们要查询的记录。