Small. Fast. Reliable.
Choose any three.
SQLite Database File Format
Table Of Contents
Javascript is required for some features of this document, including table of contents, figure numbering and internal references (section numbers and hyper-links.

Document Overview

Scope and Purpose

This document is designed to serve two purposes:

Exactly how the database file is created and safely updated on the persistent media is outside the scope of this document. As such no mention of journal or statement files is made. Database transactions are referred to only with respect to those file manipulation operations (i.e. change-counter management and database reorganization in auto-vacuum mode) that occur once per transaction. Here we are concerned solely with the arrangement of bytes in the database file, not the interactions between the SQLite library and the VFS (Virtual File System) interface.

Similarly, the various interfaces and SQL language features that may be used to manipulate the contents of a database are not dealt with here. This document describes the effect of various operations on the database, such as creating a table or inserting a new record. The myriad of ways that these operations or sets of these operations may be achieved using SQLite are dealt with elsewhere.

Add references to the documents that do describe these things. One other document will concentrate on the pager module and the way it uses the VFS interface to safely create and update database files. The other will be the document that describes the supported SQL language and capabilities.

Document and Requirements Organization

Section sqlite_database_files contains simple requirements describing the relationship between SQLite and the definition of a well-formed SQLite database file.

Section database_file_format describes the various fields and sub-structures that make up the SQLite database file format.

Glossary

Auto-vacuum last root-page A page number stored as 32-bit integer at byte offset 52 of the database file header (see section file_header). In an auto-vacuum database, this is the numerically largest root-page number in the database. Additionally, all pages that occur before this page in the database are either B-Tree root pages, pointer-map pages or the locking page.
Auto-vacuum database Each database is either an auto-vacuum database or a non auto-vacuum database. Auto-vacuum databases feature pointer-map pages (section pointer_map_pages) and have a non-zero value stored as a 4-byte big-endian integer at offset 52 of the file header (section file_header).
B-Tree A B-Tree is a tree structure optimized for offline storage. The table and index data in an SQLite database file is stored in B-Tree structures.
B-Tree cell Each database page that is part of a B-Tree structure contains zero or more B-Tree cells. A B-Tree cell contains a single B-Tree key value (either an integer or database record) and optionally an associated database record value.
B-Tree page A database page that is part of a B-Tree tree structure (not an overflow page).
(B-Tree) page header The 8 (leaf pages) or 12 (internal node pages) byte header that occurs at the start of each B-Tree page.
Cell content area The area within a B-Tree page in which the B-Tree cells are stored.
(Database) text encoding The text encoding used for all text values in the database file. One of UTF-8, big-endian UTF-16 and little-endian UTF-16. The database text encoding is defined by a 4 byte field stored at byte offset 56 of the database file header (see section file_header).
(Database) file header The first 100 bytes of an SQLite database file constitute the database file header.
(Database) page size An SQLite database file is divided into one or more pages of page-size bytes each.
Database record A database record is a blob of data containing the serialized representation of an ordered list of one or more SQL values.
Database record header The first part of each database record contains the database record header. It encodes the types and lengths of values stored in the record (see section record_format).
Database record data area Following the database record header in each database record is the database record data area. It contains the actual data (string content, numeric value etc.) of all values in the record (see section record_format).
Default pager cache size A 32-bit integer field stored at byte offset 48 of the database file header (see section file_header).
(Database) usable page size The number of bytes of each database page that is usable. This is the page-size less the number of bytes left unused at the end of each page. The number of bytes left unused is governed by the value stored at offset 20 of the file header (see section file_header).
File format read version Single byte field stored at byte offset 20 of the database file header (see section file_header).
File format write version Single byte field stored at byte offset 19 of the database file header (see section file_header).
File change counter A 32-bit integer field stored at byte offset 24 of the database file header (see section file_header). Normally, SQLite increments this value each time it commits a transaction.
Fragment A block of 3 or less bytes of unused space within the cell content area of a B-Tree page.
Free block A block of 4 or more bytes of unused space within the cell content area of a B-Tree page.
Free block list The linked list of all free blocks on a single B-Tree page (see section index_btree_page_format).
Free page A page that is not currently being used to store any database data or meta data. Part of the free-page list.
Free page list A data structure within an SQLite database file that links all the free-pages together.
Index B-Tree One of two variants on the B-Tree data structure used within SQLite database files. An index B-Tree (section index_btrees) uses database records as keys.
Incremental Vacuum flag A 32-bit integer field stored at byte offset 64 of the database file header (see section file_header). In auto-vacuum databases, if this field is non-zero then the database is not automatically compacted at the end of each transaction.
Locking page The database page that begins at the 1GB (230 byte) boundary. This page is always left unused.
Logical database An SQLite database file is a serialized representation of a logical database. A logical database consists of the SQL database schema, the content of the various tables in the database, and assorted database properties that may be set by the user (auto-vacuum, page-size, user-cookie value etc.),
Non-auto-vacuum database Any database that is not an auto-vacuum database. A non-auto-vacuum database contains no pointer-map pages and has a zero value stored in the 4-byte big-endian integer field at offset 52 of the database file header (section file_header).
Overflow chain A linked list of overflow pages across which a single (large) database record is stored (see section overflow_page_chains).
Overflow page If a B-Tree cell is too large to store within a B-Tree page, a portion of it is stored using a chain of one or more overflow pages (see section overflow_page_chains).
Pointer-map page A database page used to store meta data only present in auto-vacuum databases (see section pointer_map_pages).
Right child page Each internal B-Tree node page has one or more child pages. The rightmost of these (the one containing the largest key values) is known as the right child page.
Root page A root page is a database page used to store the root node of a B-Tree data structure.
Schema layer file format An integer between 1 and 4 stored as a 4 byte big-endian integer at offset 44 of the file header (section file_header). Certain file format constructions may only be present in databases with a certain minimum schema layer file format value.
Schema table The table B-Tree with root-page 1 used to store database records describing the database schema. Accessible as the "sqlite_master" table from within SQLite.
Schema version A 32-bit integer field stored at byte offset 40 of the database file header (see section file_header). Normally, SQLite increments this value each time it modifies the databas schema.
Table B-Tree One of two variants on the B-Tree data structure used within SQLite database files. A table B-Tree (section table_btrees) uses 64 bit integers as key values and stores an associated database record along with each key value.
User cookie A 32-bit integer field stored at byte offset 60 of the database file header (see section file_header). Normally, SQLite increments this value each time it modifies the databas schema.
Variable Length Integer A format used for storing 64-bit signed integer values in SQLite database files. Consumes between 1 and 9 bytes of space, depending on the precise value being stored.
Well formed database file An SQLite database file that meets all the criteria laid out in section database_file_format of this document.

SQLite Database Files

The bulk of this document, section database_file_format, contains the definition of a well-formed SQLite database file. SQLite is required to create database files that meet this definition.

The system shall ensure that at the successful conclusion of a database transaction the contents of the database file constitute a well-formed SQLite database file.

Additionally, the database file should contain a serialized version of the logical database produced by the transaction. For all but the most trivial logical databases, there are many possible serial representations.

The system shall ensure that at the successful conclusion of a database transaction the contents of the database file are a valid serialization of the contents of the logical SQL database produced by the transaction.

Database File Format Details

This section describes the various fields and sub-structures that make up the format used by SQLite to serialize a logical SQL database.

This section does not contain requirements governing the behaviour of any software system. Instead, along with the plain language description of the file format are a series of succinct, testable statements describing the properties of "well-formed SQLite database files". Some of these statements describe the contents of the database file in terms of the contents of the logical SQL database that it is a serialization of. e.g. "For each SQL table in the database, the database file shall...". The contents and properties of a logical database include:

This concept of a logical database should be defined properly in some requirements document so that it can be referenced from here and other places. The definition will be something like the list of bullet points above.

A well-formed SQLite database file is defined as a file for which all of the statements itemized as requirements within this section are true. mention the requirements numbering scheme here. A software system that wishes to interoperate with other systems using the SQLite database file format should only ever output well-formed SQLite databases. In the case of SQLite itself, the system should ensure that the database file is well-formed at the conclusion of each database transaction.

File Format Overview

A B-Tree is a data structure designed for offline storage of a set of unique key values. It is structured so as to support fast querying for a single key or range of keys. As implemented in SQLite, each entry may be associated with a blob of data that is not part of the key. For the canonical introduction to the B-Tree and its variants, refer to reference ref_comer_btree. The B-Tree implementation in SQLite also adopts some of the enhancements suggested in ref_knuth_btree

An SQLite database file contains one or more B-Tree structures. Each B-Tree structure stores the data for a single database table or index. Hence each database file contains a single B-Tree to store the contents of the sqlite_master table, and one B-Tree for each database table or index created by the user. If the database uses auto-increment integer primary keys, then the database file also contains a B-Tree to store the contents of the automatically created sqlite_sequence table.

SQLite uses two distinct variants of the B-Tree structure. One variant, hereafter refered to as a "table B-Tree" uses signed 64-bit integer values for keys. Each entry has an associated variable length blob of data used to store a database record (see section record_format). Each SQLite database file contains one table B-Tree for the schema table and one table B-Tree for each additional database table created by the user.

A database record is a blob of data containing an ordered list of SQL values (integers, real values, NULL values, blobs or strings). For each row in each table at the SQL level, there is an entry in the corresponding table B-Tree structure in the file. The entry key is same as the SQL "rowid" or "integer primary key" field of the table row. The associated database record is made up of the row's column values, in declaration (CREATE TABLE) order.

The other B-Tree variant used by SQLite, hereafter an "index B-Tree" uses database records (section record_format) as keys. For this kind of B-Tree, there is no additional data associated with each entry. SQLite databases contain an index B-Tree for each database index created by the user. Database indexes may be created by CREATE INDEX statements, or by UNIQUE or PRIMARY KEY (but not INTEGER PRIMARY KEY) clauses added to CREATE TABLE statements.

Index B-Tree structures contain one entry for each row in the associated table at the SQL level. The database record used as the key consists of the row's value for each of the indexed columns in declaration (CREATE INDEX) order, followed by the row's "rowid" or "integer primary key" column value.

For example, the following SQL script:

      CREATE TABLE t1(a INTEGER PRIMARY KEY, b, c, d);
      CREATE INDEX i1 ON t1(d, c);

      INSERT INTO t1 VALUES(1, 'triangle', 3, 180, 'green');
      INSERT INTO t1 VALUES(2, 'square',   4, 360, 'gold');
      INSERT INTO t1 VALUES(3, 'pentagon', 5, 540, 'grey');
      ...

Creates a database file containing three B-Tree structures: one table B-Tree to store the sqlite_master table, one table B-Tree to store table "t1", and one index B-Tree to store index "i1". The B-Tree structures created for the user table and index are populated as shown in figure figure_examplepop.

Figure - Example B-Tree Data.

Global Structure

The following sections and sub-sections describe precisely the format used to house the B-Tree structures within an SQLite database file.

File Header

Each SQLite database file begins with a 100-byte header. The header file consists of a well known 16-byte sequence followed by a series of 1, 2 and 4 byte unsigned integers. All integers in the file header (as well as the rest of the database file) are stored in big-endian format.

The well known 16-byte sequence that begins every SQLite database file is:

          0x53 0x51 0x4c 0x69 0x74 0x65 0x20 0x66 0x6f 0x72 0x6d 0x61 0x74 0x20 0x33 0x00

Interpreted as UTF-8 encoded text, this byte sequence corresponds to the string "SQLite format 3" followed by a nul-terminator byte.

The 1, 2 and 4 byte unsigned integers that make up the rest of the database file header are described in the following table.

Byte Range Byte Size Description
16..17 2 Database page size in bytes. See section pages_and_page_types for details.
18 1

File-format "write version". Currently, this field is always set to 1. If a value greater than 1 is read by SQLite, then the library will only open the file for read-only access.

This field and the next one are intended to be used for forwards compatibility, should the need ever arise. If in the future a version of SQLite is created that uses a file format that may be safely read but not written by older versions of SQLite, then this field will be set to a value greater than 1 to prevent older SQLite versions from writing to a file that uses the new format.

19 1

File-format "read version". Currently, this field is always set to 1. If a value greater than 1 is read by SQLite, then the library will refuse to open the database

Like the "write version" described above, this field exists to facilitate some degree of forwards compatibility, in case it is ever required. If a version of SQLite created in the future uses a file format that may not be safely read by older SQLite versions, then this field will be set to a value greater than 1.

20 1 Number of bytes of unused space at the end of each database page. Usually this field is set to 0. If it is non-zero, then it contains the number of bytes that are left unused at the end of every database page (see section pages_and_page_types for a description of a database page).
21 1 Maximum fraction of an index tree page to use for embedded content. This value is used to determine the maximum size of a B-Tree cell to store as embedded content on a page that is part of an index B-Tree. Refer to section index_btree_cell_format for details.
22 1 Minimum fraction of an index B-Tree page to use for embedded content when an entry uses one or more overflow pages. This value is used to determine the portion of a B-Tree cell that requires one or more overflow pages to store as embedded content on a page that is part of an index B-Tree. Refer to section index_btree_cell_format for details.
23 1 Minimum fraction of an table B-Tree leaf page to use for embedded content when an entry uses one or more overflow pages. This value is used to determine the portion of a B-Tree cell that requires one or more overflow pages to store as embedded content on a page that is a leaf of a table B-Tree. Refer to section table_btree_cell_format for details.
24..27 4

The file change counter. Each time a database transaction is committed, the value of the 32-bit unsigned integer stored in this field is incremented.

SQLite uses this field to test the validity of its internal cache. After unlocking the database file, SQLite may retain a portion of the file cached in memory. However, since the file is unlocked, another process may use SQLite to modify the contents of the file, invalidating the internal cache of the first process. When the file is relocked, the first process can check if the value of the file change counter has been modified since the file was unlocked. If it has not, then the internal cache may be assumed to be valid and may be reused.

32..35 4 Page number of first freelist trunk page. For more details, refer to section free_page_list.
36..39 4 Number of free pages in the database file. For more details, refer to section free_page_list.
40..43 4 The schema version. Each time the database schema is modified (by creating or deleting a database table, index, trigger or view) the value of the 32-bit unsigned integer stored in this field is incremented.
44..47 4

Schema layer file-format. This value is similar to the "read-version" and "write-version" fields at offsets 18 and 19 of the database file header. If SQLite encounters a database with a schema layer file-format value greater than the file-format that it understands (currently 4), then SQLite will refuse to access the database.

Usually, this value is set to 1. However, if any of the following file-format features are used, then the schema layer file-format must be set to the corresponding value or greater:

  1. Implicit NULL values at the end of table records (see section table_btree_content).
  2. Implicit default (non-NULL) values at the end of table records (see section table_btree_content).
  3. Descending indexes (see section index_btree_compare_func) and Boolean values in database records (see section record_format, serial types 8 and 9).
48..51 4 Default pager cache size. This field is used by SQLite to store the recommended pager cache size to use for the database.
52..55 4 For auto-vacuum capable databases, the numerically largest root-page number in the database. Since page 1 is always the root-page of the schema table (section schema_table), this value is always non-zero for auto-vacuum databases. For non-auto-vacuum databases, this value is always zero.
56..59 4 (constant) Database text encoding. A value of 1 means all text values are stored using UTF-8 encoding. 2 indicates little-endian UTF-16 text. A value of 3 means that the database contains big-endian UTF-16 text.
60..63 4 The user-cookie value. A 32-bit integer value available to the user for read/write access.
64..67 4 The incremental-vacuum flag. In non-auto-vacuum databases this value is always zero. In auto-vacuum databases, this field is set to 1 if "incremental vacuum" mode is enabled. If incremental vacuum mode is not enabled, then the database file is reorganized so that it contains no free pages (section free_page_list) at the end of each database transaction. If incremental vacuum mode is enabled, then the reorganization is not performed until explicitly requested by the user.

The four byte block beginning at offset 28 is unused. As is the 32 byte block beginning at offset 68.

Some of the following requirements state that certain database header fields must contain defined constant values, even though the sqlite database file format is designed to allow various values. This is done to artificially constrain the definition of a well-formed database in order to make implementation and testing more practical.

The first 16 bytes of a well-formed database file contain the UTF-8 encoding of the string "SQLite format 3" followed by a single nul-terminator byte.

Following the 16 byte magic string in the file header is the page size, a 2-byte field. See section pages_and_page_types for details.

The 19th byte (byte offset 18), the file-format write version, of a well-formed database file contains the value 0x01.

The 20th byte (byte offset 19), the file-format read version, of a well-formed database file contains the value 0x01.

The 21st byte (byte offset 20), the number of unused bytes on each page, of a well-formed database file shall contain the value 0x00.

The 22nd byte (byte offset 21), the maximum fraction of an index B-Tree page to use for embedded content, of a well-formed database file shall contain the value 0x40.

The 23rd byte (byte offset 22), the minimum fraction of an index B-Tree page to use for embedded content when using overflow pages, of a well-formed database file contains the value 0x20.

The 24th byte (byte offset 23), the minimum fraction of a table B-Tree page to use for embedded content when using overflow pages, of a well-formed database file contains the value 0x20.

The 4 byte block starting at byte offset 24 of a well-formed database file contains the file change counter formatted as a 4-byte big-endian integer.

Following the file change counter in the database header are two 4-byte fields related to the database file free page list. See section free_page_list for details.

The 4 byte block starting at byte offset 40 of a well-formed database file contains the schema version formatted as a 4-byte big-endian integer.

The 4 byte block starting at byte offset 44 of a well-formed database file, the schema layer file format, contains a big-endian integer value between 1 and 4, inclusive.

The 4 byte block starting at byte offset 48 of a well-formed database file contains the default pager cache size formatted as a 4-byte big-endian integer.

The 4 byte block starting at byte offset 52 of a well-formed database file contains the auto-vacuum last root-page formatted as a 4-byte big-endian integer. If this value is non-zero, the database is said to be an auto-vacuum database.

The 4 byte block starting at byte offset 56 of a well-formed database file, the text encoding contains a big-endian integer value between 1 and 3, inclusive.

The 4 byte block starting at byte offset 60 of a well-formed database file contains the user cookie formatted as a 4-byte big-endian integer.

The 4 byte block starting at byte offset 64 of a well-formed database file, the incremental vaccum flag contains a big-endian integer value between 0 and 1, inclusive.

In a well-formed non-autovacuum database (one with a zero stored in the 4-byte big-endian integer value beginning at byte offset 52 of the database file header, the incremental vacuum flag is set to 0.

Pages and Page Types

The entire database file is divided into pages, each page consisting of page-size bytes, where page-size is the 2-byte integer value stored at offset 16 of the file header (see above). The page-size is always a power of two between 512 (29) and 32768 (215). SQLite database files always consist of an exact number of pages.

Pages are numbered beginning from 1, not 0. Page 1 consists of the first page-size bytes of the database file. The file header described in the previous section consumes the first 100 bytes of page 1.

Each page of the database file is one of the following:

The database page size of a well-formed database, stored as a 2-byte big-endian unsigned integer at byte offset 16 of the file, shall be an integer power of 2 between 512 and 32768, inclusive.

The size of a well formed database file shall be an integer multiple of the database page size.

Each page of a well formed database file is exactly one of a B-Tree page, an overflow page, a free page, a pointer-map page or the locking page.

The database page that starts at byte offset 230, the locking page, is never used for any purpose.

The Schema Table

Apart from being the page that contains the file-header, page 1 of the database file is special because it is the root page of the B-Tree structure that contains the schema table data. From the SQL level, the schema table is accessible via the name "sqlite_master".

The exact format of the B-Tree structure and the meaning of the term "root page" is discussed in section btree_structures. For now, it is sufficient to know that the B-Tree structure is a data structure that stores a set of records. Each record is an ordered set of SQL values (the format of which is described in section record_format). Given the root page number of the B-Tree structure (which is well known for the schema table), it is possible to iterate through the set of records.

The schema table contains a record for each SQL table (including virtual tables) except for sqlite_master, and for each index, trigger and view in the logical database. There is also an entry for each UNIQUE or PRIMARY KEY clause present in the definition of a database table. Each record in the schema table contains exactly 5 values, in the following order:

FieldDescription
Schema item type. A string value. One of "table", "index", "trigger" or "view", according to the schema item type. Entries associated with UNIQUE or PRIMARY KEY clauses have this field set to "index".
Schema item name. A string value. The name of the database schema item (table, index, trigger or view) associated with this record, if any. Entries associated with UNIQUE or PRIMARY KEY clauses have this field set to a string of the form "sqlite_autoindex_<name>_<idx>" where <name> is the name of the SQL table and <idx> is an integer value.
Associated table name. A string value. For "table" or "view" records this is a copy of the second (previous) value. For "index" and "trigger" records, this field is set to the name of the associated database table.
The "root page" number. For "trigger" and "view" records, as well as "table" records associated with virtual tables, this is set to NULL. For other "table" and "index" records (including those associated with UNIQUE or PRIMARY KEY clauses), this field contains the root page number (an integer) of the B-Tree structure that contains the table or index data.
The SQL statement. A string value. The SQL statement used to create the schema item (i.e. the complete text of an SQL "CREATE TABLE" statement). This field contains an empty string for table entries associated with PRIMARY KEY or UNIQUE clauses. Refer to some document that describes these SQL statements more precisely.

Logical database schema items other than non-virtual tables and indexes (including indexes created by UNIQUE or PRIMARY key constraints) do not require any other data structures to be created within the database file.

Tables and indexes on the other hand, are represented within the database file by both an entry in the schema table and a B-Tree structure stored elsewhere in the file. The specific B-Tree associated with each database table or index is identified by its root page number, which is stored in the 4th field of the schema table record. In a non-auto-vacuum database, the B-Tree root pages may be stored anywhere within the database file. For an auto-vacuum database, all B-Tree root pages must at all times form a contiguous set starting at page 3 of the database file, skipping any pages that are required to be used as pointer-map pages (see section pointer_map_pages).

As noted in section file_header, in an auto-vacuum database the page number of the page immediately following the final root page in the contiguous set of root pages is stored as a 4 byte big-endian integer at byte offset 52 of the database file header. Unless that page is itself a pointer-map page, in which case the page number of the page following it is stored instead.

For example, if the schema of a logical database is created using the following SQL statements:

          CREATE TABLE abc(a, b, c);
          CREATE INDEX i1 ON abc(b, c);
          CREATE TABLE main.def(a PRIMARY KEY, b, c, UNIQUE(b, c));
          CREATE VIEW v1 AS SELECT * FROM abc;
      

Then the schema table would contain a total of 7 records, as follows:

Field 1Field 2Field 3Field 4Field 5
table abc abc 2 CREATE TABLE abc(a, b, c)
index i1 abc 3 CREATE INDEX i1 ON abc(b, c)
table def def 4 CREATE TABLE def(a PRIMARY KEY, b, c, UNIQUE(b, c))
index sqlite_autoindex_def_1 def 5
index sqlite_autoindex_def_2 def 6
view v1 v1 0 CREATE VIEW v1 AS SELECT * FROM abc

In a well-formed database file, the portion of the first database page not consumed by the database file-header (all but the first 100 bytes) contains the root node of a table B-Tree, the schema table.

All records stored in the schema table contain exactly five fields.

The following requirements describe "table" records.

For each SQL table in the database apart from itself ("sqlite_master"), the schema table of a well-formed database file contains an associated record.

The first field of each schema table record associated with an SQL table shall be the text value "table".

The second field of each schema table record associated with an SQL table shall be a text value set to the name of the SQL table.

In a well-formed database file, the third field of all schema table records associated with SQL tables shall contain the same value as the second field.

In a well-formed database file, the fourth field of all schema table records associated with SQL tables that are not virtual tables contains the page number (an integer value) of the root page of the associated table B-Tree structure within the database file.

If the associated database table is a virtual table, the fourth field of the schema table record shall contain an SQL NULL value.

In a well-formed database, the fifth field of all schema table records associated with SQL tables shall contain a "CREATE TABLE" or "CREATE VIRTUAL TABLE" statment (a text value). The details of the statement shall be such that executing the statement would create a table of precisely the same name and schema as the existing database table.

The following requirements describe "implicit index" records.

For each PRIMARY KEY or UNIQUE constraint present in the definition of each SQL table in the database, the schema table of a well-formed database shall contain a record with the first field set to the text value "index", and the second field set to a text value containing a string of the form "sqlite_autoindex_<name>_<idx>", where <name> is the name of the SQL table and <idx> is an integer value.

In a well-formed database, the third field of all schema table records associated with SQL PRIMARY KEY or UNIQUE constraints shall contain the name of the table to which the constraint applies (a text value).

In a well-formed database, the fourth field of all schema table records associated with SQL PRIMARY KEY or UNIQUE constraints shall contain the page number (an integer value) of the root page of the associated index B-Tree structure.

In a well-formed database, the fifth field of all schema table records associated with SQL PRIMARY KEY or UNIQUE constraints shall contain an SQL NULL value.

The following requirements describe "explicit index" records.

For each SQL index in the database, the schema table of a well-formed database shall contain a record with the first field set to the text value "index" and the second field set to a text value containing the name of the SQL index.

In a well-formed database, the third field of all schema table records associated with SQL indexes shall contain the name of the SQL table that the index applies to.

In a well-formed database, the fourth field of all schema table records associated with SQL indexes shall contain the page number (an integer value) of the root page of the associated index B-Tree structure.

In a well-formed database, the fifth field of all schema table records associated with SQL indexes shall contain an SQL "CREATE INDEX" statement (a text value). The details of the statement shall be such that executing the statement would create an index of precisely the same name and content as the existing database index.

The following requirements describe "view" records.

For each SQL view in the database, the schema table of a well-formed database shall contain a record with the first field set to the text value "view" and the second field set to a text value containing the name of the SQL view.

In a well-formed database, the third field of all schema table records associated with SQL views shall contain the same value as the second field.

In a well-formed database, the third field of all schema table records associated with SQL views shall contain the integer value 0.

In a well-formed database, the fifth field of all schema table records associated with SQL indexes shall contain an SQL "CREATE VIEW" statement (a text value). The details of the statement shall be such that executing the statement would create a view of precisely the same name and definition as the existing database view.

The following requirements describe "trigger" records.

For each SQL trigger in the database, the schema table of a well-formed database shall contain a record with the first field set to the text value "trigger" and the second field set to a text value containing the name of the SQL trigger.

In a well-formed database, the third field of all schema table records associated with SQL triggers shall contain the name of the database table or view to which the trigger applies.

In a well-formed database, the third field of all schema table records associated with SQL triggers shall contain the integer value 0.

In a well-formed database, the fifth field of all schema table records associated with SQL indexes shall contain an SQL "CREATE TRIGGER" statement (a text value). The details of the statement shall be such that executing the statement would create a trigger of precisely the same name and definition as the existing database trigger.

The following requirements describe the placement of B-Tree root pages in auto-vacuum databases.

In an auto-vacuum database, all pages that occur before the page number stored in the auto-vacuum last root-page field of the database file header (see H30140) must be either B-Tree root pages, pointer-map pages or the locking page.

In an auto-vacuum database, no B-Tree root pages may occur on or after the page number stored in the auto-vacuum last root-page field of the database file header (see H30140) must be either B-Tree root pages, pointer-map pages or the locking page.

B-Tree Structures

A large part of any SQLite database file is given over to one or more B-Tree structures. A single B-Tree structure is stored using one or more database pages. Each page contains a single B-Tree node. The pages used to store a single B-Tree structure need not form a contiguous block. The page that contains the root node of a B-Tree structure is known as the "root page".

SQLite uses two distinct variants of the B-Tree structure:

As well as the schema table, a well-formed database file contains N table B-Tree structures, where N is the number of non-virtual tables in the logical database, excluding the sqlite_master table but including sqlite_sequence and other system tables.

A well-formed database file contains N table B-Tree structures, where N is the number of indexes in the logical database, including indexes created by UNIQUE or PRIMARY KEY clauses in the declaration of SQL tables.

Variable Length Integer Format

In several parts of the B-Tree structure, 64-bit twos-complement signed integer values are stored in the "variable length integer format" described here.

A variable length integer consumes from one to nine bytes of space, depending on the value stored. Seven bits are used from each of the first eight bytes present, and, if present, all eight from the final ninth byte. Unless the full nine byte format is used, the serialized form consists of all bytes up to and including the first byte with the 0x80 bit cleared.

The number of bytes present depends on the position of the most significant set bit in the 64-bit word. Negative numbers always have the most significant bit of the word (the sign bit) set and so are always encoded using the full nine bytes. Positive integers may be encoded using less space. The following table shows the 9 different length formats available for storing a variable length integer value.

BytesValue RangeBit Pattern
17 bit0xxxxxxx
214 bit1xxxxxxx 0xxxxxxx
321 bit1xxxxxxx 1xxxxxxx 0xxxxxxx
428 bit1xxxxxxx 1xxxxxxx 1xxxxxxx 0xxxxxxx
535 bit1xxxxxxx 1xxxxxxx 1xxxxxxx 1xxxxxxx 0xxxxxxx
642 bit1xxxxxxx 1xxxxxxx 1xxxxxxx 1xxxxxxx 1xxxxxxx 0xxxxxxx
749 bit1xxxxxxx 1xxxxxxx 1xxxxxxx 1xxxxxxx 1xxxxxxx 1xxxxxxx 0xxxxxxx
856 bit1xxxxxxx 1xxxxxxx 1xxxxxxx 1xxxxxxx 1xxxxxxx 1xxxxxxx 1xxxxxxx 0xxxxxxx
964 bit1xxxxxxx 1xxxxxxx 1xxxxxxx 1xxxxxxx 1xxxxxxx 1xxxxxxx 1xxxxxxx 1xxxxxxx xxxxxxxx

When using the full 9 byte representation, the first byte contains the 7 most significant bits of the 64-bit value. The final byte of the 9 byte representation contains the 8 least significant bits of the 64-bit value. When using one of the other representations, the final byte contains the 7 least significant bits of the 64-bit value. The second last byte, if present, contains the 7 next least signficant bits of the value, and so on. The significant bits of the 64-bit value for which no storage is provided are assumed to be zero.

When encoding a variable length integer, SQLite usually selects the most compact representation that provides enough storage to accomadate the most significant set bit of the value. This is not required however, using more bytes than is strictly necessary when encoding an integer is valid.

DecimalHexadecimal Variable Length Integer
43 0x000000000000002B 0x2B
200815 0x000000000003106F 0x8C 0xA0 0x6F
-1 0xFFFFFFFFFFFFFFFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF
-78056 0xFFFFFFFFFFFECD56 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFD 0xCD 0x56

A 64-bit signed integer value stored in variable length integer format consumes from 1 to 9 bytes of space.

The most significant bit of all bytes except the last in a serialized variable length integer is always set. Unless the serialized form consumes the maximum 9 bytes available, then the most significant bit of the final byte of the representation is always cleared.

The eight least significant bytes of the 64-bit twos-compliment representation of a value stored in a 9 byte variable length integer are stored in the final byte (byte offset 8) of the serialized variable length integer. The other 56 bits are stored in the 7 least significant bits of each of the first 8 bytes of the serialized variable length integer, in order from most significant to least significant.

A variable length integer that consumes less than 9 bytes of space contains a value represented as an N-bit unsigned integer, where N is equal to the number of bytes consumed by the serial representation (between 1 and 8) multiplied by 7. The N bits are stored in the 7 least significant bits of each byte of the serial representation, from most to least significant.

Database Record Format

A database record is a blob of data that represents an ordered list of one or more SQL values. Database records are used in two places in SQLite database files - as the associated data for entries in table B-Tree structures, and as the key values in index B-Tree structures. The size (number of bytes consumed by) a database record depends on the values it contains.

Each database record consists of a short record header followed by a data area. The record header consists of N+1 variable length integers (see section varint_format), where N is the number of values stored in the record.

The first variable length integer in a record header contains the size of the record header in bytes. The following N variable length integer values each describe the type and size of the records corresponding SQL value (the second integer in the record header describes the first value in the record, etc.). Integer values are interpreted according to the following table:

Header Value Data type and size
0 An SQL NULL value (type SQLITE_NULL). This value consumes zero bytes of space in the record's data area.
1 An SQL integer value (type SQLITE_INTEGER), stored as a big-endian 1-byte signed integer.
2 An SQL integer value (type SQLITE_INTEGER), stored as a big-endian 2-byte signed integer.
3 An SQL integer value (type SQLITE_INTEGER), stored as a big-endian 3-byte signed integer.
4 An SQL integer value (type SQLITE_INTEGER), stored as a big-endian 4-byte signed integer.
5 An SQL integer value (type SQLITE_INTEGER), stored as a big-endian 6-byte signed integer.
6 An SQL integer value (type SQLITE_INTEGER), stored as an big-endian 8-byte signed integer.
7 An SQL real value (type SQLITE_FLOAT), stored as an 8-byte IEEE floating point value.
8 The literal SQL integer 0 (type SQLITE_INTEGER). The value consumes zero bytes of space in the record's data area. Values of this type are only present in databases with a schema file format (the 32-bit integer at byte offset 44 of the database file header) value of 4 or greater.
9 The literal SQL integer 1 (type SQLITE_INTEGER). The value consumes zero bytes of space in the record's data area. Values of this type are only present in databases with a schema file format (the 32-bit integer at byte offset 44 of the database file header) value of 4 or greater.
bytes * 2 + 12 Even values greater than 12 are used to signify a blob of data (type SQLITE_BLOB) (n-12)/2 bytes in length, where n is the integer value stored in the record header.
bytes * 2 + 13 Odd values greater than 12 are used to signify a string (type SQLITE_TEXT) (n-13)/2 bytes in length, where n is the integer value stored in the record header.

Immediately following the record header is the data for each of the record's values. A record containing N values is depicted in figure figure_recordformat.

Figure - Database Record Format.

For each SQL value in the record, there is a blob of data stored in the records data area. If the corresponding integer type value in the record header is 0 (NULL), 8 (integer value 0) or 9 (integer value 1), then the blob of data is zero bytes in length. Otherwise, the length of the data field is as described in the table above.

The data field associated with a string value contains the string encoded using the database encoding, as defined in the database file header (see section file_header). No nul-terminator character is stored in the database.

A database record consists of a database record header, followed by database record data. The first part of the database record header is a variable length integer containing the total size (including itself) of the header in bytes.

Following the length field, the remainder of the database record header is populated with N variable length integer fields, where N is the number of database values stored in the record.

Following the database record header, the database record data is made up of N variable length blobs of data, where N is again the number of database values stored in the record. The n blob contains the data for the nth value in the database record. The size and format of each blob of data is encoded in the corresponding variable length integer field in the database record header.

A value of 0 stored within the database record header indicates that the corresponding database value is an SQL NULL. In this case the blob of data in the data area is 0 bytes in size.

A value of 1 stored within the database record header indicates that the corresponding database value is an SQL integer. In this case the blob of data contains the integer value, formatted as a 1-byte big-endian signed integer.

A value of 2 stored within the database record header indicates that the corresponding database value is an SQL integer. In this case the blob of data contains the integer value, formatted as a 2-byte big-endian signed integer.

A value of 3 stored within the database record header indicates that the corresponding database value is an SQL integer. In this case the blob of data contains the integer value, formatted as a 3-byte big-endian signed integer.

A value of 4 stored within the database record header indicates that the corresponding database value is an SQL integer. In this case the blob of data contains the integer value, formatted as a 4-byte big-endian signed integer.

A value of 5 stored within the database record header indicates that the corresponding database value is an SQL integer. In this case the blob of data contains the integer value, formatted as a 6-byte big-endian signed integer.

A value of 6 stored within the database record header indicates that the corresponding database value is an SQL integer. In this case the blob of data contains the integer value, formatted as a 8-byte big-endian signed integer.

A value of 7 stored within the database record header indicates that the corresponding database value is an SQL real (floating point number). In this case the blob of data contains an 8-byte IEEE floating point number, stored in big-endian byte order.

A value of 8 stored within the database record header indicates that the corresponding database value is an SQL integer, value 0. In this case the blob of data in the data area is 0 bytes in size.

A value of 9 stored within the database record header indicates that the corresponding database value is an SQL integer, value 1. In this case the blob of data in the data area is 0 bytes in size.

An even value greater than or equal to 12 stored within the database record header indicates that the corresponding database value is an SQL blob field. The blob of data contains the value data. The blob of data is exactly (n-12)/2 bytes in size, where n is the integer value stored in the database record header.

An odd value greater than or equal to 13 stored within the database record header indicates that the corresponding database value is an SQL text field. The blob of data contains the value text stored using the database encoding, with no nul-terminator. The blob of data is exactly (n-12)/2 bytes in size, where n is the integer value stored in the database record header.

The following database file properties define restrictions on the integer values that may be stored within a database record header.

In a well-formed database file, if the values 8 or 9 appear within any database record header within the database, then the schema-layer file format (stored at byte offset 44 of the database file header) must be set to 4.

In a well-formed database file, the values 10 and 11, and all negative values may not appear within any database record header in the database.

Index B-Trees

As specified in section fileformat_overview, index B-Tree structures store a unique set of the database records described in the previous section. While in some cases, when there are very few entries in the B-Tree, the entire structure may fit on a single database page, usually the database records must be spread across two or more pages. In this case, the pages are organized into a tree structure with a single "root" page at the head of the tree.

Within the tree structure, each page is either an internal tree node containing an ordered list of N references to child nodes (page numbers) and N-1 database records, or a leaf node containing M database records. The value of N may be different for each page, but is always two or greater. Similarly, each leaf page may have a different non-zero positive value for M. The tree is always of uniform height, meaning the number of intermediate levels between each leaf node page and the root page is the same.

Within both internal and leaf node pages, the records are stored in sorted order. The comparison function used to determine the sort order is described in section index_btree_compare_func.

Records are distributed throughout the tree such that for each internal node, all records stored in the sub-tree headed by the first child node ( C(0) ) are considered less than the first record stored on the internal node ( R(0) ) by the comparison function described in section index_btree_compare_func. Similarly all records stored in the sub-tree headed by C(n) are considered greater than R(n-1) but less than R(n) for values of n between 1 and N-2, inclusive. All records in the sub-tree headed by C(N-1) are greater than the largest record stored on the internal node.

Figure - Index B-Tree Tree Structure.

Figure figure_indextree depicts one possible record distribution for an index B-Tree containing records R1 to R26, assuming that for all values of N, R(N+1)>R(N). In total the B-Tree structure uses 11 database file pages. Internal tree nodes contain database records and references to child node pages. Leaf nodes contain database records only.

The pages in an index B-Tree structures are arranged into a tree structure such that all leaf pages are at the same depth.

Each leaf node page in an index B-Tree contains one or more B-Tree cells, where each cell contains a database record.

Each internal node page in an index B-Tree contains one or more B-Tree cells, where each cell contains a child page number, C, and a database record R. All database records stored within the sub-tree headed by page C are smaller than record R, according to the index sort order (see below). Additionally, unless R is the smallest database record stored on the internal node page, all integer keys within the sub-tree headed by C are greater than R-1, where R-1 is the largest database record on the internal node page that is smaller than R.

As well as child page numbers associated with B-Tree cells, each internal node page in an index B-Tree contains the page number of an extra child page, the right-child page. All database records stored in all B-Tree cells within the sub-tree headed by the right-child page are greater than all database records stored within B-Tree cells on the internal node page.

The precise way in which index B-Tree pages and cells are formatted is described in subsequent sections.

Index B-Tree Content

The database file contains one index B-Tree for each database index in the logical database, including those created by UNIQUE or PRIMARY KEY clauses in table declarations. Each record stored in an index B-Tree contains the same number of fields, the number of indexed columns in the database index declaration plus one.

An index B-Tree contains an entry for each row in its associated database table. The fields of the record used as the index B-Tree key are copies of each of the indexed columns of the associated database row, in order, followed by the rowid value of the same row. See figure figure_examplepop for an example.

In a well-formed database, each index B-Tree contains a single entry for each row in the indexed logical database table.

Each database record (key) stored by an index B-Tree in a well-formed database contains the same number of values, the number of indexed columns plus one.

The final value in each database record (key) stored by an index B-Tree in a well-formed database contains the rowid (an integer value) of the corresponding logical database row.

The first N values in each database record (key) stored in an index B-Tree where N is the number of indexed columns, contain the values of the indexed columns from the corresponding logical database row, in the order specified for the index.

Record Sort Order

This section defines the comparison function used when database records are used as B-Tree keys for index B-Trees. The comparison function is only defined when both database records contain the same number of fields.

When comparing two database records, the first field of one record is compared to the first field of the other. If they are not equal, the next pair of fields are compared, and so on. If all the fields in the database records are equal, then the two records are considered equal. Otherwise, the result of the comparison is determined by the first pair of inequal fields.

Two database record fields (SQL values) are compared using the following rules:

  1. If both values are NULL, then they are considered equal.
  2. If one value is a NULL and the other is not, it is considered the lesser of the two.
  3. If both values are either real or integer values, then the comparison is done numerically.
  4. If one value is a real or integer value, and the other is a text or blob value, then the numeric value is considered lesser.
  5. If both values are text, then the collation function is used to compare them. The collation function is a property of the index column in which the values are found. Link to document with CREATE INDEX syntax.
  6. If one value is text and the other a blob, the text value is considered lesser.
  7. If both values are blobs, memcmp() is used to determine the results of the comparison function. If one blob is a prefix of the other, the shorter blob is considered lesser.

Each column of a database index may be declared as "descending". Link to document with CREATE INDEX syntax. In SQLite database files with a schema layer file-format equal to 4, this modifies the order in which the records are stored in the corresponding index B-Tree structure. For each index column declared as descending, the results of the above comparison procedure are inverted.

The columns of database indexes created by UNIQUE or PRIMARY KEY clauses are never treated as descending.

Need requirements style statements for this information. Easier to do once collation sequences have been defined somewhere.

Index B-Tree Page Format

Each index B-Tree page is divided into four sections that occur in order on the page:

Figure - Index B-Tree Page Data.

The 8 (leaf node pages) or 12 (internal tree node pages) byte page header that begins each index B-Tree page is made up of a series of 1, 2 and 4 byte unsigned integer values as shown in the following table. All values are stored in big-endian byte order.

Byte Range Byte Size Description
0 1B-Tree page flags. For an index B-Tree internal tree node page, this is set to 0x02. For a leaf node page, 0x0A.
1..2 2Byte offset of first block of free space on this page. If there are no free blocks on this page, this field is set to 0.
3..4 2Number of cells (entries) on this page.
5..6 2Byte offset of the first byte of the cell content area (see figure figure_indexpage), relative to the start of the page.
7 1Number of fragmented free bytes on page.
8..11 4Page number of rightmost child-page (the child-page that heads the sub-tree in which all records are larger than all records stored on this page). This field is not present for leaf node pages.

The cell content area, which occurs last on the page, contains one B-Tree cell for each record stored on the B-Tree page. On a leaf node page, each cell is responsible for storing a database record only. On an internal tree node page, each cell contains a database record and the corresponding child page number ((R(0) and C(0)) are stored together, for example - the cell record is considered greater than all records stored in the sub-tree headed by the child page). The final child page number is stored as part of the page header.

The B-Tree cells may be distributed throughout the cell content area and may be interspersed with blocks of unused space. They are not sorted within the cell content area in any particular order. The serialized format of a B-Tree cell is described in detail in section index_btree_cell_format.

The byte offset of each cell in the cell content area, relative to the start of the page, is stored in the cell offset array. The offsets are in sorted order according to the database records stored in the corresponding cells. The first offset in the array is the offset of the cell containing the smallest record on the page, according to the comparison function defined in section index_btree_compare_func.

As well as the block of unused space between the cell offset array and the cell content area, which may be any size, there may be small blocks of free space interspersed with the B-Tree cells within the cell content area. These are classified into two classes, depending on their size:

The b-tree page flags field (the first byte) of each database page used as an internal node of an index B-Tree structure is set to 0x02.

The b-tree page flags field (the first byte) of each database page used as a leaf node of an index B-Tree structure is set to 0x0A.

The following requirements describe the B-Tree page header present at the start of both index and table B-Tree pages.

The first byte of each database page used as a B-Tree page contains the b-tree page flags field. On page 1, the b-tree page flags field is stored directly after the 100 byte file header at byte offset 100.

The number of B-Tree cells stored on a B-Tree page is stored as a 2-byte big-endian integer starting at byte offset 3 of the B-Tree page. On page 1, this field is stored at byte offset 103.

The 2-byte big-endian integer starting at byte offset 5 of each B-Tree page contains the byte-offset from the start of the page to the start of the cell content area, which consumes all space from this offset to the end of the usable region of the page. On page 1, this field is stored at byte offset 105. All B-Tree cells on the page are stored within the cell-content area.

On each page used as an internal node a of B-Tree structures, the page number of the rightmost child node in the B-Tree structure is stored as a 4-byte big-endian unsigned integer beginning at byte offset 8 of the database page, or byte offset 108 on page 1.

This requirement describes the cell content offset array. It applies to both B-Tree variants.

Immediately following the page header on each B-Tree page is the cell offset array, consisting of N 2-byte big-endian unsigned integers, where N is the number of cells stored on the B-Tree page (H30840). On an internal node B-Tree page, the cell offset array begins at byte offset 12, or on a leaf page, byte offset 8. For the B-Tree node on page 1, these offsets are 112 and 108, respectively.

The cell offset array and the cell content area (H30850) may not overlap.

Each value stored in the cell offset array must be greater than or equal to the offset to the cell content area (H30850), and less than the database page size.

The N values stored within the cell offset array are the byte offsets from the start of the B-Tree page to the beginning of each of the N cells stored on the page.

No two B-Tree cells may overlap.

The following requirements govern management of free-space within the page content area (both table and index B-Tree pages).

Within the cell content area, all blocks of contiguous free-space (space not used by B-Tree cells) greater than 3 bytes in size are linked together into a linked list, the free block list. Such blocks of free space are known as free blocks.

The first two bytes of each free block contain the offset of the next free block in the free block list formatted as a 2-byte big-endian integer, relative to the start of the database page. If there is no next free block, then the first two bytes are set to 0x00.

The second two bytes (byte offsets 2 and 3) of each free block contain the total size of the free block, formatted as a 2-byte big-endian integer.

On all B-Tree pages, the offset of the first free block in the free block list, relative to the start of the database page, is stored as a 2-byte big-endian integer starting at byte offset 1 of the database page. If there is no first free block (because the free block list is empty), then the two bytes at offsets 1 and 2 of the database page are set to 0x00. On page 1, this field is stored at byte offset 101 of the page.

Within the cell-content area, all blocks of contiguous free-space (space not used by B-Tree cells) less than or equal to 3 bytes in size are known as fragments. The total size of all fragments on a B-Tree page is stored as a 1-byte unsigned integer at byte offset 7 of the database page. On page 1, this field is stored at byte offset 107.

Index B-Tree Cell Format

For index B-Tree internal tree node pages, each B-Tree cell begins with a child page-number, stored as a 4-byte big-endian unsigned integer. This field is omitted for leaf pages, which have no children.

Following the child page number is the total number of bytes consumed by the cell's record, stored as a variable length integer (see section varint_format).

If the record is small enough, it is stored verbatim in the cell. A record is deemed to be small enough to be completely stored in the cell if it consists of less than:

            max-local := usable-size * (max-embedded-fraction / 255 ) - 23

bytes. In the formula above, usable-size is the page-size in bytes less the number of unused bytes left at the end of every page (as read from byte offset 20 of the file header), and max-embedded-fraction is the value read from byte offset 21 of the file header.

Figure - Small Record Index B-Tree Cell.

If the cell record is larger than the maximum size identified by the formula above, then only the first part of the record is stored within the cell. The remainder is stored in an overflow-chain (see section overflow_page_chains for details). Following the part of the record stored within the cell is the page number of the first page in the overflow chain, stored as a 4 byte big-endian unsigned integer. The size of the part of the record stored within the B-Tree cell (local-size in figure figure_indexlongrecord) is calculated according to the following algorithm:

            min-local := (usable-size - 12) * min-embedded-fraction / 255 - 23
            max-local := (usable-size - 12) * max-embedded-fraction / 255 - 23
            local-size := (record-size - min-local) % (usable-size - 4)
            if( local-size > max-local )
                local-size := min-local

In the formula above, usable-size is the page-size in bytes less the number of unused bytes left at the end of every page (as read from byte offset 20 of the file header), and max-embedded-fraction and min-embedded-fraction are the values read from byte offsets 21 and 22 of the file header, respectively.

Figure - Large Record Index B-Tree Cell.

Each B-Tree cell belonging to an internal node page of an index B-Tree consists of a 4-byte big-endian unsigned integer, the child page number, followed by a variable length integer field, followed by a database record. The variable length integer field contains the length of the database record in bytes.

Each B-Tree cell belonging to an leaf page of an index B-Tree consists of a variable length integer field, followed by a database record. The variable length integer field contains the length of the database record in bytes.

If the database record stored in an index B-Tree page is sufficiently small, then the entire cell is stored within the index B-Tree page. Sufficiently small is defined as equal to or less than max-local, where: max-local := (usable-size - 12) * 64 / 255 - 23

If the database record stored as part of an index B-Tree cell is too large to be stored entirely within the B-Tree page (as defined by H30520), then only a prefix of the database record is stored within the B-Tree page and the remainder stored in an overflow chain. In this case, the database record prefix is immediately followed by the page number of the first page of the overflow chain, formatted as a 4-byte big-endian unsigned integer.

When a database record belonging to a table B-Tree cell is stored partially within an overflow page chain, the size of the prefix stored within the index B-Tree page is N bytes, where N is calculated using the following algorithm: min-local := (usable-size - 12) * 32 / 255 - 23 max-local := (usable-size - 12) * 64 / 255 - 23 N := min-local + ((record-size - min-local) % (usable-size - 4)) if( N > max-local ) N := min-local

Requirements H31010 and H30990 are similar to the algorithms presented in the text above. However instead of min-embedded-fraction and max-embedded-fraction the requirements use the constant values 32 and 64, as well-formed database files are required by H30080 and H30070 to store these values in the relevant database file header fields.

Table B-Trees

As noted in section fileformat_overview, table B-Trees store a set of unique 64-bit signed integer keys. Associated with each key is a database record. As with index B-Trees, the database file pages that make up a table B-Tree are organized into a tree structure with a single "root" page at the head of the tree.

Unlike index B-Tree structures, where entries are stored on both internal and leaf nodes, all entries in a table B-Tree are stored in the leaf nodes. Within each leaf node, keys are stored in sorted order.

Each internal tree node contains an ordered list of N references to child pages, where N is some number greater than one. In a similar manner to the way in which an index B-Tree page would contain N-1 records, each internal table B-Tree node page also contains a list of N-1 64-bit signed integer values in sorted order. The keys are distributed throughout the tree such that for all internal tree nodes, integer I(n) is equal to the largest key value stored in the sub-tree headed by child page C(n) for values of n between 0 and N-2, inclusive. Additionally, all keys stored in the sub-tree headed by child page C(n+1) have values larger than that of I(n), for values of n in the same range.

Figure - Table B-Tree Tree Structure.

Figure figure_tabletree depicts a table B-Tree containing a contiguous set of 14 integer keys starting with 1. Each key n has an associated database record Rn. All the keys and their associated records are stored in the leaf pages. The internal node pages contain no database data, their only purpose is to provide a way to navigate the tree structure.

The pages in a table B-Tree structures are arranged into a tree structure such that all leaf pages are at the same depth.

Each leaf page in a table B-Tree structure contains one or more B-Tree cells, where each cell contains a 64-bit signed integer key value and a database record.

Each internal node page in a table B-Tree structure contains one or more B-Tree cells, where each cell contains a 64-bit signed integer key value, K, and a child page number, C. All integer key values in all B-Tree cells within the sub-tree headed by page C are less than or equal to K. Additionally, unless K is the smallest integer key value stored on the internal node page, all integer keys within the sub-tree headed by C are greater than K-1, where K-1 is the largest integer key on the internal node page that is smaller than K.

As well as child page numbers associated with B-Tree cells, each internal node page in a table B-Tree contains the page number of an extra child page, the right-child page. All key values in all B-Tree cells within the sub-tree headed by the right-child page are greater than all key values stored within B-Tree cells on the internal node page.

The precise way in which table B-Tree pages and cells are formatted is described in subsequent sections.

Table B-Tree Content

The database file contains one table B-Tree for each database table in the logical database. Although some data may be duplicated in index B-Tree structures, the table B-Tree is the primary location of table data.

The table B-Tree contains exactly one entry for each row in the database table. The integer key value used for the B-Tree entry is the value of the "rowid" field of the corresponding logical row in the database table. The database row fields are stored in the record associated with the table B-Tree entry, in the same order as they appear in the logical database table. The first field in the record (see section record_format) contains the value of the leftmost field in the database row, and so on.

If a database table column is declared as an INTEGER PRIMARY KEY, then it is an alias for the rowid field, which is stored as the table B-Tree key value. Instead of duplicating the integer value in the associated record, the record field associated with the INTEGER PRIMARY KEY column is always set to an SQL NULL.

Finally, if the schema layer file-format is greater than or equal to 2, some of the records stored in table B-Trees may contain less fields than the associated logical database table does columns. If the schema layer file-format is exactly 2, then the logical database table column values associated with the "missing" fields are SQL NULL. If the schema layer file-format is greater than 2, then the values associated with the "missing" fields are determined by the default value of the associated database table columns. Reference to CREATE TABLE syntax. How are default values determined?

In a well-formed database, each table B-Tree contains a single entry for each row in the corresponding logical database table.

The key value (a 64-bit signed integer) for each B-Tree entry is the same as the value of the rowid field of the corresponding logical database row.

The SQL values serialized to make up each database record stored as ancillary data in a table B-Tree shall be the equal to the values taken by the N leftmost columns of the corresponding logical database row, where N is the number of values in the database record.

If a logical database table column is declared as an "INTEGER PRIMARY KEY", then instead of its integer value, an SQL NULL shall be stored in its place in any database records used as ancillary data in a table B-Tree.

The following database properties discuss table B-Tree records with implicit (default) values.

If the database schema layer file-format (the value stored as a 4-byte integer at byte offset 44 of the file header) is 1, then all database records stored as ancillary data in a table B-Tree structure have the same number of fields as there are columns in the corresponding logical database table.

If the database schema layer file-format value is two or greater and the rightmost M columns of a row contain SQL NULL values, then the corresponding record stored as ancillary data in the table B-Tree has between N-M and N fields, where N is the number of columns in the logical database table.

If the database schema layer file-format value is three or greater and the rightmost M columns of a row contain their default values according to the logical table declaration, then the corresponding record stored as ancillary data in the table B-Tree may have as few as N-M fields, where N is the number of columns in the logical database table.

Table B-Tree Page Format

Table B-Tree structures use the same page format as index B-Tree structures, described in section index_btree_page_format, with the following differences:

In a well-formed database file, the first byte of each page used as an internal node of a table B-Tree structure is set to 0x05.

In a well-formed database file, the first byte of each page used as a leaf node of a table B-Tree structure is set to 0x0D.

Most of the requirements specified in section index_btree_page_format also apply to table B-Tree pages. The wording of the requirements make it clear when this is the case, either by refering to generic "B-Tree pages" or by explicitly stating that the statement applies to both "table and index B-Tree pages".

Table B-Tree Cell Format

Cells stored on internal table B-Tree nodes consist of exactly two fields. The associated child page number, stored as a 4-byte big-endian unsigned integer, followed by the 64-bit signed integer value, stored as a variable length integer (section varint_format). This is depicted graphically in figure figure_tablenodecell.

Figure - Table B-Tree Internal Node Cell.

Cells of table B-Tree leaf pages are required to store a 64-bit signed integer key and its associated database record. The first two fields of all table B-Tree leaf page cells are the size of the database record, stored as a variable length integer (see section varint_format), followed by the key value, also stored as a variable length integer. For sufficiently small records, the entire record is stored in the B-Tree cell following the record-size field. In this case, sufficiently small is defined as less than or equal to:

          max-local := usable-size - 35

bytes. Where usable-size is defined as the page-size in bytes less the number of unused bytes left at the end of every page (as read from byte offset 20 of the file header). This scenario, where the entire record is stored within the B-Tree cell, is depicted in figure figure_tableshortrecord.

Figure - Table B-Tree Small Record Leaf Node Cell.

If the record is too large to be stored entirely within the B-Tree cell, then the first part of it is stored within the cell and the remainder in an overflow chain (see section overflow_page_chains). The size of the part of the record stored within the B-Tree cell (local-size in figure figure_tablelongrecord) is calculated according to the following algorithm (a similar procedure to that used to calculate the portion of an index B-Tree key to store within the cell when an overflow chain is required):

            min-local := (usable-size - 12) * min-embedded-fraction / 255 - 23
            max-local := usable-size - 35
            local-size := min-local + (record-size - min-local) % (usable-size - 4)
            if( local-size > max-local )
                local-size := min-local

In this case, min-embedded-fraction is the value read from byte offset 22 of the file header. The layout of the cell in this case, when an overflow-chain is required, is shown in figure figure_tablelongrecord.

Figure - Table B-Tree Large Record Leaf Node Cell.

If the leaf page is page 1, then the value of usable-size is as it would be for any other B-Tree page, even though the actual usable size is 100 bytes less than this for page 1 (because the first 100 bytes of the page is consumed by the database file header).

The following requirements describe the format of table B-Tree cells, and the distribution thereof between B-Tree and overflow pages.

B-Tree cells belonging to table B-Tree internal node pages consist of exactly two fields, a 4-byte big-endian unsigned integer immediately followed by a variable length integer. These fields contain the child page number and key value respectively (see H31030).

B-Tree cells belonging to table B-Tree leaf node pages consist of three fields, two variable length integer values followed by a database record. The size of the database record in bytes is stored in the first of the two variable length integer fields. The second of the two variable length integer fields contains the 64-bit signed integer key (see H31030).

If the size of the record stored in a table B-Tree leaf page cell is less than or equal to (usable page size-35) bytes, then the entire cell is stored on the B-Tree leaf page. In a well-formed database, usable page size is the same as the database page size.

If a table B-Tree cell is too large to be stored entirely on a leaf page (as defined by H31170), then a prefix of the cell is stored on the leaf page, and the remainder stored in an overflow page chain. In this case the cell prefix stored on the B-Tree leaf page is immediately followed by a 4-byte big-endian unsigned integer containing the page number of the first overflow page in the chain.

When a table B-Tree cell is stored partially in an overflow page chain, the prefix stored on the B-Tree leaf page consists of the two variable length integer fields, followed by the first N bytes of the database record, where N is determined by the following algorithm: min-local := (usable-size - 12) * 255 / 32 - 23 max-local := (usable-size - 35) N := min-local + (record-size - min-local) % (usable-size - 4) if( N > max-local ) N := min-local

Requirement H31190 is very similar to the algorithm presented in the text above. Instead of min-embedded-fraction, it uses the constant value 32, as well-formed database files are required by H30090 to store this value in the relevant database file header field.

Overflow Page Chains

Sometimes, a database record stored in either an index or table B-Trees is too large to fit entirely within a B-Tree cell. In this case part of the record is stored within the B-Tree cell and the remainder stored on one or more overflow pages. The overflow pages are chained together using a singly linked list. The first 4 bytes of each overflow page is a big-endian unsigned integer value containing the page number of the next page in the list. The remaining usable database page space is available for record data.

Figure - Overflow Page Format.

The scenarios in which overflow pages are required and the number of bytes stored within the B-Tree cell in each are described for index and table B-Trees in sections index_btree_cell_format and table_btree_cell_format respectively. In each case the B-Tree cell also stores the page number of the first page in a linked list of overflow pages.

The amount of space available for record data on each overflow page is:

        available-space := usable-size - 4

Where usable-size is defined as the page-size in bytes less the number of unused bytes left at the end of every page (as read from byte offset 20 of the file header).

Each overflow page except for the last one in the linked list contains available-space bytes of record data. The last page in the list contains the remaining data, starting at byte offset 4. The value of the "next page" field on the last page in an overflow chain is undefined.

A single overflow page may store up to available-space bytes of database record data, where available-space is equal to (usable-size - 4).

When a database record is too large to store within a B-Tree page (see H31170 and H31000), a prefix of the record is stored within the B-Tree page and the remainder stored across N overflow pages. In this case N is the minimum number of pages required to store the portion of the record not stored on the B-Tree page, given the maximum payload per overflow page defined by H31200.

The list of overflow pages used to store a single database record are linked together in a singly linked list known as an overflow chain. The first four bytes of each page except the last in an overflow chain are used to store the page number of the next page in the linked list, formatted as an unsigned big-endian integer. The first four bytes of the last page in an overflow chain are set to 0x00.

Each overflow page except the last in an overflow chain contains N bytes of record data starting at byte offset 4 of the page, where N is the maximum payload per overflow page, as defined by H31200. The final page in an overflow chain contains the remaining data, also starting at byte offset 4.

The Free Page List

Sometimes, after deleting data from the database, SQLite removes pages from B-Tree structures. If these pages are not immediately required for some other purpose, they are placed on the free page list. The free page list contains those pages that are not currently being used to store any valid data.

Each page in the free-list is classified as a free-list trunk page or a free-list leaf page. All trunk pages are linked together into a singly linked list (in the same way as pages in an overflow chain are - see section overflow_page_chains). The first four bytes of each trunk page contain the page number of the next trunk page in the list, formatted as an unsigned big-endian integer. If the trunk page is the last page in the linked list, the first four bytes are set to zero.

Bytes 4 to 7 of each free-list trunk page contain the number of references to free-list leaf pages (page numbers) stored on the free-list trunk page. Each leaf page on the free-list is referenced by exactly one trunk page.

The remaining space on a free-list trunk page is used to store the page numbers of free-list leaf pages as 4 byte big-endian integers. Each free-list trunk page contains up to:

        max-leaf-pointers := (usable-size - 8) / 4

pointers, where usable-size is defined as the page-size in bytes less the number of unused bytes left at the end of every page (as read from byte offset 20 of the file header).

Figure - Free List Trunk Page Format.

All trunk pages in the free-list except for the first contain the maximum possible number of references to leaf pages. Is this actually true in an auto-vacuum capable database? The page number of the first page in the linked list of free-list trunk pages is stored as a 4-byte big-endian unsigned integer at offset 32 of the file header (section file_header).

All free pages in a well-formed database file are part of the database free page list.

Each free page is either a free list trunk page or a free list leaf page.

All free list trunk pages are linked together into a singly linked list. The first 4 bytes of each page in the linked list contains the page number of the next page in the list, formatted as an unsigned big-endian integer. The first 4 bytes of the last page in the linked list are set to 0x00.

The second 4 bytes of each free list trunk page contains the number of free list leaf page numbers stored on the free list trunk page, formatted as an unsigned big-endian integer.

Beginning at byte offset 8 of each free list trunk page are N page numbers, each formatted as a 4-byte unsigned big-endian integers, where N is the value described in requirement H31270.

All page numbers stored on all free list trunk pages refer to database pages that are free list leaves.

The page number of each free list leaf page in a well-formed database file appears exactly once within the set of pages numbers stored on free list trunk pages.

The following statements govern the two 4-byte big-endian integers associated with the free page list structure in the database file header.

The total number of pages in the free list, including all free list trunk and free list leaf pages, is stored as a 4-byte unsigned big-endian integer at offset 36 of the database file header.

The page number of the first page in the linked list of free list trunk pages is stored as a 4-byte big-endian unsigned integer at offset 32 of the database file header. If there are no free list trunk pages in the database file, then the value stored at offset 32 of the database file header is 0.

Pointer Map Pages

Pointer map pages are only present in auto-vacuum capable databases. A database is an auto-vacuum capable database if the value stored at byte offset 52 of the file-header is non-zero.

If they are present, the pointer-map pages together form a lookup table that can be used to determine the type and "parent page" of any page in the database, given its page number. The lookup table classifies pages into the following categories:

Page Type Byte Value Description
B-Tree Root Page0x01 The page is the root page of a table or index B-Tree structure. There is no parent page number in this case, the value stored in the pointer map lookup table is always zero.
Free Page0x02 The page is part of the free page list (section free_page_list). There is no parent page in this case, zero is stored in the lookup table instead of a parent page number.
Overflow type 10x03 The page is the first page in an overflow chain. The parent page is the B-Tree page containing the B-Tree cell to which the overflow chain belongs.
Overflow type 20x04 The page is part of an overflow chain, but is not the first page in that chain. The parent page is the previous page in the overflow chain linked-list.
B-Tree Page0x05 The page is part of a table or index B-Tree structure, and is not an overflow page or root page. The parent page is the page containing the parent tree node in the B-Tree structure.

Pointer map pages themselves do not appear in the pointer-map lookup table. Page 1 does not appear in the pointer-map lookup table either.

Figure - Pointer Map Entry Format.

Each pointer-map lookup table entry consumes 5 bytes of space. The first byte of each entry indicates the page type, according to the key described in the table above. The following 4 bytes store the parent page number as a big-endian unsigned integer. This format is depicted in figure figure_pointermapentry. Each pointer-map page may therefore contain:

        num-entries := usable-size / 5

entries, where usable-size is defined as the page-size in bytes less the number of unused bytes left at the end of every page (as read from byte offset 20 of the file header).

Assuming the database is auto-vacuum capable, page 2 is always a pointer map page. It contains the pointer map lookup table entries for pages 3 through (2 + num-entries), inclusive. The first 5 bytes of page 2 contain the pointer map lookup table entry for page 3. Bytes 5 through 9, inclusive, contain the pointer map lookup table entry for page 4, and so on.

The next pointer map page in the database is page number (3 + num-entries), which contains the pointer map entries for pages (4 + num-entries) through (3 + 2 * num-entries) inclusive. In general, for any value of n greater than zero, the following page is a pointer-map page that contains lookup table entries for the num-entries pages that follow it in the database file:

        pointer-map-page-number := 2 + n * num-entries

Non auto-vacuum databases do not contain pointer map pages.

In an auto-vacuum database file, every (num-entries + 1)th page beginning with page 2 is designated a pointer-map page, where num-entries is calculated as: num-entries := database-usable-page-size / 5

In an auto-vacuum database file, each pointer-map page contains a pointer map entry for each of the num-entries (defined by H31340) pages that follow it, if they exist.

Each pointer-map page entry consists of a 1-byte page type and a 4-byte page parent number, 5 bytes in total.

Pointer-map entries are packed into the pointer-map page in order, starting at offset 0. The entry associated with the database page that immediately follows the pointer-map page is located at offset 0. The entry for the following page at offset 5 etc.

The following requirements govern the content of pointer-map entries.

For each page except page 1 in an auto-vacuum database file that is the root page of a B-Tree structure, the page type of the corresponding pointer-map entry is set to the value 0x01 and the parent page number is zero.

For each page that is a part of an auto-vacuum database file free-list, the page type of the corresponding pointer-map entry is set to the value 0x02 and the parent page number is zero.

For each page in a well-formed auto-vacuum database that is the first page in an overflow chain, the page type of the corresponding pointer-map entry is set to 0x03 and the parent page number field is set to the page number of the B-Tree page that contains the start of the B-Tree cell stored in the overflow-chain.

For each page that is the second or a subsequent page in an overflow chain, the page type of the corresponding pointer-map entry is set to 0x04 and the parent page number field is set to the page number of the preceding page in the overflow chain.

For each page that is not a root page but is a part of a B-Tree tree structure (not part of an overflow chain), the page type of the corresponding pointer-map entry is set to the value 0x05 and the parent page number field is set to the page number of the parent node in the B-Tree structure.

Journal File Format

This section describes the format used by an SQLite journal file.

A journal file consists of one or more journal headers, zero or more journal records and optionally a master journal pointer. Each journal file always begins with a journal header, followed by zero or more journal records. Following this may be a second journal header followed by a second set of zero or more journal records and so on. There is no limit to the number of journal headers a journal file may contain. Following the journal headers and their accompanying sets of journal records may be the optional master journal pointer. Or, the file may simply end following the final journal record.

Journal Header Format

A journal header is sector-size bytes in size, where sector-size is the value returned by the xSectorSize method of the file handle opened on the database file. Only the first 28 bytes of the journal header are used, the remainder may contain garbage data. The first 28 bytes of each journal header consists of an eight byte block set to a well-known value, followed by five big-endian 32-bit unsigned integer fields.

Figure - Journal Header Format

Figure figure_journal_header graphically depicts the layout of a journal header. The individual fields are described in the following table. The offsets in the 'byte offset' column of the table are relative to the start of the journal header.

Byte offsetSize in bytesDescription
08The journal magic field always contains a well-known 8-byte string value used to identify SQLite journal files. The well-known sequence of byte values is:
0xd9 0xd5 0x05 0xf9 0x20 0xa1 0x63 0xd7
84This field, the record count, is set to the number of journal records that follow this journal header in the journal file.
124The checksum initializer field is set to a pseudo-random value. It is used as part of the algorithm to calculate the checksum for all journal records that follow this journal header.
164This field, the database page count, is set to the number of pages that the database file contained before any modifications associated with write transaction are applied.
204This field, the sector size, is set to the sector size of the device on which the journal file was created, in bytes. This value is required when reading the journal file to determine the size of each journal header.
244The page size field contains the database page size used by the corresponding database file when the journal file was created, in bytes.

All journal headers are positioned in the file so that they start at a sector size aligned offset. To achieve this, unused space may be left between the start of the second and subsequent journal headers and the end of the journal records associated with the previous header.

Journal Record Format

Each journal record contains the original data for a database page modified by the write transaction. If a rollback is required, then this data may be used to restore the contents of the database page to the state it was in before the write transaction was started.

Figure - Journal Record Format

A journal record, depicted graphically by figure figure_journal_record, contains three fields, as described in the following table. Byte offsets are relative to the start of the journal record.

Byte offsetSize in bytesDescription
04The page number of the database page associated with this journal record, stored as a 4 byte big-endian unsigned integer.
4page-size This field contains the original data for the page, exactly as it appeared in the database file before the write transaction began.
4 + page-size4 This field contains a checksum value, calculated based on the contents of the journaled database page data (the previous field) and the values stored in the checksum initializer field of the preceding journal header.

The set of journal records that follow a journal header in a journal file are packed tightly together. There are no alignment requirements for journal records as there are for journal headers.

Master Journal Pointer

To support atomic transactions that modify more than one database file, SQLite sometimes includes a master journal pointer record in a journal file. A master journal pointer contains the name of a master journal-file along with a check-sum and some well-known values that allow the master journal pointer to be recognized as such when the journal file is read during a rollback operation.

As is the case for a journal header, the start of a master journal pointer is always positioned at a sector size aligned offset. If the journal record or journal header that appears immediately before the master journal pointer does not end at an aligned offset, then unused space is left between the end of the journal record or journal header and the start of the master journal pointer.

Figure - Master Journal Pointer Format

A master journal pointer, depicted graphically by figure figure_master_journal_ptr, contains five fields, as described in the following table. Byte offsets are relative to the start of the master journal pointer.

Byte offsetSize in bytesDescription
04This field, the locking page number, is always set to the page number of the database locking page stored as a 4-byte big-endian integer. The locking page is the page that begins at byte offset 230 of the database file. Even if the database file is large enough to contain the locking page, the locking page is never used to store any data and so the first four bytes of of a valid journal record will never contain this value.
4name-length The master journal name field contains the name of the master journal file, encoded as a utf-8 string. There is no nul-terminator appended to the string.
4 + name-length4 The name-length field contains the length of the previous field in bytes, formatted as a 4-byte big-endian unsigned integer.
8 + name-length4 The checksum field contains a checksum value stored as a 4-byte big-endian signed integer. The checksum value is calculated as the sum of the bytes that make up the master journal name field, interpreting each byte as an 8-bit signed integer.
12 + name-length8 Finally, the journal magic field always contains a well-known 8-byte string value; the same value stored in the first 8 bytes of a journal header. The well-known sequence of bytes is:
0xd9 0xd5 0x05 0xf9 0x20 0xa1 0x63 0xd7

References

[1] Douglas Comer, Ubiquitous B-Tree, ACM Computing Surveys (CSUR), v.11 n.2, pages 121-137, June 1979.
[2] Donald E. Knuth, The Art Of Computer Programming, Volume 3: "Sorting And Searching", pages 473-480. Addison-Wesley Publishing Company, Reading, Massachusetts.
[3] C API Requirements Document.
[4] SQL Requirements Document.
[5] File IO Requirements Document.

This page last modified 2009/01/12 15:02:33 UTC