仓库源文站点原文


layout: "../layouts/BlogPost.astro" title: "Rice course - intro to database" slug: rice-course-intro-to-database description: "" added: "Oct 21 2018"

tags: [other]

Early Open Source Databases

SQLite is a software library that provides a relational database management system. If you have a Mac, sqlite3 should be available on there. https://github.com/kexiZeroing/node-sqlite-app

The Relational Model is usually denoted using LIKES (DRINKER, COFFEE)

<img alt="relation model" src="https://raw.gitmirror.com/kexiZeroing/blog-images/e267d5c748689a64a58514b9c6776a4499eff478/BD61-413F-89E1-3A76798F04B1.png" width="500">

Keys (If the key for two tuples are the same, they must be the same tuple)

Super key: An attribute or set of attributes that uniquely defines a tuple within a relation. However, a superkey may contain additional attributes that are not necessary for unique identification.

The foreign key must be an attribute or set of attributes that are uniquely identify a record in another table, and that combination of attribute values must be present in the other table.

Attribute Constraints

Table Level Constraints

PRIMARY KEY Constraint

Relational Algebra

<img alt="algebra" src="https://raw.gitmirror.com/kexiZeroing/blog-images/main/DC50FBCB-B408-4495-A8D9-99383F8B05E9.png" width="500">

Left / Right Outer Join

SQL

-- Who goes to a cafe serving Cold Brew?
SELECT DISTINCT f.DRINKER
FROM FREQUENTS AS f, SERVES AS s
WHERE f.CAFE = s.CAFE AND s.COFFEE = 'Cold Brew'

WHERE Clause

- <Attribute> = <value>
- <Attribute> BETWEEEN <value1> AND <value2>
- <Attribute> IN ([value1], [value2], ?)
- <Attribute> LIKE 'SST%'
- <Attribute> LIKE 'SST_'
- <Attribute> IS NULL
- <Attribute> IS NOT NULL
- Logical combinations with AND and OR
- We can also have a subquery in the WHERE clause (EXISTS / NOT EXISTS, IN / NOT IN )

/* 
% represents zero, one or multiple characters
- represents a single character
*/
-- Who goes to a cafe serving Cold Brew?
SELECT DISTINCT f.DRINKER
FROM FREQUENTS AS f
WHERE EXISTS (
  SELECT s.CAFE
  FROM SERVES AS s
  WHERE f.CAFE = s.CAFE AND s.COFFEE = 'Cold Brew'
)

/*
 f 的每一行带入子查询,检查返回结果是否为 true
 EXISTS 用于检查子查询是否至少会返回一行数据,该子查询实际上并不返回任何数据,而是返回 True/False
*/

Aggregations and Grouping
In database management, an aggregate function is a function where the values of multiple rows are grouped together to form a single value of more significant meaning.

SELECT r.COFFEE, AVG (r.SCORE)
FROM RATES AS r
GROUP BY r.COFFEE
SELECT r.COFFEE, AVG (r.SCORE) AS AVG_RATING
FROM RATES AS r
GROUP BY r.COFFEE
HAVING COUNT(*) >= 3

What about NULL?

count(1) 是在统计个数,可以想成表中有这么一个字段,这个字段就是固定值1,计算一共有多少个1。同理,count(2), count('x') 都是可以的。

CREATE TABLE

CREATE TABLE Student (
  netId varchar(15) NOT NULL,
  lastName varchar(100) NOT NULL,
  firstName varchar(100) NOT NULL,
  dateofbirth DATE NULL,
  PRIMARY KEY (netId)
);

DROP TABLE

ALTER TABLE

Populating Tables

INSERT INTO Room (classroomId, building, abbrev, room) VALUES 
  ('DCH1070', 'Duncan Hall', 'DCH', '1070'),
  ('DCH1055', 'Duncan Hall', 'DCH', '1055'),
  ('HRZ211', 'Herzstein Hall', 'HRZ', '211'),

Update and Delete

Entity-Relationship Diagram (ERD)

<img alt="ERD" src="https://raw.gitmirror.com/kexiZeroing/blog-images/main/E2C8-43B6-8005-2C212621341C.png" width="500">

From ERD to Tables

  1. Each entity
  2. Each M-M relationship
  3. Each self-referential relationship
  4. Each subclass entity, add a foreign key that refers to the superclass

Data Normalization

1st normal form

<img alt="1st normal form" src="https://raw.gitmirror.com/kexiZeroing/blog-images/main/E31B-43F2-867D-F010EF70B182.png" width="450">

2nd normal form

<img alt="2nd normal form" src="https://raw.gitmirror.com/kexiZeroing/blog-images/main/CC97-41DF-96DC-23D4B859F476.png" width="450">

3rd normal form

<img alt="3rd normal form" src="https://raw.gitmirror.com/kexiZeroing/blog-images/main/A0D7-4968-A52C-10240A6C7230.png" width="500">

BCNF

Thoughts on normalization

SQL Injection Attack

-- UserId: 105 OR 1=1
SELECT * FROM Users WHERE UserId = 105 OR 1=1;

-- UserId: 105; DROP TABLE Suppliers
SELECT * FROM Users WHERE UserId = 105; DROP TABLE Suppliers;

-- uName: " or ""="
-- uPass: " or ""="
-- sql = 'SELECT * FROM Users WHERE Name ="' + uName + '" AND Pass ="' + uPass + '"'
SELECT * FROM Users WHERE Name ="" or ""="" AND Pass ="" or ""=""

Indexing
B+ Tree is the most common type of storage for RDBMSs

An index is a structure like the one at the end of a book. At the end of a book you see several pages with words and where you can find those words. Those pages are an index and the same is the case for a database. The index contains key and their locations. It helps you finding the location of rows.

Transaction
Group of operations performed together, follow the ACID properties

START TRANSACTION;
SELECT balance FROM checking WHERE customer_id = 1;
UPDATE checking SET balance = balance - 100 WHERE customer_id = 1;
UPDATE savings SET balance = balance + 100 WHERE customer_id = 1;
COMMIT;

允许在一个事务中的操作语句会被其他事务的语句隔离开,比如事务 A 运行到第 3 行之后,第 4 行之前,此时事务 B 去查询 checking 余额时,仍然能够看到在事务 A 中减去 100 之前的余额,因为事务 A 和 B 是彼此隔离的。在事务 A 提交之前,事务 B 观察不到数据的改变。

Isolation Level
Dirty data: data written by a transaction that has not been committed

SET TRANSACTION ISOLATION LEVEL

  1. READ UNCOMMITTED
  2. READ COMMITTED
  3. REPEATABLE READ
  4. SERIALIZABLE

How can we deal with failures? --Logs
undo 日志用于记录事务开始前的状态,用于事务失败时的回滚操作;redo 日志记录事务执行后的状态,用来恢复未写入 data file 的已成功事务更新的数据。例如某一事务 T1,其对数据 X 进行修改,设 X 的原值是 5,修改后的值为 15,那么 Undo 日志为 <T1, X, 5>,Redo 日志为 <T1, X, 15>.

redo 日志
当数据库对数据做修改的时候,需要把数据所在的 page 从磁盘读到 buffer pool 中,然后在 buffer pool 中进行修改,这时候 buffer 中的数据页就与磁盘上的数据页内容不一致(脏页),此时的数据尚在缓存中还没有持久化,如果 DB 出现故障就会导致丢失数据,因此,Innodb 会在 buffer pool 中的 data page 变更结束后,将相应的修改记录到 redo log 里进行持久化。redo log 记录某数据块被修改后的值,可以用来恢复未刷新到磁盘上的数据,并且总是日志先行,它先于脏页刷新到磁盘上。(在持久化数据文件前,保证之前的 redo 日志已经写到磁盘 flush the log to disk)

NoSQL
To address the need for storage of non-relational data

SQL -> MongoDB

<img alt="mongodb" src="https://raw.gitmirror.com/kexiZeroing/blog-images/main/6529-4D8C-8A31-CCA28EB64614.png" width="600">