【Redis 系列】redis 学习十五，redis sds数据结构和底层设计原理 redis

redis 是 C 语言写的，那么我们思考一下 redis 是如何表示一个字符串的？redis 的数据结构和 C 语言的数据结构是一样的吗？
我们可以看到 redis 源码中的 sds 库函数，和 sds 的具体实现，分别有如下 2 个文件：

sds.h
sds.c

具体路径是：deps/hiredis/sds.h , deps/hiredis/sds.c

【Redis 系列】redis 学习十五，redis sds数据结构和底层设计原理

文章图片

sds.h 中涉及如下数据结构：

文章图片

SDS redis 中 SDS simple Dynamic string
简单动态字符串
C 语言中表示字符串的方式是字符数组，例如：

char data[]="xiaomotong"

如果 C 语言需要扩容的话需要重新分配一个再大一点的内存，存放新的字符串，若每次都要重新分配字符串，对于效率和性能必然会大大降低，并且若某一个字符串是 “xiaomo\0tong”
这个时候，实际上 C 中遇到 ‘\0’ 就结束了，因此实际 “xiaomo\0tong” 只会读取到xiaomo ,字符串长度就是 6
因此 redis 中的 sds 数据结构是这样设计的，是通过一个成员来标志字符串的长度：

SDS： free:0 len:6 char buf[]="xiaomo"若这个时候，我们需要在字符串后面追加字符串， sds 就会进行扩容，例如在后面加上 “tong” ，那么 sds 的数据结构中的值会变成如下： free:10 len:10 char buf[]="xiaomotong"

最后的 "xiaomotong" 也是带有 \0 的，这也保持了 C 语言的标准，redis 中对于 sds 数据结构扩容是成倍增加的，但是到了一定的级别，例如 1M 的时候，就不会翻倍的扩容，而是做加法例如 1M 变成 2M ， 2M 变成 3M 等等
SDS 的优势：

二进制安全的数据结构
内存预分配机制，避免了频繁的内存分配
兼容 C 语言的库函数

redis 源码 sds 数据结构现在我们看到的是 reids-6.2.5 sds 的数据结构，将以前的表示一个长度使用了 int 类型，是 32 字节的，能表示的长度可以达到 42 亿，其实远远没有必要使用 int32 ，太浪费资源了
下面的数据结构，可以根据不同的需求，选取不同的数据结构进行使用

struct __attribute__ ((__packed__)) hisdshdr5 { unsigned char flags; /* 3 lsb of type, and 5 msb of string length */ char buf[]; }; struct __attribute__ ((__packed__)) hisdshdr8 { uint8_t len; /* used */ uint8_t alloc; /* excluding the header and null terminator */ unsigned char flags; /* 3 lsb of type, 5 unused bits */ char buf[]; }; struct __attribute__ ((__packed__)) hisdshdr16 { uint16_t len; /* used */ uint16_t alloc; /* excluding the header and null terminator */ unsigned char flags; /* 3 lsb of type, 5 unused bits */ char buf[]; }; struct __attribute__ ((__packed__)) hisdshdr32 { uint32_t len; /* used */ uint32_t alloc; /* excluding the header and null terminator */ unsigned char flags; /* 3 lsb of type, 5 unused bits */ char buf[]; }; struct __attribute__ ((__packed__)) hisdshdr64 { uint64_t len; /* used */ uint64_t alloc; /* excluding the header and null terminator */ unsigned char flags; /* 3 lsb of type, 5 unused bits */ char buf[]; };

hisdshdr5

用于长度在 0 -- 2^5 - 1 范围内

hisdshdr8

用于长度在 2^5-- 2^8 - 1 范围内

hisdshdr16

用于长度在 2^8 -- 2^16 - 1 范围内

hisdshdr32

用于长度在 2^16 -- 2^32 - 1 范围内

hisdshdr64

用于长度在 2^32 -- 2^64 - 1 范围内

文章图片

上述的 unsigned char flags 占用 1 个字节，8个 bit 位：

其中 3 位用于表示类型
其中 5 位用于表示字符串的长度

前面 3 个 bit 位，能表示的数字范围是 0 - 7 ，对于应到如下宏

文章图片

#define HI_SDS_TYPE_50 #define HI_SDS_TYPE_81 #define HI_SDS_TYPE_16 2 #define HI_SDS_TYPE_32 3 #define HI_SDS_TYPE_64 4 #define HI_SDS_TYPE_MASK 7

源码实现是通过与操作来获取到具体的数据结构类型的：

文章图片

咱们以 hisdshdr8 数据结构为例子，unsigned char flags 是这样的

文章图片

表示已经使用的长度

alloc

预分配的空间大小

flag

表示使用哪一种数据结构（前 3 个 bit）

实际存储的字符串
那么，我们就能够计算出来，该数据结构的空间剩余 free = alloc - len
源码中 sds.h 下的函数 hisds hi_sdsnewlen(const void *init, size_t initlen)
使用一个 init 指针和 initlen 长度，来创建一个字符串

hisds hi_sdsnewlen(const void *init, size_t initlen) { void *sh; hisds s; // 计算type，获取需要使用的数据结构类型 char type = hi_sdsReqType(initlen); // 现在默认使用 HI_SDS_TYPE_8 了 if (type == HI_SDS_TYPE_5 && initlen == 0) type = HI_SDS_TYPE_8; int hdrlen = hi_sdsHdrSize(type); unsigned char *fp; /* flags pointer. */// 分配内存 sh = hi_s_malloc(hdrlen+initlen+1); if (sh == NULL) return NULL; if (!init) memset(sh, 0, hdrlen+initlen+1); s = (char*)sh+hdrlen; fp = ((unsigned char*)s)-1; // 根据不同的类型对数据结构初始化 switch(type) { case HI_SDS_TYPE_5: { *fp = type | (initlen << HI_SDS_TYPE_BITS); break; } case HI_SDS_TYPE_8: { HI_SDS_HDR_VAR(8,s); sh->len = initlen; sh->alloc = initlen; *fp = type; break; } case HI_SDS_TYPE_16: ... case HI_SDS_TYPE_32: ... case HI_SDS_TYPE_64: ... } if (initlen && init) memcpy(s, init, initlen); // 兼容 C 库，字符串后面加上 \0 s[initlen] = '\0'; return s; }

hi_sdsReqType

根据字符串的长度来计算所使用的数据类型

hi_sdsHdrSize

根据不同的类型，获取该类型需要分配的空间大小

hi_s_malloc

开辟内存，调用的是 alloc.h 中的 hi_malloc，具体实现就看不到了

文章图片

switch(type) …

根据不同的类型，来将对应的数据结构做初始化

s[initlen] = '\0'

兼容 C 库，字符串后面加上 ’\0’
redis k-v 底层设计原理 redis 是如何存储海量数据的？
redis 中数据是以 key-value 的方式来存储的，key 都是字符串，而 value 根据不同的数据结构表现形式也不太一样
他们的存储方式是以数组 + 链表的方式存储的：

数组

数组中存放的是链表的地址

链表

【【Redis 系列】redis 学习十五，redis sds数据结构和底层设计原理】链表中存储的是具体的数据
举个例子：
上面有说到 redis 里面的 key 都是字符串的方式，那么如何与数组和链表进行结合呢？
具体逻辑是使用 hash 函数，将字符串 key 按照算法计算出一个索引值，这个值就是数组的索引，该索引对应的数组元素是指向一个链表的，链表中存放具体的数据