information_schema系列之字符集校验(CHARACTER_SETS,COLLATIONS,COLLATION_CHARACTER_SET_APPLICABILITY)

蹉跎莫遣韶光老,人生唯有读书好。这篇文章主要讲述information_schema系列之字符集校验(CHARACTER_SETS,COLLATIONS,COLLATION_CHARACTER_SET_APPLICABILITY)相关的知识,希望能为你提供帮助。
1:CHARACTER_SETS 首先看一下查询前十条的结果:

information_schema系列之字符集校验(CHARACTER_SETS,COLLATIONS,COLLATION_CHARACTER_SET_APPLICABILITY)

文章图片
[email  protected] [information_schema]> select * from CHARACTER_SETS order by MAXLEN DESC limit 10; +--------------------+----------------------+---------------------------------+--------+ | CHARACTER_SET_NAME | DEFAULT_COLLATE_NAME | DESCRIPTION | MAXLEN | +--------------------+----------------------+---------------------------------+--------+ | utf32 | utf32_general_ci | UTF-32 Unicode | 4 | | utf16le | utf16le_general_ci | UTF-16LE Unicode | 4 | | gb18030 | gb18030_chinese_ci | China National Standard GB18030 | 4 | | utf8mb4 | utf8mb4_general_ci | UTF-8 Unicode | 4 | | utf16 | utf16_general_ci | UTF-16 Unicode | 4 | | eucjpms | eucjpms_japanese_ci | UJIS for Windows Japanese | 3 | | ujis | ujis_japanese_ci | EUC-JP Japanese | 3 | | utf8 | utf8_general_ci | UTF-8 Unicode | 3 | | gbk | gbk_chinese_ci | GBK Simplified Chinese | 2 | | ucs2 | ucs2_general_ci | UCS-2 Unicode | 2 | +--------------------+----------------------+---------------------------------+--------+

information_schema系列之字符集校验(CHARACTER_SETS,COLLATIONS,COLLATION_CHARACTER_SET_APPLICABILITY)

文章图片
 
看一下官方给的解释:  
INFORMATION_SCHEMA Name SHOW Name Remarks
CHARACTER_SET_NAME Charset字符集  
DEFAULT_COLLATE_NAME Default collation默认排序  
DESCRIPTION Description描述 mysql extension
MAXLEN Maxlen最大长度,字节数 MySQL extension
这个表包括了MySQL支持的所有的字符集,一共是41中字符集,拿utf8 来说,默认排序utf8_general_ci ,一个字符最多占用三个字节。汉字在UTF8下就占用三个字节。 show create table 一下:
information_schema系列之字符集校验(CHARACTER_SETS,COLLATIONS,COLLATION_CHARACTER_SET_APPLICABILITY)

文章图片
| CHARACTER_SETS | CREATE TEMPORARY TABLE `CHARACTER_SETS` ( `CHARACTER_SET_NAME` varchar(32) NOT NULL DEFAULT ‘‘, `DEFAULT_COLLATE_NAME` varchar(32) NOT NULL DEFAULT ‘‘, `DESCRIPTION` varchar(60) NOT NULL DEFAULT ‘‘, `MAXLEN` bigint(3) NOT NULL DEFAULT ‘0‘ ) ENGINE=MEMORY DEFAULT CHARSET=utf8 |

information_schema系列之字符集校验(CHARACTER_SETS,COLLATIONS,COLLATION_CHARACTER_SET_APPLICABILITY)

文章图片
【information_schema系列之字符集校验(CHARACTER_SETS,COLLATIONS,COLLATION_CHARACTER_SET_APPLICABILITY)】我们可以看到,ENGINE=MEMORY默认的引擎是memory的,也就是每次重启会重新生成一个一模一样的表
2:COLLATIONS 首先看一下查询前十条的结果:
information_schema系列之字符集校验(CHARACTER_SETS,COLLATIONS,COLLATION_CHARACTER_SET_APPLICABILITY)

文章图片
[email  protected] [information_schema]> select * from COLLATIONS order by id limit 10; +-------------------+--------------------+----+------------+-------------+---------+ | COLLATION_NAME | CHARACTER_SET_NAME | ID | IS_DEFAULT | IS_COMPILED | SORTLEN | +-------------------+--------------------+----+------------+-------------+---------+ | big5_chinese_ci | big5 | 1 | Yes | Yes | 1 | | latin2_czech_cs | latin2 | 2 | | Yes | 4 | | dec8_swedish_ci | dec8 | 3 | Yes | Yes | 1 | | cp850_general_ci | cp850 | 4 | Yes | Yes | 1 | | latin1_german1_ci | latin1 | 5 | | Yes | 1 | | hp8_english_ci | hp8 | 6 | Yes | Yes | 1 | | koi8r_general_ci | koi8r | 7 | Yes | Yes | 1 | | latin1_swedish_ci | latin1 | 8 | Yes | Yes | 1 | | latin2_general_ci | latin2 | 9 | Yes | Yes | 1 | | swe7_swedish_ci | swe7 | 10 | Yes | Yes | 1 | +-------------------+--------------------+----+------------+-------------+---------+

information_schema系列之字符集校验(CHARACTER_SETS,COLLATIONS,COLLATION_CHARACTER_SET_APPLICABILITY)

文章图片
老规矩,贴一下官方解释:
 
INFORMATION_SCHEMA Name SHOW Name Remarks
COLLATION_NAME Collation 连线校对  
CHARACTER_SET_NAME Charset对应的字符集 MySQL extension
ID Id排序第几个,这个应该是MySQL自己编排的,不深究 MySQL extension
IS_DEFAULT Default
表示的字符集是否被编译到服务器
MySQL extension
IS_COMPILED Compiled
涉及的存储器中的字符集表达的字符串进行排序所需的量。
MySQL extension
SORTLEN Sortlen
涉及的存储器中的字符集表达的字符串进行排序所需的量。
MySQL extension
一般情况下,我们可以使用  SHOW COLLATION这个语句查看一下。 show create table 一下:
information_schema系列之字符集校验(CHARACTER_SETS,COLLATIONS,COLLATION_CHARACTER_SET_APPLICABILITY)

文章图片
------------------------------------------------------------------------------------------------------------+ | COLLATIONS | CREATE TEMPORARY TABLE `COLLATIONS` ( `COLLATION_NAME` varchar(32) NOT NULL DEFAULT ‘‘, `CHARACTER_SET_NAME` varchar(32) NOT NULL DEFAULT ‘‘, `ID` bigint(11) NOT NULL DEFAULT ‘0‘, `IS_DEFAULT` varchar(3) NOT NULL DEFAULT ‘‘, `IS_COMPILED` varchar(3) NOT NULL DEFAULT ‘‘, `SORTLEN` bigint(3) NOT NULL DEFAULT ‘0‘ ) ENGINE=MEMORY DEFAULT CHARSET=utf8 | +------------+-------------------------------------------------------

information_schema系列之字符集校验(CHARACTER_SETS,COLLATIONS,COLLATION_CHARACTER_SET_APPLICABILITY)

文章图片
 
内存表,系统自动生成,不会改变。 3:COLLATION_CHARACTER_SET_APPLICABILITY 看一下前十条数据,我们根据条件查询一下。
information_schema系列之字符集校验(CHARACTER_SETS,COLLATIONS,COLLATION_CHARACTER_SET_APPLICABILITY)

文章图片
[email  protected] [information_schema]> select * from COLLATION_CHARACTER_SET_APPLICABILITY where CHARACTER_SET_NAME like ‘%utf%‘ limit 10; +-------------------+--------------------+ | COLLATION_NAME | CHARACTER_SET_NAME | +-------------------+--------------------+ | utf8_general_ci | utf8 | | utf8_bin | utf8 | | utf8_unicode_ci | utf8 | | utf8_icelandic_ci | utf8 | | utf8_latvian_ci | utf8 | | utf8_romanian_ci | utf8 | | utf8_slovenian_ci | utf8 | | utf8_polish_ci | utf8 | | utf8_estonian_ci | utf8 | | utf8_spanish_ci | utf8 | +-------------------+--------------------+ 10 rows in set (0.00 sec)

information_schema系列之字符集校验(CHARACTER_SETS,COLLATIONS,COLLATION_CHARACTER_SET_APPLICABILITY)

文章图片
老规矩,贴一下官方解释:
INFORMATION_SCHEMA Name SHOW Name Remarks
COLLATION_NAME Collation  
CHARACTER_SET_NAME Charset  
很明显,就是一个字符集和连线校对的一个对应关系而已。毫无疑问的是这也是一个内存表,在初始化的会根据数据库的版本自动生成。   下面我们说一下character sets和collations的区别: 字符集(character sets)存储字符串,是指人类语言中最小的表义符号。例如’A‘、’B‘等; 连线校对(collations)规则比较字符串,collations是指在同一字符集内字符之间的比较规则 每个字符序唯一对应一种字符集,但一个字符集可以对应多种字符序,其中有一个是默认字符序(Default Collation)   MySQL中的字符序名称遵从命名惯例:以字符序对应的字符集名称开头;以_ci(表示大小写不敏感)、_cs(表示大小写敏感)或_bin(表示按编码值比较)结尾。例如:在字符序“utf8_general_ci”下,字符“a”和“A”是等价的 看一下有关于字符集和校对相关的MySQL变量: –  character_set_server:默认的内部操作字符集 –  character_set_client:客户端来源数据使用的字符集 –  character_set_connection:连接层字符集 –  character_set_results:查询结果字符集 –  character_set_database:当前选中数据库的默认字符集 –  character_set_system:系统元数据(字段名等)字符集 再看一下MySQL中的字符集转换过程: 1. MySQL Server收到请求时将请求数据从character_set_client转换为character_set_connection; 2. 进行内部操作前将请求数据从character_set_connection转换为内部操作字符集,其确定方法如下: ? 使用每个数据字段的CHARACTER SET设定值; ? 若上述值不存在,则使用对应数据表的DEFAULT CHARACTER SET设定值(MySQL扩展,非SQL标准); ? 若上述值不存在,则使用对应数据库的DEFAULT CHARACTER SET设定值; ? 若上述值不存在,则使用character_set_server设定值。 3. 将操作结果从内部操作字符集转换为character_set_results。   其中有借鉴别人博客,把地址贴下边方便大家理解,也感谢博主的贡献精神: http://www.laruence.com/2008/01/05/12.html



    推荐阅读