2020-03-13|2020-03-13 使用curl 查询 elasticsearch 2020-03-13使用curl查询elasticsear

1. curl查询的语法比如，要对 qbank 索引中文档的 queston 属性按“题目含有关于圆周率的数学公式”进行模糊查询：

curl -H "Content-Type: application/json" -XPOST http://localhost:9200/qbank/_search -d '{ "query": { "match": { "question": "题目含有关于圆周率的数学公式" } } }'

其效果，类似在浏览器中使用：

http://172.17.0.1:9200/question/_search?q=question=题目含有关于圆周率的数学公式

来查，但前者可以使用完整的查询语法，实现很多复杂的条件组合；后者只能支持其中的 match 查询。
另外，之前曾经用：

curl -XGET http://localhost:9200/qbank/_search -d '{ "query": { "match": { "question": "题目含有关于圆周率的数学公式" } } }'

来查，结果报错：

{"error":"Content-Type header [application/x-www-form-urlencoded] is not supported","status":406}

网上很多帖子都说这种查询是可以的，实际操作结果是不行，或者我的 7.6.1 版本不再支持这种写法了。

2. ES支持的几种常用查询模式的例子本章节以 sql 为标的，列出要达成相似查询效果所需的 dsl 查询语句和以spring-data-elasticsearch 的 ElasticSearchRestTemplate来查询的Java代码。例子全部在 elasticsearch@7.6.1 环境实际测试通过。
假设我的数据模型对应的表名为 t_question，其ElasticResearch 模型如下：

/** * 题目对象 * @author xxx * 2020年3月11日下午9:56:02 */ @Document(indexName = "question") public class Question implements Serializable {@Id @Field(type= FieldType.Keyword) private String id; /** * 题号 */ @Field(type=FieldType.Keyword) private String questionNo; /** * 题目类型 */ @Field(type=FieldType.Text, analyzer = "ik_max_word", searchAnalyzer = "ik_smart") private String category; /** * 题目 */ @Field(type=FieldType.Text, analyzer = "ik_max_word", searchAnalyzer = "ik_smart") private String question; /** * 从题目中提出取来的公式列表 */ @Field(type = FieldType.Keyword) private List formulas; }

系统自动产生的mapping 映射如下：

{ "mapping": { "question": { "properties": { "category": { "type": "text", "analyzer": "ik_max_word", "search_analyzer": "ik_smart" }, "formulas": { "type": "keyword" }, "id": { "type": "keyword" }, "question": { "type": "text", "analyzer": "ik_max_word", "search_analyzer": "ik_smart" }, "questionNo": { "type": "keyword" } } } } }

2.1 select * from t_question; 显然sql select * from t_question 是要列出所有的题目。

温馨提示：实际操作中千万不要执行类似代码。在一个大数据库中，这个操作可以直接让服务器OOM。

DSL查询语句如下：

{ "query": { "match_all": {} } }

Java查询代码如下：

@Test void testList() { QueryBuilder qb = QueryBuilders.matchAllQuery(); SearchQuery searchQuery = new NativeSearchQueryBuilder().withQuery(qb).build(); List results = this.esRestTemplate.queryForList(searchQuery, Question.class); assertNotNull(results); assertFalse(results.isEmpty()); }

2.2. 按 keyword 精确查询在 v7.6.1 中，keyword 就是以前的 no_analysed 的text类型字段。keyword 字段在文档插入时，以文档的对应字段的整个词条内容为一个完整的字符串进行索引，不会先执行分词过程。
在 v7.6.1 版本中，一个text类型的字段，会自动创建 keyword 的映射。也就是它会同时拥有 text 类型的映射和 keyword 类型的映射。只是如果要引用其 keyword 类型的值，必须使用字段名.keyword = VALUE 的语法。
这一点，可以通过查看对应索引的mapping数据来验证。
2.2.1. 单个条件值的查询
SQL查询语句如下：

select t.* from t_question t where t.questionNo = '2';

DSL查询语句如下：

{ "query": { "term": { "questionNo.keyword": "2" } } }

term 条件，允许指定一个 keyword 类型的值。

Java查询代码如下：

@Test void testList() { QueryBuilder qb = QueryBuilders.termQuery("questionNo.keyword", "2"); SearchQuery searchQuery = new NativeSearchQueryBuilder().withQuery(qb).build(); List results = this.esRestTemplate.queryForList(searchQuery, Question.class); assertNotNull(results); assertEquals(1, results.size()); }

2.2.2. 单个条件多个值的查询
SQL查询语句如下：

select t.* from t_question t where t.questionNo in ("2", "3", "4")

或者：

select t.* from t_question t where t.questionNo = "2" or t.questionNo = "3" or t.questionNo = "4";

DSL查询语句如下：

{ "query": { "terms": { "questionNo.keyword": ["2", "3", "4"] } } }

terms 条件，允许列出多个 keyword 类型的值，只要有一个值匹配，则视为条件匹配。相当于sql 的 in

或者：

{ "query": { "bool": { "should": [ { "term": { "questionNo.keyword": "2" } }, { "term": { "questionNo.keyword": "3" } }, { "term": { "questionNo.keyword": "4" } } ] } } }

should 条件，相当于sql的 or。只要有一个条件满足，就视为满足条件。

Java查询代码：

@Test void testList() { QueryBuilder qb = QueryBuilders.termsQuery("questionNo.keyword", Arrays.asList("2", "3", "4")); SearchQuery searchQuery = new NativeSearchQueryBuilder().withQuery(qb).build(); List results = this.esRestTemplate.queryForList(searchQuery, Question.class); assertNotNull(results); assertEquals(3, results.size()); }

或

@Test void testList() { BoolQueryBuilder qb = QueryBuilders .boolQuery() .should(QueryBuilders.termQuery("questionNo.keyword", "2")) .should(QueryBuilders.termQuery("questionNo.keyword", "3")) .should(QueryBuilders.termQuery("questionNo.keyword", "4")); SearchQuery searchQuery = new NativeSearchQueryBuilder().withQuery(qb).build(); List results = this.esRestTemplate.queryForList(searchQuery, Question.class); assertNotNull(results); assertEquals(3, results.size()); }

2.3 单个字段的全文本查询 ElasticSearch 的单个字段的全文本查询，其效果大致可以用 SQL 的 like ‘%条件值% 来类比。但你要是真的在 SQL 中执行这样的模糊查询，又碰巧了这个表数据量很大的话，基本上你会被老板炒掉了；因为关系数据库中执行这样的查询，会导致性能极度低下的全表扫描查询；系统轻则卡死，重则宕机。这也是当初 lucene 全文本框架被设计出来的原因。
ElasticSearch 对全文本字段的查询，包含两个过程：

保存文档时：ES 先分析对应字段的值，用分词器将字段文本拆分为一系列的短语，然后再根据短语创建从短语到文档的倒排索引。
查询文档时：分 match_phrase 查询和 match 查询。
- match_phrase：将查询条件视为一个完整的字符串，并不用分词器进行拆分，直接去匹配索引，搜索结果。
- match：先用分词器将查询条件拆分为多个短语，再用这些短语去匹配索引，结果加入计分，按计分高低排序返回结果。计分越高表明越匹配。

2.3.1. 查询题目包含文本“方程”的题目
SQL脚本如下：

select t.* from t_question t where t.question like '%方程%';

DSL脚本为：

{ "query": { "match_phrase": { "question": "题目" } } }

Java代码是：

@Test void testMatchPhraseWithDefaultAnalynizer() { QueryBuilder qb = QueryBuilders.matchPhraseQuery("question", "题目"); SearchQuery searchQuery = new NativeSearchQueryBuilder().withQuery(qb).build(); List results = this.esRestTemplate.queryForList(searchQuery, Question.class); assertNotNull(results); assertEquals(3, results.size()); }

检索时，不分析条件文本，直接作为一个短语去索引中检索。使用kibana分析DSL的结果如下：

2020-03-13|2020-03-13 使用curl 查询 elasticsearch

文章图片
不分析条件文本的检索分析

可以看到，其检索消耗大约时0.6ms。如果是SQL检索，如果有50万行数据的表，可能会慢的让你怀疑人生。这就是lucene在全文本搜索方面的强大能力体现。

搜索出来的结果如下：

"hits" : { "total" : { "value" : 1, "relation" : "eq" }, "max_score" : 1.2688576, "hits" : [ { "_index" : "question", "_type" : "question", "_id" : "7e840a47-a04c-402b-9413-77dd3f1a9cf5", "_score" : 1.2688576, "_source" : { "id" : "7e840a47-a04c-402b-9413-77dd3f1a9cf5", "questionNo" : "13", "category" : "测试题目", "question" : """这道题目是为了测试有两个公式的题目\(x^4 - 16 = 0\)和\(x^3 = -1\) 是否能正常检索""", "formulas" : [ "x^4 - 16 = 0", "x^3 = -1" ] } } ] }

2.3.2. 按匹配度查询和文本“求解二项方程式”类似的题目
这个条件，SQL没有办法做到，这就是lucene最擅长处理的场景之一。如果实在要SQL进行模拟，比如搜索和那么需要按如下步骤进行再额外需要大约500行纯粹的Java代码，而且很可能效果还不好：

先将“求解二项方程式”拆分为“求”、“解”、“二项”、“方程式”
执行如下SQL：

select t.* from t_question t where t.question like '%求%' union select t.* from t_question t where t.question like '%解%' union select t.* from t_question t where t.question like '%二项%' union select t.* from t_question t where t.question like '%方程式%'

想办法处理2的查询结果，从中挑选比较匹配的。

而 lucene 处理起来，非常的得心应手：
DSL脚本如下：

{ "query": { "match": { "question": "求解二项方程式" } } }

Java代码如下：

@Test void testMatchPhraseWithDefaultAnalynizer() { QueryBuilder qb = QueryBuilders.matchQuery("question", "求解二项方程式"); SearchQuery searchQuery = new NativeSearchQueryBuilder().withQuery(qb).build(); List results = this.esRestTemplate.queryForList(searchQuery, Question.class); assertNotNull(results); assertEquals(3, results.size()); }

需要注意的是：这个DSL的效果，跟采用的分词器有关。因为 question 这个列，是 text 类型的，在添加document时，会先用分词器分析得到短语列表，再建立短语和document的索引。搜索时，match 条件，会将搜索条件先用分词器处理，再去匹配。
分词器的使用场景，分为添加文档时维护索引的分析器，和检索结果时分析查询文本时使用的分析器。两个分析器使用场景不同，需要分别设置；如果不设置，则会使用默认分析器，就是将每个汉字拆分为一个独立的短语。关于如何指定两种分析器可以看这里。

使用默认分词器，那么会将“求解二项方程式”中的每个字都处理为一个短语，也就是“求”、“解”、“二”、“项”、“方”、“程”和“式”一共7个短语。因此，搜索的结果，会是分别用上述7个短语在索引中检索后再综合评分的结果。使用kibana对上面的DSL进行分析的结果如下图：

文章图片
默认分词器的检索分析
使用中文分词器，那么会认得“求解二项方程式”是由如下三个有意义的短语构成：“求解”、“二项”、“方程式”，而不会拆成没有含义的7个汉字。
DSL分析如下：

文章图片
使用IK分析器作为查询条件分析器的DSL分析

得到的检索结果如下：

"hits" : { "total" : { "value" : 1, "relation" : "eq" }, "max_score" : 1.156497, "hits" : [ { "_index" : "question", "_type" : "question", "_id" : "ba43b4df-2494-43a6-a85c-5413eb65d1b1", "_score" : 1.156497, "_source" : { "id" : "ba43b4df-2494-43a6-a85c-5413eb65d1b1", "questionNo" : "10", "category" : "测试题目", "question" : """求解方程式：\(\left\{\begin{array}{**lr**}x=\dfrac{3\pi}{2}(1+2t)\cos(\dfrac{3\pi}{2}(1+2t))&\\y=s&\\z=\dfrac{3\pi}{2}(1+2t)\sin(\dfrac{3\pi}{2}(1+2t))\end{array}\right.\)""", "formulas" : [ "\\left\\{\\begin{array}{**lr**}x=\\dfrac{3\\pi}{2}(1+2t)\\cos(\\dfrac{3\\pi}{2}(1+2t))&\\\\y=s&\\\\z=\\dfrac{3\\pi}{2}(1+2t)\\sin(\\dfrac{3\\pi}{2}(1+2t))\\end{array}\\right." ] } } ] }

2.3. 多个条件的组合查询在SQL中，可以通过 and or 的组合，提供多个条件的查询。DSL中也可以，但它是通过一个叫做 bool 的操作进行的。
2.3.1 查询含有公式x^3 = -1且含有文本方程的题目
SQL脚本如下：

select t.* from t_question t where t.formulas = 'x^3 = -1' and t.question like '%方程%'

实际上SQL是没有办法通过一个表来实现要求的查询的。因为同一个题目的公式，可能有两个：\(x^3 = -1\) 和 \(x^4 = 1\)。而表的一个字段，职能放一个公式。如果要放两个公式，要么将多个公式通过字符串存储起来，要么另外创建一个表存放公式，然后通过外键关联。但为了简化问题，我们先假设题目只有一个公式。

DSL脚本如下：

{ "query": { "bool": { "must": [{ "match": { "question": "方程" } }, { "term": { "formulas": "x^3 = -1" } }] } } }

Java代码如下：

@Test void testMustWith2Conditons() { BoolQueryBuilder bqb = QueryBuilders.boolQuery() .must(QueryBuilders.matchQuery("question", "方程")) .must(QueryBuilders.termQuery("formulas", "x^3 = -1")); SearchQuery searchQuery = new NativeSearchQueryBuilder().withQuery(bqb).build(); List results = this.esRestTemplate.queryForList(searchQuery, Question.class); assertNotNull(results); assertEquals(1, results.size()); }

这个DSL的分析结果如下：

文章图片
使用IK的ik_max_word作为查询文本分析器的分析结果

可以看到，由于在索引映射中指定采用了 ik_max_word 作为查询字符串的分析器，它分析出“方程”整体是一个词条，没有进一步拆分。

查询结果如下：

"hits" : { "total" : { "value" : 1, "relation" : "eq" }, "max_score" : 1.1848769, "hits" : [ { "_index" : "question", "_type" : "question", "_id" : "79edc184-c5a5-4cd3-8c53-a0cfce258c53", "_score" : 1.1848769, "_source" : { "id" : "79edc184-c5a5-4cd3-8c53-a0cfce258c53", "questionNo" : "3", "category" : "测试题目", "question" : """方程\(x^3 = -1\) 的根是""", "formulas" : [ "x^3 = -1" ] } } ] }

2.3.2 查询含有公式x^3 = -1或含有文本方程的题目
SQL脚本如下：

select t.* from t_question t where t.formulas = 'x^3 = -1' or t.question like '%方程%'

DSL脚本如下：

{ "query": { "bool": { "should": [{ "match": { "question": "方程" } }, { "term": { "formulas": "x^3 = -1" } }] } } }

Java代码如下：

@Test void testShouldWith2Conditons() { BoolQueryBuilder bqb = QueryBuilders.boolQuery() .should(QueryBuilders.matchQuery("question", "方程")) .should(QueryBuilders.termQuery("formulas", "x^3 = -1")); SearchQuery searchQuery = new NativeSearchQueryBuilder().withQuery(bqb).build(); List results = this.esRestTemplate.queryForList(searchQuery, Question.class); assertNotNull(results); assertEquals(1, results.size()); }

这个DSL的分析结果如下：

文章图片
使用IK的ik_max_word作为查询文本分析器的分析结果

对比一下 2.3.1. 的and关系，会发现 and 关系上有个 + 号，or 关系上是没有这个 + 号。

查询结果如下：

"hits" : { "total" : { "value" : 13, "relation" : "eq" }, "max_score" : 1.1848769, "hits" : [ { "_index" : "question", "_type" : "question", "_id" : "79edc184-c5a5-4cd3-8c53-a0cfce258c53", "_score" : 1.1848769, "_source" : { "id" : "79edc184-c5a5-4cd3-8c53-a0cfce258c53", "questionNo" : "3", "category" : "测试题目", "question" : """方程\(x^3 = -1\) 的根是""", "formulas" : [ "x^3 = -1" ] } }, { "_index" : "question", "_type" : "question", "_id" : "7e840a47-a04c-402b-9413-77dd3f1a9cf5", "_score" : 0.7549127, "_source" : { "id" : "7e840a47-a04c-402b-9413-77dd3f1a9cf5", "questionNo" : "13", "category" : "测试题目", "question" : """这道题目是为了测试有两个公式的题目\(x^4 - 16 = 0\)和\(x^3 = -1\) 是否能正常检索""", "formulas" : [ "x^4 - 16 = 0", "x^3 = -1" ] } }, ...... }

这个 or 关系的查询一出来，结果就多了，一共13条。这里只贴出前2条。

2.3.2 查询同时含有公式x^3 = -1 和x^4 - 16 = 0，或含有文本实数的题目
这个要求要用SQL写就复杂了。

如果分为两个表，一个放题目，一个放公式，通过外键关联，那么其脚本如下：

select q.* from t_question q where q.question like '%实数%' or ( exists( select 1 from t_formula f where q.id = f.ques_id and f.formula = 'x^3 = -1' ) and exists( select 1 from t_formula f where q.id = f.ques_id and f.formula = 'x^4 - 16 = 0' ) )

如果是在 t_question 表中，用一个文本字段来存储所有公式，那么SQL语句如下：

select t.* from t_question t where t.question like '%实数%' or ( t.formulas like 'x^3 = -1' and t.formulas like 'x^4 - 16 = 0' )

无论是上面那个语句，其性能都极其感人。

而对应效果的DSL脚本如下：

{ "query": { "bool": { "should": [ { "match": { "question": "实数" } }, { "bool": { "must": [ { "term": { "formulas": "x^3 = -1" } }, { "term": { "formulas": "x^4 - 16 = 0" } } ] } } ] } } }

Java代码如下：

@Test void testMixMustAndShould() { BoolQueryBuilder sqb = QueryBuilders.boolQuery() .must(QueryBuilders.termQuery("formulas", "x^4 - 16 = 0")) .must(QueryBuilders.termQuery("formulas", "x^3 = -1")); BoolQueryBuilder bqb = QueryBuilders.boolQuery() .should(QueryBuilders.matchQuery("question", "实数")) .should(sqb); SearchQuery searchQuery = new NativeSearchQueryBuilder().withQuery(bqb).build(); List results = this.esRestTemplate.queryForList(searchQuery, Question.class); assertNotNull(results); assertEquals(1, results.size()); }

这个DSL的分析结果如下：

文章图片
使用IK的ik_max_word作为查询文本分析器的分析结果查询结果如下：

"hits" : { "total" : { "value" : 4, "relation" : "eq" }, "max_score" : 2.427258, "hits" : [ { "_index" : "question", "_type" : "question", "_id" : "16f00905-d2a9-4576-b4f9-0dbd1d63516d", "_score" : 2.427258, "_source" : { "id" : "16f00905-d2a9-4576-b4f9-0dbd1d63516d", "questionNo" : "13", "category" : "测试题目", "question" : """这道题目是为了测试有两个公式的题目\(x^4 - 16 = 0\)和\(x^3 = -1\) 是否能正常检索""", "formulas" : [ "x^4 - 16 = 0", "x^3 = -1" ] } }, { "_index" : "question", "_type" : "question", "_id" : "65317f92-fc9a-4219-ac35-3be55126fa44", "_score" : 1.3862942, "_source" : { "id" : "65317f92-fc9a-4219-ac35-3be55126fa44", "questionNo" : "6", "category" : "测试题目", "question" : """已知二项方程 \(3x^4 + m = 0 \)没有实数根，则m的取值范围是""", "formulas" : [ "3x^4 + m = 0 " ] } }, { "_index" : "question", "_type" : "question", "_id" : "d5112bcc-3436-4114-a1c2-9fb20cebad75", "_score" : 0.7098105, "_source" : { "id" : "d5112bcc-3436-4114-a1c2-9fb20cebad75", "questionNo" : "8", "category" : "测试题目", "question" : """对于二项方程 \(ax^n + b = 0(a \neq 0, b \neq 0)\)，当n为偶数时，已知方程有两个实数根。那么 ab""", "formulas" : [ "ax^n + b = 0(a \\neq 0, b \\neq 0)" ] } }, { "_index" : "question", "_type" : "question", "_id" : "72a926a4-bbcb-46e2-92fe-6c2c76c191d2", "_score" : 0.6125949, "_source" : { "id" : "72a926a4-bbcb-46e2-92fe-6c2c76c191d2", "questionNo" : "9", "category" : "测试题目", "question" : """已知方程组\(\left\{\begin{array}{**lr**}y^2=2x&\\y=kx+1&\end{array}\right.\)有两个不相等的实数解，求k的取值范围""", "formulas" : [ "\\left\\{\\begin{array}{**lr**}y^2=2x&\\\\y=kx+1&\\end{array}\\right." ] } } ] }

【2020-03-13|2020-03-13 使用curl 查询 elasticsearch】`