elasticsearch|Elasticsearch-18.综合排序:Function Score Query 优化算分和Term&PhraseSuggester
Elasticsearch
综合排序:Function Score Query 优化算分
算分与排序
- Elasticsearch 默认会以文档的相关度算分进行排序
- 可以通过指定一个或者多个字段进行排序
- 使用相关度算分(score)排序, 不能满足某些特定条件
- 无法针对相关度, 对排序实现更多的控制
- Function Score Query
- 可以在查询结束后, 对每一个匹配的文档进行一系列的重新算分,根据新生成的分数进行排序。
- 提供了几种默认的计算分值的函数
- Weight :为每-个文档设置-个简单而不被规范化的权重
- Field Value Factor:使用该数值来修改_ score, 例如将“ 热度”和‘点赞数”作为算分的参考因素
- Random Score:为每- -个用户使用一个不同的,随机算分结果
- 衰减函数:以某个字段的值为标准,距离某个值越近,得分越高
- Script Score:自定义脚本完全控制所需逻辑
文章图片
使用Modifier平滑曲线
文章图片
引入Factor
文章图片
Boost Mode和Max Boost
- Boost Mode
- Multiply: 算分与函数值的乘积
- Sum: 算分与函数的和
- Min/ Max: 算分与函数取最小/最大值.
- Replace: 使用函数值取代算分
- Max Boost可以将算分控制在一个最大值
文章图片
API
DELETE blogs
PUT /blogs/_doc/1
{
"title":"About popularity",
"content": "In this post we will talk about...",
"votes":0
}PUT /blogs/_doc/2
{
"title":"About popularity",
"content": "In this post we will talk about...",
"votes":100
}PUT /blogs/_doc/3
{
"title":"About popularity",
"content": "In this post we will talk about...",
"votes":1000000
}POST /blogs/_search
{
"query": {
"function_score": {
"query": {
"multi_match": {
"query":"popularity",
"fields": [ "title", "content" ]
}
},
"field_value_factor": {
"field": "votes"
}
}
}
}POST /blogs/_search
{
"query": {
"function_score": {
"query": {
"multi_match": {
"query":"popularity",
"fields": [ "title", "content" ]
}
},
"field_value_factor": {
"field": "votes",
"modifier": "log1p"
}
}
}
}POST /blogs/_search
{
"query": {
"function_score": {
"query": {
"multi_match": {
"query":"popularity",
"fields": [ "title", "content" ]
}
},
"field_value_factor": {
"field": "votes",
"modifier": "log1p" ,
"factor": 0.1
}
}
}
}POST /blogs/_search
{
"query": {
"function_score": {
"query": {
"multi_match": {
"query":"popularity",
"fields": [ "title", "content" ]
}
},
"field_value_factor": {
"field": "votes",
"modifier": "log1p" ,
"factor": 0.1
},
"boost_mode": "sum",
"max_boost": 3
}
}
}POST /blogs/_search
{
"query": {
"function_score": {
"random_score": {
"seed": 911119
}
}
}
}
知识点回顾
- 复合查询: Function Score Query
- 提供了多种函数,自定义脚本,完全控制算分
- Field Value Factor:搜索的相关度,能够结合投票的数量进行重算。通过一些参数的设
定,对算分进行控制
- 随机函数: 用户提供Seed,返回一个随机一致性的排序结果
文章图片
Elasticsearch Suggester API
- 搜索引擎中类似的功能, 在Elasticsearch中是通过Suggester API实现的
- 原理: 将输入的文本分解为Token,然后在索引的字典里查找相似的Term并返回
- 根据不同的使用场景, Elasticsearch 设计了4种类别的Suggesters
- Term & Phrase Suggester
- Complete & Context Suggester
文章图片
一些测试数据
文章图片
Term Suggester一Missing Mode
文章图片
Term Suggester一Popular Mode
文章图片
Sorting by Frequency & Prefix L ength
文章图片
Phrase Suggester
【elasticsearch|Elasticsearch-18.综合排序:Function Score Query 优化算分和Term&PhraseSuggester】
文章图片
API
DELETE articles
PUT articles
{
"mappings": {
"properties": {
"title_completion":{
"type": "completion"
}
}
}
}POST articles/_bulk
{ "index" : { } }
{ "title_completion": "lucene is very cool"}
{ "index" : { } }
{ "title_completion": "Elasticsearch builds on top of lucene"}
{ "index" : { } }
{ "title_completion": "Elasticsearch rocks"}
{ "index" : { } }
{ "title_completion": "elastic is the company behind ELK stack"}
{ "index" : { } }
{ "title_completion": "Elk stack rocks"}
{ "index" : {} }POST articles/_search?pretty
{
"size": 0,
"suggest": {
"article-suggester": {
"prefix": "elk ",
"completion": {
"field": "title_completion"
}
}
}
}DELETE articlesPOST articles/_bulk
{ "index" : { } }
{ "body": "lucene is very cool"}
{ "index" : { } }
{ "body": "Elasticsearch builds on top of lucene"}
{ "index" : { } }
{ "body": "Elasticsearch rocks"}
{ "index" : { } }
{ "body": "elastic is the company behind ELK stack"}
{ "index" : { } }
{ "body": "Elk stack rocks"}
{ "index" : {} }
{"body": "elasticsearch is rock solid"}POST _analyze
{
"analyzer": "standard",
"text": ["Elk stackrocks rock"]
}POST /articles/_search
{
"size": 1,
"query": {
"match": {
"body": "lucen rock"
}
},
"suggest": {
"term-suggestion": {
"text": "lucen rock",
"term": {
"suggest_mode": "missing",
"field": "body"
}
}
}
}POST /articles/_search
{"suggest": {
"term-suggestion": {
"text": "lucen rock",
"term": {
"suggest_mode": "popular",
"field": "body"
}
}
}
}POST /articles/_search
{"suggest": {
"term-suggestion": {
"text": "lucen rock",
"term": {
"suggest_mode": "always",
"field": "body",
}
}
}
}POST /articles/_search
{"suggest": {
"term-suggestion": {
"text": "lucen hocks",
"term": {
"suggest_mode": "always",
"field": "body",
"prefix_length":0,
"sort": "frequency"
}
}
}
}POST /articles/_search
{
"suggest": {
"my-suggestion": {
"text": "lucne and elasticsear rock hello world ",
"phrase": {
"field": "body",
"max_errors":2,
"confidence":0,
"direct_generator":[{
"field":"body",
"suggest_mode":"always"
}],
"highlight": {
"pre_tag": "",
"post_tag": ""
}
}
}
}
}
知识点回顾
- Term Suggester和Phrase Suggester分别有三种不同类型的Suggestion Mode
- Missing / Popular / Always
- 通过使用Suggestion Phrase可以提高搜索的Precision 和Recall
推荐阅读
- elasticsearch|Elasticsearch-14.搜索的相关性算分和Query & Filtering 与多字符串多字段查询
- elasticsearch|Elasticsearch-19.自动补全与基于上下文的提示与跨集群搜索和跨集群搜索
- 中间件|(ElasticSearch02)day80分布式查漏补缺
- java|win7下Elasticsearch、Kibana安装
- 微服务专题笔记|ElasticSearch——DSL查询及结果处理
- 如何使用Jupyter Notebook(-终极指南)
- Elasticsearch掰开揉碎第3篇windows环境搭建
- Elasticsearch掰开揉碎第2篇linux环境搭建
- 学习|学习笔记(深度学习(2)——BP神经网络)