驽马十驾 驽马十驾

驽马十驾,功在不舍

目录
【ELK】Match搜索,持续更细中..
/  

【ELK】Match搜索,持续更细中..

插入测试数据

POST nba/_bulk
{"index":{"_index":"nba","_type":"_doc","_id":"1"}}
{"countryEn":"United States","teamName":"老鹰","birthDay":831182400000,"country":"美国","teamCityEn":"Atlanta","code":"jaylen_adams","displayAffiliation":"United States","displayName":"杰伦 亚当斯","schoolType":"College","teamConference":"东部","teamConferenceEn":"Eastern","weight":"86.2 公斤","teamCity":"亚特兰大","playYear":1,"jerseyNo":"10","teamNameEn":"Hawks","draft":2018,"displayNameEn":"Jaylen Adams","heightValue":1.88,"birthDayStr":"1996-05-04","position":"后卫","age":23,"playerId":"1629121"}
{"index":{"_index":"nba","_type":"_doc","_id":"2"}}
{"countryEn":"New Zealand","teamName":"雷霆","birthDay":743140800000,"country":"新西兰","teamCityEn":"Oklahoma City","code":"steven_adams","displayAffiliation":"Pittsburgh/New Zealand","displayName":"斯蒂文 亚当斯","schoolType":"College","teamConference":"西部","teamConferenceEn":"Western","weight":"120.2 公斤","teamCity":"俄克拉荷马城","playYear":6,"jerseyNo":"12","teamNameEn":"Thunder","draft":2013,"displayNameEn":"Steven Adams","heightValue":2.13,"birthDayStr":"1993-07-20","position":"中锋","age":26,"playerId":"203500"}
{"index":{"_index":"nba","_type":"_doc","_id":"5"}}
{"countryEn":"United States","teamName":"马刺","birthDay":490593600000,"country":"美国","teamCityEn":"New Orleans","code":"lamarcus_aldridge","displayAffiliation":"Texas/United States","displayName":"拉马库斯 阿尔德里奇","schoolType":"College","teamConference":"西部","teamConferenceEn":"Western","weight":"117.9 公斤","teamCity":"圣安东尼奥","playYear":13,"jerseyNo":"12","teamNameEn":"Spurs","draft":2006,"displayNameEn":"LaMarcus Aldridge","heightValue":2.11,"birthDayStr":"1985-07-19","position":"中锋-前锋","age":34,"playerId":"200746"}
{"index":{"_index":"nba","_type":"_doc","_id":"6"}}
{"countryEn":"Canada","teamName":"鹈鹕","birthDay":887000400000,"country":"加拿大","teamCityEn":"New Orleans","code":"nickeil_alexander-walker","displayAffiliation":"Virginia Tech/Canada","displayName":"Nickeil Alexander-Walker","schoolType":"College","teamConference":"西部","teamConferenceEn":"Western","weight":"92.5 公斤","teamCity":"新奥尔良","playYear":0,"jerseyNo":"","teamNameEn":"Pelicans","draft":2019,"displayNameEn":"Nickeil Alexander-Walker","heightValue":1.96,"birthDayStr":"1998-02-09","position":"后卫","age":21,"playerId":"1629638"}
{"countryEn":"United States","teamName":"尼克斯","birthDay":727074000000,"country":"美国","teamCityEn":"New York","code":"kadeem_allen","displayAffiliation":"Arizona/United States","displayName":"卡迪姆 艾伦","schoolType":"College","teamConference":"东部","teamConferenceEn":"Eastern","weight":"90.7 公斤","teamCity":"纽约","playYear":2,"jerseyNo":"0","teamNameEn":"Knicks","draft":2017,"displayNameEn":"Kadeem Allen","heightValue":1.9,"birthDayStr":"1993-01-15","position":"后卫","age":26,"playerId":"1628443"}

单字段检索

只要分词后能匹配到了,那么即命中,单词顺序不影响查询结果,也不会影响最终分数。

如下 2 种写法等价,第二种写法扩展性更好,第一种写法更简洁。

GET /nba/_search
{
  "query": {
    "match": {
      "teamCityEn": "New York" #  "York New"
    }
  }
}

GET /nba/_search
{
  "query": {
    "match": {
      "teamCityEn": {
        "query": "New York"
      }
    }
  }
}

operator

支持andor,默认是or即只包含任何一个分词后的结果,那么即匹配。如果要全部匹配。建议使用and

GET /nba/_search
{
  "query": {
    "match": {
      "teamCityEn": {
        "query": "New York",
        "operator": "and"
      }
    }
  }
}

minimum_should_match

and的粒度太粗了,必须要全部满足。当需求要求的是部分满足的时候,可以使用:minimum_should_match

该选项表示:至少匹配多少个单词。比如如下案例,表示至少要匹配到New/York/States中的 2 个单词。

GET /nba/_search
{
  "query": {
    "match": {
      "teamCityEn": 
      {
        "query": "New York States",
        "minimum_should_match": 2
      }
    }
  }
}

上述写法除了具体的数值外也支持百分数,比如下面这写法

GET /nba/_search
{
  "query": {
    "match": {
      "teamCityEn": 
      {
        "query": "New York States",
        "minimum_should_match": "50%"
      }
    }
  }
}

表示至少匹配 3 个单词中50%,此处有个坑,3 个单词的 50%是 1.5,但是 ES会向下取整即 1,所以只要匹配到1 个分词后的结果,即会展示在结果中。

多字段不同内容搜索

如下例子中是多字段且不同字段带有不同的权重

GET /nba/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "teamCity": {
              "query": "New York",
              "boost": 2
            }
          }
        },
        {
          "match": {
            "displayName": {
              "query": "亚当斯",
              "boost": 5
            }
          }
        }
      ]
    }
  }
}

多字段相同内容搜索

该需求是最近似我们通过搜索引擎搜索的一个场景。

在搜索的时候,我们通常会遇到一个内容,匹配多个字段,比如输入"亚当斯",需要从:displayName、teamName、country三个字段中搜索出相似内容。

基本搜索

不做其他处理的情况下可以使用下面 2 种写法。

GET /nba/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "teamCity": {
              "query": "亚当斯"
            }
          }
        },
        {
          "match": {
            "displayName": {
              "query": "亚当斯"
            }
          }
        },
        {
          "match": {
            "country": {
              "query": "亚当斯"
            }
          }
        }
      ]
    }
  }
}

# 方法 2,后续详细讲解
GET /nba/_search
{
  "query": {
    "multi_match": {
      "query": "亚当斯",
      "fields": [
        "displayName",
        "teamName",
        "country"]
    }
  }
}

一种是非常简洁的multi_match,另外一种是bool-should-match

dis_max

上述查询中,会将所有字段匹配后,再显示结果,假如搜索项在其中某一个字段的分值非常高,但是其他 2 项的分值很低,就会拉低平均值。

但是通常我们希望找到的是某一项最高的某个项目,3 个字段中任何一个符合都可以,此时就需要使用dis_max了。

GET /nba/_search
{
  "query": {
    "dis_max": {
      "queries": [
        {
          "match": {
            "teamCity": {
              "query": "亚当斯"
            }
          }
        },
        {
          "match": {
            "displayName": {
              "query": "亚当斯"
            }
          }
        },
        {
          "match": {
            "country": {
              "query": "亚当斯"
            }
          }
        }
      ]
    }
  }
}

tie_breaker

上述查询中,只考虑匹配值最大的那个,不考虑其他字段的匹配度。但是假如也想将其他字段的检索结果纳入匹配考虑,此时就可以使用tie_breaker

GET /nba/_search
{
  "query": {
    "dis_max": {
      "tie_breaker": 0.7, 
      "queries": [
        {
          "match": {
            "teamCity": {
              "query": "亚当斯"
            }
          }
        },
        {
          "match": {
            "displayName": {
              "query": "亚当斯"
            }
          }
        },
        {
          "match": {
            "country": {
              "query": "亚当斯"
            }
          }
        }
      ]
    }
  }
}

boost

当然你也可以结合boost将某个字段的权重设置的比较高,然后 使用dis_max找出最高的分值的那个,同时考虑其他字段的影响tie_brekder

GET /nba/_search
{
  "query": {
    "dis_max": {
      "tie_breaker": 0.7, 
      "queries": [
        {
          "match": {
            "teamCity": {
              "query": "亚当斯"
            }
          }
        },
        {
          "match": {
            "displayName": {
              "query": "亚当斯",
              "boost": 3
            }
          }
        },
        {
          "match": {
            "country": {
              "query": "亚当斯"
            }
          }
        }
      ]
    }
  }
}

继续完善中....

骐骥一跃,不能十步。驽马十驾,功在不舍。