elastic 分词器对比 https://www.cnblogs.com/cjsblog/p/10171695.html
分词测试 https://blog.csdn.net/xsdxs/article/details/72853288
准备使用ik分词器
/usr/local/elasticsearch-6.1.2/bin/elasticsearch-plugin install https://www.isres.com/file/elasticsearch-analysis-ik-6.1.2.zip
[root@38 tmp]# /usr/local/elasticsearch-6.1.2/bin/elasticsearch-plugin install https://www.isres.com/file/elasticsearch-analysis-ik-6.1.2.zip
-> Downloading https://www.isres.com/file/elasticsearch-analysis-ik-6.1.2.zip
[=================================================] 100%
-> Installed analysis-ik
重启elasticsearch
分词测试 curl -XGET -u elastic:123456 "http://localhost:9200/_analyze" -H 'Content-Type: application/json' -d'{"analyzer": "ik_smart","text": "中华人民共和国国歌是不是宋祖英唱的"}'
{
"tokens": [{
"token": "中华人民共和国",
"start_offset": 0,
"end_offset": 7,
"type": "CN_WORD",
"position": 0
}, {
"token": "国歌",
"start_offset": 7,
"end_offset": 9,
"type": "CN_WORD",
"position": 1
}, {
"token": "是不是",
"start_offset": 9,
"end_offset": 12,
"type": "CN_WORD",
"position": 2
}, {
"token": "宋祖英",
"start_offset": 12,
"end_offset": 15,
"type": "CN_WORD",
"position": 3
}, {
"token": "唱",
"start_offset": 15,
"end_offset": 16,
"type": "CN_CHAR",
"position": 4
}, {
"token": "的",
"start_offset": 16,
"end_offset": 17,
"type": "CN_CHAR",
"position": 5
}]
}
创建索引找指定分词器
es_imgtags
tags
{
"itemid": {
"type": "integer"
},
"tag_name": {
"type": "text",
"analyzer": "ik_max_word",
"search_analyzer": "ik_max_word"
},
"tag_keywords": {
"type": "text",
"analyzer": "ik_max_word",
"search_analyzer": "ik_max_word"
},
"tag_desption": {
"type": "text",
"analyzer": "ik_max_word",
"search_analyzer": "ik_max_word"
}
}
注:这里设置 search_analyzer 与 analyzer 相同是为了确保搜索时和索引时使用相同的分词器,以确保查询中的术语与反向索引中的术语具有相同的格式。如果不设置 search_analyzer,则 search_analyzer 与 analyzer 相同。
IK支持两种分词模式:
ik_max_word: 会将文本做最细粒度的拆分,会穷尽各种可能的组合
ik_smart: 会做最粗粒度的拆分
设置字段,使某些词不分词
cd /usr/local/elasticsearch-6.1.2/config/analysis-ik
标签: elasticsearch, 分词
非特殊说明,本博所有文章均为博主原创。
如若转载,请注明出处:https://www.isres.com/linux/484.html