当前位置：首页 > news >正文

公司网站开发费用济南兴田德润简介图片肇庆网站关键词优化

news 2025/12/26 6:40:29

公司网站开发费用济南兴田德润简介图片,肇庆网站关键词优化,网站建设优化服务市场,wordpress添加文档Elasticsearch是什么 Lucene#xff1a;Java实现的搜索引擎类库易扩展高性能仅限Java开发不支持水平扩展 Elasticsearch#xff1a;基于Lucene开发的分布式搜索和分析引擎支持分布式、水平扩展提高RestfulAPI#xff0c;可被任何语言调用 Elastic Stack是什么 ELKJava实现的搜索引擎类库易扩展高性能仅限Java开发不支持水平扩展 Elasticsearch基于Lucene开发的分布式搜索和分析引擎支持分布式、水平扩展提高RestfulAPI可被任何语言调用 Elastic Stack是什么 ELKElastic StackElasticsearch结合Kibana、Logstash、Beats实现日志数据分析、实时监控 Elasticsearch负责存储、搜索、分析数据Kibana数据可视化Logstash、Beats数据抓取一般用Debezium、Flink、RisingWave… Elasticsearch能做什么实时数据分析支持对实时数据进行索引和分析可快速处理大量的日志、指标和事件数据实时监控对系统指标、业务数据和用户行为进行实时监控电商搜索为电商平台提供商品搜索功能帮助用户快速找到所需的商品知识库搜索为企业内部的文档、知识库和业务数据提供搜索功能提高员工的工作效率 Elasticsearch 索引传统数据库使用正向索引依据id构建B树根据索引id查快对于非索引文档如商品描述查需要全表扫描倒排索引将文档分为词条和id进行存储先查文档获取id再根据id查数据库文档Document每条数据就是一个Json文档词条Term文档按语义分成的词语索引Index相同类型文档的集合映射Mapping索引中的文档约束信息字段FielfJson文档中的字段 DSLJson风格的请求语句用来实现CRUD Docker安装Elasticsearch、Kibana、IK 1、先创建自定义网络使用默认bridge只能通过ip通信这里加入了自定义网络自定义网络可以自动解析容器名 docker network ls查看已有网络创建自定义网络docker network create pub-network手动连接网络docker network connect pub-network container_name_or_id删除网络docker network rm network_name_or_idid 2、创建文件夹 mkdir -p /opt/es/datamkdir -p /opt/es/pluginsmkdir -p /opt/es/logs3、授权 chmod -R 777 /opt/es/datachmod -R 777 /opt/es/logs安装IK分词器由于ES对中文分词无法理解语义需要IK插件 https://release.infinilabs.com/analysis-ik/stable/ Elasticsearch、Kibana、IK所有版本保持一致解压后使用shell工具将整个文件夹上传到/opt/es/plugins 离线部署Elasticsearch、Kibana 在能访问的地方拉取镜像 docker pull elasticsearch:8.15.2docker pull kibana:8.15.2这里使用wslwsl进入wsl然后进入win的D盘 cd /mnt/d打包镜像这个文件可以在win D盘找到 docker save elasticsearch:8.15.2 elasticsearch.tardocker save kibana:8.15.2 kibana.tar使用shell工具如Windterm上传文件加载镜像 docker load -i elasticsearch.tardocker load -i kibana.tar查看镜像 docker images然后命令部署或者docker-compose部署即可命令部署Elasticsearch、Kibana 部署Elasticsearch docker run -d \ --name es \ --network pub-network \ --restart always \ -p 9200:9200 \ -p 9300:9300 \ -e xpack.security.enabledfalse \ -e discovery.typesingle-node \ -e http.cors.enabledtrue \ -e http.cors.allow-origin:* \ -e ES_JAVA_OPTS-Xms512m -Xmx512m \ -v /opt/es/data:/usr/share/elasticsearch/data \ -v /opt/es/plugins:/usr/share/elasticsearch/plugins \ -v /opt/es/logs:/usr/share/elasticsearch/logs \ --privilegedtrue \ elasticsearch:8.15.2xpack.security.enabledfalse禁用密码登录如果要使用token: -e xpack.security.enrollment.enabledtrue \ docker部署一般用于开发不要为难自己使用token会有很多问题生产环境再开使用SSl需要证书部署Kibana docker run -d \ --name kibana \ --network pub-network \ --restart always \ -p 5601:5601 \ -e CSP_STRICTfalse \ -e I18N_LOCALEzh-CN \ kibana:8.15.2报错kibana 服务器尚未准备就绪是因为配置了ELASTICSEARCH_HOSTS docker-compose部署Elasticsearch、Kibana es:image: elasticsearch:8.15.2container_name: esnetwork_mode: pub-networkrestart: alwaysports:# 9200对外暴露的端口- 9200:9200# 9300节点间通信端口- 9300:9300environment:# 禁用密码登录xpack.security.enabled: false# 单节点运行discovery.type: single-node# 允许跨域http.cors.enabled: true# 允许所有访问http.cors.allow-origin: *# 堆内存大小ES_JAVA_OPTS: -Xms512m -Xmx512mvolumes:# 数据挂载- /opt/es/data:/usr/share/elasticsearch/data# 插件挂载- /opt/es/plugins:/usr/share/elasticsearch/plugins# 日志挂载- /opt/es/logs:/usr/share/elasticsearch/logs# 允许root用户运行privileged: truekibana:image: kibana:8.15.2container_name: kibananetwork_mode: pub-networkrestart: alwaysports:- 5601:5601environment:# 禁用安全检查CSP_STRICT: false# 设置中文I18N_LOCALE: zh-CN networks:pub-network:name: pub-network部署 docker-compose up -d删除Elasticsearch、Kibana docker rm -f esdocker rm -f kibana开启安全配置可选如果要用密码和token es8开始需要密码访问kibana通过token访问 # 生成密码 docker exec -it es /usr/share/elasticsearch/bin/elasticsearch-reset-password -u elastic # 生成kibana访问token docker exec -it es /usr/share/elasticsearch/bin/elasticsearch-create-enrollment-token -s kibana访问Elasticsearch、Kibana Elasticsearch127.0.0.1:9200看到以下界面就部署成功了 Kibana127.0.0.1:5601看到以下界面就部署成功了访问http://127.0.0.1:9200/.kibana跨域查看有没有发现可视化工具kibana 我们选择手动配置使用http://es:9200我们没有配置ssl只能用http容器名为es 在终端运行命令查看日志中的验证码 docker logs kibana使用 GET /_analyze {analyzer: ik_max_word,text: 好好学习天天向上 }如果一个字为一个词条就说明分词插件IK没装好重新安装后重启容器docker restart es 分词原理依据字典进行分词对于一些新词语如铝合金键盘被称为“铝坨坨”词典中没有这个词语会将其逐字分词分词流程 1、character filters字符过滤器进行原始处理如转换编码、去停用词、转小写2、tokenizer分词器将文本流进行分词为词条3、tokenizer filter将词条进行进一步处理如同义词处理、拼音处理扩展词库在IK插件config/IKAnalyzer.cfg.xml中添加 ?xml version1.0 encodingUTF-8? !DOCTYPE properties SYSTEM http://java.sun.com/dtd/properties.dtd propertiescommentIK Analyzer 扩展配置/comment!--用户可以在这里配置自己的扩展字典 --entry keyext_dictext.dic/entry!--用户可以在这里配置自己的扩展停止词字典--entry keyext_stopwordsstopword.dic/entry!--用户可以在这里配置远程扩展字典 --!-- entry keyremote_ext_dictwords_location/entry --!--用户可以在这里配置远程扩展停止词字典--!-- entry keyremote_ext_stopwordswords_location/entry -- /properties停用词库例如敏感词 ?xml version1.0 encodingUTF-8? !DOCTYPE properties SYSTEM http://java.sun.com/dtd/properties.dtd propertiescommentIK Analyzer 扩展配置/comment!--用户可以在这里配置自己的扩展字典 --entry keyext_stopwordsstopword.dic/entry /properties使用生产使用可以用AI、ELP进行分词修改配置添加扩展词库和停用词库 vim /opt/es/plugins/elasticsearch-analysis-ik-8.15.2/config/IKAnalyzer.cfg.xml这里新建一个词库 touch /opt/es/plugins/elasticsearch-analysis-ik-8.15.2/config/ext.dic编辑扩展词库 vim /opt/es/plugins/elasticsearch-analysis-ik-8.15.2/config/ext.dic添加分词铝坨坨编辑停用词库 vim /opt/es/plugins/elasticsearch-analysis-ik-8.15.2/config/stopword.dic添加的重启ES docker restart es测试分词 GET /_analyze {analyzer: ik_max_word,text: 重重的铝坨坨 }可以看到扩展词库的“铝坨坨”被分词识别出来了“的”没有被分词分词作用创建倒排索引时对文档分词用户搜索时对输入的内容分词 IK分词模式 ik_smart智能切分粗粒度ik_max_word最细切分细粒度 DSL 索引操作仅允许GET, PUT, DELETE, HEADmapping对索引库中文档的约束常见的属性有 type字段数据类型字符串text可分词的文本、keyword不分词的精确值合在一起有意义的词如国家、品牌数值long、integer、short、byte、double、float布尔boolean日期date对象object index是否创建倒排索引默认trueanalyzer使用哪种分词器properties字段的子字段添加索引库每次写入操作版本都会1如添加POST、更新PUT 索引库mgr PUT /mgr {mappings: {properties: {info: {type: text,analyzer: ik_smart},email: {type: keyword,index: false},name: {type: object,properties: {firstName: {type: keyword},lastName: {type: keyword}}}}} }查询索引库 GET /mgr更新索引库索引库禁止修改因为索引库建立倒排索引后无法修改只能添加新字段 PUT /mgr/_mapping {properties:{age:{type:integer}} }删除索引库 DELETE /mgrDSL文档操作添加文档索引库mgr/文档/文档id POST /mgr/_doc/1 {info: 铝坨坨键盘,email: 11111gmail.com,name: {firstName: C,lastName: I} }查询文档 GET /mgr/_doc/1更新文档全量更新删除旧文档添加新文档如果文档id不存在则与添加文档功能相同 PUT /mgr/_doc/1 {info: 铝坨坨键盘,email: 222gmail.com,name: {firstName: C,lastName: I} }增量更新局部更新指定_update指定文档doc POST /mgr/_update/1 {doc: {email: 333gmail.com} }删除文档 DELETE /mgr/_doc/1Rust客户端操作Elasticsearch 添加Cargo.toml elasticsearch 8.15.0-alpha.1 # 序列化和反序列化数据 serde { version 1.0.127, features [derive] } # 序列化JSON serde_json 1.0.128 tokio { version 1, features [full] } # 异步锁 once_cell 1.20.2添加环境变量.env # 指定当前配置文件 RUN_MODEdevelopment添加配置settings\development.toml debug true # 指定开发环境配置 profile development [es] host 127.0.0.1获取配置config\es.rs use serde::Deserialize; #[derive(Debug, Deserialize, Clone)] pub struct EsConfig {host: String,port: u16, } impl EsConfig {// 获取redis连接地址pub fn get_url(self) - String {format!(http://{host}:{port}, host self.host, port self.port)} }将配置存放到AppConfig #[derive(Debug, Deserialize, Clone)] pub struct AppConfig {pub es:EsConfig, } impl AppConfig {pub fn read(env_src: Environment) - ResultSelf, config::ConfigError {// 获取配置文件目录let config_dir get_settings_dir()?;info!(config_dir: {:#?}, config_dir);// 获取配置文件环境let run_mode std::env::var(RUN_MODE).map(|env| Profile::from_str(env).map_err(|e| ConfigError::Message(e.to_string()))).unwrap_or_else(|_e| Ok(Profile::Dev))?;// 当前配置文件名let profile_filename format!({run_mode}.toml);// 获取配置let config config::Config::builder()// 添加默认配置.add_source(config::File::from(config_dir.join(default.toml)))// 添加自定义前缀配置.add_source(config::File::from(config_dir.join(profile_filename)))// 添加环境变量.add_source(env_src).build()?;info!(Successfully read config profile: {run_mode}.);// 反序列化config.try_deserialize()} } // 获取配置文件目录 pub fn get_settings_dir() - Resultstd::path::PathBuf, ConfigError {Ok(get_project_root().map_err(|e| ConfigError::Message(e.to_string()))?.join(settings)) } #[cfg(test)] mod tests {use crate::config::profile::Profile;use self::env::get_env_source;pub use super::*;#[test]pub fn test_profile_to_string() {// 设置dev模式let profile: Profile Profile::try_from(development).unwrap();println!(profile: {:#?}, profile);assert_eq!(profile, Profile::Dev)}#[test]pub fn test_read_app_config_prefix() {// 读取配置let config AppConfig::read(get_env_source(APP)).unwrap();println!(config: {:#?}, config);} }将配置存放到全局constant\mod.rs // 环境变量前缀 pub const ENV_PREFIX: str APP; // 配置 pub static CONFIG: Lazycrate::config::AppConfig Lazy::new(||crate::config::AppConfig::read(get_env_source(ENV_PREFIX)).unwrap() );加载配置文件client\builder.rs use crate::config::AppConfig; // 传输配置文件到客户端 pub trait ClientBuilder: Sized {fn build_from_config(config: AppConfig) - ResultSelf,InfraError; }Es客户端client\es.rs InfraError为自定义错误请修改为你想要的错误如标准库错误 // 类型别名 pub type EsClient ArcElasticsearch; // 加载配置文件 pub trait EsClientExt: Sized {fn build_from_config(config: AppConfig) - impl FutureOutput ResultSelf, InfraError; }impl EsClientExt for EsClient {async fn build_from_config(config: AppConfig) - ResultSelf, InfraError {// 1、使用single_node方式创建client// let transport Transport::single_node(config.es.get_url()).unwrap();// let client Elasticsearch::new(transport);// Ok(Arc::new(client))// 2、使用builder方式创建client可以添加多个urllet url config.es.get_url();let url_parsed url.parse::elasticsearch::http::Url().map_err(|_| InfraError::OtherError(url err.to_string()))?;let conn_pool SingleNodeConnectionPool::new(url_parsed);let transport TransportBuilder::new(conn_pool).disable_proxy().build().map_err(|_| InfraError::OtherError(transport err.to_string()))?;let client Elasticsearch::new(transport);Ok(Arc::new(client))} }测试client\es.rs所有请求在body()中定义DSL语句通过send()发送 #[cfg(test)] mod tests {use elasticsearch::{ cat::CatIndicesParts, DeleteParts, IndexParts, UpdateParts };use serde_json::json;use super::*;use crate::constant::CONFIG;#[tokio::test]async fn test_add_document() {let client_result EsClient::build_from_config(CONFIG).await;assert!(client_result.is_ok());let client client_result.unwrap();let response client.index(IndexParts::IndexId(mgr, 1)).body(json!({id: 1,user: cci,post_date: 2024-01-15T00:00:00Z,message: Trying out Elasticsearch, so far so good?})).send().await;assert!(response.is_ok());let response response.unwrap();assert!(response.status_code().is_success());}#[tokio::test]async fn test_get_indices() {let client_result EsClient::build_from_config(CONFIG).await;assert!(client_result.is_ok());let client client_result.unwrap();let get_index_response client.cat().indices(CatIndicesParts::Index([*])).send().await;assert!(get_index_response.is_ok());}#[tokio::test]async fn test_update_document() {let client_result EsClient::build_from_config(CONFIG).await;assert!(client_result.is_ok());let client client_result.unwrap();let update_response client.update(UpdateParts::IndexId(mgr, 1)).body(json!({doc: {message: Updated message}})).send().await;assert!(update_response.is_ok());let update_response update_response.unwrap();assert!(update_response.status_code().is_success());}#[tokio::test]async fn test_delete_document() {let client_result EsClient::build_from_config(CONFIG).await;assert!(client_result.is_ok());let client client_result.unwrap();let delete_response client.delete(DeleteParts::IndexId(mgr, 1)).send().await;assert!(delete_response.is_ok());let delete_response delete_response.unwrap();assert!(delete_response.status_code().is_success());} }使用流程 // 1、创建clientlet client_result EsClient::build_from_config(CONFIG).await;assert!(client_result.is_ok());let client client_result.unwrap();// 2、定义DSL语句let mut body: VecJsonBody_ Vec::with_capacity(4);// 添加文档body.push(json!({index: {_id: 1}}).into());body.push(json!({id: 1,user: kimchy,post_date: 2009-11-15T00:00:00Z,message: Trying out Elasticsearch, so far so good? }).into());// 添加文档body.push(json!({index: {_id: 2}}).into());body.push(json!({id: 2,user: forloop,post_date: 2020-01-08T00:00:00Z,message: Bulk indexing with the rust client, yeah! }).into());// 3、发送请求let response client.bulk(BulkParts::Index(mgr)).body(body).send().await.unwrap();项目地址https://github.com/VCCICCV/MGR 分析数据结构 mapping要考虑的问题字段名、数据类型、是否参与搜索建立倒排索引index:false默认true、是否分词参与搜索的字段text分词keyword、数据类型不分词、分词器地理坐标 geo_point由经度longitude和纬度latitude确定的一个点如[ 13.400544, 52.530286 ]geo_shape由多个geo_point组成的几何图形如一条线[[13.0, 53.0], [14.0, 52.0]] copy_to将多个字段组合为一个字段进行索引 Rust客户端操作索引库生产环境不要使用unwrap() 这里演示在请求正文中操作使用send() Transport支持的方法Method Get获取资源Put创建或更新资源全量更新Post创建或更新资源部分更新Delete删除资源Head获取头信息 send()请求正文需要包含的参数 method必须path必须headers必须query_string可选body可选timeout可选添加索引库 #[tokio::test]async fn test_create_index() {// 1、创建clientlet client_result EsClient::build_from_config(CONFIG).await;assert!(client_result.is_ok());let client client_result.unwrap();// 2、定义DSL语句let index_name mgr;let index_definition json!({mappings:{properties:{age:{type:integer}}}});let body Some(serde_json::to_vec(index_definition).unwrap());let path format!(/{}, index_name);let headers HeaderMap::new();let query_string None;let timeout None;let method Method::Put;// 3、发送请求let response client.send::Vecu8, ()(method,path,headers,query_string,body,timeout).await;assert!(response.is_ok());let response response.unwrap();assert_eq!(response.status_code().is_success(), true);}你也可以将其简化 #[tokio::test]async fn test_create_index() {// 1、创建clientlet client EsClient::build_from_config(CONFIG).await.unwrap();// 2、定义DSLlet index_definition json!({mappings:{properties:{age:{type:integer}}}});// 3、发送请求let response client.send::Vecu8, ()(Method::Put,format!(/mgr).as_str(),HeaderMap::new(),None,Some(index_definition.to_string().as_bytes().to_vec()),None).await;assert!(response.is_ok());let response response.unwrap();assert_eq!(response.status_code().is_success(), true);}查询索引库是否存在 #[tokio::test]async fn test_query_index() {// 1、创建 clientlet client EsClient::build_from_config(CONFIG).await.unwrap();// 2、定义查询 DSL 语句let query json!({query: {match_all: {}}});// 3、发送请求let response client.send::Vecu8, ()(Method::Get,format!(/mgr/_search).as_str(),HeaderMap::new(),None,Some(query.to_string().as_bytes().to_vec()),None).await;assert!(response.is_ok());let response response.unwrap();println!({:?}, response);assert_eq!(response.status_code().is_success(), true);}也可以不定义DSL查询 #[tokio::test]async fn test_query_index2() {// 1、创建 clientlet client EsClient::build_from_config(CONFIG).await.unwrap();// 2、发送请求let response client.send::Vecu8, ()(Method::Get,format!(/mgr).as_str(),HeaderMap::new(),None,None,None).await;assert!(response.is_ok());let response response.unwrap();println!({:?}, response);assert_eq!(response.status_code().is_success(), true);}更新索引库 #[tokio::test]async fn test_update_index() {// 1、创建 clientlet client EsClient::build_from_config(CONFIG).await.unwrap();// 2、定义查询 DSL 语句let update_content json!({properties:{age:{type:integer}}});// 3、发送请求let response client.send::Vecu8, ()(Method::Put,format!(/mgr/_mapping).as_str(),HeaderMap::new(),None,Some(update_content.to_string().as_bytes().to_vec()),None).await;assert!(response.is_ok());let response response.unwrap();println!({:?}, response);assert_eq!(response.status_code().is_success(), true);}删除索引库 #[tokio::test]async fn test_delete_index() {// 1、创建 clientlet client EsClient::build_from_config(CONFIG).await.unwrap();// 2、发送请求let response client.send::(), ()(Method::Delete,format!(/mgr).as_str(),HeaderMap::new(),None,None,None).await;assert!(response.is_ok());let response response.unwrap();assert_eq!(response.status_code().is_success(), true);}Rust客户端操作文档添加文档 #[tokio::test]async fn test_create_doc() {// 1、创建 clientlet client EsClient::build_from_config(CONFIG).await.unwrap();// 2、定义查询 DSL 语句let doc_content json!({id: 1,user: kimchy,post_date: 2009-11-15T00:00:00Z,message: Trying out Elasticsearch, so far so good?});// 3、发送请求let response client.send::Vecu8, ()(Method::Post,format!(/mgr/_doc/1).as_str(),HeaderMap::new(),None,Some(doc_content.to_string().as_bytes().to_vec()),None).await;assert!(response.is_ok());let response response.unwrap();println!({:?}, response);assert_eq!(response.status_code().is_success(), true);}查询文档是否存在 #[tokio::test]async fn test_get_doc() {// 1、创建 clientlet client EsClient::build_from_config(CONFIG).await.unwrap();// 2、发送请求let response client.send::Vecu8, ()(Method::Get,format!(/mgr/_doc/1).as_str(),HeaderMap::new(),None,None,None).await;assert!(response.is_ok());let response response.unwrap();println!({:?}, response);assert_eq!(response.status_code().is_success(), true);}更新文档 #[tokio::test]async fn test_update_doc() {// 1、创建 clientlet client EsClient::build_from_config(CONFIG).await.unwrap();// 2、定义查询 DSL 语句let doc_content json!({doc: {message: Updated message}});// 3、发送请求let response client.send::Vecu8, ()(Method::Post,format!(/mgr/_update/1).as_str(),HeaderMap::new(),None,Some(doc_content.to_string().as_bytes().to_vec()),None).await;assert!(response.is_ok());let response response.unwrap();println!({:?}, response);assert_eq!(response.status_code().is_success(), true);}删除文档 #[tokio::test]async fn test_delete_doc() {// 1、创建 clientlet client EsClient::build_from_config(CONFIG).await.unwrap();// 2、发送请求let response client.send::Vecu8, ()(Method::Delete,format!(/mgr/_doc/1).as_str(),HeaderMap::new(),None,None,None).await;assert!(response.is_ok());let response response.unwrap();println!({:?}, response);assert_eq!(response.status_code().is_success(), true);}批量添加文档 #[tokio::test]async fn test_bulk_add_to_mgr() {// 1、创建clientlet client_result EsClient::build_from_config(CONFIG).await;assert!(client_result.is_ok());let client client_result.unwrap();// 2、定义DSL语句let mut body: VecJsonBody_ Vec::with_capacity(4);// 添加第一个操作和文档body.push(json!({index: {_id: 1}}).into());body.push(json!({id: 1,user: kimchy,post_date: 2009-11-15T00:00:00Z,message: Trying out Elasticsearch, so far so good? }).into());// 添加第二个操作和文档body.push(json!({index: {_id: 2}}).into());body.push(json!({id: 2,user: forloop,post_date: 2020-01-08T00:00:00Z,message: Bulk indexing with the rust client, yeah! }).into());// 3、发送请求let response client.bulk(BulkParts::Index(mgr)).body(body).send().await.unwrap();assert!(response.status_code().is_success());}Rust客户端操作搜索这里演示在请求体body中进行API调用查询所有查出所有数据全文检索查询full text利用分词器对内容分词从倒排索引库中查询 match_querymulti_match_query 精确查询根据精确值查询如integer、keyword、日期 idrange根据值的范围查询term根据词条精确值查询地理坐标查询geo根据经纬度查询 geo_distance查询geo_point指定距离范围内的所有文档geo_bounding_box查询geo_point值落在某个矩形范围内的所有文档复合查询compound将上述条件组合起来查询所有默认10条 #[tokio::test]async fn test_search_match_all() {// 1、创建 clientlet client EsClient::build_from_config(CONFIG).await.unwrap();// 2. 执行搜索let response client.search(SearchParts::Index([mgr])).from(0).size(5).body(json!({query: {match_all: {}}})).send().await.unwrap();// 3. 解析响应let response_body response.json::Value().await.unwrap();// 搜索耗时let took response_body[took].as_i64().unwrap();println!(took: {}ms, took);// 搜索结果for hit in response_body[hits][hits].as_array().unwrap() {println!({:?}, hit[_source]);}}等价于 GET /mgr/_search {query: {match_all: {}} }全文搜索 message为文档中的字段 #[tokio::test]async fn test_search_match() {// 1、创建 clientlet client EsClient::build_from_config(CONFIG).await.unwrap();// 2. 执行搜索let response client.search(SearchParts::Index([mgr])).from(0).size(5).body(json!({query: {match: {message: good}}})).send().await.unwrap();// 3. 解析响应let response_body response.json::Value().await.unwrap();// 搜索耗时let took response_body[took].as_i64().unwrap();println!(took: {}ms, took);// 搜索结果for hit in response_body[hits][hits].as_array().unwrap() {println!({:?}, hit[_source]);}}相当于 GET /mgr/_search {query: {match: {message: good}} }多字段查询多字段查询效率低一般在创建时使用copy_to到一个字段中 #[tokio::test]async fn test_search_multi_match() {// 1、创建 clientlet client EsClient::build_from_config(CONFIG).await.unwrap();// 2. 执行搜索let response client.search(SearchParts::Index([mgr])).from(0).size(5).body(json!({query: {multi_match: {query: good,fields: [message,user]}}})).send().await.unwrap();// 3. 解析响应let response_body response.json::Value().await.unwrap();// 搜索耗时let took response_body[took].as_i64().unwrap();println!(took: {}ms, took);// 搜索结果for hit in response_body[hits][hits].as_array().unwrap() {println!({:?}, hit[_source]);}}相当于 GET /mgr/_search {query: {multi_match: {query: good,fields: [message,user]}} }根据范围查询range gte大于等于lte小于等于gt大于lt小于 #[tokio::test]async fn test_search_range() {// 1、创建 clientlet client EsClient::build_from_config(CONFIG).await.unwrap();// 2. 执行搜索let response client.search(SearchParts::Index([mgr])).from(0).size(5).body(json!({query: {range: {id: {gte: 1,lte: 1}}}})).send().await.unwrap();// 3. 解析响应let response_body response.json::Value().await.unwrap();// 搜索耗时let took response_body[took].as_i64().unwrap();println!(took: {}ms, took);// 搜索结果for hit in response_body[hits][hits].as_array().unwrap() {println!({:?}, hit[_source]);}}相当于 GET /mgr/_search {query: {range: {id: {gte: 1,lte: 1}}} }根据词条精确查询term #[tokio::test]async fn test_search_term() {// 1、创建 clientlet client EsClient::build_from_config(CONFIG).await.unwrap();// 2. 执行搜索let response client.search(SearchParts::Index([mgr])).from(0).size(5).body(json!({query: {term: {user: kimchy}}})).send().await.unwrap();// 3. 解析响应let response_body response.json::Value().await.unwrap();// 搜索耗时let took response_body[took].as_i64().unwrap();println!(took: {}ms, took);// 搜索结果for hit in response_body[hits][hits].as_array().unwrap() {println!({:?}, hit[_source]);}}相当于 GET /mgr/_search {query: {term: {user: kimchy}} }根据地理距离查询 GET /mgr/_search {query: {geo_distance: {distance: 100km,location: 31.04, 45.12}} }根据指定矩形范围查询左上经纬度与右下经纬度 geo为文档中的字段 GET /mgr/_search {query: {geo_bounding_box: {geo: {top_left: {lon: 124.45,lat: 32.11},bottom_right: {lon: 125.12,lat: 30.21}}}} }复合查询查询时文档会对搜索词条的关联度打分_score返回结果时按照降序排列关联度计算方法 TF-IDF算法ES5.0之前 TF词条频率词条出现次数/文档中词条总数 IDF逆文档频率log(文档总数/包含词条的文档总数) score ∑(1,)(TF*IDF)将词条频率与逆文档频率相乘再求和 BM25算法ES5.0之后默认采用BM25算法考虑了TF、IDF、文档长度等因素能够平衡长短文的关联度 function_score修改关联度指定文档和算分函数 GET /mgr/_search {query: {function_score: {query: {match: {// 查询方法message: good}},functions: [ // 算分函数{filter: {// 只有符合过滤条件的才被计算term: {// 根据词条精确查询id: 1}},weight: 3 // 指定加权函数}],// 加权模式相乘boost_mode: multiply}} }weight给定常量值还可以指定以下值 field_value_factor用文档中的指定字段值作为函数结果 random_score随机生成一个值 script_score自定义计算公式 boost_mode加权模式multiply与原来的_score相乘还可以配置 replace替换原来的_score sum求和 avg取平均值 min取最小值 max取最大值相当于 #[tokio::test]async fn test_function_score_query() {// 1、创建 clientlet client EsClient::build_from_config(CONFIG).await.unwrap();// 2. 执行搜索let response client.search(SearchParts::Index([mgr])).from(0).size(5).body(json!({query: {function_score: {query: {match: {// 查询方法message: good}},functions: [ // 算分函数{filter: {// 只有符合过滤条件的才被计算term: {// 根据词条精确查询id: 1}},weight: 3 // 指定加权函数}],// 加权模式相乘boost_mode: multiply}}})).send().await.unwrap();// 3. 解析响应let response_body response.json::Value().await.unwrap();// 搜索耗时let took response_body[took].as_i64().unwrap();println!(took: {}ms, took);// 搜索结果for hit in response_body[hits][hits].as_array().unwrap() {println!({:?}, hit[_source]);}}boolean query 布尔查询布尔查询是一个或多个子句查询的组合,组合方式有 must必须匹配每个子查询类似于“与”should选择性匹配子查询类似于“或”must_not必须不匹配不参与算分类似于“非”filter必须匹配查询message中包含rustpost_date不小于2020年1月1日的文档 GET /mgr/_search {query: {bool: {must: [{match_phrase: {message: rust}}],must_not: [{range: {post_date: {lt: 2020-01-01T00:00:00Z}}}]}} }搜索结果处理排序 GET /mgr/_search {query: {match_all: {}},sort: [{id: desc// ASC升序DESC降序}] }地理位置排序 GET /mgr/_search {query: {match_all: {}},sort: [{_geo_distance:{FIELD: {lat: 40,// 纬度lon: -70// 经度},order:asc,// 排序方式unit:km // 单位}}] }分页 1、fromsize分页查询默认10条数据 GET /mgr/_search {query: {match_all: {}},from:1,// 分页开始位置size:10,// 期望获取的文档总数sort: [{id: desc// ASC升序DESC降序}] }深度分页问题一般将ES作为分布式部署当需要from990,size10查数据时 1、先从每个数据分片上查询前1000条数据 2、将所有节点的结果聚合在内存中重新排序选出前1000条文档 3、在这1000条文档中选取from990,size10的数据如果搜索页数过深或者结果集fromsize越大对内存和CPU的消耗越高因此ES设定的查询上限是10000 深度分页解决方案 2、search after分页查询分页时排序从上一次的排序值开始查询下一页文档只能向后查询 3、scroll分页查询将排序数据形成快照保存在内存中内存消耗大官方不推荐高亮处理搜索键盘时关键字高亮 highlight指定高亮字段默认搜索字段和高亮字段匹配才高亮 GET /mgr/_search {query: {match: {message:rust// 搜索message中包含rust的文档}},highlight:{fields:{message:{// 指定高亮字段require_field_match:false// 搜索字段和高亮字段可以不匹配}}} }数据聚合聚合aggregations可以实现对文档数据的统计、分析、运算聚合分类桶Buket用来对数据分组 https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket.htmlTermAggregation按文档字段或词条值分组Date Histogram按日期阶梯分组如一周为一组度量Metric用于计算一些值如最大值、最小值、平均值 https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics.htmlAvg求平均值Max求最大值Min求最小值Sum求和Stats同时求Max、Min、Avg、Sum等管道pipeline以其他聚合的结果作为聚合的基础 https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-pipeline.html 桶Buket Buket默认统计其中的文档数量_count并且按照降序排序 GET /mgr/_search {size:0,// 文档大小结果不包含文档只包含聚合结果aggs: {//指定聚合idAgg: {// 聚合名terms: {// 精确查询field:id,// 指定字段order:{_count:asc// 按升序排序}}}} }度量Metric GET /mgr/_search {size:0,// 文档大小结果不包含文档只包含聚合结果aggs: {//指定聚合idAgg: {// 聚合名terms: {// 精确查询field:id,// 指定字段size:20},aggs:{// 子聚合score_stats:{// 聚合名max:{//聚合类型min、max、avg等field:score// 聚合字段}}}}} }自动补全拼音补全如果你想要通过拼音补全请下载解压拼音分词器上传到/opt/es/plugins目录然后重启es https://github.com/infinilabs/analysis-pinyin/releases 补全字段必须是completion类型拼音分词需要自定义分词器进行拼音分词创建索引并设置字段类型为completion同时指定先分词再根据词条过滤如果不自定义分词器默认将每个汉字单独分为拼音所以先分词词条再进行拼音处理其他设置见github仓库 PUT /test {settings: {// 设置analysis: {analyzer: {// 设置分词器my_analyzer: {// 分词器名filters: [lowercase,// 转小写stop// 去停用词],tokenizer: ik_max_word, // 分词器filter: py // 过滤时进行拼音}}},filter: { // 自定义tokenizer filterpy: { // 过滤器名称type: pinyin, // 过滤器类型这里是pinyinkeep_full_pinyin: false,// 是否保留完整的拼音形式keep_joined_full_pinyin: true,// 是否保留连接起来的完整拼音形式keep_original: true,// 是否保留原始的文本内容limit_first_letter_length: 16,// 限制拼音首字母的长度为 16remove_duplicated_term: true,// 是否移除重复的词条none_chinese_pinyin_tokenize: false// 不对非中文字符进行拼音分词}}},mappings: {properties: {user: {type: completion}}} }不进行拼音分词创建索引并设置字段类型为completion PUT /test {mappings: {properties: {user: {type: completion}}} }添加文档 POST /test/_doc/1 {id: 1,message: Trying out Elasticsearch, so far so good?,post_date: 2009-11-15T00:00:00Z,user: kimchy }根据关键字查询补全 GET /test/_search {suggest: {YOUR_SUGGESTION: {// 指定自动补全查询名字text: k,// 关键字前缀completion: {// 自动补全类型field: user,// 补全字段skip_duplicates: true,// 是否跳过重复的建议size: 10 // 获取前10条结果}}} }所有代码地址https://github.com/VCCICCV/MGR/blob/main/auth/infrastructure/src/client/es.rs

查看全文

http://www.w-s-a.com/news/904225/