研究目標
- Elastic 能否做到 Solr 的 Edismax
- 達到的情境:根據 List 的算法跟 String 算法是否一致
- 權重的分布情況
Docker 單節點運行指令
docker run -d --name elasticsearch -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" -e "xpack.security.enabled=false" -v es_data:/usr/share/elasticsearch/data elasticsearch:8.17.0
Docker 集群配置 (docker-compose)
version: '3'
services:
elasticsearch-node1:
image: docker.elastic.co/elasticsearch/elasticsearch:8.17.0
environment:
- discovery.type=zen-disco
- discovery.seed_hosts=elasticsearch-node1,elasticsearch-node2,elasticsearch-node3
- cluster.initial_master_nodes=elasticsearch-node1,elasticsearch-node2,elasticsearch-node3
- xpack.security.enabled=false
volumes:
- es_data1:/usr/share/elasticsearch/data
networks:
- elasticsearch-net
ports:
- "9200:9200"
- "9300:9300"
elasticsearch-node2:
image: docker.elastic.co/elasticsearch/elasticsearch:8.17.0
environment:
- discovery.type=zen-disco
- discovery.seed_hosts=elasticsearch-node1,elasticsearch-node2,elasticsearch-node3
- cluster.initial_master_nodes=elasticsearch-node1,elasticsearch-node2,elasticsearch-node3
- xpack.security.enabled=false
volumes:
- es_data2:/usr/share/elasticsearch/data
networks:
- elasticsearch-net
ports:
- "9201:9200"
- "9301:9300"
elasticsearch-node3:
image: docker.elastic.co/elasticsearch/elasticsearch:8.17.0
environment:
- discovery.type=zen-disco
- discovery.seed_hosts=elasticsearch-node1,elasticsearch-node2,elasticsearch-node3
- cluster.initial_master_nodes=elasticsearch-node1,elasticsearch-node2,elasticsearch-node3
- xpack.security.enabled=false
volumes:
- es_data3:/usr/share/elasticsearch/data
networks:
- elasticsearch-net
ports:
- "9202:9200"
- "9302:9300"
networks:
elasticsearch-net:
driver: bridge
volumes:
es_data1:
es_data2:
es_data3:
啟動集群:
docker-compose up -d
C# NEST 操作範例
using Nest;
using System;
using System.Collections.Generic;
namespace ElasticSearch
{
class Program
{
static void Main(string[] args)
{
// 設定 Elasticsearch 的 URL 和預設索引名稱
var settings = new ConnectionSettings(new Uri("http://localhost:9200"))
.DefaultIndex("my-index");
// 建立 Elastic 客戶端
var client = new ElasticClient(settings);
// 測試連接
var pingResponse = client.Ping();
if (pingResponse.IsValid)
{
Console.WriteLine("成功連接到 Elasticsearch!");
}
else
{
Console.WriteLine("連接失敗:" + pingResponse.DebugInformation);
}
CreateOneData(client);
QueryStringSearch(client);
}
// QueryString 搜尋範例
private static void QueryStringSearch(ElasticClient client)
{
var response = client.Search<MyDocument>(s => s
.Query(q => q
.QueryString(qs => qs
.Query("(文件)^1")
.Fields(f => f
.Field(p => p.Tags, 2.0)
)
.DefaultOperator(Operator.And)
)
)
);
foreach (var doc in response.Documents)
{
Console.WriteLine($"Id: {doc.Id}, Name: {doc.Name}, Tag: {String.Join(",", doc.Tags)}");
}
foreach (var doc in response.Hits)
{
Console.WriteLine($"Id: {doc.Id}, Score: {doc.Score}");
}
}
// 組合查詢範例 (OR)
private static void QueryStringSearch2(ElasticClient client)
{
var response = client.Search<MyDocument>(s => s
.Query(q => q
.QueryString(qs => qs
.Query("(新東西)^1 OR (測試)^1")
.Fields(f => f
.Field(p => p.Tags, 2.0)
)
.DefaultOperator(Operator.And)
)
)
);
}
// MultiMatch 搜尋(類似 Solr Edismax)
private static void SpecialSearch(ElasticClient client)
{
var response = client.Search<MyDocument>(s => s
.Query(q => q
.MultiMatch(m => m
.Query("測試")
.Fields(f => f
.Field(p => p.Name, 2.0)
)
.Type(TextQueryType.MostFields)
.MinimumShouldMatch("20%") // 類似 Solr 的 mm 參數
)
)
);
}
// 新增資料
private static void CreateOneData(ElasticClient client)
{
var document = new MyDocument
{
Id = 10,
Name = "新的資料文件",
Tags = new List<string> { "文件", "文件", "文件", "文件", "新東西" },
CreatedAt = DateTime.UtcNow
};
var indexResponse = client.IndexDocument(document);
if (indexResponse.IsValid)
{
Console.WriteLine("文件新增成功!");
}
}
}
}
Elasticsearch 查詢語法總整理
Query DSL
- match:全字段匹配,自動分詞
- term:精確匹配,不分詞,通常用於結構化數據
- bool:組合查詢,可用 must(必須)、should(可選)和 must_not(排除)
- range:範圍查詢,用於數字、日期
過濾參數
- filter:高效過濾數據,不會影響相關性分數(如緩存優化)
- post_filter:在聚合後再過濾結果,常用於展示不同聚合統計值的結果
高級參數
- Aggregations:聚合查詢
- Highlight:高亮顯示
- Script:腳本
性能相關參數
- timeout:查詢超時時間
- terminate_after:限制最多返回的文檔數
- track_total_hits:控制是否統計總命中數,減少性能開銷
測試結果
完全只用一般字串查詢字段


使用 List 結構


結論
- String 的情況:使用全切模式,可用模糊比對算分,字段的長度也會影響分數
- List 的情況:要完全比對,且次數越多分數會越高,但不會成倍數的成長
- 底層演算法:一樣為 TF-IDF,但每個字詞都有預設的 IDF 後再比對全文件的 IDF,TF 也不是單純詞數(詳細要看原始 CODE)
發佈留言