2024618大戰

具體問題

大總管job機 64台一秒鐘處理一條queue
大總管job機 128台兩秒鐘處理一條queue
機器開兩倍速度完全沒起來

具體發現

機器開始量大起來時
查詢更新都會有影響尤其更新影響更大
查詢其實沒有太大影響是更新後的commit造成的
Solr 為了資料絕不遺漏在shared沒有全部完全起來前不可查不可更新不可刪除，But 可以新增新增的資料也刪除，等到其他機器復活後會自動Sync
1. 當 shard 尚未完全準備就緒時（例如：應該有3個副本，但僅有1個處於 active 狀態）：
2. 可以插入新的文件 (doc)
3. 不可以進行查詢 (query)
4. 不可以刪除 (delete)
5. 不可以更新現有的文件 (update)
Zookeeper內有solr相關設定可從Zookeeper get solr的設定進而查詢到solr資料

commit跟soft commit有即時性的差異
1. 功能：commit 將所有緩存中的更改（如新增、更新或刪除的文檔）寫入到索引的持久存儲中。
  1. 特點：性能較低：因為它涉及將數據寫入磁碟，這個過程相對較慢。
  2. 可見性：所有的更改在 commit 後立即對查詢可見。
  3. 索引更新：真正更新索引，需要進行這一操作。
  1. 特點：性能較高：執行速度比普通的 commit 快，因為它不需要實際寫入磁碟，只是更新內存中的索引。
  2. 可見性：在 soft commit 後，變更會立即對查詢可見，但實際的持久化仍需執行普通的 commit。
  3. 資料安全性：由於沒有寫入磁碟，若系統故障，未進行正常 commit 的變更將會丟失。
  soft commit的啟用會讓頂層快取失效(filter queryResult)跟document 不一致，造成query跟get的方法查同一個ID資料不一致
  
  參考資料
  
  Solr softcommit
  
  Solr doc
  
  solr doc
  
  Solr commit diff
  
  Solr Cache
  
  Solr source code
  
  如要自幹 handler
  1. 自己包一個jar檔
  2. 放到solr的lib裡面
  3. 到對應的collection上設定路徑跟對應的class
  gradle設定檔
  
  “` xml=
  plugins {
  id \’java\’
  id \’com.github.johnrengelman.shadow\’ version \’7.1.2\’
  }
  
  group \’com.yourpackage\’
  version \’1.0-SNAPSHOT\’
  
  repositories {
  mavenCentral()
  }
  
  dependencies {
  // implementation \’org.apache.solr:solr-solrj:8.11.1\’ // SolrJ 依賴
  // implementation(group: \’org.apache.solr\’, name: \’solr-solrj\’, version: \’8.11.1\’) {
  // exclude group: “org.eclipse.jetty”
  // exclude group: “org.eclipse.jetty.http2”
  // exclude group: “io.netty”
  // }
```
implementation \'org.apache.solr:solr-core:8.11.1\' // 確保使用正確的版本
implementation \'org.apache.solr:solr-solrj:8.11.1\' // 如果需要 SolrJ
annotationProcessor \'org.apache.solr:solr-core:8.11.1\' // 確保使用正確的版本
annotationProcessor \'org.apache.solr:solr-solrj:8.11.1\' // 如果需要 SolrJ
```
  }
  
  shadowJar {
  archiveClassifier.set(\’all\’) // 生成一個含有所有依賴的JAR
  manifest {
  attributes(
  \’Main-Class\’: \’com.solrtest.MyCustomRequestHandler\’, // 將此處替換為你的主類
```
            \'Implementation-Title\': project.name,
            \'Implementation-Version\': project.version
    )
}
```
  }
  //
  //jar {
  // manifest {
  // attributes(
  // \’Main-Class\’: \’com.solrtest.MyCustomRequestHandler\’, // 將此處替換為你的主類
  //
  // \’Implementation-Title\’: project.name,
  // \’Implementation-Version\’: project.version
  // )
  // }
  //}
  
  sourceCompatibility = \’1.8\’
  targetCompatibility = \’1.8\’
```
``` java= 
  package com.solrtest;
  
  import org.apache.solr.handler.RequestHandlerBase;
  import org.apache.solr.common.params.SolrParams;
  import org.apache.solr.common.SolrInputDocument;
  import org.apache.solr.common.util.NamedList;
  import org.apache.solr.handler.component.HttpShardHandler;
  import org.apache.solr.handler.component.SearchHandler;
  import org.apache.solr.request.SolrQueryRequest;
  import org.apache.solr.response.SolrQueryResponse;
  
  import javax.servlet.http.HttpServletRequest;
  import javax.servlet.http.HttpServletResponse;
  
  public class MyCustomRequestHandler extends SearchHandler{
  
      @Override
      public void handleRequestBody(SolrQueryRequest req, SolrQueryResponse rsp) throws Exception {
          // 自定義邏輯
          String param = req.getParams().get("paramName");
          // 處理請求並準備響應
          ModifiableSolrParams params = new ModifiableSolrParams(req.getParams());
          params.set("defType", "edismax");  // 設定 defType 為 edismax
          params.set("qf", "data_ik^2");  // 你可以在這裡設置 qf 等其他參數
  
  //        req.setParams(params); req.getContext().put(HttpShardHandler.ONLY_NRT_REPLICAS, Boolean.TRUE);
          super.handleRequestBody(req, rsp);
          rsp.add("response", "handleRequestBody: " + param);
      }
  
      @Override
      public String getDescription() {
          return "getDescription";
      }
  
      @Override
      public String getName() {
          return "getName";
      }
  
  
  }
```
solr上:
“` xml=

explicit
10

query
facet
```
  query:
  
  http://localhost:8080/solr/new_core/customSearch?indent=true¶mName=123&q.op=OR&q=*%3A*
  
  # Solr使用分數
  
  1. 需要用q 不能用fq
  2. defType:edismax
  3. 欄位要用切詞性質(IK等等的) <<如果是qf
  5. 設定欄位計算方式 (qf mm bf等等)
  6. 排序按照score
  7. 欄位的部分要看score  fl=*,score
  
``` java=
SolrQuery solrQuery = new SolrQuery("東南西北");
  solrQuery.set("defType", "edismax");
  solrQuery.set("qf", "total_amount_ik^2  payment_ik^3");
            // 設置要返回的字段，包括score
  solrQuery.setFields("*,score");
  solrQuery.setRows(10);
  solrQuery.setSort("score", SolrQuery.ORDER.desc);
var datas=_orderDao.query(solrQuery);
for (var da:datas) {
  System.out.println(da.getOrderId());
}
```
  sample資料數據
  “` xml=
  sample1
  \
  3.6087308 = sum of:\
  
  1.1289119 = max of:\
  
  1.1289119 = weight(data2_ik:指數 in 0) [SchemaSimilarity], result of:\
  
  1.1289119 = score(freq=5.0), computed as boost * idf * tf from:\
  
  3.0 = boost\
  
  0.47000363 = idf, computed as log(1 + (N – n + 0.5) / (n + 0.5)) from:\
  
  2 = n, number of documents containing term\
  
  3 = N, total number of documents with field\
  
  0.80064046 = tf, computed as freq / (freq + k1 * (1 – b + b * dl / avgdl)) from:\
  
  5.0 = freq, occurrences of term within document\
  
  1.2 = k1, term saturation parameter\
  
  0.75 = b, length normalization parameter\
  
  42.0 = dl, length of field (approximate)\
  
  40.0 = avgdl, average length of field\
  
  1.4313638 = FunctionQuery(log(int(source2_i))), product of:\
  
  0.47712126 = log(int(source2_i)=3)\
  
  3.0 = boost\
  
  1.048455 = FunctionQuery(log(int(source_i))), product of:\
  
  0.69897 = log(int(source_i)=5)\
  
  1.5 = boost\
  
  sample2
  
  0.6475366 = max of:\
  
  0.40851527 = weight(data_ik:指數 in 0) [SchemaSimilarity], result of:\
  
  0.40851527 = score(freq=1.0), computed as boost * idf * tf from:\
  
  2.0 = boost \
  
  0.44183275 = idf, computed as log(1 + (N – n + 0.5) / (n + 0.5)) from:\
  
  4 = n, number of documents containing term\
  
  6 = N, total number of documents with field\
  
  0.4622963 = tf, computed as freq / (freq + k1 * (1 – b + b * dl / avgdl)) from:\
  
  1.0 = freq, occurrences of term within document\
  
  1.2 = k1, term saturation parameter\
  
  0.75 = b, length normalization parameter\
  
  39.0 = dl, length of field\
  
  40.666668 = avgdl, average length of field\
  
  0.6475366 = weight(data2_ik:指數 in 0) [SchemaSimilarity], result of:\
  
  0.6475366 = score(freq=1.0), computed as boost * idf * tf from:\
  
  3.0 = boost\
  
  0.47000363 = idf, computed as log(1 + (N – n + 0.5) / (n + 0.5)) from:\
  
  2 = n, number of documents containing term\
  
  3 = N, total number of documents with field\
  
  0.45924222 = tf, computed as freq / (freq + k1 * (1 – b + b * dl / avgdl)) from:\
  
  1.0 = freq, occurrences of term within document\
  
  1.2 = k1, term saturation parameter\
  
  0.75 = b, length normalization parameter\
  
  39.0 = dl, length of field\
  
  40.0 = avgdl, average length of field\
```
sample1 
因資料 只有data2_ik有命中 ，同時 source2_i source_i欄位也有資料
結果: qf(data2_ik) +bf(source2_i)+bf(source_i)

sample2 
因資料 data2_ik有命中，且 data_ik也有命中 取高者 ，source都沒有資料
結果: qf兩者取高 


``` xml=
  qf算法
  boost * idf * tf
  idf, computed as log(1 + (N - n + 0.5) / (n + 0.5))
  tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl))
```
總結
qf:會經過tfidf計算權重取最高的一個分數來計
bf:直接增加權重分數

solr有時候index會卡住影響idf跟tf的分數，如果只有一個欄位進行qf沒有問題但是如果有兩個取高者，卡住的就會有相對影響)
qf的算法

整體順序
q>>fq 資料進行qf 然後加上bf 最後sort 並且fl out

SOLR 實戰問題

2024618大戰

具體問題

具體發現

參考資料

如要自幹 handler

留言

發佈留言取消回覆

更多文章

Hacker News 每日精選 – 2026-03-18

多通路電商 OMS 系統實戰：系列導讀

K8s 部署實戰：Helm Charts 與服務編排

分散式追蹤：OpenTracing 整合實戰

SOLR 實戰問題

2024618大戰

具體問題

具體發現

參考資料

如要自幹 handler

留言

發佈留言 取消回覆

更多文章

Hacker News 每日精選 – 2026-03-18

多通路電商 OMS 系統實戰：系列導讀

K8s 部署實戰：Helm Charts 與服務編排

分散式追蹤：OpenTracing 整合實戰

發佈留言取消回覆