跳转到主要内容
Use this endpoint to rerank a set of candidate documents against a user query. Reranking is typically the final step in a RAG pipeline: after an initial vector search retrieves a broad set of candidates, a reranker scores each document’s true relevance to the query and returns them in ranked order, improving the quality of context passed to the language model. Endpoint: POST https://api.qhaigc.net/v1/rerank

Supported Models

Model IDDescription
bge-reranker-v2-m3Lightweight cross-encoder reranker optimized for multilingual RAG pipelines.

Request Parameters

model
string
必填
The reranking model to use. Example: bge-reranker-v2-m3.
query
string
必填
The search query or user question to rank documents against.
documents
string[]
必填
An array of document strings to score and rank. Each element is the text content of one document or passage.
top_n
integer
Return only the top N highest-scoring results. If omitted, all documents are scored and returned.

Response Fields

results
array
Array of scored document results, sorted by relevance_score in descending order (highest relevance first).
usage
object
Token usage for this request.

Code Examples

import requests

url = "https://api.qhaigc.net/v1/rerank"
headers = {
    "Content-Type": "application/json",
    "Authorization": "Bearer sk-your-api-key-here"
}

payload = {
    "model": "bge-reranker-v2-m3",
    "query": "Organic skincare products for sensitive skin",
    "top_n": 3,
    "documents": [
        "Organic skincare for sensitive skin with aloe vera and chamomile. Clinically tested and hypoallergenic.",
        "New makeup trends focus on bold colors and innovative application techniques for a striking look.",
        "Bio-Hautpflege für empfindliche Haut mit Aloe Vera und Kamille. Klinisch getestet und hypoallergen."
    ]
}

response = requests.post(url, headers=headers, json=payload)
results = response.json()["results"]

for r in results:
    print(f"Index {r['index']}: score={r['relevance_score']:.4f}")

Example Response

{
  "results": [
    {
      "index": 0,
      "relevance_score": 0.9854
    },
    {
      "index": 2,
      "relevance_score": 0.6773
    },
    {
      "index": 1,
      "relevance_score": 0.000016
    }
  ],
  "usage": {
    "prompt_tokens": 77,
    "total_tokens": 77
  }
}
The result shows that document at index 0 (English skincare text) and index 2 (German skincare text) are both highly relevant to the query, while document at index 1 (makeup trends) is not.

RAG Pipeline Integration

1

Embed and index your documents

Use POST /v1/embeddings to convert your knowledge base into vectors and store them in a vector database.
2

Retrieve candidates with vector search

At query time, embed the user’s question and retrieve the top 20–50 most similar document chunks from your vector database.
3

Rerank the candidates

Send the retrieved chunks to POST /v1/rerank with the user’s question as the query. Set top_n to 3–5 to keep only the most relevant passages.
4

Generate the response

Pass the top reranked passages as context to POST /v1/chat/completions and ask the model to answer based on the provided content.
Reranking is most effective when your initial vector retrieval returns 20+ candidates. With fewer candidates, the reranker has less to work with and the quality improvement is smaller.