POST /v1/rerank — Rerank Documents for RAG Pipelines
Score and reorder candidate documents by relevance to a query. Improve RAG retrieval precision before passing context to a chat model.
Use this endpoint to rerank a set of candidate documents against a user query. Reranking is typically the final step in a RAG pipeline: after an initial vector search retrieves a broad set of candidates, a reranker scores each document’s true relevance to the query and returns them in ranked order, improving the quality of context passed to the language model.Endpoint:POST https://api.qhaigc.net/v1/rerank
import requestsurl = "https://api.qhaigc.net/v1/rerank"headers = { "Content-Type": "application/json", "Authorization": "Bearer sk-your-api-key-here"}payload = { "model": "bge-reranker-v2-m3", "query": "Organic skincare products for sensitive skin", "top_n": 3, "documents": [ "Organic skincare for sensitive skin with aloe vera and chamomile. Clinically tested and hypoallergenic.", "New makeup trends focus on bold colors and innovative application techniques for a striking look.", "Bio-Hautpflege für empfindliche Haut mit Aloe Vera und Kamille. Klinisch getestet und hypoallergen." ]}response = requests.post(url, headers=headers, json=payload)results = response.json()["results"]for r in results: print(f"Index {r['index']}: score={r['relevance_score']:.4f}")
The result shows that document at index 0 (English skincare text) and index 2 (German skincare text) are both highly relevant to the query, while document at index 1 (makeup trends) is not.
Use POST /v1/embeddings to convert your knowledge base into vectors and store them in a vector database.
2
Retrieve candidates with vector search
At query time, embed the user’s question and retrieve the top 20–50 most similar document chunks from your vector database.
3
Rerank the candidates
Send the retrieved chunks to POST /v1/rerank with the user’s question as the query. Set top_n to 3–5 to keep only the most relevant passages.
4
Generate the response
Pass the top reranked passages as context to POST /v1/chat/completions and ask the model to answer based on the provided content.
Reranking is most effective when your initial vector retrieval returns 20+ candidates. With fewer candidates, the reranker has less to work with and the quality improvement is smaller.