Search & Retrieval Architecture
Providing researchers with highly relevant, cross-corpus results that combine classical exact-match precision with modern semantic understanding.
Expected Results
Search Methodology
- 1. Lexical Plane (BM25)Full-Text Search index for exact matches, rare terminology, and specific names, using Arabic normalization.
- 2. Semantic Plane (HNSW)1024-dim Vector index for synonyms and conceptually related verses, running locally via Ollama.
The Search Workflow
OpenBayan utilizes a Hybrid Search strategy implemented securely within SurrealDB.
1. Normalization & Embedding
The query is cleaned and normalized. Simultaneously, it is sent to Ollama to generate a 1024-dimension vector.
2. Parallel Retrieval
FTS retrieves top matches based on keyword frequency, while the Vector Query retrieves matches based on cosine similarity.
3. Reciprocal Rank Fusion (RRF)
Results from both planes are merged. A combined score prioritizes results appearing in both planes.
4. Contextual Enrichment
The system FETCHes metadata and entities for interactive tooltips, returning a sorted, fused result set.
Future Roadmap
Planned enhancements to the search infrastructure.
Reranking
Implementing a secondary Cross-Encoder model to fine-tune the top 10 results.
Faceted Filtering
Allowing users to filter by Taxonomy (e.g., only Fiqh books) or Topic.
Graph Search
Ability to search for entities and find all sentences where they interact with another specific entity.