DocumentationArchitecture & Pipelines

Search & Retrieval Architecture

Providing researchers with highly relevant, cross-corpus results that combine classical exact-match precision with modern semantic understanding.

Expected Results

Holistic View
Results from Quran, Hadith, and scholarly Books in a single, unified interface.
Contextual Relevance
Finding sentences that explain concepts, rather than merely containing keywords.
Traceability
Every result is rigidly linked back to its original source and parent node.
Data Provenance
Executed cross-corpus on SurrealDB indices generated from the compiled Quranic Verses, Hadith narration texts, and shamela book pages.

Search Methodology

  • 1. Lexical Plane (BM25)Full-Text Search index for exact matches, rare terminology, and specific names, using Arabic normalization.
  • 2. Semantic Plane (HNSW)1024-dim Vector index for synonyms and conceptually related verses, running locally via Ollama.

The Search Workflow

OpenBayan utilizes a Hybrid Search strategy implemented securely within SurrealDB.

1. Normalization & Embedding

The query is cleaned and normalized. Simultaneously, it is sent to Ollama to generate a 1024-dimension vector.

2. Parallel Retrieval

FTS retrieves top matches based on keyword frequency, while the Vector Query retrieves matches based on cosine similarity.

3. Reciprocal Rank Fusion (RRF)

Results from both planes are merged. A combined score prioritizes results appearing in both planes.

4. Contextual Enrichment

The system FETCHes metadata and entities for interactive tooltips, returning a sorted, fused result set.

Future Roadmap

Planned enhancements to the search infrastructure.

Reranking

Implementing a secondary Cross-Encoder model to fine-tune the top 10 results.

Faceted Filtering

Allowing users to filter by Taxonomy (e.g., only Fiqh books) or Topic.

Graph Search

Ability to search for entities and find all sentences where they interact with another specific entity.

Explore Other Documentation