VaaniNetra
System Architecture
VaaniNetra (वाणीनेत्र) — "The Eye That Reads Financial Language"
Coverage + Explainability
132 YAML rules across 17 regulatory frameworks with evidence spans, rule citations, precedents, and remediation in each report.
Entity Trust Layer
Extracted entities are cross-validated via Neo4j relationships and Outris ROC/MCA services before final report generation.
Human-in-the-Loop
Low-confidence entity/section/compliance outputs create annotation tasks with bulk review actions and export-ready correction data.
Operational Visibility
Upload and validation flows expose stage-level progress, retries, and live activity logs for long-running annual report processing.
Upload
Drag & drop PDF/XBRL document
< 1sExtract
OCR + entity recognition + section classification
~2s/pageValidate
132 rules across 17 frameworks + KG entity validation + ROC/MCA cross-check
~45s totalForensic
9 anomaly checks (Benford, Beneish, Altman, etc.)
~5sReport
Findings with evidence chains + precedent citations
~3sChat
Ask NFRA Bot questions about the document
real-timeLayer 1 — Heuristic (<50ms)
21 regex patterns + 132 YAML keyword rules. Fast first-pass, filters not_applicable cases.
Layer 2 — LLM + RAG
Gemini 2.0 Flash with Qdrant RAG context + Neo4j precedents. Self-consistency voting ×3.
Layer 3 — Human Review
Low-confidence items auto-queued for annotation. Corrections exported as training data (JSONL/CoNLL/HF).
Dashboard
Stats, severity charts, compliance distribution, recent docs
Upload & Extract
Drag-and-drop PDF upload → OCR → entity extraction → section classification
Compliance Reports
Reports list + report detail route with evidence spans, precedent citations, RAG sources
Forensic Analysis
Benford charting + anomaly list (including Beneish/Altman/ratio-based backend checks)
OSINT Intelligence
SEBI/NFRA enforcement actions, news signals, entity screening
NFRA Chatbot
RAG-powered conversational assistant with document Q&A and KG queries
Regulations
Browse 132 rules across regulatory frameworks with searchable rule cards
Annotations
Human-in-the-loop queue with document filtering and bulk accept/reject actions
Benchmarks
11/11 metrics passing, latency breakdown, explainability chain, confusion matrix
Architecture
This page — system architecture overview and component documentation
Settings
Runtime toggles for MCA and ROC entity validation integrations
Orchestrator Agent
LangGraph state machine — routes documents through parallel validation nodes
Ind AS Validator
13 standards (Ind AS 1, 7, 8, 12, 16/38, 24, 33, 36, 37, 109, 115, 116)
SEBI/MCA Validators
SEBI LODR + MCA Schedule III checks via YAML rule packs (RBI rules present in rule definitions)
Forensic Anomaly Detector
9 checks — Benford, Beneish, Altman, ratio variance, Q4 spike, audit mismatch
Entity Validator
KG + ROC + MCA-backed CIN/DIN/FRN/auditor validation
Self-Consistency Voting
Configurable multi-pass voting for selected checks (enabled via runtime flags)
Report Generator
Findings aggregation, severity classification, executive summary, auto annotation task generation
Neo4j Graph
312 nodes: 12 Companies, 4 Regulations, 16 Standards, 103 Disclosures, 67 Provisions, 58 Precedents, 52 Auditors
Qdrant Vector Store
408 vectors in 2 collections (regulations, precedents), gemini-embedding-001, 3072-dim
Rule Engine
132 YAML rules across 18 files — keyword matching + LLM refinement per rule
Precedent Database
36 NFRA enforcement orders (27 text + 9 vision extracted), penalty amounts, entity links
Query Engine
Cypher queries for entity validation, shared auditor detection, violation history
MCA API Validator
Live CIN/DIN verification via Ministry of Corporate Affairs database (fetch-company, fetch-director, fetch-by-name)
Format Detector
Auto-detect PDF, XBRL, iXBRL, Excel — routes to appropriate parser
OCR Engine
pdfplumber (digital) + Gemini 2.0 Flash Vision (scanned) — dual-path extraction
Entity Extractor
21 regex patterns (CIN, DIN, PAN, amounts, Ind AS refs) + 22 LLM entity types
Section Classifier
13 regulatory section types — keyword + LLM hybrid with structural rules
Scope Detector
Standalone vs Consolidated detection using 4+4 marker patterns
Table Detector
pdfplumber table extraction with financial number matching
File Upload
PDF, XBRL, Excel upload with size validation and asynchronous processing
PostgreSQL
Document metadata, extraction results, compliance findings, audit trail
S3-Compatible Storage
Object storage for uploaded documents and extracted artifacts (AWS S3 / Cloudflare R2 style endpoints)
Alembic Migrations
Schema versioning — initial_tables migration for all models
Rule Citation
Every finding cites the specific Ind AS / SEBI LODR / Companies Act rule (e.g., IND_AS_24_001)
Evidence Spans
Text snippets from the document that triggered the finding, with page numbers
Missing Elements
List of specific disclosures required but not found (e.g., 'transaction amounts')
LLM Explanation
Natural language explanation of why the flag was raised
Remediation
Actionable guidance on how to fix the non-compliance
Precedent Citations
NFRA/SEBI enforcement order citations with penalty amounts
RAG Sources
Vector-retrieved regulatory passages from Qdrant supporting the decision
| Metric | Achieved | Target | Status |
|---|---|---|---|
| CER (Digital) | 0.00% | ≤ 5% | PASS |
| Entity F1 | 0.9622 | ≥ 0.85 | PASS |
| Extraction Accuracy | 0.9303 | ≥ 0.85 | PASS |
| Segmentation mIoU | 0.9322 | ≥ 0.85 | PASS |
| Section F1 | 0.8162 | ≥ 0.75 | PASS |
| ROUGE-1 | 0.4042 | ≥ 0.35 | PASS |
| ROUGE-2 | 0.2060 | ≥ 0.15 | PASS |
| ROUGE-L | 0.3095 | ≥ 0.30 | PASS |
| BERTScore | 0.8889 | ≥ 0.85 | PASS |
| Compliance Macro-F1 | 0.8575 | ≥ 0.80 | PASS |
| Compliance MCC | 0.808 | ≥ 0.60 | PASS |
| Category | Technologies |
|---|---|
| Frontend | Next.js 16, React 19, TypeScript, Tailwind CSS, shadcn/ui, Recharts, Lucide |
| Backend | FastAPI, Python 3.11+, Pydantic v2, SQLAlchemy 2.0, Alembic |
| LLM | Gemini 2.0 Flash via LiteLLM, gemini-embedding-001 (3072-dim) |
| Orchestration | LangGraph — 7-node parallel DAG with state machine |
| Graph DB | Neo4j Aura — 312 nodes, 7 node types, Cypher queries |
| Vector DB | Qdrant Cloud — 408 vectors, 2 collections (regulations, precedents) |
| External APIs | Outris MCA API + ROC database integration for CIN/DIN/name verification |
| Storage | PostgreSQL (metadata), S3-compatible object storage (documents) |
| Deployment | Railway (backend + frontend), Railway Postgres, Neo4j Aura, Qdrant Cloud |