The verification cortex for serious quantitative work — for humans analyzing hard data, and for AI systems that can't afford to hallucinate. Local. Open. Cited.
// cloud LLMs guess. Aurora computes.
git clone https://github.com/FantasyLab-ai/aurora.git
cd aurora
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -r requirements.txt
# Optional substrate-layer extras
pip install cryptography # Ed25519 bundle signing
pip install mcp # MCP server for LLM agents
# Run the Studio
python studio_api.py
Open http://127.0.0.1:8000 → click ▶ Try a demo → 10-second smoke test, or drop your own CSV / Parquet / JSON / XLSX.
Inline comments cover Windows activation + optional substrate-layer extras.
Same code. Same glass-box principles. Two integration shapes — one for humans clicking through findings, one for AI systems calling APIs.
A local quantitative copilot for the work that matters too much to trust to a model that hallucinates. Six analytical lenses, 24+ research-grade methods, knowledge-grounded synthesis, every "What This Means" sentence cited to a seed:* entry.
Every LLM today invents numbers. Aurora is the structurally different fix — it computes and verifies rather than predicts. Wire it into Claude Desktop, Claude Code, Cursor, or any MCP-compatible agent in 5 minutes. Or call it directly from Python.
pip install away from cited quantitative reasoning in any script or notebook · Jupyter HTML repraurora_plugins entry-point, isolated failures.aurora.json artifactsimport aurora_sdk as aurora
r = aurora.run("data.csv", depth="standard")
r.findings.critical().by_method("iso-forest")
r.forecast.peak(horizon_hours=24)
r.bundle.save("audit.aurora.json") # SHA-256 + optional Ed25519 signing
# Verify on any machine with Aurora installed
b = aurora.Bundle.load("audit.aurora.json")
b.verify() # raises if tampered
The same .aurora.json bundle moves between the SDK, the MCP server, Decision Contracts, and the Studio. One format. One verifiable artifact. Four programmable surfaces.
You already have Aurora running. Now connect it to the agent or pipeline that calls it. Two copy-paste flows, both finished in 5 minutes.
Install the MCP extras, then drop the config into your agent. Claude Desktop reads claude_desktop_config.json on startup; Claude Code and Cursor expose MCP settings in their UI. Aurora's MCP server exposes 7 path-allowlisted tools.
pip install -r requirements-mcp.txt
# or, equivalently:
pip install mcp cryptography
{
"mcpServers": {
"aurora": {
"command": "python",
"args": ["-m", "aurora_mcp"],
"cwd": "/absolute/path/to/aurora"
}
}
}
Restart your agent. Aurora's tools appear under @aurora. Try: "@aurora analyze this CSV and give me the top anomalies."
One import away from cited quantitative reasoning. The SDK speaks the same .aurora.json bundle format as the MCP server and Decision Contracts — same artifact, same guarantees, just a different surface.
pip install -e ./aurora_sdk
# editable install — points at the cloned repo
import aurora_sdk as aurora
r = aurora.run("data.csv", depth="standard")
r.findings.critical().by_method("iso-forest")
r.forecast.peak(horizon_hours=24)
# Save a signed, portable bundle
r.bundle.save("audit.aurora.json")
# Verify on any machine with Aurora installed
b = aurora.Bundle.load("audit.aurora.json")
b.verify() # raises if tampered
Full API: SDK docs on GitHub →
Decision Contracts? Same install path — predicates live in YAML, the engine ships with Aurora. See Decision Contracts docs →
Every run feeds back into Aurora's prior library. Confirmed patterns increase confidence. Contradictions surface. Across thousands of runs, Aurora learns which signatures repeat.
Overnight learning mode connects to public data streams — FRED economic releases, NOAA climate observations, NIST reference updates, peer-reviewed literature feeds — and ingests new structured knowledge into the bank. You wake up to a smarter Aurora than you went to bed with.
Drop a CSV, Parquet, or JSON file. Aurora analyzes through six purpose-built lenses — anomalies, regimes, motifs, forecast, physics, and overview. Every finding cited. Every method inspectable. No black boxes.
Classified shape, learned-from-past-runs priors, advanced methods at a glance — SINDy governing equations, HMM latent regimes, mutual info, Granger causality, wavelet, Lomb-Scargle, persistent homology, multivariate outliers. Research-grade methods, surfaced in plain English.
[ overview · classified shape · advanced methods ]
Multivariate outlier consensus across mahalanobis-robust, isolation forest, and LOF — flagged by ≥2-of-3 detectors, never one. Z-scores, predictive-maintenance precursors, AR(1) forecasts of when the next breach hits. Each finding pinned to a peer-reviewed reference.
[ anomalies · top findings · seed citations ]
HMM Baum-Welch decoded latent states, mean shifts, expected dwell times, and posterior probabilities. PELT change-point detection on top. Know when your system actually changed — not just when a metric moved.
[ regimes · latent states · dwell times ]
Persistent homology surfaces structural patterns invisible to traditional statistics. Bootstrap-validated cluster stability with silhouette scores. The shape of your data, made legible.
[ motifs · topology · cluster structure ]
Multiple forecasters compete on a held-out fold — AR(1), kNN-window, exponential smoothing — with calibrated CRPS scores. The winner is selected on out-of-sample performance, not in-sample fit. Threshold breach probabilities and confidence intervals reported transparently.
[ forecast · peak prediction · alternates ]
SINDy fits sparse governing equations to your data. Aurora cross-references the discovered ODE against known physical laws — exponential growth/decay, damped harmonic oscillator, logistic — and reports matches with RMSE and AIC. Real physics discovery. Not metaphor.
[ physics · governing equation · matched law ]
Every layer engineered to produce defensible findings — not impressive-looking ones. Math first, narrative second.
SINDy governing equations, HMM latent regimes, mutual info, Granger causality, wavelets, Lomb-Scargle, persistent homology. The methods quants and physicists actually use — not LLM party tricks.
Curated from public, licensed sources — peer-reviewed papers, FRED metadata, NOAA, NIST, IPCC, ontologies, Wikidata. Every claim Aurora makes traces back to an inspectable entry. Growing nightly.
Every finding has a method tag. Every method has source code. Every claim has a citation. Every confidence number is calibrated. Click any sentence to see where it came from.
Your CPU. Your data. Your runs. Zero cloud dependencies. No API costs. No data egress. No subscription required to analyze. No telemetry, no phone-home, no analytics. The entire pipeline runs on your machine, offline.
The local LLM only rewords retrieved knowledge — it never invents. Every claim is verified against retrieved entries; ungrounded statements are flagged. The same verifier protects MCP and SDK callers — AI agents calling Aurora get the same glass-box guarantees.
Aurora connects to public data streams nightly — ingesting new peer-reviewed papers, economic releases, climate observations, reference updates. Every run also strengthens internal priors. You wake up to a smarter system.
Four programmable surfaces consuming the same .aurora.json bundle format. Wire Aurora into any LLM agent, notebook, pipeline, or automation in minutes. Path-allowlisted, output-capped, SSRF-guarded.
Every Aurora run produces a portable .aurora.json artifact with SHA-256 content hash and optional Ed25519 signing. Move it between machines, attach it to audits, ship it with research papers. Tampering raises on verification.
Real engineering, not magic. Seven subsystems make up the analytical brain — six driving the analysis, one exposing it to AI agents and pipelines. Each inspectable, each debuggable, each running locally.
Runs every dataset through 24+ research-grade methods in parallel — SINDy, HMM Baum-Welch, mutual info, Granger, wavelet CWT, Lomb-Scargle, persistent homology, Pearl do-calculus, VAR, DTW, BOCPD, Robust PCA, Kalman, EMD, spectral entropy, robust z-score, isolation forest, LOF, mahalanobis, PELT, AR(1), kNN-window, exponential smoothing, GP. Each method outputs structured findings with explicit confidence.
Targeting 2–3M structured entries from public, licensed sources. Peer-reviewed papers (Chandola, Pearl, Hyndman/Athanasopoulos, Brunton/Proctor/Kutz, Malthus, Hampel, Newton), reference databases (FRED, NOAA, NIST, IPCC), ontologies, and Wikidata. Every entry inspectable, version-tracked, and linkable.
A local LLM (Gemma 3 12B via Ollama) takes the structured findings and retrieved knowledge entries — and writes the human-readable "what this means" narrative. Strict prompts: use only the retrieved facts. Never invent. Every sentence carries a seed-citation tag back to its source entry.
After synthesis, a post-hoc verifier checks every claim in the narrative against the retrieved knowledge entries. Anything that doesn't trace back gets flagged or rewritten. This is what makes Aurora glass-box even with an LLM in the loop.
Builds a system graph for every run — nodes for variables and processes, edges for discovered relationships, worldlines projected forward and backward in time. Threshold-cross events surface where the math says something is about to break. Scrub the timeline, simulate counterfactuals.
Every run feeds back into Aurora's prior library. Confirmed patterns increase confidence. Contradictions surface. Overnight learning mode connects to public data streams (FRED, NOAA, NIST, arXiv, Wikidata) and ingests new structured knowledge. Aurora learns. Reproducibly. Transparently.
MCP server (7 tools), Python SDK, Decision Contracts engine, and the Aurora Bundle Format. The same engine that drives the Studio answers LLM agents, runs in notebooks, and fires automation predicates. Four shapes, one core, one glass-box.
Real output from a real Aurora run on factory bearing sensor data. Each tagged citation links to a specific entry in the knowledge bank. Click. Verify. Trust.
The most significant finding is the frequent detection of anomalies related to motor temperature, evidenced by alerts at rows 993, 971, and in the last few rows seed:predictive_maintenance. These anomalies, characterized by a high z-score (+13.6 to +15.1), alongside high vibration and shifted timestamps, suggest a potential developing issue that requires attention seed:predictive_maintenance. These deviations from normal operating conditions could indicate progression towards equipment failure, necessitating intervention to prevent downtime seed:predictive_maintenance. The correlation of motor temperature with vibration and timestamp anomalies is notable, though the exact causal relationship requires further investigation seed:robust_zscore.
Several other anomalies are also present, including elevated vibration at row 220 and vibration coupled with timestamp and rpm anomalies seed:robust_zscore. A forecast peak is anticipated, potentially indicating increased load or stress on the system seed:ar1_persistence. The system is currently operating in a "HIGH" regime, where the mean has shifted to 90 seed:ar1_persistence. There's evidence of a causal effect where vibration increases are associated with changes in timestamps seed:exponential_decay.
One pass. Six lenses. Cited findings. Inspectable math. The entire pipeline runs on your machine, end-to-end.
CSV, Parquet, or JSON. Aurora detects schema, time axis, gaps, and dupes automatically.
24+ research-grade methods run in parallel — anomalies, regimes, motifs, forecast, physics, structure, causal.
RAG retrieves matching entries from the knowledge bank. Local Gemma writes a grounded narrative.
Every claim traced to a source. Findings ranked. Spacetime graph rendered. Done.
Most quantitative AI tools fail in the same three ways — they hallucinate, they can't show their work, and they can't be called by other AI systems. Aurora fixes all three.
| LLM Analytics | BI Dashboards | AutoML Tools | Aurora | |
|---|---|---|---|---|
| Hallucination resistant | No | N/A | Mixed | Yes (RAG + verifier) |
| Cited claims | No | No | No | Every claim |
| Real physics / ODE discovery | No | No | No | SINDy |
| Continuous learning | No | No | No | Nightly · streams |
| Runs locally | No | No | Mixed | Yes |
| Glass-box methods | No | Partial | No | Full |
| MCP / AI-callable | No | No | No | 7 tools, path-allowlisted |
| Signed, portable artifacts | No | No | No | .aurora.json + Ed25519 |
| Usage limits | Tokens | Seats | Compute | Unlimited |
| Source available | No | No | No | Apache 2.0 |
Aurora is being built right now, in real time, on real GitHub. These tiles update from the public API.
Honest about timelines being aspirational. Quality compounds — we ship Now and Next before getting clever.
aurora_plugins entry-point)Install in 60 seconds. 599 tests passing. v2.0 shipping with causal inference, plugin SDK, custom KB, and streaming connectors. Real users running it on real data.