AI Research Lab
Hypothesis-first research pipeline — from original idea to published paper
Two AI agents formulate an original hypothesis, design custom environments, train real agents, analyze results, and write academic papers with inline training figures and literature references — downloadable as PDF.
How the Lab Works
A six-phase hypothesis-first pipeline turns your research idea into a complete paper backed by real experimental data.
Hypothesis
Sage formulates an original hypothesis and research question from your topic — no literature search yet. Defines agent requirements and environment specs.
SageDesign
Atlas designs a custom Gymnasium environment based on the hypothesis, including agent-side behavior like self-observation and adaptive mechanisms.
AtlasExperiment
Atlas trains real SB3 agents with configurable hyperparameters. Supports Continue, Fine-Tune, and Curriculum training modes.
AtlasAnalyze
Training results are analyzed — reward curves, convergence rates, success metrics — all from real runs.
AtlasResearch & Write
Sage conducts academic literature research for supporting references, then writes a complete paper with inline training figures, tables, and real experimental data.
SageReview
Both agents review the paper for accuracy, consistency with experimental results, and academic rigor. PDF download ready.
Sage + AtlasAI Research Agents
Two specialized agents work together — one thinks strategically, the other builds and runs experiments.
Sage
Research Strategist
- Original hypothesis formulation from user topic
- Academic literature research for supporting references
- Research question & experimental design
- Academic paper writing with inline figures
- PDF download with training data embedded
Atlas
RL Engineer
- Gymnasium environment generation with agent-side design
- SB3 training: Continue, Fine-Tune, Curriculum modes
- Configurable hyperparameters (LR, batch, gamma, net arch)
- Training metric collection & curve generation
- Result analysis, evaluation episodes & visualization
Sage — Hypothesis
Based on your topic "curriculum learning in sparse-reward navigation," I've formulated the hypothesis: automatic curriculum generation combined with hindsight replay will outperform flat-reward baselines. The agent needs adaptive difficulty sensing and experience memory.
Atlas — Design & Experiment
Building CurriculumNav-v1 with adaptive obs space and difficulty scaling. Training PPO with Continue → Curriculum modes, configurable LR and batch size. 8/8 tests passed. Starting 50K timesteps...
Sage — Research & Write
Found 14 academic papers as supporting references. Writing paper with inline training curves, evaluation tables, and hyperparameter details embedded in relevant sections. PDF ready for download.
Real Experiments, Real Data
Unlike tools that simulate or hallucinate results, Kualia's Research Lab generates actual Gymnasium environments, trains real Stable Baselines3 agents, and captures authentic training metrics — reward curves, convergence rates, episode lengths, and success rates.
Every chart in the paper comes from a real training run. Every number is backed by data.
500K
Training Steps
Real SB3 runs
3
Environments
Gymnasium v0.29
12
Total Runs
PPO + SAC
All data points come from actual Stable Baselines3 training runs on real Gymnasium environments — no simulated or hallucinated results.
Papers with Inline Data
Papers include inline training curves, evaluation tables, hyperparameter details, and reproducibility info — embedded directly in the relevant sections, not just appended. Academic literature is used as supporting citations for your original hypothesis.
Download the final paper as a styled PDF with all figures and data included. Re-run any phase to iterate on your research.
Automatic Curriculum Generation with Hindsight Experience Replay for Sparse-Reward Navigation Tasks
Kualia AI Research Lab, 2025
Abstract
We propose ACGHER, a method combining automatic curriculum generation with hindsight experience replay for continuous-control navigation in sparse-reward settings. Through experiments on three Gymnasium environments of increasing complexity, we demonstrate that ACGHER achieves 91.3 mean reward compared to 34.2 for the PPO baseline, while requiring 40% fewer training steps to converge...
Sections
- 1. Introduction
- 2. Related Work
- 3. Methodology
- 4. Experimental Setup
- 5. Results & Discussion
- 6. Conclusion
Inline Data
Table 1: Main Results
| Method | Reward | Success | Steps |
|---|---|---|---|
| PPO (baseline) | 34.2 | 38% | 500K |
| PPO + HER | 62.7 | 71% | 500K |
| ACGHER (ours) | 91.3 | 96% | 300K |
From Builder to Paper
Already built an environment in the Kualia Builder? Use the "Create Paper" button to generate a research paper directly from your existing environment and training data.
The Research Lab uses your environment code, training configurations, and experiment results as the foundation — then searches literature, contextualizes your work, and writes a paper around your real results.
Go to BuilderObstacle Avoidance with Penalty-Based Reward Shaping for UAV Navigation
An automated paper analyzing your drone-navigation-v3 environment, including training curves from 3 PPO runs, convergence analysis, and comparison with baseline configurations.
Published Papers
Research papers generated by the lab, each backed by real experiments and training data.
No papers published yet.
Papers will appear here as research projects are completed.
Start Your Research
Give the lab a topic. Get a paper backed by real experiments.