Design the Experience.
Train the Intelligence.
Generate RL environments. Train agents. Run experiments. Create papers.
Describe what you need in plain English. The AI builds Gymnasium-compatible code, trains agents with SB3, tracks experiments, and writes research papers with real data.
01
Generate.
Describe your RL environment in plain English — including agent behavior. AI writes validated Gymnasium code.
02
Train.
Continue, Fine-Tune, or Curriculum modes. Live charts, real metrics, configurable hyperparameters.
03
Research.
Hypothesis-first pipeline — design experiments, train agents, write academic papers with real data.
Builder.
AI Environment Builder
Describe your RL environment in natural language — including agent behavior, self-observation, and adaptive mechanisms. The Architect Agent generates Gymnasium-compatible code, validates with 8 automated tests, and offers AI-powered suggestions for your next iteration.
Created DroneNav-v1 with 8 distance sensors, -5.0 collision penalty, and goal bonus +50.0. Gaussian wind perturbation N(0, 0.3) applied each step.
Added self-observation: action history buffer [5x4], strategy effectiveness metric, and stagnation detection. Obs space expanded to [32].
Distance + collision + goal
Training.
Agent Training & Advanced Modes
Train with Continue, Fine-Tune, or Curriculum Learning. Configurable hyperparameters, live training curves, evaluation metrics, and ETA.
Training Modes
Continue
Same settings, resume training
Fine-Tune
Low LR, short training
Curriculum
Auto-increase difficulty
Algorithm
PPO
Timesteps
100K
LR
3e-4
Experiments.
Track & Compare
Every training run is versioned. Compare runs side-by-side, expand experiment history with full metrics, training curves, evaluation episodes, and hyperparameter details. Export to GitHub or download reports.
| Metric | #5 | #4 | Delta |
|---|---|---|---|
| Reward | 78.2 | 45.6 | +71.5% |
| Success Rate | 97% | 68% | +42.6% |
| Ep. Length | 55 | 120 | -54.2% |
| Policy Loss | 0.63 | 1.24 | -49.2% |
Sage — Hypothesis
“Does curriculum reward shaping outperform flat rewards in multi-goal navigation?” Defined env specs: 2D grid, 4 goals, progressive difficulty scaling. Agent must generalize across difficulty levels.
Atlas — Design & Experiment
Built CurriculumNav-v1 — 8/8 tests passed. Training PPO across Continue → Fine-Tune → Curriculum. Results: 92% success, mean reward 78.2, episode length 55.
Sage — Research & Write
Found 14 supporting papers via academic literature search. Writing paper with inline training curves, evaluation tables, and hyperparameter appendix. PDF download ready.
Research Lab.
Hypothesis to Paper
Two AI agents formulate a hypothesis, design experiments, train real agents, analyze results, and write academic papers with inline figures and literature references — downloadable as PDF.
Generated Paper
Curriculum Reward Shaping for Multi-Goal Navigation: A Comparative Study
Sage & Atlas — kualia.ai Research Lab
Abstract: We investigate whether curriculum-based reward shaping improves agent performance in multi-goal navigation tasks compared to flat reward structures...
Gymnasium v0.29+
Industry-standard RL framework
8 Automated Tests
Every environment validated
PPO / SAC / DQN
Leading RL algorithms
Curriculum Learning
Auto-increase difficulty
AI Smart Suggestions
Next-step recommendations
Research Papers
Academic quality, PDF export
Version Control
Every iteration tracked
GitHub Export
Push to your repository
Ready to build?
Describe your environment, train an agent, and publish your research. All in one platform.