Design the Experience.
Train the Intelligence.

Generate RL environments. Train agents. Run experiments. Create papers.

Describe what you need in plain English. The AI builds Gymnasium-compatible code, trains agents with SB3, tracks experiments, and writes research papers with real data.

01

Generate.

Describe your RL environment in plain English — including agent behavior. AI writes validated Gymnasium code.

02

Train.

Continue, Fine-Tune, or Curriculum modes. Live charts, real metrics, configurable hyperparameters.

03

Research.

Hypothesis-first pipeline — design experiments, train agents, write academic papers with real data.

Builder.

AI Environment Builder

Describe your RL environment in natural language — including agent behavior, self-observation, and adaptive mechanisms. The Architect Agent generates Gymnasium-compatible code, validates with 8 automated tests, and offers AI-powered suggestions for your next iteration.

START BUILDING
builder / drone-navigation-v3
Create a drone navigation environment with obstacle avoidance and wind perturbation

Created DroneNav-v1 with 8 distance sensors, -5.0 collision penalty, and goal bonus +50.0. Gaussian wind perturbation N(0, 0.3) applied each step.

syntaximportresetstepobsactionrewarddeterm.
The agent should store its last 5 actions and observe its own strategy effectiveness

Added self-observation: action history buffer [5x4], strategy effectiveness metric, and stagnation detection. Obs space expanded to [32].

syntaximportresetstepobsactionrewarddeterm.
Add penalty shapingIncrease obs dimsTrain with SACRaise difficulty
7/8 Tests Passingv3 · Box[32] → Box[4]
syntax
import
reset
step
obs_space
action_space
reward
determinism
Observation
TypeBox
Shape[32]
Range[-1, 1]
Action
TypeBox
Shape[4]
Range[-1, 1]
Reward

Distance + collision + goal

distcollisiongoal
Training in progress
73.5K / 100K
Episode Reward78.2
Episode Length55
Success Rate97%
Policy Loss0.63
Step: 73,500Episodes: 1,247FPS: 4,280ETA: 6sPPO

Training.

Agent Training & Advanced Modes

Train with Continue, Fine-Tune, or Curriculum Learning. Configurable hyperparameters, live training curves, evaluation metrics, and ETA.

SEE HOW IT WORKS

Training Modes

Continue

Same settings, resume training

Fine-Tune

Low LR, short training

Curriculum

Auto-increase difficulty

Algorithm

PPO

Timesteps

100K

LR

3e-4

Experiments.

Track & Compare

Every training run is versioned. Compare runs side-by-side, expand experiment history with full metrics, training curves, evaluation episodes, and hyperparameter details. Export to GitHub or download reports.

VIEW EXPERIMENTS
experiments / 4 runsCompare
#5completedPPOv3100K
R: 78.297%
#4completedPPOv2100K
R: 45.668%
#3completedSACv250K
R: 38.152%
#2completedPPOv150K
R: 12.415%
comparison / #5 vs #4
Run #5 (PPO v3) vs Run #4 (PPO v2)
Metric#5#4Delta
Reward78.245.6+71.5%
Success Rate97%68%+42.6%
Ep. Length55120-54.2%
Policy Loss0.631.24-49.2%
research / curriculum-nav-study
S

Sage — Hypothesis

“Does curriculum reward shaping outperform flat rewards in multi-goal navigation?” Defined env specs: 2D grid, 4 goals, progressive difficulty scaling. Agent must generalize across difficulty levels.

A

Atlas — Design & Experiment

Built CurriculumNav-v1 — 8/8 tests passed. Training PPO across Continue → Fine-Tune → Curriculum. Results: 92% success, mean reward 78.2, episode length 55.

S

Sage — Research & Write

Found 14 supporting papers via academic literature search. Writing paper with inline training curves, evaluation tables, and hyperparameter appendix. PDF download ready.

6/6

Research Lab.

Hypothesis to Paper

Two AI agents formulate a hypothesis, design experiments, train real agents, analyze results, and write academic papers with inline figures and literature references — downloadable as PDF.

EXPLORE LAB

Generated Paper

Curriculum Reward Shaping for Multi-Goal Navigation: A Comparative Study

Sage & Atlas — kualia.ai Research Lab

Abstract: We investigate whether curriculum-based reward shaping improves agent performance in multi-goal navigation tasks compared to flat reward structures...

4 figures3 experiments14 refs
PDF ReadyDownload

Gymnasium v0.29+

Industry-standard RL framework

8 Automated Tests

Every environment validated

PPO / SAC / DQN

Leading RL algorithms

Curriculum Learning

Auto-increase difficulty

AI Smart Suggestions

Next-step recommendations

Research Papers

Academic quality, PDF export

Version Control

Every iteration tracked

GitHub Export

Push to your repository

Ready to build?

Describe your environment, train an agent, and publish your research. All in one platform.