Introduction
kualia.ai is an end-to-end reinforcement learning experiment platform. It lets you describe an environment in plain English, generates a fully compliant Gymnasium environment, and gives you a visual builder to iterate on reward functions, observation spaces, and dynamics — all through a conversational interface.
Once your environment is ready, kualia trains agents using state-of-the-art algorithms from Stable-Baselines3, streams real-time training curves, and lets you download or continue training your models.
The Research Lab extends the platform into a full experiment pipeline: define hypotheses, run multi-phase experiments, and auto-generate publishable papers from your results.
Core capabilities: Environment generation from text or papers · Conversational builder · PPO / SAC / DQN training · Real-time monitoring · Version control · Research pipeline · REST API
Quick Start
Get from zero to a trained agent in under five minutes.
1. Create an account
Sign up with Google or GitHub. You'll land on your Dashboard immediately.
2. Describe your environment
Use the Environment Builder to describe what you need in plain English. The AI Architect Agent generates Gymnasium-compatible code, validates it with 8 automated tests, and lets you iterate through chat. AI smart suggestions help you refine.
3. Train an agent
Choose an algorithm (PPO, SAC, DQN), configure hyperparameters, and hit train. Use Continue, Fine-Tune, or Curriculum modes to improve your agent. Watch live progress with real-time reward curves.
4. Run research (optional)
Use the Research Lab to formulate a hypothesis, run real experiments, and generate a complete academic paper with inline training figures — downloadable as PDF.
Authentication
All API requests require an API key passed via the X-API-Key header. You can create and manage keys from the Dashboard → Settings page.
curl -H "X-API-Key: sk-your-key" \
https://kualia.ai/api/rlforge/catalog| Parameter | Type | Required | Description |
|---|---|---|---|
| X-API-Key | string | Yes | Your API key from the dashboard settings page. |
Security: Never commit your API key to version control. Use environment variables or a secrets manager in production.
Environment Builder — Overview
The Environment Builder is a conversational interface that lets you create, modify, and refine Gymnasium-compatible RL environments without writing code manually. It combines an AI code generation backend with a live preview of the environment code.
Every environment generated through kualia follows the standard Gymnasium interface — reset(), step(action), render() — so it is compatible with every major RL library (Stable-Baselines3, RLlib, CleanRL, etc.).
The builder stores a full version history of every change, so you can roll back at any time.
Generating Environments
There are three ways to create an environment on kualia:
Natural-language description
Describe your environment in plain English. kualia's AI generates the full Gymnasium code including observation space, action space, reward function, and dynamics.
curl -X POST https://kualia.ai/api/rlforge/generate \
-H "Content-Type: application/json" \
-H "X-API-Key: sk-your-key" \
-d '{
"description": "A 2D drone that must land on a moving platform",
"domain": "robotics",
"difficulty": "medium"
}'From a research paper
Upload a PDF paper and kualia extracts the environment specification, reward structure, and constraints to generate a matching implementation.
curl -X POST https://kualia.ai/api/rlforge/generate-from-paper \
-H "X-API-Key: sk-your-key" \
-F "file=@paper.pdf"Fork an existing environment
Start from any published environment in the catalog and fork it with your own modifications.
curl -X POST https://kualia.ai/api/rlforge/fork/42 \
-H "Content-Type: application/json" \
-H "X-API-Key: sk-your-key" \
-d '{ "modifications": "Add obstacles and increase gravity" }'Chat Iteration
After initial generation, the builder enters chat mode. You can ask for any modification — changing the reward function, adjusting the observation space, adding visualization, tweaking dynamics — and the AI updates the environment code accordingly.
curl -X POST https://kualia.ai/api/rlforge/builder/42/chat \
-H "Content-Type: application/json" \
-H "X-API-Key: sk-your-key" \
-d '{ "message": "Make the reward sparse: +1 only when the agent reaches the goal" }'Each message creates a new version of the environment. You can retrieve the full conversation history to understand how the environment evolved:
curl https://kualia.ai/api/rlforge/builder/42/history \
-H "X-API-Key: sk-your-key"The response includes each message, the code diff, and the version number, so you always have full traceability.
Version Control
Every chat iteration creates a new version of your environment. kualia stores the full version tree so you can roll back to any previous state at any time.
# Roll back to version 3
curl -X POST https://kualia.ai/api/rlforge/builder/42/rollback \
-H "Content-Type: application/json" \
-H "X-API-Key: sk-your-key" \
-d '{ "version": 3 }'Rollback does not delete later versions — it creates a new version based on the target, so you can always go forward again. Think of it like git revert rather than a hard reset.
Export (ZIP, GitHub)
When your environment is ready, you can export it in two ways:
ZIP download
Download a self-contained ZIP with the environment code, requirements.txt, and a README:
curl -o env.zip https://kualia.ai/api/rlforge/builder/42/export-zip \
-H "X-API-Key: sk-your-key"GitHub push
From the dashboard, connect your GitHub account and push directly to a repository. kualia creates the repo (or pushes to an existing one) with proper file structure, CI checks, and a Gymnasium registration entry point.
Algorithms
kualia supports three proven algorithms from Stable-Baselines3. Each is suited for different environment characteristics:
PPO
Proximal Policy Optimization
General-purpose on-policy algorithm. Works well across most environments and is the recommended default.
Best for: Discrete & continuous actions, most use cases
SAC
Soft Actor-Critic
Off-policy algorithm optimized for continuous action spaces. Sample-efficient and stable.
Best for: Continuous actions (robotics, control)
DQN
Deep Q-Network
Value-based off-policy algorithm for discrete action spaces. Simple and effective for smaller problems.
Best for: Discrete actions (grid worlds, games)
Training Configuration
Start a training run by specifying the environment ID, algorithm, and hyperparameters. kualia provides sensible defaults for all optional fields.
curl -X POST https://kualia.ai/api/rlforge/train/42 \
-H "Content-Type: application/json" \
-H "X-API-Key: sk-your-key" \
-d '{
"algorithm": "PPO",
"total_timesteps": 100000,
"learning_rate": 0.0003,
"n_steps": 2048,
"batch_size": 64,
"gamma": 0.99
}'| Parameter | Type | Required | Description |
|---|---|---|---|
| algorithm | string | No | One of "PPO", "SAC", "DQN". Defaults to "PPO". |
| total_timesteps | integer | No | Total training steps. Default: 50,000. |
| learning_rate | float | No | Learning rate. Default: 3e-4. |
| n_steps | integer | No | Steps per rollout (PPO only). Default: 2048. |
| batch_size | integer | No | Minibatch size. Default: 64. |
| gamma | float | No | Discount factor. Default: 0.99. |
| seed | integer | No | Random seed for reproducibility. |
Monitoring & Curves
While training is running, you can poll the status endpoint to get real-time metrics. The dashboard also provides a live reward curve visualization.
# Check training status
curl https://kualia.ai/api/rlforge/train/42/status \
-H "X-API-Key: sk-your-key"
# Response
{
"status": "running",
"progress": 0.65,
"current_timestep": 65000,
"total_timesteps": 100000,
"mean_reward": 187.3,
"elapsed_seconds": 42
}# Get the full reward curve
curl https://kualia.ai/api/rlforge/train/42/curve \
-H "X-API-Key: sk-your-key"
# Response
{
"timesteps": [1000, 2000, 3000, ...],
"rewards": [12.5, 34.2, 67.8, ...],
"episode_lengths": [120, 145, 98, ...]
}The curve data is suitable for plotting with matplotlib, plotly, or any charting library. The dashboard renders it in real-time using a streaming connection.
Continue Training
If a model needs more training steps, you can resume from the last checkpoint without starting over. Pass the continue_from parameter with the ID of a completed training run:
curl -X POST https://kualia.ai/api/rlforge/train/42 \
-H "Content-Type: application/json" \
-H "X-API-Key: sk-your-key" \
-d '{
"algorithm": "PPO",
"total_timesteps": 50000,
"continue_from": "run_abc123"
}'The training run inherits all hyperparameters from the previous run unless you explicitly override them. This is useful for fine-tuning or extending convergence.
Research Lab — Overview
The Research Lab is a structured experiment pipeline for running reproducible RL research. It guides you through the full research lifecycle: from hypothesis formation to paper generation.
Each research project contains a conversational thread where you describe your research goals. kualia's AI assistant helps you design experiments, select baselines, and analyze results.
You can also upload reference papers to ground your research in existing literature. kualia extracts key methods, results, and experimental setups to inform your experiment design.
Phases & Pipeline
A research project progresses through the following phases:
Define your research question and expected outcomes.
Select environments, algorithms, baselines, and metrics.
Run all training experiments with tracking and versioning.
Compare results, generate plots, and run statistical tests.
Auto-generate a LaTeX paper from your results and methodology.
You can move between phases freely — for example, going back to experiment design after seeing initial results.
Paper Generation
Once your experiments are complete, kualia can auto-generate a research paper in LaTeX format. The generated paper includes:
- Abstract summarizing the research and key findings
- Introduction with related work from uploaded references
- Methodology section describing environments and algorithms
- Results with auto-generated tables and reward curve figures
- Discussion and conclusion sections
- Bibliography from uploaded reference papers
The generated paper is a starting point — you can download the LaTeX source and edit it further, or continue iterating through the chat interface.
API Reference — Catalog
Browse and search published environments in the kualia catalog.
/api/rlforge/catalogList published environments. Supports filtering by domain, difficulty, and text search.
| Parameter | Type | Required | Description |
|---|---|---|---|
| domain | string | No | Filter by domain (e.g. "robotics", "finance", "games"). |
| difficulty | string | No | Filter by difficulty: "easy", "medium", "hard". |
| search | string | No | Full-text search across name and description. |
| page | integer | No | Page number for pagination. Default: 1. |
| limit | integer | No | Results per page. Default: 20, max: 100. |
// GET /api/rlforge/catalog?domain=robotics&limit=2
{
"environments": [
{
"id": 42,
"slug": "drone-landing-v1",
"name": "Drone Landing",
"description": "2D drone that must land on a moving platform",
"domain": "robotics",
"difficulty": "medium",
"observation_space": "Box(8,)",
"action_space": "Box(2,)",
"created_at": "2025-06-15T10:30:00Z"
}
],
"total": 127,
"page": 1
}/api/rlforge/catalog/{slug}Get full details for a specific environment by its slug.
/api/rlforge/templatesList template environments that serve as starting points for generation.
API Reference — Generation
Generate new environments from natural language, papers, or by forking existing ones.
/api/rlforge/generateGenerate a new Gymnasium environment from a natural-language description.
| Parameter | Type | Required | Description |
|---|---|---|---|
| description | string | Yes | Natural-language description of the desired environment. |
| domain | string | No | Category hint: "robotics", "finance", "games", etc. |
| difficulty | string | No | Complexity hint: "easy", "medium", "hard". |
// POST /api/rlforge/generate
// Request
{
"description": "Multi-stock trading with transaction costs and portfolio constraints",
"domain": "finance",
"difficulty": "hard"
}
// Response
{
"id": 57,
"slug": "multi-stock-trading-v1",
"name": "Multi-Stock Trading",
"status": "ready",
"code": "import gymnasium as gym\nimport numpy as np\n...",
"observation_space": "Box(30,)",
"action_space": "Box(5,)"
}/api/rlforge/generate-from-paperGenerate an environment from an uploaded PDF research paper. Multipart form data.
/api/rlforge/fork/{env_id}Fork an existing environment and apply modifications.
API Reference — Builder
Interact with the conversational environment builder.
/api/rlforge/builder/{id}/chatSend an iteration message to modify the environment.
| Parameter | Type | Required | Description |
|---|---|---|---|
| message | string | Yes | The modification request in natural language. |
// POST /api/rlforge/builder/42/chat
// Request
{ "message": "Change the reward to be distance-based: -1 * distance_to_goal" }
// Response
{
"version": 5,
"code": "import gymnasium as gym\n...",
"diff": "@@ -45,3 +45,3 @@\n- reward = 1.0 if done else 0.0\n+ reward = -1.0 * distance_to_goal",
"message": "Updated the reward function to be distance-based..."
}/api/rlforge/builder/{id}/historyGet the full conversation history and version timeline.
/api/rlforge/builder/{id}/rollbackRoll back to a specific version.
/api/rlforge/builder/{id}/export-zipDownload the current environment as a ZIP archive.
API Reference — Training
Launch and monitor agent training runs using Stable-Baselines3.
/api/rlforge/train/{env_id}Start a training run. Returns a run ID for tracking.
| Parameter | Type | Required | Description |
|---|---|---|---|
| algorithm | string | No | "PPO", "SAC", or "DQN". Default: "PPO". |
| total_timesteps | integer | No | Total training steps. Default: 50000. |
| learning_rate | float | No | Learning rate. Default: 3e-4. |
| seed | integer | No | Random seed for reproducibility. |
| continue_from | string | No | Run ID to continue training from. |
// POST /api/rlforge/train/42
// Request
{
"algorithm": "SAC",
"total_timesteps": 200000,
"seed": 42
}
// Response
{
"run_id": "run_xyz789",
"status": "queued",
"env_id": 42,
"algorithm": "SAC",
"total_timesteps": 200000
}/api/rlforge/train/{env_id}/statusGet the current status of the latest training run.
/api/rlforge/train/{env_id}/curveGet the reward curve data (timesteps, rewards, episode lengths).
/api/rlforge/train/{env_id}/modelDownload the trained model as a .zip file.
API Reference — Research
Create and manage research projects with the experiment pipeline.
/api/rlforge/research/projectsCreate a new research project.
| Parameter | Type | Required | Description |
|---|---|---|---|
| title | string | Yes | Title of the research project. |
| description | string | No | Brief description of the research goals. |
// POST /api/rlforge/research/projects
// Request
{
"title": "Reward Shaping in Sparse Environments",
"description": "Comparing dense vs sparse reward signals across navigation tasks"
}
// Response
{
"id": "proj_abc123",
"title": "Reward Shaping in Sparse Environments",
"phase": "hypothesis",
"created_at": "2025-07-20T14:00:00Z"
}/api/rlforge/research/projectsList all research projects for the authenticated user.
/api/rlforge/research/projects/{id}Get full project details including conversation messages.
/api/rlforge/research/projects/{id}/upload-paperUpload a reference paper (PDF) to inform the experiment design.