Introduction

kualia.ai is an end-to-end reinforcement learning experiment platform. It lets you describe an environment in plain English, generates a fully compliant Gymnasium environment, and gives you a visual builder to iterate on reward functions, observation spaces, and dynamics — all through a conversational interface.

Once your environment is ready, kualia trains agents using state-of-the-art algorithms from Stable-Baselines3, streams real-time training curves, and lets you download or continue training your models.

The Research Lab extends the platform into a full experiment pipeline: define hypotheses, run multi-phase experiments, and auto-generate publishable papers from your results.

Core capabilities: Environment generation from text or papers · Conversational builder · PPO / SAC / DQN training · Real-time monitoring · Version control · Research pipeline · REST API

Quick Start

Get from zero to a trained agent in under five minutes.

1. Create an account

2. Describe your environment

Use the Environment Builder to describe what you need in plain English. The AI Architect Agent generates Gymnasium-compatible code, validates it with 8 automated tests, and lets you iterate through chat. AI smart suggestions help you refine.

3. Train an agent

Choose an algorithm (PPO, SAC, DQN), configure hyperparameters, and hit train. Use Continue, Fine-Tune, or Curriculum modes to improve your agent. Watch live progress with real-time reward curves.

4. Run research (optional)

Use the Research Lab to formulate a hypothesis, run real experiments, and generate a complete academic paper with inline training figures — downloadable as PDF.

Authentication

All API requests require an API key passed via the X-API-Key header. You can create and manage keys from the Dashboard → Settings page.

shell

curl -H "X-API-Key: sk-your-key" \
  https://kualia.ai/api/rlforge/catalog

Parameter	Type	Required	Description
X-API-Key	string	Yes	Your API key from the dashboard settings page.

Security: Never commit your API key to version control. Use environment variables or a secrets manager in production.

Environment Builder — Overview

The Environment Builder is a conversational interface that lets you create, modify, and refine Gymnasium-compatible RL environments without writing code manually. It combines an AI code generation backend with a live preview of the environment code.

Every environment generated through kualia follows the standard Gymnasium interface — reset(), step(action), render() — so it is compatible with every major RL library (Stable-Baselines3, RLlib, CleanRL, etc.).

The builder stores a full version history of every change, so you can roll back at any time.

Generating Environments

There are three ways to create an environment on kualia:

Natural-language description

Describe your environment in plain English. kualia's AI generates the full Gymnasium code including observation space, action space, reward function, and dynamics.

shell

curl -X POST https://kualia.ai/api/rlforge/generate \
  -H "Content-Type: application/json" \
  -H "X-API-Key: sk-your-key" \
  -d '{
    "description": "A 2D drone that must land on a moving platform",
    "domain": "robotics",
    "difficulty": "medium"
  }'

From a research paper

Upload a PDF paper and kualia extracts the environment specification, reward structure, and constraints to generate a matching implementation.

shell

curl -X POST https://kualia.ai/api/rlforge/generate-from-paper \
  -H "X-API-Key: sk-your-key" \
  -F "file=@paper.pdf"

Fork an existing environment

Start from any published environment in the catalog and fork it with your own modifications.

shell

curl -X POST https://kualia.ai/api/rlforge/fork/42 \
  -H "Content-Type: application/json" \
  -H "X-API-Key: sk-your-key" \
  -d '{ "modifications": "Add obstacles and increase gravity" }'

Chat Iteration

After initial generation, the builder enters chat mode. You can ask for any modification — changing the reward function, adjusting the observation space, adding visualization, tweaking dynamics — and the AI updates the environment code accordingly.

shell

curl -X POST https://kualia.ai/api/rlforge/builder/42/chat \
  -H "Content-Type: application/json" \
  -H "X-API-Key: sk-your-key" \
  -d '{ "message": "Make the reward sparse: +1 only when the agent reaches the goal" }'

Each message creates a new version of the environment. You can retrieve the full conversation history to understand how the environment evolved:

shell

curl https://kualia.ai/api/rlforge/builder/42/history \
  -H "X-API-Key: sk-your-key"

The response includes each message, the code diff, and the version number, so you always have full traceability.

Version Control

Every chat iteration creates a new version of your environment. kualia stores the full version tree so you can roll back to any previous state at any time.

shell

# Roll back to version 3
curl -X POST https://kualia.ai/api/rlforge/builder/42/rollback \
  -H "Content-Type: application/json" \
  -H "X-API-Key: sk-your-key" \
  -d '{ "version": 3 }'

Rollback does not delete later versions — it creates a new version based on the target, so you can always go forward again. Think of it like git revert rather than a hard reset.

Export (ZIP, GitHub)

When your environment is ready, you can export it in two ways:

ZIP download

Download a self-contained ZIP with the environment code, requirements.txt, and a README:

shell

curl -o env.zip https://kualia.ai/api/rlforge/builder/42/export-zip \
  -H "X-API-Key: sk-your-key"

GitHub push

From the dashboard, connect your GitHub account and push directly to a repository. kualia creates the repo (or pushes to an existing one) with proper file structure, CI checks, and a Gymnasium registration entry point.

Algorithms

kualia supports three proven algorithms from Stable-Baselines3. Each is suited for different environment characteristics:

PPO

Proximal Policy Optimization

General-purpose on-policy algorithm. Works well across most environments and is the recommended default.

Best for: Discrete & continuous actions, most use cases

SAC

Soft Actor-Critic

Off-policy algorithm optimized for continuous action spaces. Sample-efficient and stable.

Best for: Continuous actions (robotics, control)

DQN

Deep Q-Network

Value-based off-policy algorithm for discrete action spaces. Simple and effective for smaller problems.

Best for: Discrete actions (grid worlds, games)

Training Configuration

Start a training run by specifying the environment ID, algorithm, and hyperparameters. kualia provides sensible defaults for all optional fields.

shell

curl -X POST https://kualia.ai/api/rlforge/train/42 \
  -H "Content-Type: application/json" \
  -H "X-API-Key: sk-your-key" \
  -d '{
    "algorithm": "PPO",
    "total_timesteps": 100000,
    "learning_rate": 0.0003,
    "n_steps": 2048,
    "batch_size": 64,
    "gamma": 0.99
  }'

Parameter	Type	Required	Description
algorithm	string	No	One of "PPO", "SAC", "DQN". Defaults to "PPO".
total_timesteps	integer	No	Total training steps. Default: 50,000.
learning_rate	float	No	Learning rate. Default: 3e-4.
n_steps	integer	No	Steps per rollout (PPO only). Default: 2048.
batch_size	integer	No	Minibatch size. Default: 64.
gamma	float	No	Discount factor. Default: 0.99.
seed	integer	No	Random seed for reproducibility.

Monitoring & Curves

While training is running, you can poll the status endpoint to get real-time metrics. The dashboard also provides a live reward curve visualization.

shell

# Check training status
curl https://kualia.ai/api/rlforge/train/42/status \
  -H "X-API-Key: sk-your-key"

# Response
{
  "status": "running",
  "progress": 0.65,
  "current_timestep": 65000,
  "total_timesteps": 100000,
  "mean_reward": 187.3,
  "elapsed_seconds": 42
}

shell

# Get the full reward curve
curl https://kualia.ai/api/rlforge/train/42/curve \
  -H "X-API-Key: sk-your-key"

# Response
{
  "timesteps": [1000, 2000, 3000, ...],
  "rewards": [12.5, 34.2, 67.8, ...],
  "episode_lengths": [120, 145, 98, ...]
}

The curve data is suitable for plotting with matplotlib, plotly, or any charting library. The dashboard renders it in real-time using a streaming connection.

Continue Training

If a model needs more training steps, you can resume from the last checkpoint without starting over. Pass the continue_from parameter with the ID of a completed training run:

shell

curl -X POST https://kualia.ai/api/rlforge/train/42 \
  -H "Content-Type: application/json" \
  -H "X-API-Key: sk-your-key" \
  -d '{
    "algorithm": "PPO",
    "total_timesteps": 50000,
    "continue_from": "run_abc123"
  }'

The training run inherits all hyperparameters from the previous run unless you explicitly override them. This is useful for fine-tuning or extending convergence.

Research Lab — Overview

The Research Lab is a structured experiment pipeline for running reproducible RL research. It guides you through the full research lifecycle: from hypothesis formation to paper generation.

Each research project contains a conversational thread where you describe your research goals. kualia's AI assistant helps you design experiments, select baselines, and analyze results.

You can also upload reference papers to ground your research in existing literature. kualia extracts key methods, results, and experimental setups to inform your experiment design.

Phases & Pipeline

A research project progresses through the following phases:

1. Hypothesis

Define your research question and expected outcomes.

2. Experiment Design

Select environments, algorithms, baselines, and metrics.

3. Execution

Run all training experiments with tracking and versioning.

4. Analysis

Compare results, generate plots, and run statistical tests.

5. Paper Generation

Auto-generate a LaTeX paper from your results and methodology.

You can move between phases freely — for example, going back to experiment design after seeing initial results.

Paper Generation

Once your experiments are complete, kualia can auto-generate a research paper in LaTeX format. The generated paper includes:

Abstract summarizing the research and key findings
Introduction with related work from uploaded references
Methodology section describing environments and algorithms
Results with auto-generated tables and reward curve figures
Discussion and conclusion sections
Bibliography from uploaded reference papers

The generated paper is a starting point — you can download the LaTeX source and edit it further, or continue iterating through the chat interface.

API Reference — Catalog

Browse and search published environments in the kualia catalog.

GET/api/rlforge/catalog

List published environments. Supports filtering by domain, difficulty, and text search.

Parameter	Type	Required	Description
domain	string	No	Filter by domain (e.g. "robotics", "finance", "games").
difficulty	string	No	Filter by difficulty: "easy", "medium", "hard".
search	string	No	Full-text search across name and description.
page	integer	No	Page number for pagination. Default: 1.
limit	integer	No	Results per page. Default: 20, max: 100.

json

// GET /api/rlforge/catalog?domain=robotics&limit=2
{
  "environments": [
    {
      "id": 42,
      "slug": "drone-landing-v1",
      "name": "Drone Landing",
      "description": "2D drone that must land on a moving platform",
      "domain": "robotics",
      "difficulty": "medium",
      "observation_space": "Box(8,)",
      "action_space": "Box(2,)",
      "created_at": "2025-06-15T10:30:00Z"
    }
  ],
  "total": 127,
  "page": 1
}

GET/api/rlforge/catalog/{slug}

Get full details for a specific environment by its slug.

GET/api/rlforge/templates

List template environments that serve as starting points for generation.

API Reference — Generation

Generate new environments from natural language, papers, or by forking existing ones.

POST/api/rlforge/generate

Generate a new Gymnasium environment from a natural-language description.

Parameter	Type	Required	Description
description	string	Yes	Natural-language description of the desired environment.
domain	string	No	Category hint: "robotics", "finance", "games", etc.
difficulty	string	No	Complexity hint: "easy", "medium", "hard".

json

// POST /api/rlforge/generate
// Request
{
  "description": "Multi-stock trading with transaction costs and portfolio constraints",
  "domain": "finance",
  "difficulty": "hard"
}

// Response
{
  "id": 57,
  "slug": "multi-stock-trading-v1",
  "name": "Multi-Stock Trading",
  "status": "ready",
  "code": "import gymnasium as gym\nimport numpy as np\n...",
  "observation_space": "Box(30,)",
  "action_space": "Box(5,)"
}

POST/api/rlforge/generate-from-paper

Generate an environment from an uploaded PDF research paper. Multipart form data.

POST/api/rlforge/fork/{env_id}

Fork an existing environment and apply modifications.

API Reference — Builder

Interact with the conversational environment builder.

POST/api/rlforge/builder/{id}/chat

Send an iteration message to modify the environment.

Parameter	Type	Required	Description
message	string	Yes	The modification request in natural language.

json

// POST /api/rlforge/builder/42/chat
// Request
{ "message": "Change the reward to be distance-based: -1 * distance_to_goal" }

// Response
{
  "version": 5,
  "code": "import gymnasium as gym\n...",
  "diff": "@@ -45,3 +45,3 @@\n-  reward = 1.0 if done else 0.0\n+  reward = -1.0 * distance_to_goal",
  "message": "Updated the reward function to be distance-based..."
}

GET/api/rlforge/builder/{id}/history

Get the full conversation history and version timeline.

POST/api/rlforge/builder/{id}/rollback

Roll back to a specific version.

POST/api/rlforge/builder/{id}/export-zip

Download the current environment as a ZIP archive.

API Reference — Training

Launch and monitor agent training runs using Stable-Baselines3.

POST/api/rlforge/train/{env_id}

Start a training run. Returns a run ID for tracking.

Parameter	Type	Required	Description
algorithm	string	No	"PPO", "SAC", or "DQN". Default: "PPO".
total_timesteps	integer	No	Total training steps. Default: 50000.
learning_rate	float	No	Learning rate. Default: 3e-4.
seed	integer	No	Random seed for reproducibility.
continue_from	string	No	Run ID to continue training from.

json

// POST /api/rlforge/train/42
// Request
{
  "algorithm": "SAC",
  "total_timesteps": 200000,
  "seed": 42
}

// Response
{
  "run_id": "run_xyz789",
  "status": "queued",
  "env_id": 42,
  "algorithm": "SAC",
  "total_timesteps": 200000
}

GET/api/rlforge/train/{env_id}/status

Get the current status of the latest training run.

GET/api/rlforge/train/{env_id}/curve

Get the reward curve data (timesteps, rewards, episode lengths).

GET/api/rlforge/train/{env_id}/model

Download the trained model as a .zip file.

API Reference — Research

Create and manage research projects with the experiment pipeline.

POST/api/rlforge/research/projects

Create a new research project.

Parameter	Type	Required	Description
title	string	Yes	Title of the research project.
description	string	No	Brief description of the research goals.

json

// POST /api/rlforge/research/projects
// Request
{
  "title": "Reward Shaping in Sparse Environments",
  "description": "Comparing dense vs sparse reward signals across navigation tasks"
}

// Response
{
  "id": "proj_abc123",
  "title": "Reward Shaping in Sparse Environments",
  "phase": "hypothesis",
  "created_at": "2025-07-20T14:00:00Z"
}

GET/api/rlforge/research/projects

List all research projects for the authenticated user.

GET/api/rlforge/research/projects/{id}

Get full project details including conversation messages.

POST/api/rlforge/research/projects/{id}/upload-paper

Upload a reference paper (PDF) to inform the experiment design.