custom-env

Gymnasium-compatible continuous 2D navigation environment (10m x 10m arena). Observation space: Box(low=-10, high=10, shape=(14,), dtype=float32) containing [agent_x, agent_y, agent_vx, agent_vy, goal_relative_x, goal_relative_y, 8_lidar_readings (ray cast distances)]. Action space: Box(low=-1, high=1, shape=(2,), dtype=float32) representing [force_x, force_y]. Physics uses Euler integration with velocity damping 0.9. The environment maintains a competency buffer tracking success/failure of last 20 episodes to compute score c ∈ [0,1]. Adaptation mechanism: Obstacle count N = 5 + floor(15*c). Placement policy evolves continuously: (1) Random uniform when c < 0.33; (2) Corridor-blocking when 0.33 ≤ c < 0.66 using k-means clustering on recent agent trajectories to identify high-traffic zones, placing obstacles to minimize passage width; (3) Adversarial placement when c ≥ 0.66 using trajectory distribution analysis to maximize expected path length to goal. Obstacles are static circles with radii 0.3-0.6m. Reward: r_t = -0.1*||pos - goal||_2 - 0.01*||action||^2 + 10*success_flag - 5*collision_flag. Episode terminates on goal reach (distance < 0.5m), collision, or 500 steps. Reset() randomizes start/goal positions (min separation 8m) and regenerates obstacles via current placement policy based on c.

Domain

navigation

Difficulty

medium

Observation

Box(shape=?)

Action

Discrete(shape=?)

Reward

see spec

Max Steps

1000

Version

Tests (0/8)

syntaximportresetstepobs_spaceaction_spacereward_sanitydeterminism

Open in Builder

Use via API

import kualia

env = kualia.make("custom-env-1774053027")
obs, info = env.reset()

Environment Code

1783 chars

===ENV_SPEC===
{
  "name": "adaptive-2d-nav-v0",
  "domain": "navigation",
  "description": "Continuous 2D navigation in a 10m x 10m arena with adaptive obstacle placement based on agent competency. The environment maintains a competency buffer (last 20 episodes) to compute score c ∈ [0,1]. Obstacle count and placement strategy evolve with c: random uniform (c < 0.33), corridor-blocking via k-means on trajectories (0.33 ≤ c < 0.66), or adversarial wall placement (c ≥ 0.66). The agent observes its position, velocity, relative goal position, and 8 LIDAR rangefinder readings.",
  "observation_space": {
    "type": "Box",
    "shape": [14],
    "low": -10.0,
    "high": 10.0,
    "components": [
      "agent_x (0 to 10)",
      "agent_y (0 to 10)",
      "agent_vx",
      "agent_vy",
      "goal_relative_x",
      "goal_relative_y",
      "lidar_ray_0",
      "lidar_ray_1",
      "lidar_ray_2",
      "lidar_ray_3",
      "lidar_ray_4",
      "lidar_ray_5",
      "lidar_ray_6",
      "lidar_ray_7"
    ]
  },
  "action_space": {
    "type": "Box",
    "shape": [2],
    "low": -1.0,
    "high": 1.0,
    "components": [
      "force_x",
      "force_y"
    ]
  },
  "reward_function": {
    "type": "dense",
    "components": {
      "distance_penalty": "-0.1 * L2_distance_to_goal",
      "action_penalty": "-0.01 * ||action||^2",
      "success_bonus": "10.0 if distance < 0.5m",
      "collision_penalty": "-5.0 if collision with obstacle"
    },
    "range": [-10.0, 10.0]
  },
  "episode": {
    "max_steps": 500,
    "termination_conditions": [
      "Goal reached (distance < 0.5m)",
      "Collision with obstacle"
    ],
    "truncation_conditions": [
      "Step limit reached (500 steps)"
    ]
  },
  "parameters": {
    "arena_size": 10.0,
    "max_steps": 500