The Inheritance Problem: Why Sequoia's Taxonomy Exposes Deep Tensions in Continual Learning Research
Continual Learning remains fractured. Despite decades of research into algorithms that learn sequentially without forgetting, the field lacks consensus on even basic evaluation protocols. Task incremental, class incremental, domain incremental, online, offline, with replay, without replay; the permutations seem endless. Each subcommunity optimizes for slightly different constraints, making cross method comparison an exercise in apples to oranges equivocation.
Into this fragmented landscape steps Sequoia, a software framework proposed in "Sequoia: A Software Framework to Unify Continual Learning Research" that attempts to impose order through a hierarchical taxonomy of experimental settings. By modeling continual learning scenarios as a tree of assumptions, where parent nodes represent general nonstationary environments and children represent increasingly constrained special cases, Sequoia promises a unified codebase where methods automatically inherit applicability down the hierarchy. Solve the general case, the logic suggests, and you solve all specific instances below it.
Yet this elegant structure reveals a fundamental tension. As the framework makes explicit the logical relationships between settings, it also forces us to confront uncomfortable questions about the direction of research progress. Does biological intelligence actually proceed from general to specific? Can we meaningfully compare performance across disparate branches of the tree? And most critically, should we optimize for worst case nonstationarity or for structured, average case exploitation?
The Taxonomy as Logical Infrastructure
Sequoia's core innovation lies in formalizing continual learning settings as sets of assumptions about data distribution and task structure. In this ontology, a setting is not merely a benchmark or dataset, but a rigorous specification of environmental constraints. Does the agent know when tasks switch? Is the data i.i.d. within tasks? Can the agent store raw samples? Each assumption acts as a branching point in a decision tree.
The resulting hierarchy places the most general setting, nonstationary reinforcement learning without task boundaries or replay buffers, at the root. From there, the tree branches into Continual Supervised Learning and Continual Reinforcement Learning domains, then further subdivides into task incremental learning, class incremental learning, domain incremental learning, and other specialized variants. The technical implementation uses object oriented inheritance, allowing methods designed for parent nodes to automatically function on child nodes by satisfying the superset of constraints.
This architectural choice has practical merit. Researchers can implement a method once at an appropriate level of generality, and Sequoia handles the boilerplate of adapting it to more constrained scenarios. The framework includes a growing suite of baseline methods and integrates external libraries, creating a common substrate for empirical comparison. For a field plagued by inconsistent baselines and hidden implementation details, this standardization offers genuine utility.
The Directionality Dilemma
However, the inheritance structure assumes that capability flows downward. A method solving the general nonstationary RL setting should, in theory, solve task incremental learning for free. This assumption warrants scrutiny.
Biological learning systems do not operate this way. Evolution produces specialized circuits through developmental refinement, not through the application of general solutions to specific cases. The mammalian visual cortex exploits structure specific regularities in natural scenes; it does not implement a general purpose perception algorithm that happens to work for vision. Similarly, effective continual learning algorithms often rely on inductive biases tailored to specific types of distribution shift. A method designed for gradual domain drift may fail catastrophically in abrupt task switching scenarios, even if the latter is technically a special case of the former.
The tree structure therefore risks encoding a fallacy of misplaced generality. By privileging the root node as the target of optimization, Sequoia implicitly suggests that the ideal continual learning system is a universal plasticity stability algorithm capable of handling arbitrary nonstationarity. Yet practical intelligence may require the opposite approach: a repertoire of specialized mechanisms that detect regime changes and deploy appropriate learning rates, architectural modifications, or memory strategies. In this view, the child nodes are not simplified versions of the parent, but distinct ecological niches requiring distinct adaptations.
The Metrics Problem
Beyond directionality, Sequoia exposes the lack of universal evaluation criteria across the tree. The framework provides the infrastructure for running experiments across settings, but it cannot resolve whether a method's performance on task incremental learning predicts its utility in class incremental learning, let alone in general nonstationary RL.
Current continual learning metrics focus primarily on average accuracy, forgetting rates, and forward transfer, measured against held out test sets. Yet these metrics assume stationary evaluation distributions within tasks, an assumption that breaks down in the general nonstationary case. A method might achieve perfect accuracy on class incremental ImageNet while failing completely on a robotic manipulation task with continuous physical parameter drift. The tree structure makes these incommensurabilities visible, but does not provide a common currency for comparison.
This raises the specter of incomparability. If settings lower in the tree are not merely simplified versions but ontologically distinct problem classes, then the inheritance mechanism becomes misleading. Researchers might waste computational resources adapting general methods to specialized settings where simpler, structure specific approaches would outperform them. The framework elegantly exposes our fragmentation, yet it cannot answer whether a universal plasticity stability metric even exists across the tree.
Your Take: Worst Case versus Average Case Optimization
The deeper issue illuminated by Sequoia concerns the objective function we should optimize. The field currently vacillates between two incompatible goals. On one hand, we seek worst case robustness: algorithms that maintain performance under arbitrary distribution shifts, adversarial task orderings, and minimal task boundary information. This aligns with the root node optimization strategy. On the other hand,