Implementing Hierarchical Program Groups In Supervisor

Alex Johnson
-
Implementing Hierarchical Program Groups In Supervisor

Supervisor is a process management system that allows users to monitor and control a number of processes on Unix-like operating systems. This article discusses the proposal to add first-class multi-group support to Supervisor, enabling users to define hierarchical groups of programs, manage them via supervisorctl with tree views and bulk actions, and leverage shell autocompletion for group paths and commands—all while maintaining backward compatibility.

Goal

The primary goal is to enhance Supervisor with first-class multi-group support. This enhancement will empower users to organize programs into hierarchical groups, providing a more structured and manageable approach to process management. The new system should allow for easy navigation and control through supervisorctl, including tree views for visualizing the hierarchy and bulk actions for efficient management. Crucially, these changes must be implemented without disrupting existing configurations, ensuring a seamless transition for current users.

Deliverables

The project's deliverables encompass a comprehensive set of features and improvements:

  1. Config language extensions (INI): A clear and backward-compatible syntax for defining nested groups.
  2. Parser & model changes: Modifications to represent the hierarchy as a tree structure, including Program, Group, and MultiGroup nodes.
  3. Runtime & RPC changes: Adjustments to ensure hierarchical start/stop/status operations map cleanly to existing process control with minimal disruption.
  4. CLI UX: New supervisorctl subcommands and flags (e.g., --groups) to display the tree and operate on hierarchical paths.
  5. Autocompletion: Shell autocompletion scripts for bash, zsh, and fish, covering group paths and commands.
  6. Migration tooling: Tools and warnings for handling legacy [group:x] configurations.
  7. Docs & manpages: Updated documentation and manpages with examples.
  8. Tests: Unit and integration tests, along with sample configurations.

Requirements

The implementation must adhere to several key requirements to ensure functionality, usability, and compatibility.

1) Config Syntax (INI)

The configuration syntax needs to be both flexible and intuitive, allowing users to define complex hierarchies while maintaining readability and backward compatibility. Here’s a breakdown of the syntax requirements:

  • Keep existing forms: The existing syntax for defining programs and groups should remain valid. This ensures that current configurations continue to work without modification.

    [program:web]
    command=/srv/web
    
    [group:frontend]
    programs=web,assets
    
  • Add hierarchical groups: Introduce a new section type, [multigroup:<name>], to define hierarchical groups. These multi-groups can contain both groups and programs, allowing for nested structures.

    [multigroup:prod]
    groups=frontend,backend,workers
    
    [group:backend]
    programs=api,admin
    
    [group:workers]
    programs=rq_worker_a,rq_worker_b
    
  • Allow inline nested composition: Implement dotted names as an optional shorthand for defining nested groups. This sugar syntax simplifies the creation of nested hierarchies.

    [group:frontend.assets]
    programs=webpack,css-min
    
    [group:frontend.web]
    programs=nginx,app
    

    The dotted form should be equivalent to:

    • Create (or merge into) a virtual multigroup frontend
    • Contain child groups assets and web.
  • Support path addressing: Enable referencing programs and groups using paths, which are case-sensitive canonical names. This allows for precise targeting of specific processes or groups within the hierarchy.

    • Program path: frontend.web:nginx
    • Group path: frontend.web
    • MultiGroup path: prod
  • Support globs: Allow the use of globs in CLI operations to perform actions on multiple targets simultaneously. This enhances the efficiency of managing large numbers of processes.

    • frontend.* (all direct children groups)
    • prod.** (recursive)
    • **:rq_worker_* (all programs named like that anywhere)
  • Validation: Implement validation checks to ensure the integrity of the configuration.

    • Detect cycles to prevent infinite loops.
    • Disallow duplicate child names within the same parent.
    • Error on ambiguous dotted merges that change a node’s kind (group vs multigroup).

2) Data Model & Parser

The data model must efficiently represent the hierarchical structure and provide the necessary functions for traversing and manipulating it. Key aspects include:

  • Internal classes: Define classes to represent the different node types in the hierarchy.

    • ProgramNode(name, process_spec, …)
    • GroupNode(name, children=[ProgramNode]) (flat)
    • MultiGroupNode(name, children=[GroupNode|MultiGroupNode|ProgramNode])
  • Build a tree: Construct a tree structure with a single virtual root to represent the entire hierarchy. This simplifies traversal and management.

  • Provide functions: Implement functions for path resolution and hierarchy traversal.

    • resolve_path(path) -> Node|List[Node]
    • iter_descendants(node, depth, predicate)
    • expand_glob(pattern) -> List[Node]
  • Ensure backward compatibility: The model must seamlessly integrate with existing configurations.

    • A legacy [group:x] is a GroupNode directly under the root.
    • Legacy programs= remains valid.

3) Control Semantics

The control semantics define how operations propagate through the hierarchy. The key principles are depth-first propagation and short-circuit error reporting. This approach ensures that operations are applied consistently across the hierarchy, and errors are reported promptly.

  • Operations must propagate depth-first with short-circuit error reporting:

    • start node: Starts all descendant programs not yet running.
    • stop node: Stops all descendants.
    • restart node: Stops then starts, preserving original process group concurrency rules.
  • Concurrency: Maintain existing concurrency semantics while adding optional parallelism at the group level. This allows users to fine-tune the execution behavior of their processes.

    • Maintain existing startsecs, retries, and priority semantics.
    • Add optional parallel= int on (multi)groups; default serial within that node.
  • Partial failures: Return structured results per node and aggregate exit status. This provides detailed feedback on the outcome of operations, even in the case of partial failures.

4) RPC / XML-RPC API

The RPC API needs to be extended to support the new hierarchical structure while maintaining compatibility with existing clients. The new API methods will allow external systems to interact with the Supervisor hierarchy.

  • Add new methods (keeping old ones):

    • getHierarchy() → JSON-like tree (names, types, state).
    • controlByPath(action, paths, recursive=true, parallelism=null) → per-node result list.
    • listPaths(kind=('any'|'program'|'group'|'multigroup')).
  • Keep existing startProcess, startGroup, etc., mapping them internally to paths under the root for compatibility. This ensures that existing clients continue to function as expected.

5) supervisorctl UX

The supervisorctl command-line interface is the primary tool for interacting with Supervisor. The user experience must be intuitive and efficient, allowing users to easily navigate and manage the hierarchical structure. Enhancements include a new tree view for visualizing the hierarchy and path-based commands for precise control.

  • New tree view:

    • supervisorctl --groups or supervisorctl groups prints the hierarchy:

      prod
      ├─ frontend
      │  ├─ web
      │  │  ├─ nginx                RUNNING pid 1234
      │  │  └─ app                  STARTING
      │  └─ assets
      │     ├─ webpack              STOPPED
      │     └─ css-min              RUNNING pid 5678
      └─ backend
         ├─ api                     RUNNING pid 2222
         └─ admin                   RUNNING pid 3333
      
    • Options:

      • --depth=N (default: full)
      • --state (filter by state)
      • --json (machine-readable)
  • Path-based commands: Allow users to specify targets using paths, enabling precise control over specific processes or groups.

    • supervisorctl start prod.backend
    • supervisorctl stop frontend.web:nginx
    • supervisorctl restart prod.**
    • supervisorctl status frontend.*
  • Command grammar: Update the help and manpage to reflect the new commands and options.

    supervisorctl (start|stop|restart|status) <path|glob>...
    supervisorctl groups [--depth N] [--json]
    supervisorctl paths [--kind any|program|group|multigroup]
    
  • Output should align in columns and include per-leaf status, providing a clear and consistent display of process information.

6) Shell Autocompletion

Shell autocompletion significantly improves the user experience by reducing typing errors and speeding up command entry. Autocompletion scripts for bash, zsh, and fish will be provided, covering group paths and commands.

  • Provide scripts:

    • contrib/completion/supervisorctl.{bash,zsh,fish}
  • Completion behavior:

    • After verb (start|stop|restart|status), complete paths and globs from listPaths.
    • Support dot-navigation: typing front<TAB> suggests frontend, then frontend. suggests children, frontend.web: suggests program names.
    • Complete flags for groups, paths.

7) Migration & Compatibility

Maintaining backward compatibility is crucial to ensure a smooth transition for existing users. The new features must integrate seamlessly with existing configurations, and migration tools should be provided to assist users in adopting the new hierarchical structure.

  • No breaking changes to existing configs.

  • If dotted group names imply a multigroup, emit an INFO log explaining the desugaring. This provides transparency and helps users understand how their configurations are being interpreted.

  • Provide a supervisorctl migrate-config --dry-run:

    • Reads the current config and emits canonical multi-group sections, allowing users to preview the changes before applying them.

8) Documentation

Comprehensive documentation is essential for users to understand and utilize the new features effectively. The documentation should include examples and clear explanations of the new syntax and commands.

  • Update README, configuration docs, and manpages:

    • “Hierarchical Groups” section with examples.
    • Path addressing & globs.
    • CLI tree output examples.
    • Autocompletion installation snippets.

9) Testing

Thorough testing is critical to ensure the reliability and stability of the new features. Unit tests and integration tests will be used to verify the functionality of the parser, path resolution, control semantics, and CLI commands. Back-compat tests will ensure that existing configurations continue to work as expected.

  • Unit tests:

    • Parser (dotted vs explicit, cycles, errors).
    • Path resolution & globbing (*, **, suffix :prog).
    • Control semantics (propagation, parallelism, partial failures).
  • Integration tests:

    • Spawn dummy programs; verify start/stop order and outcomes.
    • CLI tree rendering (golden tests).
    • Autocompletion script basic checks.
  • Back-compat tests:

    • Existing [group:x] configs run unchanged.

10) Implementation Notes

These notes provide guidance on the implementation approach, focusing on performance, thread safety, and integration with the existing Supervisor architecture.

  • Keep parsing in the existing config parser; introduce a normalization step that constructs the tree before daemon start. This minimizes disruption to the existing parsing logic.
  • Store a stable path on each node (computed once). This improves performance by avoiding repeated path computations.
  • For performance, cache listPaths and invalidate on config reload. This reduces the overhead of path lookups.
  • Ensure thread-safety in control operations; maintain Supervisor’s event model. This is crucial for the stability and reliability of the system.

Example Config & Commands

To illustrate the usage of the new features, consider the following example configuration and command sequences.

Config

[program:nginx]
command=/usr/sbin/nginx -g "daemon off;"

[program:app]
command=/srv/app/bin/start

[program:webpack]
command=/srv/frontend/bin/webpack

[program:css-min]
command=/srv/frontend/bin/css-min

[multigroup:prod]
groups=frontend,backend

[group:frontend.web]
programs=nginx,app

[group:frontend.assets]
programs=webpack,css-min

[group:backend]
programs=api,admin

[program:api]
command=/srv/api/bin/start

[program:admin]
command=/srv/admin/bin/start

CLI

# View hierarchy
supervisorctl groups --depth 2

# Start a subtree
supervisorctl start prod.backend

# Restart everything under frontend, recursively
supervisorctl restart frontend.**

# Stop a single program via path
supervisorctl stop frontend.web:nginx

# Machine-readable inventory
supervisorctl groups --json > hierarchy.json

Acceptance Criteria

The following acceptance criteria must be met to ensure the successful implementation of the hierarchical groups feature:

  • Existing non-hierarchical configs behave identically.
  • supervisorctl groups prints a correct tree for mixed multigroup/group/program configs.
  • Path/glob addressing works for start/stop/restart/status with clear per-node results.
  • Autocompletion suggests valid paths at each dot/colon step and supports globs.
  • RPC exposes the hierarchy and path operations; legacy RPC unchanged.
  • Docs and manpages reflect all new features with examples.

Implementation Steps

The implementation should be structured into clear, manageable steps with clean commits.

Implement the feature end-to-end with clean commits:

  1. parser/model + tests,
  2. RPC + tests,
  3. CLI + autocompletion + tests,
  4. docs,
  5. migration tool.

Where reasonable, include small, self-contained code snippets and TODOs for integration points.

Conclusion

Implementing hierarchical groups in Supervisor represents a significant enhancement to the system's capabilities. By providing a structured way to organize and manage processes, this feature will improve usability and efficiency for Supervisor users. The detailed requirements, acceptance criteria, and implementation steps outlined in this article provide a solid foundation for a successful implementation. For further reading on process management and Supervisor, you may find the official Supervisor documentation a valuable resource: Supervisor Documentation.

You may also like