Skills-First Architecture

Skills-First Architecture

Themis structures every AI agent feature into three layers: skills (domain knowledge), output tools (structured contracts), and workflows (lifecycle orchestration). Skills are the foundation — they carry the judgment, reasoning, and expertise that make agents effective.

The Three Layers

+--------------------------------------------------+
|  LAYER 1: SKILL                                  |
|  Owns: persona, process, judgment, domain knowledge |
|  Sources: file-based (.claude/skills/, lib/skills/) |
|           DB-backed (Skill model, 3 scopes)       |
|           prompts (app/prompts/)                  |
+--------------------------------------------------+
                       |
                 agent calls tool
                       |
                       v
+--------------------------------------------------+
|  LAYER 2: OUTPUT TOOL                            |
|  Owns: structured data contract, side effects    |
|  Lives in: app/services/*_tool_builder.rb        |
|  Built with: ClaudeAgentSDK.create_tool()        |
+--------------------------------------------------+
                       |
                 returns result
                       |
                       v
+--------------------------------------------------+
|  LAYER 3: WORKFLOW                               |
|  Owns: lifecycle orchestration only              |
|  Lives in: app/services/workflows/               |
|  Target: under 50 lines                          |
+--------------------------------------------------+

Layer 1: Skills

Skills own all judgment. They tell the agent what to do, how to reason, and when to use its tools. Themis supports three skill sources that work together to give agents comprehensive domain knowledge.

File-Based Skills (Codebase)

Skills checked into the repository as markdown files with a SKILL.md manifest. Two directories serve different purposes:

.claude/skills/ — Themis-internal skills. Architecture guides, coding conventions, review methodology, integration helpers. These stay in the Themis repo and are auto-discovered by the Claude Agent SDK.
lib/skills/ — Portable skills. Copied into target project worktrees during code generation so the agent carries cross-project standards (e.g., code-quality/ with Rails conventions and security checklists).

Each skill is a directory containing a SKILL.md (YAML frontmatter + markdown) and optional supplementary files:

.claude/skills/understanding-themis/
  SKILL.md          # Manifest with name, description, content
  HOTWIRE.md        # Supplementary reference (optional)

DB-Backed Skills (Skill Model)

User-created skills stored in the database with Active Storage file attachments. Managed through the web UI or via agent tools during chat. Three scopes control visibility:

Scope	Owned By	Visible To	Use Case
System	Admin	All users, all spaces	Organization-wide standards
Space	Space	All space members	Team-specific knowledge
Personal	User	Owner only	Individual preferences and workflows

DB skills are extracted to disk by SkillExtractor before each agent run, cached per scope with atomic directory replacement. The agent discovers them through the same .claude/skills/ directory convention.

SkillExtractor.prepare_for_agent(space:, user:)
  → Queries Skill.available_for(user, space)
  → Extracts to cache dirs (system / space / personal)
  → Returns add_dirs array for SDK options

Agents can also create and update their own personal skills during conversation via SkillToolBuilder tools (create_skill, update_skill, list_my_skills).

Agent Prompts

Static and dynamic prompt files that provide workflow-specific instructions:

Static prompts (.md) — When the skill needs no runtime context. Example: pr_review.md with review process, quality standards, and verdict criteria.
Dynamic prompts (.md.erb) — ERB templates that inject runtime data. Example: base_agent.md.erb renders per-space context like agent identity and available channels.

Prompts live in app/prompts/. Load via PromptLoader.load("name") (static) or PromptLoader.render("name", locals) (dynamic).

Prompts define process and judgment but do not describe output format — that responsibility belongs to the output tool schema.

How Skills Reach the Agent

ChatJob / ChannelMentionJob
  │
  ├─ System prompt ← PromptLoader (app/prompts/)
  │
  └─ add_dirs ← SkillExtractor
       ├─ File-based skills (.claude/skills/)
       └─ DB skills (extracted to cache)
              ├─ system/
              ├─ {space_id}/space/
              └─ {space_id}/personal/{user_id}/

The Claude Agent SDK scans add_dirs for SKILL.md files and makes them available to the agent automatically. Skills are togglable per space via the feature_skills setting.

Layer 2: Output Tools

Output tools define the structured contract between agent and system. Instead of asking the agent to produce parseable text (fragile), we give it a tool to call with typed arguments.

Wiring tools to callers is the next step after defining them. Each agent caller (full agent, web/API chat, messaging) opts into a set of tool groups via the Tool Catalog — adding a tool to a new caller is one declarative change, not edits across four files.

Tool builders live in app/services/*_tool_builder.rb and use ClaudeAgentSDK.create_tool():

class PRReviewToolBuilder
  def self.build_submit_review_tool(review:, space:)
    ClaudeAgentSDK.create_tool(
      "submit_review",
      "Submit your completed code review.",
      {
        type: "object",
        properties: {
          verdict: { type: "string", enum: %w[APPROVE REQUEST_CHANGES COMMENT] },
          summary: { type: "string", description: "Markdown review summary" },
          comments: {
            type: "array",
            items: {
              type: "object",
              properties: {
                path: { type: "string" },
                line: { type: "integer" },
                body: { type: "string" }
              },
              required: %w[path line body]
            }
          }
        },
        required: %w[verdict summary]
      }
    ) do |args|
      # Side effects: submit to GitHub, update review record
    end
  end
end

The schema is the format specification. The agent sees it in its tool list and knows exactly what to produce. No prompt budget wasted on output format instructions.

Current Tool Builders

Grouped by purpose. All tools are wired to agent callers via the Tool Catalog.

Workflow output contracts

Builder	Key Tools	Purpose
`PRReviewToolBuilder`	`get_pr_info`, `get_pr_diff`, `get_pr_comments`, `get_ci_status`, `submit_review`	PR review workflow output
`CodeGenerationResultToolBuilder`	`submit_code_generation_result`	PR metadata from code gen
`AutomationToolBuilder`	`skip_message`	Automation skip decisions

Triggers (factory-wired)

Builder	Key Tools	Purpose
`PRReviewTriggerToolBuilder`	`trigger_pr_review`	Enqueue a PR review from chat / mention
`CodeGenerationToolBuilder`	`trigger_code_generation`	Enqueue code generation from chat / mention

Data access

Builder	Key Tools	Purpose
`GithubToolBuilder`	`get_pr_info`, `get_pr_diff`, `get_pr_comments`, `get_ci_status`, `list_pull_requests`, `post_pr_comment`	GitHub direct-API access in chat / mention contexts
`ChatHistoryToolBuilder`	`search_conversations`, `recall_conversation`	On-demand conversation history
`RepoSearchToolBuilder`	`resolve_repo_path`	Browse local git worktrees
`ThemisQueryToolBuilder`	`query_themis_data`	Themis DB queries (editable_by? gate)
`GoogleDriveProxyToolBuilder`	proxied Google Drive read tools	Per-user OAuth-scoped Drive access

Side effects

Builder	Key Tools	Purpose
`SentryToolBuilder`	`update_sentry_issue`	Sentry status + assignment
`MemoryToolBuilder`	`save_memory`, `delete_memory`	Per-user memory store

Chat UX

Builder	Key Tools	Purpose
`AskUserQuestionHook` (PreToolUse)	`AskUserQuestion` (native built-in)	Structured clarifying questions — intercepted, not built as an MCP tool
`FileToolBuilder`	`create_file`	Agent-generated file downloads
`ShowWidgetToolBuilder`	`show_widget`	Sandboxed HTML widgets (D3, Mermaid, SVG)
`ShowChartToolBuilder`	`show_chart`	Structured Chart.js rendering
`ImageGenerationToolBuilder`	`generate_image`	Gemini image generation

Resource management

Builder	Key Tools	Purpose
`SkillToolBuilder`	13 tools — CRUD, file ops, checkout/checkin	Agent-driven skill management
`AutomationChatToolBuilder`	`create_automation`, `update_automation`, `list_my_automations`, `delete_automation`	Agent-driven automation management

Layer 3: Workflows

The workflow is thin glue. It creates records, starts the agent, handles errors, and updates status. It should contain zero judgment and zero parsing.

All workflows inherit from Workflows::BaseWorkflow and implement #execute. The base class provides #run_agent(prompt:, system_prompt:, model:, max_turns:).

module Workflows
  class FeatureWorkflow < BaseWorkflow
    def execute(input:)
      record = FeatureRecord.create!(input: input, status: "running")

      begin
        result = run_agent(
          prompt: build_prompt(input),
          system_prompt: PromptLoader.load("feature_name")
        )
        record.complete!(result)
      rescue => e
        record.fail!(e.message)
        raise
      end
    end
  end
end

If your workflow is doing regex parsing, JSON extraction, or business logic — something is in the wrong layer.

Decision Framework

Question	Answer	Layer
Does it involve judgment, reasoning, or domain knowledge?	Move it to the skill	Skill
Does it define a structured data exchange or produce side effects?	Make it an output tool	Output Tool
Does it manage record lifecycle, error recovery, or orchestration?	Keep it in the workflow	Workflow
Are you parsing LLM free-text into structured data?	You’re doing it wrong	Refactor to Output Tool
Are you writing prompt instructions about output format?	The tool schema should handle this	Refactor to Output Tool
Is the workflow over 50 lines?	Something is in the wrong layer	Audit and redistribute

Adding a New Feature

Step 1: Write the skill

Decide where the skill lives based on its purpose:

Agent prompt (app/prompts/) — Workflow-specific instructions. Use .md for static, .md.erb for dynamic context.
Codebase skill (.claude/skills/) — Reusable domain knowledge shared across workflows.
DB skill — User-configurable knowledge managed through the UI.

Focus on: persona, process, judgment criteria, domain knowledge. Do not describe output format.

Step 2: Define the output contract as an SDK tool

Create app/services/feature_tool_builder.rb. The tool schema defines what the agent produces. The handler executes side effects.

class FeatureToolBuilder
  def self.build_submit_tool(record:, space:)
    ClaudeAgentSDK.create_tool(
      "submit_result",
      "Submit your analysis results.",
      { type: "object", properties: { ... }, required: %w[...] }
    ) do |args|
      # Execute side effects, return confirmation
    end
  end
end

Step 3: Write the workflow as thin glue

Create app/services/workflows/feature_workflow.rb. It should only create records, build prompts, call run_agent, handle errors, and update status. Target under 50 lines.

Step 4: Wire into the job

Create app/jobs/feature_job.rb. The job builds options (model, MCP servers, tools, skill dirs), instantiates the workflow, and calls execute. Use the Tool Catalog to opt into tool groups instead of hand-rolling mcp_servers / allowed_tools lists.

class FeatureJob < ApplicationJob
  def perform(record_id)
    record = FeatureRecord.find(record_id)
    options = build_options(record)
    workflow = Workflows::FeatureWorkflow.new(options: options)
    workflow.execute(record: record)
  end
end

Anti-Patterns

Anti-pattern	Why it’s wrong	Fix
Parsing JSON from agent free-text	Fragile: breaks on markdown fences, extra text, formatting variations	Define an output tool with typed arguments
Prompt instructions about output format	Wastes token budget on rules the agent may ignore; duplicates the contract	The tool schema is the format
Business logic in the workflow	Couples orchestration to domain logic; makes workflows fat	Move judgment to skills, data contracts to tools
Huge prompts with no ERB	Cannot inject runtime context (user preferences, project settings)	Use `.md.erb` with locals for dynamic sections
Multiple output tools per workflow	Confuses the agent about which tool to call	One primary output tool per workflow
Tool handler with complex business logic	Hard to test, tightly coupled to infrastructure	Keep handlers thin: validate, side effect, confirm

Workflow Maturity

Maturity tracks how cleanly each workflow separates skills, output tools, and orchestration. Tool wiring is now uniform across all workflows via the Tool Catalog regardless of maturity level.

Workflow	Maturity	Output Contract
PRReview	High	`submit_review` SDK tool handles GitHub submission
Mention	High	Agent uses MCP tools directly (GitHub, Linear comments)
Automation	High	`skip_message` tool for skip decisions; delivery via `AutomationMessageDeliveryService`
CodeGeneration	Medium	`submit_code_generation_result` tool exists; PR metadata partially parsed

Maturity levels:

High — Prompt owns judgment, tools own contracts, workflow is thin lifecycle glue.
Medium — Partially follows the pattern. Some parsing or format instructions remain.
Low — Judgment, parsing, and orchestration tangled in the workflow.