Skills-First Architecture

Themis structures every AI agent feature into three layers: skills (domain knowledge), output tools (structured contracts), and workflows (lifecycle orchestration). Skills are the foundation — they carry the judgment, reasoning, and expertise that make agents effective.

The Three Layers

+--------------------------------------------------+
|  LAYER 1: SKILL                                  |
|  Owns: persona, process, judgment, domain knowledge |
|  Sources: file-based (.claude/skills/, lib/skills/) |
|           DB-backed (Skill model, 3 scopes)       |
|           prompts (app/prompts/)                  |
+--------------------------------------------------+
                       |
                 agent calls tool
                       |
                       v
+--------------------------------------------------+
|  LAYER 2: OUTPUT TOOL                            |
|  Owns: structured data contract, side effects    |
|  Lives in: app/services/*_tool_builder.rb        |
|  Built with: ClaudeAgentSDK.create_tool()        |
+--------------------------------------------------+
                       |
                 returns result
                       |
                       v
+--------------------------------------------------+
|  LAYER 3: WORKFLOW                               |
|  Owns: lifecycle orchestration only              |
|  Lives in: app/services/workflows/               |
|  Target: under 50 lines                          |
+--------------------------------------------------+

Layer 1: Skills

Skills own all judgment. They tell the agent what to do, how to reason, and when to use its tools. Themis supports three skill sources that work together to give agents comprehensive domain knowledge.

File-Based Skills (Codebase)

Skills checked into the repository as markdown files with a SKILL.md manifest. Two directories serve different purposes:

  • .claude/skills/ — Themis-internal skills. Architecture guides, coding conventions, review methodology, integration helpers. These stay in the Themis repo and are auto-discovered by the Claude Agent SDK.
  • lib/skills/ — Portable skills. Copied into target project worktrees during code generation so the agent carries cross-project standards (e.g., code-quality/ with Rails conventions and security checklists).

Each skill is a directory containing a SKILL.md (YAML frontmatter + markdown) and optional supplementary files:

.claude/skills/understanding-themis/
  SKILL.md          # Manifest with name, description, content
  HOTWIRE.md        # Supplementary reference (optional)

DB-Backed Skills (Skill Model)

User-created skills stored in the database with Active Storage file attachments. Managed through the web UI or via agent tools during chat. Three scopes control visibility:

ScopeOwned ByVisible ToUse Case
SystemAdminAll users, all spacesOrganization-wide standards
SpaceSpaceAll space membersTeam-specific knowledge
PersonalUserOwner onlyIndividual preferences and workflows

DB skills are extracted to disk by SkillExtractor before each agent run, cached per scope with atomic directory replacement. The agent discovers them through the same .claude/skills/ directory convention.

SkillExtractor.prepare_for_agent(space:, user:)
  → Queries Skill.available_for(user, space)
  → Extracts to cache dirs (system / space / personal)
  → Returns add_dirs array for SDK options

Agents can also create and update their own personal skills during conversation via SkillToolBuilder tools (create_skill, update_skill, list_my_skills).

Agent Prompts

Static and dynamic prompt files that provide workflow-specific instructions:

  • Static prompts (.md) — When the skill needs no runtime context. Example: pr_review.md with review process, quality standards, and verdict criteria.
  • Dynamic prompts (.md.erb) — ERB templates that inject runtime data. Example: base_agent.md.erb renders per-space context like agent identity and available channels.

Prompts live in app/prompts/. Load via PromptLoader.load("name") (static) or PromptLoader.render("name", locals) (dynamic).

Prompts define process and judgment but do not describe output format — that responsibility belongs to the output tool schema.

How Skills Reach the Agent

ChatJob / ChannelMentionJob
  │
  ├─ System prompt ← PromptLoader (app/prompts/)
  │
  └─ add_dirs ← SkillExtractor
       ├─ File-based skills (.claude/skills/)
       └─ DB skills (extracted to cache)
              ├─ system/
              ├─ {space_id}/space/
              └─ {space_id}/personal/{user_id}/

The Claude Agent SDK scans add_dirs for SKILL.md files and makes them available to the agent automatically. Skills are togglable per space via the feature_skills setting.

Layer 2: Output Tools

Output tools define the structured contract between agent and system. Instead of asking the agent to produce parseable text (fragile), we give it a tool to call with typed arguments.

Wiring tools to callers is the next step after defining them. Each agent caller (full agent, web/API chat, messaging) opts into a set of tool groups via the Tool Catalog — adding a tool to a new caller is one declarative change, not edits across four files.

Tool builders live in app/services/*_tool_builder.rb and use ClaudeAgentSDK.create_tool():

class PRReviewToolBuilder
  def self.build_submit_review_tool(review:, space:)
    ClaudeAgentSDK.create_tool(
      "submit_review",
      "Submit your completed code review.",
      {
        type: "object",
        properties: {
          verdict: { type: "string", enum: %w[APPROVE REQUEST_CHANGES COMMENT] },
          summary: { type: "string", description: "Markdown review summary" },
          comments: {
            type: "array",
            items: {
              type: "object",
              properties: {
                path: { type: "string" },
                line: { type: "integer" },
                body: { type: "string" }
              },
              required: %w[path line body]
            }
          }
        },
        required: %w[verdict summary]
      }
    ) do |args|
      # Side effects: submit to GitHub, update review record
    end
  end
end

The schema is the format specification. The agent sees it in its tool list and knows exactly what to produce. No prompt budget wasted on output format instructions.

Current Tool Builders

Grouped by purpose. All tools are wired to agent callers via the Tool Catalog.

Workflow output contracts

BuilderKey ToolsPurpose
PRReviewToolBuilderget_pr_info, get_pr_diff, get_pr_comments, get_ci_status, submit_reviewPR review workflow output
CodeGenerationResultToolBuildersubmit_code_generation_resultPR metadata from code gen
AutomationToolBuilderskip_messageAutomation skip decisions

Triggers (factory-wired)

BuilderKey ToolsPurpose
PRReviewTriggerToolBuildertrigger_pr_reviewEnqueue a PR review from chat / mention
CodeGenerationToolBuildertrigger_code_generationEnqueue code generation from chat / mention

Data access

BuilderKey ToolsPurpose
GithubToolBuilderget_pr_info, get_pr_diff, get_pr_comments, get_ci_status, list_pull_requests, post_pr_commentGitHub direct-API access in chat / mention contexts
ChatHistoryToolBuildersearch_conversations, recall_conversationOn-demand conversation history
RepoSearchToolBuilderresolve_repo_pathBrowse local git worktrees
ThemisQueryToolBuilderquery_themis_dataThemis DB queries (editable_by? gate)
GoogleDriveProxyToolBuilderproxied Google Drive read toolsPer-user OAuth-scoped Drive access

Side effects

BuilderKey ToolsPurpose
SentryToolBuilderupdate_sentry_issueSentry status + assignment
MemoryToolBuildersave_memory, delete_memoryPer-user memory store

Chat UX

BuilderKey ToolsPurpose
AskUserQuestionHook (PreToolUse)AskUserQuestion (native built-in)Structured clarifying questions — intercepted, not built as an MCP tool
FileToolBuildercreate_fileAgent-generated file downloads
ShowWidgetToolBuildershow_widgetSandboxed HTML widgets (D3, Mermaid, SVG)
ShowChartToolBuildershow_chartStructured Chart.js rendering
ImageGenerationToolBuildergenerate_imageGemini image generation

Resource management

BuilderKey ToolsPurpose
SkillToolBuilder13 tools — CRUD, file ops, checkout/checkinAgent-driven skill management
AutomationChatToolBuildercreate_automation, update_automation, list_my_automations, delete_automationAgent-driven automation management

Layer 3: Workflows

The workflow is thin glue. It creates records, starts the agent, handles errors, and updates status. It should contain zero judgment and zero parsing.

All workflows inherit from Workflows::BaseWorkflow and implement #execute. The base class provides #run_agent(prompt:, system_prompt:, model:, max_turns:).

module Workflows
  class FeatureWorkflow < BaseWorkflow
    def execute(input:)
      record = FeatureRecord.create!(input: input, status: "running")

      begin
        result = run_agent(
          prompt: build_prompt(input),
          system_prompt: PromptLoader.load("feature_name")
        )
        record.complete!(result)
      rescue => e
        record.fail!(e.message)
        raise
      end
    end
  end
end

If your workflow is doing regex parsing, JSON extraction, or business logic — something is in the wrong layer.

Decision Framework

QuestionAnswerLayer
Does it involve judgment, reasoning, or domain knowledge?Move it to the skillSkill
Does it define a structured data exchange or produce side effects?Make it an output toolOutput Tool
Does it manage record lifecycle, error recovery, or orchestration?Keep it in the workflowWorkflow
Are you parsing LLM free-text into structured data?You’re doing it wrongRefactor to Output Tool
Are you writing prompt instructions about output format?The tool schema should handle thisRefactor to Output Tool
Is the workflow over 50 lines?Something is in the wrong layerAudit and redistribute

Adding a New Feature

Step 1: Write the skill

Decide where the skill lives based on its purpose:

  • Agent prompt (app/prompts/) — Workflow-specific instructions. Use .md for static, .md.erb for dynamic context.
  • Codebase skill (.claude/skills/) — Reusable domain knowledge shared across workflows.
  • DB skill — User-configurable knowledge managed through the UI.

Focus on: persona, process, judgment criteria, domain knowledge. Do not describe output format.

Step 2: Define the output contract as an SDK tool

Create app/services/feature_tool_builder.rb. The tool schema defines what the agent produces. The handler executes side effects.

class FeatureToolBuilder
  def self.build_submit_tool(record:, space:)
    ClaudeAgentSDK.create_tool(
      "submit_result",
      "Submit your analysis results.",
      { type: "object", properties: { ... }, required: %w[...] }
    ) do |args|
      # Execute side effects, return confirmation
    end
  end
end

Step 3: Write the workflow as thin glue

Create app/services/workflows/feature_workflow.rb. It should only create records, build prompts, call run_agent, handle errors, and update status. Target under 50 lines.

Step 4: Wire into the job

Create app/jobs/feature_job.rb. The job builds options (model, MCP servers, tools, skill dirs), instantiates the workflow, and calls execute. Use the Tool Catalog to opt into tool groups instead of hand-rolling mcp_servers / allowed_tools lists.

class FeatureJob < ApplicationJob
  def perform(record_id)
    record = FeatureRecord.find(record_id)
    options = build_options(record)
    workflow = Workflows::FeatureWorkflow.new(options: options)
    workflow.execute(record: record)
  end
end

Anti-Patterns

Anti-patternWhy it’s wrongFix
Parsing JSON from agent free-textFragile: breaks on markdown fences, extra text, formatting variationsDefine an output tool with typed arguments
Prompt instructions about output formatWastes token budget on rules the agent may ignore; duplicates the contractThe tool schema is the format
Business logic in the workflowCouples orchestration to domain logic; makes workflows fatMove judgment to skills, data contracts to tools
Huge prompts with no ERBCannot inject runtime context (user preferences, project settings)Use .md.erb with locals for dynamic sections
Multiple output tools per workflowConfuses the agent about which tool to callOne primary output tool per workflow
Tool handler with complex business logicHard to test, tightly coupled to infrastructureKeep handlers thin: validate, side effect, confirm

Workflow Maturity

Maturity tracks how cleanly each workflow separates skills, output tools, and orchestration. Tool wiring is now uniform across all workflows via the Tool Catalog regardless of maturity level.

WorkflowMaturityOutput Contract
PRReviewHighsubmit_review SDK tool handles GitHub submission
MentionHighAgent uses MCP tools directly (GitHub, Linear comments)
AutomationHighskip_message tool for skip decisions; delivery via AutomationMessageDeliveryService
CodeGenerationMediumsubmit_code_generation_result tool exists; PR metadata partially parsed

Maturity levels:

  • High — Prompt owns judgment, tools own contracts, workflow is thin lifecycle glue.
  • Medium — Partially follows the pattern. Some parsing or format instructions remain.
  • Low — Judgment, parsing, and orchestration tangled in the workflow.