Prompt Tuning

Understanding Agent Responses

Every agent response includes metadata you can inspect to understand how Themis arrived at its answer.

Tool Calls

When Themis uses external services during a conversation, you’ll see tool call indicators in the response. These show which tools were invoked — for example, fetching a PR diff from GitHub, querying data from Metabase, or creating an issue in Linear. This transparency helps you understand what data the agent used.

Model Information

Each response shows which model was used (e.g., Claude Sonnet, Claude Opus). If your space uses the two-tier agent architecture, simple questions may be handled by the lightweight Tier 1 model, while complex ones escalate to the full Tier 2 model automatically.

Reasoning Logs & Stats

Expand the reasoning log on any agent response to see the full trace of what happened:

  • Thinking steps — The agent’s internal reasoning process
  • Tool calls and results — Every tool invocation with inputs and outputs
  • Token usage and cost — How many tokens were consumed and the estimated cost

This is invaluable for understanding why the agent gave a particular answer and for diagnosing issues.

Continuing from Inbox Entries

Every inbox entry — not just conversations — can be extended into a full chat. Click the Chat button on any entry to start a conversation with the original context loaded:

Entry TypeWhat You Can Do
Automation resultDiscuss findings, ask follow-up questions, refine the analysis
Code generationReview the generated code, request changes, iterate on the approach
PR reviewDiscuss review comments, ask for clarification, explore alternatives
@Mention responseContinue a conversation started from GitHub or Linear

The new conversation inherits the full context — you don’t need to re-explain what happened.

Debugging Automations

When an automation isn’t performing well — producing low-quality results, failing frequently, or costing too much — use the debug workflow to improve it.

Spotting Problems

Go to the automation’s detail page and check the Execution History:

  • Frequent failures — The agent is hitting errors or getting stuck in loops
  • High cost — The agent is making too many tool calls or using excessive tokens
  • Skipped runs — The agent is skipping when it shouldn’t be (or vice versa)
  • Reasoning logs — Expand individual runs to see where things go wrong

Improving the Prompt

  1. Open a completed or failed automation execution
  2. Click Chat to start a conversation with the execution context
  3. Toggle Debug mode — this loads the full reasoning log and prompt into the conversation
  4. Ask Themis to analyze what went wrong and suggest improvements:
    • “Why did this automation fail? How can I improve the prompt?”
    • “This is costing too much. How can I make the prompt more efficient?”
    • “The output quality is inconsistent. What’s causing that?”
  5. Apply the suggested changes to your automation’s prompt template

This feedback loop is the fastest way to iterate on automation quality. The agent can see exactly what happened during the run — which tools were called, where reasoning went off track, and what the final output looked like — and suggest targeted prompt improvements.

Common Prompt Issues

SymptomLikely CauseFix
Agent makes too many tool callsPrompt is too vague about what data to fetchBe specific about which tools to use and what to look for
Output is inconsistentPrompt lacks structure expectationsAdd explicit output format guidance
Agent skips when it shouldn’tSkip conditions are too broadNarrow the skip criteria or remove the skip instruction
High token costAgent is fetching too much dataLimit scope (e.g., “only PRs from the last 24 hours”)
Frequent failuresAgent tries unsupported operationsCheck reasoning logs for the failing tool call and adjust the prompt

Tips for Better Conversations

  • Be specific — “Review this PR for security issues” works better than “Look at this PR”
  • Provide context — Paste links, share files, mention the relevant project or service
  • Iterate — If the first answer isn’t quite right, follow up. Themis remembers the full conversation.
  • Use the right entry point — Starting from an inbox entry (review, automation, code gen) loads context automatically, saving you from re-explaining
  • Check reasoning logs — If an answer seems off, expand the reasoning to understand why and redirect the agent