Prompt Tuning

Prompt Tuning

Understanding Agent Responses

Every agent response includes metadata you can inspect to understand how Themis arrived at its answer.

Tool Calls

When Themis uses external services during a conversation, you’ll see tool call indicators in the response. These show which tools were invoked — for example, fetching a PR diff from GitHub, querying data from Metabase, or creating an issue in Linear. This transparency helps you understand what data the agent used.

Model Information

Each response shows which model was used (e.g., Claude Sonnet, Claude Opus). If your space uses the two-tier agent architecture, simple questions may be handled by the lightweight Tier 1 model, while complex ones escalate to the full Tier 2 model automatically.

Reasoning Logs & Stats

Expand the reasoning log on any agent response to see the full trace of what happened:

Thinking steps — The agent’s internal reasoning process
Tool calls and results — Every tool invocation with inputs and outputs
Token usage and cost — How many tokens were consumed and the estimated cost

This is invaluable for understanding why the agent gave a particular answer and for diagnosing issues.

Continuing from Inbox Entries

Every inbox entry — not just conversations — can be extended into a full chat. Click the Chat button on any entry to start a conversation with the original context loaded:

Entry Type	What You Can Do
Automation result	Discuss findings, ask follow-up questions, refine the analysis
Code generation	Review the generated code, request changes, iterate on the approach
PR review	Discuss review comments, ask for clarification, explore alternatives
@Mention response	Continue a conversation started from GitHub or Linear

The new conversation inherits the full context — you don’t need to re-explain what happened.

Debugging Automations

When an automation isn’t performing well — producing low-quality results, failing frequently, or costing too much — use the debug workflow to improve it.

Spotting Problems

Go to the automation’s detail page and check the Execution History:

Frequent failures — The agent is hitting errors or getting stuck in loops
High cost — The agent is making too many tool calls or using excessive tokens
Skipped runs — The agent is skipping when it shouldn’t be (or vice versa)
Reasoning logs — Expand individual runs to see where things go wrong

Improving the Prompt

Open a completed or failed automation execution
Click Chat to start a conversation with the execution context
Toggle Debug mode — this loads the full reasoning log and prompt into the conversation
Ask Themis to analyze what went wrong and suggest improvements:
- “Why did this automation fail? How can I improve the prompt?”
- “This is costing too much. How can I make the prompt more efficient?”
- “The output quality is inconsistent. What’s causing that?”
Apply the suggested changes to your automation’s prompt template

This feedback loop is the fastest way to iterate on automation quality. The agent can see exactly what happened during the run — which tools were called, where reasoning went off track, and what the final output looked like — and suggest targeted prompt improvements.

Common Prompt Issues

Symptom	Likely Cause	Fix
Agent makes too many tool calls	Prompt is too vague about what data to fetch	Be specific about which tools to use and what to look for
Output is inconsistent	Prompt lacks structure expectations	Add explicit output format guidance
Agent skips when it shouldn’t	Skip conditions are too broad	Narrow the skip criteria or remove the skip instruction
High token cost	Agent is fetching too much data	Limit scope (e.g., “only PRs from the last 24 hours”)
Frequent failures	Agent tries unsupported operations	Check reasoning logs for the failing tool call and adjust the prompt

Tips for Better Conversations

Be specific — “Review this PR for security issues” works better than “Look at this PR”
Provide context — Paste links, share files, mention the relevant project or service
Iterate — If the first answer isn’t quite right, follow up. Themis remembers the full conversation.
Use the right entry point — Starting from an inbox entry (review, automation, code gen) loads context automatically, saving you from re-explaining
Check reasoning logs — If an answer seems off, expand the reasoning to understand why and redirect the agent