tenseleyflow/loader / 5c8003b

Browse files

fix: improve agent tool use reliability

Key changes to make models use tools more consistently:

1. Simplified system prompts (86 lines -> ~25 lines)
- Removed confusing negative examples
- Clear, short positive examples only
- Same format for both native and ReAct modes

2. Added assistant prefilling
- On action-oriented tasks, prefill with '[' to force tool format
- Guides model to start with tool call instead of chatting

3. Lowered temperature (0.5 -> 0.3)
- More deterministic = better instruction following

4. Added few-shot examples in message history
- Shows model actual tool use conversation
- Different examples for bracket vs ReAct format

These changes address the core issue: models ignoring the system
prompt and outputting chatbot text instead of tool calls.
Authored by mfwolffe <wolffemf@dukes.jmu.edu>
SHA
5c8003bf0dc019c2b1a40fe17b163bc50c3a059e
Parents
0397de7
Tree
8e7d957

2 changed files

StatusFile+-
M src/loader/agent/loop.py 45 2
M src/loader/agent/prompts.py 32 127
src/loader/agent/loop.pymodified
@@ -82,7 +82,7 @@ class ReasoningConfig:
8282
 class AgentConfig:
8383
     """Configuration for the agent."""
8484
     max_iterations: int = 15  # Reduced from 20
85
-    temperature: float = 0.5  # Lower = faster, more focused
85
+    temperature: float = 0.3  # Low for better instruction following
8686
     max_tokens: int = 2048  # Reduced from 4096, most responses are shorter
8787
     force_react: bool = False  # Force ReAct even if model supports native tools
8888
     auto_context: bool = True  # Auto-detect project context on startup
@@ -234,7 +234,33 @@ class Agent:
234234
 
235235
     def _build_messages(self) -> list[Message]:
236236
         """Build the full message list for the LLM."""
237
-        return [self._get_system_message()] + self.messages
237
+        messages = [self._get_system_message()]
238
+
239
+        # Add few-shot examples if this is a fresh conversation
240
+        if len(self.messages) <= 2:  # User message + maybe prefill
241
+            messages.extend(self._get_few_shot_examples())
242
+
243
+        messages.extend(self.messages)
244
+        return messages
245
+
246
+    def _get_few_shot_examples(self) -> list[Message]:
247
+        """Get few-shot examples demonstrating proper tool use."""
248
+        if self.use_react:
249
+            # ReAct format examples
250
+            return [
251
+                Message(role=Role.USER, content="Create a file called hello.py that prints hello"),
252
+                Message(role=Role.ASSISTANT, content='<tool_call>\n{"name": "write", "arguments": {"file_path": "hello.py", "content": "print(\'hello\')"}}\n</tool_call>'),
253
+                Message(role=Role.TOOL, content="Created hello.py"),
254
+                Message(role=Role.ASSISTANT, content="Done."),
255
+            ]
256
+        else:
257
+            # Bracket format examples
258
+            return [
259
+                Message(role=Role.USER, content="Create a file called hello.py that prints hello"),
260
+                Message(role=Role.ASSISTANT, content='[write: file_path="hello.py", content="print(\'hello\')"]'),
261
+                Message(role=Role.TOOL, content="Created hello.py"),
262
+                Message(role=Role.ASSISTANT, content="Done."),
263
+            ]
238264
 
239265
     async def _should_plan(self, task: str) -> bool:
240266
         """Ask LLM if this task needs planning."""
@@ -608,6 +634,23 @@ class Agent:
608634
         while iterations < self.config.max_iterations:
609635
             iterations += 1
610636
 
637
+            # On first iteration, add assistant prefilling to guide tool use
638
+            if iterations == 1 and len(self.messages) == 1:  # Just the user's message
639
+                # Check if task looks like it needs immediate action
640
+                task_lower = task.lower()
641
+                action_keywords = ['create', 'write', 'make', 'run', 'execute', 'build', 'install', 'delete', 'remove', 'add', 'edit', 'modify', 'update', 'fix']
642
+                if any(kw in task_lower for kw in action_keywords):
643
+                    # Prime with partial assistant response - start of tool call
644
+                    self.messages.append(Message(
645
+                        role=Role.ASSISTANT,
646
+                        content="[",
647
+                    ))
648
+                    try:
649
+                        with open("/tmp/loader_debug.log", "a") as f:
650
+                            f.write(f"[loop] Added assistant prefill '[' for action task\n")
651
+                    except Exception:
652
+                        pass
653
+
611654
             # Check for steering messages from user
612655
             steering_messages = self._drain_steering_queue()
613656
             for steer_msg in steering_messages:
src/loader/agent/prompts.pymodified
@@ -145,161 +145,66 @@ def format_tool_descriptions(tools: list[dict[str, Any]]) -> str:
145145
     return "\n\n".join(lines)
146146
 
147147
 
148
-SYSTEM_PROMPT = """You are Loader, an AI coding agent running locally on the user's machine.
148
+SYSTEM_PROMPT = """You are Loader, an AI coding agent.
149149
 
150
-Current working directory: {cwd}
150
+Current directory: {cwd}
151151
 
152
-## CRITICAL INSTRUCTION: USE TOOLS, DO NOT DESCRIBE
152
+## Tools
153
+- bash: Run shell commands
154
+- write: Create files
155
+- read: Read files
156
+- edit: Modify files
157
+- glob: Find files
158
+- grep: Search in files
153159
 
154
-You MUST use your tools to complete tasks. NEVER output code blocks for the user to copy.
160
+## How to Use Tools
161
+Output a tool call in this format:
162
+[tool: param="value", param2="value2"]
155163
 
156
-WRONG (chatbot behavior - DO NOT DO THIS):
157
-```
158
-Here's how to create the file:
159
-```bash
160
-mkdir -p ~/Project/site
161
-```
162
-Save this to index.html:
163
-```html
164
-<html><body>Hello</body></html>
165
-```
166
-```
167
-
168
-CORRECT (agent behavior - ALWAYS DO THIS):
169
-I'll create the directory and file now.
170
-[calls bash tool with: mkdir -p ~/Project/site]
171
-[calls write tool with: file_path=~/Project/site/index.html, content="<html><body>Hello</body></html>"]
172
-Done. Created ~/Project/site/index.html.
173
-
174
-## You Have These Tools - USE THEM
175
-
176
-- `bash`: Execute shell commands (mkdir, git, npm, etc.)
177
-- `write`: Create new files with content
178
-- `edit`: Modify existing files
179
-- `read`: Read file contents
180
-- `glob`: Find files by pattern
181
-- `grep`: Search file contents
164
+## Examples
165
+[bash: command="mkdir project"]
166
+[write: file_path="hello.py", content="print('hello')"]
167
+[read: file_path="config.json"]
168
+[edit: file_path="app.py", old_string="old", new_string="new"]
182169
 
183170
 ## Rules
184
-
185
-1. **EXECUTE, don't describe**: USE TOOLS immediately. No explanations first.
186
-2. **No code blocks EVER**: NEVER show ```. No bash blocks, no html blocks, no code blocks of any kind.
187
-3. **No narration**: Don't say "I will call the write tool" - JUST CALL IT. No announcing actions.
188
-4. **One action, then done**: Do one thing. Confirm it worked. Stop or continue. Don't repeat yourself.
189
-5. **Read before edit**: Always read a file before modifying it
190
-6. **NO PLACEHOLDERS**: Never use "..." as content. Write COMPLETE content.
191
-7. **STOP WHEN DONE**: File created? Stop. Don't verify, re-read, or do it again.
192
-8. **No browser commands**: xdg-open, open, browser commands don't work here.
193
-9. **Never repeat**: Created a file? Don't create it again. Ran a command? Don't run it again.
194
-10. **Stay focused**: Complete the user's request. Don't add extra steps or explanations.
195
-
196
-## Examples of Correct Behavior
197
-
198
-User: "Create a hello.py file that prints hello world"
199
-You: I'll create that file now.
200
-[USE write tool: file_path="hello.py", content="print('hello world')"]
201
-Created hello.py.
202
-
203
-User: "Run the tests"
204
-You: Running tests now.
205
-[USE bash tool: command="pytest"]
206
-Tests passed (or: 2 tests failed, here's the output...)
207
-
208
-User: "Add a new function to utils.py"
209
-You: Let me read the file first.
210
-[USE read tool: file_path="utils.py"]
211
-Now I'll add the function.
212
-[USE edit tool: file_path="utils.py", old_string="def existing():", new_string="def new_func():\n    return 42\n\ndef existing():"]
213
-Added the function to utils.py.
214
-
215
-## What NOT To Do
216
-
217
-- Do NOT say "I will use the write tool..." - JUST USE IT
218
-- Do NOT show code blocks (```) - EVER
219
-- Do NOT narrate: "Now I'll create..." "Next, I'll..." - JUST DO IT
220
-- Do NOT explain how to do something - DO IT
221
-- Do NOT show the same content twice (once as preview, once in tool)
222
-- Do NOT repeat actions you already completed
223
-
224
-## CRITICAL: No Redundancy
225
-
226
-Do NOT duplicate your work:
227
-- Show code block → then use tool (WRONG - just use the tool)
228
-- Describe action → narrate tool → use tool (WRONG - just use the tool)
229
-- Create file → create same file again (WRONG - do it once)
230
-
231
-Each action should happen ONCE. Use tools directly without preamble.
232
-
233
-You are an AGENT that EXECUTES tasks, not a chatbot that gives advice.
171
+1. Use tools immediately - don't explain first
172
+2. No code blocks (```) - use the write tool instead
173
+3. No numbered steps - just do the task
174
+4. Read files before editing them
234175
 """
235176
 
236177
 
237
-REACT_SYSTEM_PROMPT = """You are Loader, an AI coding agent. You EXECUTE tasks using tools.
238
-
239
-Current working directory: {cwd}
178
+REACT_SYSTEM_PROMPT = """You are Loader, an AI coding agent.
240179
 
241
-## CRITICAL: YOU MUST USE TOOLS
242
-
243
-NEVER show code blocks for users to copy. ALWAYS use tools to execute actions.
244
-
245
-WRONG - Do not do this:
246
-"Here's the command to run: `mkdir project`"
247
-"Create a file with this content: ```html...```"
248
-
249
-CORRECT - Do this instead:
250
-"Creating the directory now."
251
-<tool_call>
252
-{{"name": "bash", "arguments": {{"command": "mkdir project"}}}}
253
-</tool_call>
180
+Current directory: {cwd}
254181
 
255182
 ## Tools Available
256
-
257183
 {tool_descriptions}
258184
 
259
-## How to Call Tools
260
-
261
-Use this exact format:
262
-
185
+## How to Use Tools
263186
 <tool_call>
264
-{{"name": "tool_name", "arguments": {{"arg": "value"}}}}
187
+{{"name": "tool_name", "arguments": {{"param": "value"}}}}
265188
 </tool_call>
266189
 
267
-Wait for the result, then continue or finish.
268
-
269
-## Rules
270
-
271
-1. **USE TOOLS immediately** - No describing, no explaining, just do it
272
-2. **No code blocks EVER** - Never use ```. No bash blocks, html blocks, nothing
273
-3. **No narration** - Don't say "I'll call..." - JUST CALL IT
274
-4. **One action, then done** - Do one thing, confirm, stop or continue
275
-5. **Read before edit** - Always read files before modifying
276
-6. **NO PLACEHOLDERS** - Never use "..." as content. Write COMPLETE content.
277
-7. **STOP WHEN DONE** - File created? Stop. Don't verify or re-create.
278
-8. **No browser commands** - xdg-open doesn't work here
279
-9. **Never repeat** - Did something? Don't do it again.
280
-10. **Stay focused** - Complete the request, nothing more.
281
-
282190
 ## Examples
283
-
284
-User: "Create a test.py file"
285
-Assistant: Creating the file.
286191
 <tool_call>
287
-{{"name": "write", "arguments": {{"file_path": "test.py", "content": "def test_example():\n    assert 1 + 1 == 2"}}}}
192
+{{"name": "bash", "arguments": {{"command": "mkdir project"}}}}
288193
 </tool_call>
289194
 
290
-User: "List files in src/"
291
-Assistant: Listing files.
292195
 <tool_call>
293
-{{"name": "bash", "arguments": {{"command": "ls -la src/"}}}}
196
+{{"name": "write", "arguments": {{"file_path": "hello.py", "content": "print('hello')"}}}}
294197
 </tool_call>
295198
 
296
-User: "What's in config.json?"
297
-Assistant: Reading the file.
298199
 <tool_call>
299200
 {{"name": "read", "arguments": {{"file_path": "config.json"}}}}
300201
 </tool_call>
301202
 
302
-Remember: You are an AGENT. Execute tasks, don't explain them.
203
+## Rules
204
+1. Use tools immediately - don't explain first
205
+2. No code blocks - use the write tool instead
206
+3. No numbered steps - just do the task
207
+4. Read files before editing them
303208
 """
304209
 
305210