AI Engineer Guide

Anthropic Claude Code Execution Tool via API

Anthropic has recently added support for code execution tool (at their server) when making LLM calls.

It allows LLM to execute Python code in secure, sandboxed environment – No internet access.

Primarily it’ll be helpful for processing complex calculations or doing something deterministically which LLM might be be good enough (yet 😜)

Feature Flag

To use this feature, you need to set beta header

"anthropic-beta": "code-execution-2025-05-22"

Quick Example

When making the API request, we need to define the following tool in the tools array

{ 
	"type": "code_execution_20250522",
	"name": "code_execution" 
}

Request

Here is a simple example:

curl --location 'https://api.anthropic.com/v1/messages' \
--header 'x-api-key: $ANTHROPIC_API_KEY' \
--header 'anthropic-version: 2023-06-01' \
--header 'anthropic-beta: code-execution-2025-05-22' \
--header 'content-type: application/json' \
--data '{
    "model": "claude-3-5-haiku-latest",
    "max_tokens": 4096,
    "messages": [
        {
            "role": "user",
            "content": "Calculate the mean and standard deviation of [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]"
        }
    ],
    "tools": [
        {
            "type": "code_execution_20250522",
            "name": "code_execution"
        }
    ]
}'

Make sure to set you ANTHROPIC_API_KEY in header

Response

{
    "id": "msg_01J6NA9KpzkGdqaN9pGA5n8Z",
    "type": "message",
    "role": "assistant",
    "model": "claude-3-5-haiku-20241022",
    "content": [
        {
            "type": "text",
            "text": "I'll help you calculate the mean and standard deviation of the given list [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] using Python's NumPy library, which is great for statistical calculations."
        },
        {
            "type": "server_tool_use",
            "id": "srvtoolu_01Ji4omgudMBX3jutGap7Ceh",
            "name": "code_execution",
            "input": {
                "code": "import numpy as np\n\n# Define the list\ndata = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]\n\n# Calculate mean\nmean = np.mean(data)\n\n# Calculate standard deviation\nstd_dev = np.std(data)\n\nprint(f\"Mean: {mean}\")\nprint(f\"Standard Deviation: {std_dev}\")"
            }
        },
        {
            "type": "code_execution_tool_result",
            "tool_use_id": "srvtoolu_01Ji4omgudMBX3jutGap7Ceh",
            "content": {
                "type": "code_execution_result",
                "stdout": "Mean: 5.5\nStandard Deviation: 2.8722813232690143\n",
                "stderr": "",
                "return_code": 0,
                "content": []
            }
        },
        {
            "type": "text",
            "text": "Let me break down the results:\n- Mean: 5.5 \n  - This is the average of all numbers in the list, calculated by summing all values and dividing by the total count of numbers.\n- Standard Deviation: 2.87 \n  - This measures the amount of variation or dispersion in the dataset. A lower standard deviation indicates that the values tend to be closer to the mean, while a higher standard deviation indicates the values are spread out over a wider range.\n\nIs there anything else you would like to know about these calculations?"
        }
    ],
    "container": {
        "id": "container_011CPmeL2A4TgyKpx2CYV98b",
        "expires_at": "2025-06-03T17:59:49.267116+00:00"
    },
    "stop_reason": "end_turn",
    "stop_sequence": null,
    "usage": {
        "input_tokens": 1707,
        "cache_creation_input_tokens": 0,
        "cache_read_input_tokens": 0,
        "output_tokens": 332,
        "service_tier": "standard",
        "server_tool_use": {
            "web_search_requests": 0
        }
    }
}

Results

Code execution will return the following things:

VariableDescription
stdoutOutput from print statements and successful execution
stderrError messages if code execution fails
return_code0 for success, non-zero for failure

Errors

If there is an error using the tool there will be aΒ code_execution_tool_result_error

{
	"type": "code_execution_tool_result",
	"tool_use_id": "srvtoolu_01VfmxgZ46TiHbmXgy928hQR",
	"content": {
		"type": "code_execution_tool_result_error",
		"error_code": "unavailable"
	}
}

Possible errors include:

Error CodeDescription
unavailableThe code execution tool is unavailable
code_execution_exceededExecution time exceeded the maximum allowed
container_expiredThe container is expired and not available

How does it work?

  1. We need to define the tool in our request
  2. Claude will determine if code execution is needed.
  3. If so, it’ll write the code, run it and then respond back with result (or failure)

Supported Models

There models supports code execution tool:

What’s the catch?

You can read more about containers, pre-installed libraries, file handling in their docs

Random Thoughts

References

Happy code execution!

#Anthropic