
The OpenAI API gives you programmatic access to GPT-4 and other OpenAI models, letting you build AI capabilities into your own applications. Whether you're building a chatbot, an automated document processor, a code review tool, or something else entirely, this tutorial walks you through getting started from scratch.
Store the key as an environment variable. Never hardcode it in source code:
export OPENAI_API_KEY="sk-..."
Or add it to a .env file and ensure .env is in your .gitignore.
Python:
pip install openai
Node.js:
npm install openai
Python:
from openai import OpenAI
client = OpenAI() # reads OPENAI_API_KEY from environment
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "user", "content": "Explain what an API is in two sentences."}
]
)
print(response.choices[0].message.content)
Node.js:
import OpenAI from 'openai';
const client = new OpenAI(); // reads OPENAI_API_KEY from environment
const response = await client.chat.completions.create({
model: "gpt-4o",
messages: [
{ role: "user", content: "Explain what an API is in two sentences." }
]
});
console.log(response.choices[0].message.content);
Run it and you'll receive a response from GPT-4o. That's all it takes to make your first call.
The API uses a chat completions endpoint where you pass a list of messages, each with a role and content.
Roles:
system — instructions that set Claude's behaviour (this is the system prompt)user — the human's messagesassistant — previous AI responses (for multi-turn conversations)messages = [
{"role": "system", "content": "You are a helpful assistant for a web hosting company."},
{"role": "user", "content": "My website is returning a 502 error. What should I check?"},
]
For multi-turn conversations, include the full message history each time you make a request. The API is stateless — it doesn't remember previous calls.
OpenAI offers multiple models at different capability and cost tiers:
# Most capable — complex reasoning and analysis
model="gpt-4o"
# Fast and cost-effective — good for most applications
model="gpt-4o-mini"
# Older, lower cost option
model="gpt-3.5-turbo"
Start with gpt-4o-mini for most applications — it handles the majority of tasks well and is significantly cheaper than gpt-4o. Use gpt-4o for tasks requiring complex reasoning, nuanced analysis, or higher accuracy.
The system message defines your application's context and behaviour. Invest time writing it well:
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{
"role": "system",
"content": """You are a support assistant for an uptime monitoring service.
Help users understand alerts, diagnose downtime causes, and configure monitoring.
Keep responses concise and practical."""
},
{"role": "user", "content": "Why is my SSL monitor alerting?"}
]
)
A specific system prompt produces more consistent, on-topic responses than a vague one.
For user-facing applications, stream the response so users see text as it's generated:
with client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Write a short summary of uptime monitoring."}],
stream=True
) as stream:
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)
Streaming dramatically improves perceived responsiveness for longer responses.
Production applications need robust error handling:
from openai import RateLimitError, APIConnectionError, APIStatusError
import time
def call_with_retry(messages, max_retries=3):
for attempt in range(max_retries):
try:
return client.chat.completions.create(
model="gpt-4o-mini",
messages=messages
)
except RateLimitError:
if attempt < max_retries - 1:
time.sleep(2 ** attempt) # exponential backoff
else:
raise
except APIConnectionError:
if attempt < max_retries - 1:
time.sleep(1)
else:
raise
except APIStatusError as e:
raise # 4xx errors shouldn't be retried
Implement exponential backoff for rate limit errors. Log all API errors with enough context to diagnose them later.
Each response includes token usage — useful for monitoring costs:
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Hello"}]
)
print(f"Input tokens: {response.usage.prompt_tokens}")
print(f"Output tokens: {response.usage.completion_tokens}")
print(f"Total tokens: {response.usage.total_tokens}")
Track token usage per user and per feature in production applications. System prompt tokens count on every request — keep them concise.
The OpenAI API and the Anthropic API follow a similar pattern — both use a chat completions/messages format with system, user, and assistant roles. The main differences are the model names, some parameter naming conventions, and the underlying model capabilities.
Developers often use both, routing tasks to whichever model performs better for a specific use case.
Once your application is in production, the AI API itself is generally reliable — but your own application has failure modes that have nothing to do with OpenAI: server issues, deployment failures, database errors, or network problems.
When your application goes down, users lose access to whatever AI features you've built. Finding out three hours later via a support email is far worse than getting an immediate alert.
Domain Monitor monitors your application's availability every minute from multiple global locations. Add a simple health check endpoint to your application and point Domain Monitor at it:
@app.route('/health')
def health_check():
return {'status': 'ok'}, 200
Create a free account and set up your first monitor in minutes. You'll know about downtime the moment it happens, not when users tell you. See monitoring AI API endpoints for AI-specific monitoring considerations.
Once you've made your first API call, the typical next steps for a production application are:
The OpenAI API is well-documented and the SDK handles most of the complexity. The fundamentals above cover everything you need to build a reliable, cost-effective AI-powered application.
Generative AI creates new content — text, images, code, and more. This guide explains how it works, what tools are available, and where it's genuinely useful versus overhyped.
Read moreCursor AI is an AI-powered code editor built on VS Code. Learn what it does, how it works, and whether it's the right tool for your development workflow.
Read moreClaude Opus is Anthropic's most capable AI model, built for complex reasoning and demanding tasks. Learn what it does, how it compares, and when to use it.
Read moreLooking to monitor your website and domains? Join our platform and start today.