
The ChatGPT API — officially the OpenAI Chat Completions API — lets you embed GPT-4's capabilities into your own applications. This tutorial goes beyond "hello world" and covers the patterns you'll actually use in a real application: conversation management, streaming, error handling, and keeping your application running reliably in production.
For the basic API setup steps, see the OpenAI API tutorial. This guide assumes you have your API key and SDK installed and focuses on building a complete application.
A simple but complete conversational assistant with:
The ChatGPT API is stateless — it doesn't remember previous messages. You maintain conversation history yourself and pass the full history with each request.
import anthropic
from openai import OpenAI
import time
client = OpenAI()
class ConversationManager:
def __init__(self, system_prompt: str, model: str = "gpt-4o-mini"):
self.system_prompt = system_prompt
self.model = model
self.messages = []
def chat(self, user_message: str) -> str:
self.messages.append({"role": "user", "content": user_message})
response = client.chat.completions.create(
model=self.model,
messages=[
{"role": "system", "content": self.system_prompt}
] + self.messages
)
assistant_message = response.choices[0].message.content
self.messages.append({"role": "assistant", "content": assistant_message})
return assistant_message
def reset(self):
self.messages = []
Usage:
assistant = ConversationManager(
system_prompt="You are a helpful support assistant for a web monitoring service. "
"Help users understand uptime monitoring, configure alerts, and "
"diagnose website issues. Keep responses concise and practical."
)
print(assistant.chat("What's the difference between a 502 and a 503 error?"))
print(assistant.chat("Which one is more likely to self-resolve?"))
The second message uses the context from the first — the model knows "which one" refers to the error types discussed.
Long conversations can exceed the model's context window. Implement a sliding window to prevent this:
class ConversationManager:
def __init__(self, system_prompt: str, model: str = "gpt-4o-mini", max_messages: int = 20):
self.system_prompt = system_prompt
self.model = model
self.messages = []
self.max_messages = max_messages
def chat(self, user_message: str) -> str:
self.messages.append({"role": "user", "content": user_message})
# Keep only the most recent messages if over the limit
if len(self.messages) > self.max_messages:
self.messages = self.messages[-self.max_messages:]
response = client.chat.completions.create(
model=self.model,
messages=[{"role": "system", "content": self.system_prompt}] + self.messages
)
assistant_message = response.choices[0].message.content
self.messages.append({"role": "assistant", "content": assistant_message})
return assistant_message
For applications where conversation history matters throughout, consider summarising older messages rather than dropping them.
For user-facing features, stream the response so text appears as it's generated:
def chat_streaming(conversation_history: list, system_prompt: str):
with client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "system", "content": system_prompt}] + conversation_history,
stream=True
) as stream:
full_response = ""
for chunk in stream:
if chunk.choices[0].delta.content:
content = chunk.choices[0].delta.content
print(content, end="", flush=True)
full_response += content
print() # newline at end
return full_response
Streaming dramatically improves perceived responsiveness for longer responses — users see content appearing immediately rather than waiting for the full response.
Retry transient errors with exponential backoff:
from openai import RateLimitError, APIConnectionError, APIStatusError
def call_with_retry(messages: list, system_prompt: str, max_retries: int = 3) -> str:
for attempt in range(max_retries):
try:
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "system", "content": system_prompt}] + messages
)
return response.choices[0].message.content
except RateLimitError:
if attempt < max_retries - 1:
wait = 2 ** attempt
print(f"Rate limited — waiting {wait}s before retry")
time.sleep(wait)
else:
raise
except APIConnectionError:
if attempt < max_retries - 1:
time.sleep(1)
else:
raise
except APIStatusError as e:
# 4xx errors (bad request, auth issues) — don't retry
raise
Log all errors with context: the model, token counts, user ID, and the error type. This makes diagnosing production issues much faster.
Wrapping the conversation manager in a Flask API:
from flask import Flask, request, jsonify, Response
import uuid
app = Flask(__name__)
conversations = {}
SYSTEM_PROMPT = """You are a support assistant for a web monitoring service.
Help users understand monitoring setup, diagnose issues, and configure alerts.
Be concise and practical."""
@app.route('/chat', methods=['POST'])
def chat():
data = request.json
session_id = data.get('session_id', str(uuid.uuid4()))
user_message = data['message']
if session_id not in conversations:
conversations[session_id] = ConversationManager(SYSTEM_PROMPT)
response = conversations[session_id].chat(user_message)
return jsonify({'session_id': session_id, 'response': response})
@app.route('/health')
def health():
return jsonify({'status': 'ok'}), 200
if __name__ == '__main__':
app.run(port=5000)
The /health endpoint is important — it's what your monitoring tool will check.
Once deployed, your application needs monitoring. The ChatGPT API being reliable doesn't mean your application is reliable — server failures, misconfigured deployments, database issues, and network problems can all take your application down independently of the AI API.
Domain Monitor monitors your application's availability every minute from multiple global locations. Point it at your /health endpoint and you'll get an immediate alert the moment your application stops responding.
Create a free account before you deploy. Configure alerts via email, SMS, or Slack. See monitoring apps built with AI for AI-specific monitoring considerations and how to set up uptime monitoring for a complete setup guide.
Both the OpenAI API (for ChatGPT/GPT-4) and the Anthropic API (for Claude) follow a similar pattern. The code structure — system prompts, message history, streaming, error handling — is nearly identical between the two.
Many production applications use both, routing different types of requests to whichever model performs best for that task. The infrastructure patterns are transferable.
With this foundation, you can extend in several directions:
The patterns for error handling, context management, and monitoring apply regardless of which direction you take it.
Generative AI creates new content — text, images, code, and more. This guide explains how it works, what tools are available, and where it's genuinely useful versus overhyped.
Read moreCursor AI is an AI-powered code editor built on VS Code. Learn what it does, how it works, and whether it's the right tool for your development workflow.
Read moreClaude Opus is Anthropic's most capable AI model, built for complex reasoning and demanding tasks. Learn what it does, how it compares, and when to use it.
Read moreLooking to monitor your website and domains? Join our platform and start today.