
The ChatGPT API — officially the OpenAI Chat Completions API — lets you embed GPT-4's capabilities into your own applications. This tutorial goes beyond "hello world" and covers the patterns you'll actually use in a real application: conversation management, streaming, error handling, and keeping your application running reliably in production.
For the basic API setup steps, see the OpenAI API tutorial. This guide assumes you have your API key and SDK installed and focuses on building a complete application.
A simple but complete conversational assistant with:
The ChatGPT API is stateless — it doesn't remember previous messages. You maintain conversation history yourself and pass the full history with each request.
import anthropic
from openai import OpenAI
import time
client = OpenAI()
class ConversationManager:
def __init__(self, system_prompt: str, model: str = "gpt-4o-mini"):
self.system_prompt = system_prompt
self.model = model
self.messages = []
def chat(self, user_message: str) -> str:
self.messages.append({"role": "user", "content": user_message})
response = client.chat.completions.create(
model=self.model,
messages=[
{"role": "system", "content": self.system_prompt}
] + self.messages
)
assistant_message = response.choices[0].message.content
self.messages.append({"role": "assistant", "content": assistant_message})
return assistant_message
def reset(self):
self.messages = []
Usage:
assistant = ConversationManager(
system_prompt="You are a helpful support assistant for a web monitoring service. "
"Help users understand uptime monitoring, configure alerts, and "
"diagnose website issues. Keep responses concise and practical."
)
print(assistant.chat("What's the difference between a 502 and a 503 error?"))
print(assistant.chat("Which one is more likely to self-resolve?"))
The second message uses the context from the first — the model knows "which one" refers to the error types discussed.
Long conversations can exceed the model's context window. Implement a sliding window to prevent this:
class ConversationManager:
def __init__(self, system_prompt: str, model: str = "gpt-4o-mini", max_messages: int = 20):
self.system_prompt = system_prompt
self.model = model
self.messages = []
self.max_messages = max_messages
def chat(self, user_message: str) -> str:
self.messages.append({"role": "user", "content": user_message})
# Keep only the most recent messages if over the limit
if len(self.messages) > self.max_messages:
self.messages = self.messages[-self.max_messages:]
response = client.chat.completions.create(
model=self.model,
messages=[{"role": "system", "content": self.system_prompt}] + self.messages
)
assistant_message = response.choices[0].message.content
self.messages.append({"role": "assistant", "content": assistant_message})
return assistant_message
For applications where conversation history matters throughout, consider summarising older messages rather than dropping them.
For user-facing features, stream the response so text appears as it's generated:
def chat_streaming(conversation_history: list, system_prompt: str):
with client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "system", "content": system_prompt}] + conversation_history,
stream=True
) as stream:
full_response = ""
for chunk in stream:
if chunk.choices[0].delta.content:
content = chunk.choices[0].delta.content
print(content, end="", flush=True)
full_response += content
print() # newline at end
return full_response
Streaming dramatically improves perceived responsiveness for longer responses — users see content appearing immediately rather than waiting for the full response.
Retry transient errors with exponential backoff:
from openai import RateLimitError, APIConnectionError, APIStatusError
def call_with_retry(messages: list, system_prompt: str, max_retries: int = 3) -> str:
for attempt in range(max_retries):
try:
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "system", "content": system_prompt}] + messages
)
return response.choices[0].message.content
except RateLimitError:
if attempt < max_retries - 1:
wait = 2 ** attempt
print(f"Rate limited — waiting {wait}s before retry")
time.sleep(wait)
else:
raise
except APIConnectionError:
if attempt < max_retries - 1:
time.sleep(1)
else:
raise
except APIStatusError as e:
# 4xx errors (bad request, auth issues) — don't retry
raise
Log all errors with context: the model, token counts, user ID, and the error type. This makes diagnosing production issues much faster.
Wrapping the conversation manager in a Flask API:
from flask import Flask, request, jsonify, Response
import uuid
app = Flask(__name__)
conversations = {}
SYSTEM_PROMPT = """You are a support assistant for a web monitoring service.
Help users understand monitoring setup, diagnose issues, and configure alerts.
Be concise and practical."""
@app.route('/chat', methods=['POST'])
def chat():
data = request.json
session_id = data.get('session_id', str(uuid.uuid4()))
user_message = data['message']
if session_id not in conversations:
conversations[session_id] = ConversationManager(SYSTEM_PROMPT)
response = conversations[session_id].chat(user_message)
return jsonify({'session_id': session_id, 'response': response})
@app.route('/health')
def health():
return jsonify({'status': 'ok'}), 200
if __name__ == '__main__':
app.run(port=5000)
The /health endpoint is important — it's what your monitoring tool will check.
Once deployed, your application needs monitoring. The ChatGPT API being reliable doesn't mean your application is reliable — server failures, misconfigured deployments, database issues, and network problems can all take your application down independently of the AI API.
Domain Monitor monitors your application's availability every minute from multiple global locations. Point it at your /health endpoint and you'll get an immediate alert the moment your application stops responding.
Create a free account before you deploy. Configure alerts via email, SMS, or Slack. See monitoring apps built with AI for AI-specific monitoring considerations and how to set up uptime monitoring for a complete setup guide.
Both the OpenAI API (for ChatGPT/GPT-4) and the Anthropic API (for Claude) follow a similar pattern. The code structure — system prompts, message history, streaming, error handling — is nearly identical between the two.
Many production applications use both, routing different types of requests to whichever model performs best for that task. The infrastructure patterns are transferable.
With this foundation, you can extend in several directions:
The patterns for error handling, context management, and monitoring apply regardless of which direction you take it.
A subdomain takeover lets an attacker claim your subdomain by exploiting dangling DNS records. Learn how it happens, real-world examples, and how DNS monitoring detects it.
Read moreMean time to detect (MTTD) measures how long it takes to discover an incident after it starts. Reducing MTTD is one of the highest-leverage improvements in reliability engineering.
Read moreBlack box monitoring tests your systems from the outside, the way users experience them — without access to internal code or infrastructure. Learn how it works and when to use it.
Read moreLooking to monitor your website and domains? Join our platform and start today.