Code editor with ChatGPT API integration code alongside a running terminal showing AI responses
# ai tools# artificial intelligence# developer tools

ChatGPT API Tutorial: Build Your First AI-Powered Application

The ChatGPT API — officially the OpenAI Chat Completions API — lets you embed GPT-4's capabilities into your own applications. This tutorial goes beyond "hello world" and covers the patterns you'll actually use in a real application: conversation management, streaming, error handling, and keeping your application running reliably in production.

For the basic API setup steps, see the OpenAI API tutorial. This guide assumes you have your API key and SDK installed and focuses on building a complete application.

The Application We're Building

A simple but complete conversational assistant with:

  • Multi-turn conversation memory
  • A configurable system prompt
  • Streaming responses
  • Retry logic for production reliability
  • A health check endpoint for monitoring

Setting Up the Conversation Manager

The ChatGPT API is stateless — it doesn't remember previous messages. You maintain conversation history yourself and pass the full history with each request.

import anthropic
from openai import OpenAI
import time

client = OpenAI()

class ConversationManager:
    def __init__(self, system_prompt: str, model: str = "gpt-4o-mini"):
        self.system_prompt = system_prompt
        self.model = model
        self.messages = []

    def chat(self, user_message: str) -> str:
        self.messages.append({"role": "user", "content": user_message})

        response = client.chat.completions.create(
            model=self.model,
            messages=[
                {"role": "system", "content": self.system_prompt}
            ] + self.messages
        )

        assistant_message = response.choices[0].message.content
        self.messages.append({"role": "assistant", "content": assistant_message})
        return assistant_message

    def reset(self):
        self.messages = []

Usage:

assistant = ConversationManager(
    system_prompt="You are a helpful support assistant for a web monitoring service. "
                  "Help users understand uptime monitoring, configure alerts, and "
                  "diagnose website issues. Keep responses concise and practical."
)

print(assistant.chat("What's the difference between a 502 and a 503 error?"))
print(assistant.chat("Which one is more likely to self-resolve?"))

The second message uses the context from the first — the model knows "which one" refers to the error types discussed.

Managing Context Window Limits

Long conversations can exceed the model's context window. Implement a sliding window to prevent this:

class ConversationManager:
    def __init__(self, system_prompt: str, model: str = "gpt-4o-mini", max_messages: int = 20):
        self.system_prompt = system_prompt
        self.model = model
        self.messages = []
        self.max_messages = max_messages

    def chat(self, user_message: str) -> str:
        self.messages.append({"role": "user", "content": user_message})

        # Keep only the most recent messages if over the limit
        if len(self.messages) > self.max_messages:
            self.messages = self.messages[-self.max_messages:]

        response = client.chat.completions.create(
            model=self.model,
            messages=[{"role": "system", "content": self.system_prompt}] + self.messages
        )

        assistant_message = response.choices[0].message.content
        self.messages.append({"role": "assistant", "content": assistant_message})
        return assistant_message

For applications where conversation history matters throughout, consider summarising older messages rather than dropping them.

Streaming for Better User Experience

For user-facing features, stream the response so text appears as it's generated:

def chat_streaming(conversation_history: list, system_prompt: str):
    with client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "system", "content": system_prompt}] + conversation_history,
        stream=True
    ) as stream:
        full_response = ""
        for chunk in stream:
            if chunk.choices[0].delta.content:
                content = chunk.choices[0].delta.content
                print(content, end="", flush=True)
                full_response += content
        print()  # newline at end
    return full_response

Streaming dramatically improves perceived responsiveness for longer responses — users see content appearing immediately rather than waiting for the full response.

Production Error Handling

Retry transient errors with exponential backoff:

from openai import RateLimitError, APIConnectionError, APIStatusError

def call_with_retry(messages: list, system_prompt: str, max_retries: int = 3) -> str:
    for attempt in range(max_retries):
        try:
            response = client.chat.completions.create(
                model="gpt-4o-mini",
                messages=[{"role": "system", "content": system_prompt}] + messages
            )
            return response.choices[0].message.content

        except RateLimitError:
            if attempt < max_retries - 1:
                wait = 2 ** attempt
                print(f"Rate limited — waiting {wait}s before retry")
                time.sleep(wait)
            else:
                raise

        except APIConnectionError:
            if attempt < max_retries - 1:
                time.sleep(1)
            else:
                raise

        except APIStatusError as e:
            # 4xx errors (bad request, auth issues) — don't retry
            raise

Log all errors with context: the model, token counts, user ID, and the error type. This makes diagnosing production issues much faster.

Building a Flask API Around It

Wrapping the conversation manager in a Flask API:

from flask import Flask, request, jsonify, Response
import uuid

app = Flask(__name__)
conversations = {}

SYSTEM_PROMPT = """You are a support assistant for a web monitoring service.
Help users understand monitoring setup, diagnose issues, and configure alerts.
Be concise and practical."""

@app.route('/chat', methods=['POST'])
def chat():
    data = request.json
    session_id = data.get('session_id', str(uuid.uuid4()))
    user_message = data['message']

    if session_id not in conversations:
        conversations[session_id] = ConversationManager(SYSTEM_PROMPT)

    response = conversations[session_id].chat(user_message)
    return jsonify({'session_id': session_id, 'response': response})

@app.route('/health')
def health():
    return jsonify({'status': 'ok'}), 200

if __name__ == '__main__':
    app.run(port=5000)

The /health endpoint is important — it's what your monitoring tool will check.

Deploying and Monitoring

Once deployed, your application needs monitoring. The ChatGPT API being reliable doesn't mean your application is reliable — server failures, misconfigured deployments, database issues, and network problems can all take your application down independently of the AI API.

Domain Monitor monitors your application's availability every minute from multiple global locations. Point it at your /health endpoint and you'll get an immediate alert the moment your application stops responding.

Create a free account before you deploy. Configure alerts via email, SMS, or Slack. See monitoring apps built with AI for AI-specific monitoring considerations and how to set up uptime monitoring for a complete setup guide.

Choosing Between OpenAI and Anthropic APIs

Both the OpenAI API (for ChatGPT/GPT-4) and the Anthropic API (for Claude) follow a similar pattern. The code structure — system prompts, message history, streaming, error handling — is nearly identical between the two.

Many production applications use both, routing different types of requests to whichever model performs best for that task. The infrastructure patterns are transferable.

What to Build Next

With this foundation, you can extend in several directions:

  • Retrieval Augmented Generation (RAG) — Give the model access to your own documents by embedding them and retrieving relevant chunks based on the user's question
  • Function calling — Let the model call functions in your application to look up data, take actions, or query APIs
  • Structured output — Request responses in JSON format for reliable parsing
  • Multi-model routing — Use cheaper models for simple tasks and more expensive models only where quality justifies the cost

The patterns for error handling, context management, and monitoring apply regardless of which direction you take it.

More posts

What Is Generative AI? How It Works and What It Creates

Generative AI creates new content — text, images, code, and more. This guide explains how it works, what tools are available, and where it's genuinely useful versus overhyped.

Read more
What Is Cursor AI? The AI Code Editor Explained

Cursor AI is an AI-powered code editor built on VS Code. Learn what it does, how it works, and whether it's the right tool for your development workflow.

Read more
What Is Claude Opus? Anthropic's Most Powerful Model Explained

Claude Opus is Anthropic's most capable AI model, built for complex reasoning and demanding tasks. Learn what it does, how it compares, and when to use it.

Read more

Subscribe to our PRO plan.

Looking to monitor your website and domains? Join our platform and start today.