ChatGPT API Tutorial: Build Your First AI-Powered Application

The ChatGPT API — officially the OpenAI Chat Completions API — lets you embed GPT-4's capabilities into your own applications. This tutorial goes beyond "hello world" and covers the patterns you'll actually use in a real application: conversation management, streaming, error handling, and keeping your application running reliably in production.

For the basic API setup steps, see the OpenAI API tutorial. This guide assumes you have your API key and SDK installed and focuses on building a complete application.

The Application We're Building

A simple but complete conversational assistant with:

Multi-turn conversation memory
A configurable system prompt
Streaming responses
Retry logic for production reliability
A health check endpoint for monitoring

Setting Up the Conversation Manager

The ChatGPT API is stateless — it doesn't remember previous messages. You maintain conversation history yourself and pass the full history with each request.

import anthropic
from openai import OpenAI
import time

client = OpenAI()

class ConversationManager:
    def __init__(self, system_prompt: str, model: str = "gpt-4o-mini"):
        self.system_prompt = system_prompt
        self.model = model
        self.messages = []

    def chat(self, user_message: str) -> str:
        self.messages.append({"role": "user", "content": user_message})

        response = client.chat.completions.create(
            model=self.model,
            messages=[
                {"role": "system", "content": self.system_prompt}
            ] + self.messages
        )

        assistant_message = response.choices[0].message.content
        self.messages.append({"role": "assistant", "content": assistant_message})
        return assistant_message

    def reset(self):
        self.messages = []

Usage:

assistant = ConversationManager(
    system_prompt="You are a helpful support assistant for a web monitoring service. "
                  "Help users understand uptime monitoring, configure alerts, and "
                  "diagnose website issues. Keep responses concise and practical."
)

print(assistant.chat("What's the difference between a 502 and a 503 error?"))
print(assistant.chat("Which one is more likely to self-resolve?"))

The second message uses the context from the first — the model knows "which one" refers to the error types discussed.

Managing Context Window Limits

Long conversations can exceed the model's context window. Implement a sliding window to prevent this:

class ConversationManager:
    def __init__(self, system_prompt: str, model: str = "gpt-4o-mini", max_messages: int = 20):
        self.system_prompt = system_prompt
        self.model = model
        self.messages = []
        self.max_messages = max_messages

    def chat(self, user_message: str) -> str:
        self.messages.append({"role": "user", "content": user_message})

        # Keep only the most recent messages if over the limit
        if len(self.messages) > self.max_messages:
            self.messages = self.messages[-self.max_messages:]

        response = client.chat.completions.create(
            model=self.model,
            messages=[{"role": "system", "content": self.system_prompt}] + self.messages
        )

        assistant_message = response.choices[0].message.content
        self.messages.append({"role": "assistant", "content": assistant_message})
        return assistant_message

For applications where conversation history matters throughout, consider summarising older messages rather than dropping them.

Streaming for Better User Experience

For user-facing features, stream the response so text appears as it's generated:

def chat_streaming(conversation_history: list, system_prompt: str):
    with client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "system", "content": system_prompt}] + conversation_history,
        stream=True
    ) as stream:
        full_response = ""
        for chunk in stream:
            if chunk.choices[0].delta.content:
                content = chunk.choices[0].delta.content
                print(content, end="", flush=True)
                full_response += content
        print()  # newline at end
    return full_response

Streaming dramatically improves perceived responsiveness for longer responses — users see content appearing immediately rather than waiting for the full response.

Production Error Handling

Retry transient errors with exponential backoff:

from openai import RateLimitError, APIConnectionError, APIStatusError

def call_with_retry(messages: list, system_prompt: str, max_retries: int = 3) -> str:
    for attempt in range(max_retries):
        try:
            response = client.chat.completions.create(
                model="gpt-4o-mini",
                messages=[{"role": "system", "content": system_prompt}] + messages
            )
            return response.choices[0].message.content

        except RateLimitError:
            if attempt < max_retries - 1:
                wait = 2 ** attempt
                print(f"Rate limited — waiting {wait}s before retry")
                time.sleep(wait)
            else:
                raise

        except APIConnectionError:
            if attempt < max_retries - 1:
                time.sleep(1)
            else:
                raise

        except APIStatusError as e:
            # 4xx errors (bad request, auth issues) — don't retry
            raise

Log all errors with context: the model, token counts, user ID, and the error type. This makes diagnosing production issues much faster.

Building a Flask API Around It

Wrapping the conversation manager in a Flask API:

from flask import Flask, request, jsonify, Response
import uuid

app = Flask(__name__)
conversations = {}

SYSTEM_PROMPT = """You are a support assistant for a web monitoring service.
Help users understand monitoring setup, diagnose issues, and configure alerts.
Be concise and practical."""

@app.route('/chat', methods=['POST'])
def chat():
    data = request.json
    session_id = data.get('session_id', str(uuid.uuid4()))
    user_message = data['message']

    if session_id not in conversations:
        conversations[session_id] = ConversationManager(SYSTEM_PROMPT)

    response = conversations[session_id].chat(user_message)
    return jsonify({'session_id': session_id, 'response': response})

@app.route('/health')
def health():
    return jsonify({'status': 'ok'}), 200

if __name__ == '__main__':
    app.run(port=5000)

The /health endpoint is important — it's what your monitoring tool will check.

Deploying and Monitoring

Once deployed, your application needs monitoring. The ChatGPT API being reliable doesn't mean your application is reliable — server failures, misconfigured deployments, database issues, and network problems can all take your application down independently of the AI API.

Domain Monitor monitors your application's availability every minute from multiple global locations. Point it at your /health endpoint and you'll get an immediate alert the moment your application stops responding.

Create a free account before you deploy. Configure alerts via email, SMS, or Slack. See monitoring apps built with AI for AI-specific monitoring considerations and how to set up uptime monitoring for a complete setup guide.

Choosing Between OpenAI and Anthropic APIs

Both the OpenAI API (for ChatGPT/GPT-4) and the Anthropic API (for Claude) follow a similar pattern. The code structure — system prompts, message history, streaming, error handling — is nearly identical between the two.

Many production applications use both, routing different types of requests to whichever model performs best for that task. The infrastructure patterns are transferable.

What to Build Next

With this foundation, you can extend in several directions:

Retrieval Augmented Generation (RAG) — Give the model access to your own documents by embedding them and retrieving relevant chunks based on the user's question
Function calling — Let the model call functions in your application to look up data, take actions, or query APIs
Structured output — Request responses in JSON format for reliable parsing
Multi-model routing — Use cheaper models for simple tasks and more expensive models only where quality justifies the cost

The patterns for error handling, context management, and monitoring apply regardless of which direction you take it.

ChatGPT API Tutorial: Build Your First AI-Powered Application

The Application We're Building

Setting Up the Conversation Manager

Managing Context Window Limits

Streaming for Better User Experience

Production Error Handling

Building a Flask API Around It

Deploying and Monitoring

Choosing Between OpenAI and Anthropic APIs

What to Build Next

More posts

What Is a Subdomain Takeover and How to Prevent It

What Is Mean Time to Detect (MTTD)?

What Is Black Box Monitoring?

Subscribe to our PRO plan.

Domain Monitoring

Uptime Monitoring

SSL Monitoring

WHOIS Lookup

Notifications

Status Pages

Ping test

Traceroute test

Find my website's IP

# ai tools# artificial intelligence# developer tools

ChatGPT API Tutorial: Build Your First AI-Powered Application

The Application We're Building

Setting Up the Conversation Manager

Managing Context Window Limits

Streaming for Better User Experience

Production Error Handling

Building a Flask API Around It

Deploying and Monitoring

Choosing Between OpenAI and Anthropic APIs

What to Build Next

Related Articles

More posts

What Is a Subdomain Takeover and How to Prevent It

What Is Mean Time to Detect (MTTD)?

What Is Black Box Monitoring?

Subscribe to our PRO plan.