OpenAI o1 launches on API with structured outputs and vision

OpenAI has announced several updates aimed at developers, including the release of the new OpenAI o1 model, enhancements to the Realtime API, a new fine-tuning method, and additional SDKs. These updates focus on improving model performance, customization options, and cost efficiency for developers working with AI.

OpenAI o1 Model in API

The OpenAI o1 model is a production-ready reasoning model designed for complex multi-step tasks. It is an upgrade from the earlier o1-preview model and offers significant improvements in accuracy, cost-efficiency, and latency. Key features include:

Function calling: Seamless integration with external data and APIs.
Structured Outputs: Reliable adherence to custom JSON schemas.
Developer messages: Customizable tone, style, and behavioral guidance.
Vision capabilities: Image reasoning for applications in science, manufacturing, and coding.
Lower latency: Reduced token usage by 60% compared to o1-preview.
A new reasoning_effort parameter allows developers to control response times.

We're bringing OpenAI o1 to the API. We're rolling out access to developers on usage tier 5 starting today, and rollout will continue over the next few weeks.

o1 supports:
⚙️ Function calling
🗂️ Structured Outputs
👀 Vision
📝 Developer messages
🧠 Reasoning effort pic.twitter.com/Ax8TT0IRke
— OpenAI Developers (@OpenAIDevs) December 17, 2024

The latest version, o1-2024-12-17, has been post-trained based on user feedback and achieves state-of-the-art results across benchmarks like MATH (96.4% pass rate) and LiveCodeBench (76.6%). It is being rolled out incrementally through the API.

Realtime API Updates

The Realtime API now includes:

WebRTC support: Simplifies building real-time voice applications with features like audio encoding, noise suppression, and congestion control.
New GPT-4o models: The GPT-4o-realtime-preview and GPT-4o-mini-realtime-preview snapshots offer improved voice quality and reduced costs. For example, audio token prices have dropped by 60%, while GPT-4o mini provides a cost-efficient option for smaller-scale applications.
Extended session lengths: Sessions can now last up to 30 minutes.
Concurrent out-of-band responses: Enables background tasks like content moderation without interrupting user interactions.

Preference Fine-Tuning

A new fine-tuning method called Preference Fine-Tuning has been introduced. Unlike traditional supervised fine-tuning, this approach uses Direct Preference Optimization (DPO) to train models based on preferred versus non-preferred outputs. It is particularly effective for subjective tasks like creative writing or summarization. Initial results from partners like Rogo AI show improved accuracy for complex queries.

New SDKs for Go and Java

OpenAI has released official SDKs for Go and Java in beta. These SDKs join existing Python, Node.js, and .NET libraries, making it easier for developers using these languages to integrate OpenAI models into their projects.

Company Context

OpenAI continues to expand its offerings for developers by focusing on tools that enhance AI usability across diverse applications. The introduction of OpenAI o1 and related updates reflects the company’s commitment to improving model performance while reducing costs. These developments are poised to support a wide range of use cases, from real-time voice assistants to advanced reasoning tasks in enterprise settings.