Yesterday, at the OpenAI press conference, the new language model GPT-4o was announced. It can take input from users in the form of text, sound, images, laughter, and emotions, providing users with a chat environment that is more like interacting with a real person.
Table of Contents
Toggle
The Real Chatbot GPT-4o
Advantages of the GPT-4o Model
Can be used as a real-time chatbot
Goal to be available to all users for free
Continued competition between OpenAI and Google
According to the team, GPT-4o will move towards more natural human-machine interaction. It can accept any combination of text, audio, and visual inputs and generate any combination of text, audio, and visual outputs. Compared to existing models, GPT-4o is more accurate and faster in understanding visual and audio information.
GPT-4o performs similarly to GPT-4 Turbo in English text and code, with an average response time of 320 milliseconds, similar to the interval between conversations among humans. In the past, GPT-3.5 had an average delay of 2.8 seconds, while GPT-4 had 5.4 seconds.
However, what do these mean?
The GPT-4o model can achieve more realistic interaction by analyzing speech and real-time images. This means that users only need to open their phone camera or directly converse with it to start interacting.
For example, it can provide real-time translation, sing happy birthday songs, serve as a customized language learning tutor, analyze the surrounding environment, and even understand human jokes and exhibit happy emotions and laughter, or understand the sarcastic implications behind language.
GPT-4o can be like a real friend, expressing admiration for how cute the user’s dog is with an envious emotion and asking about its name. GPT-4o is more like having a conversation rather than a simple question and answer session.
The GPT-4o model has trained a new model end-to-end across text, visual, and audio inputs, in addition to the user’s primary voice or text input. It can automatically incorporate the user’s facial expressions, laughter, and environment to provide more realistic and accurate responses. If the user interrupts its speech, GPT-4o knows what to do.
Learning math with Chat-4o
(Source)
The “o” in GPT-4o stands for omni, meaning all-encompassing. The team aims to provide users with a model that can respond to anything, rather than just text input or single-dimensional questions.
Currently, GPT-4o is only available to paying users, but it seems that only text and voice inputs are currently open, and the promised real-time image input will require some more time. OpenAI’s goal is to make it available to all users for free.
Paying users can have early access to try GPT-4o.
Based on our experience, many of the features mentioned by the team are still not fully developed, such as the effectiveness of listening to jokes in Chinese, the emptiness of real chat content, and the slower response speed. We look forward to further updates from the team.
OpenAI chose to release the new product before the Google I/O developer conference, indicating strong competition. Previously, rumors of collaboration between both OpenAI’s ChatGPT and Gemini models with Apple for integration into iOS 18 have emerged.
(Apple rumored to collaborate with OpenAI to integrate ChatGPT into iOS 18)
GPT-4
GPT-4o
OpenAI
Further reading
Vitalik: GPT-4 has already passed the Turing test, and it is best to keep this in mind.
Is GPT-4o not far from “Her”? Exploring the potential applications of GPT-4o, which integrates multidimensional interaction with speech.