A staged demo video leaves developers and employees in doubt about the true capabilities of Google’s new Gemini language model.
In the video, titled “Hands-on with Gemini: Interacting with multimodal AI,” Google shows off the AI model’s impressive voice interaction and real-time visual response capabilities.
After the demonstration, however, it turned out that the voice interaction did not exist and the demonstration was not in real time. Instead, Google used still images from the video with specific text prompts to get the results. In the video description Google states: “For the purposes of this demo, latency has been reduced and Gemini outputs have been shortened for brevity.”
According to Bloomberg, Google admits that the actual demonstration involved the use of still images from the video and text prompts, rather than Gemini predicting or responding to changes in real time. You can check out a making-of of the video on Google’s developer blog.
Gemini fake demo faces internal criticism
According to sources from Bloomberg and The Information, Google employees have expressed concern and criticism internally about the demo video. One Google employee stated that the video painted an unrealistic picture of how easy it is to achieve impressive results with Gemini.
The staged demo also became the subject of memes and jokes within the company, with employees sharing images and comments poking fun at the discrepancies between the video and the actual AI system.
Despite the controversy surrounding the demo video, Google insists that all user input and output shown in the video is real, even if the video suggests a real-time implementation that does not yet exist.
Eli Collins, vice president of products at Google DeepMind, told Bloomberg that the duck-drawing demo is still in the research stage and not yet part of Google’s products.
“It’s a new era for us,” Collins told Bloomberg. “We’re breaking ground from a research perspective. This is V1. It’s just the beginning.”
Google also published benchmark results in a misleading way. It compared a top score on the well-known language understanding benchmark MMLU using a more complex prompt method (CoT@32) with the standard benchmark method tested by OpenAI using GPT-4 (5-shot). Using the 5-shot prompt method with Gemini Ultra on MMLU, Google’s largest model performs 2.7% worse than GPT-4.
Although Gemini achieved the best overall MMLU score with CoT@32, the way it presents this result is questionable. It shows, as does the fake real-time video, that Google has tried at all costs to portray Gemini as superior to GPT-4, rather than about equal, which is probably closer to the truth.