Mystery AI models on LM Arena: Grok 3 or Opus 3.5 or o3?

Recently, on LM Arena, users spotted a new model called Chocolate, which significantly outperformed most of the existing models available on the platform. This sparked speculation about its origins and which frontier model it could be. In fact, there are two standout models—besides Chocolate, another model named Kiwi was noticed about a week earlier. Neither of them reveals their creators, making identification difficult. Compared to other models, which typically provide some indication of their origins, these two seem to be intentionally well-protected.

One of the main theories is that these models could be Grok 3 and Grok 3 Mini. However, there is no concrete evidence to support this claim. The speculation began after Elon Musk shared prompts that were reportedly generated by Grok 3, and users started comparing them to responses from Chocolate and Kiwi. Another possibility is that one of these models could be o3 or Opus 3.5.

people are saying kiwi and chocolate are grok 3?

keeps sayings it's a claude model when i ask it to make a twitter clone and tell me what model it is at the end? pic.twitter.com/GSmmRngpPL
— 🍓🍓🍓 (@iruletheworldmo) February 9, 2025

An unusual behavior observed in one of these models is its approach to generating SVGs. At one point, it glitched and began responding exclusively in SVG format. Even when asked to provide text, it would return an SVG with the text embedded as an image. It seemed as though the model had determined this was the most efficient way to communicate. More intriguingly, it felt like the model was attempting to take control over how LM Arena processed its output, overriding the expected format.

Another unique aspect is how it constructs images. When asked to draw a robot, for example, it doesn’t generate the full image at once. Instead, it builds it gradually, adding components step by step. This kind of structured generation, where it appears to reason about each element before adding it, is something rarely seen in other models. Additionally, if prompted to modify a specific part of the image—such as changing the eye color—it would instantly adjust only that part and return the updated SVG without regenerating the entire image.

I have never seen anything like this on @lmarena_ai before. Regardless of which model this is, it is worth testing (chocolate).

In my case, it glitched and started only responding with SVGs, as if it decided on its own that it was a better representation 👀

It was drawing them… https://t.co/HKQet3PiQC pic.twitter.com/4A67HyA2KU
— TestingCatalog News 🗞 (@testingcatalog) February 8, 2025

This kind of precision and reasoning makes the model highly unusual. Given its exceptional coding abilities and its advanced handling of SVGs, many users believe it could be Grok 3, o3, or Opus 3.5. Notably, the way it creates SVGs is quite similar to Claude’s Sonnet 3.5. However, with no official confirmation, the mystery remains. It will be interesting to see what these models turn out to be in the end.