The Weekly AI Roundup: OpenAI Release o3, AI Talks To Dolphins, Kling 2 Live, Google Glasses Return

The Weekly AI Roundup: April 14 – April 20

OpenAI releases GPT 4.1, which will be OpenAI’s flagship model for complex tasks. The release also adds GPT 4.1 Mini and GPT 4.1 Nano. All of these models are only available via the API. These models have a significantly larger context window of up to 1 million tokens and they come with better instruction following and greater coding efficiency.

In another instance where we have to double check the date isn’t April 1st. Google has announced that it’s going to help humans talk to dolphins Google is working in partnership with the Dolphin project and has developed Dolphin Gemma, an AI model aimed at translating dolphin vocalisations.

Freepik has introduced a new tool called Composition Reference that allows you to generate a visual from either a reference image or a sketch with notes. Creating AI images from images is nothing new, but this tool emphasises the structural elements of the image rather than the style.

Hugging Face is a leading AI development platform and today it announced that it has acquired Pollen Robotics, a French startup known for its open Source humanoid robot Reachy2.

The Verge has reported that OpenAI is developing its own social network designed to compete with platforms like X. The project is said to be in its early stages with an internal prototype that incorporates ChatGPT’s image generation capabilities into a social feed.

Google’s VEO2 video generation model is now accessible to Gemini Advanced users and Google One AI Premium subscribers. VEO2 allows users to create high resolution 8 second videos from text prompts, and it’s been vying for the best generative video spot since its release.

Kling have launched Kling 2.0, significantly enhancing their text to video technology. This update offers improved motion dynamics, prompt adherence and realistic character expressions, allowing for high quality cinematic visuals.

OpenAI released two new models, O3 and O4 Mini, which are now their flagship reasoning models. The real highlight is the fact that these models now incorporate tools like web search, Python coding, image analysis and file interpretation all into a single work stream, enabling them to handle complex tasks with multimodal inputs and outputs.

OpenAI also launched a new feature for ChatGPT called the Image Library, enabling users to manage all their AI generated images in one place.

In more OpenAI news reports have emerged that they are currently negotiating to acquire Windsurf, the AI coding tool that has become a fan favourite and the major competitor to cursor. It said that OpenAI is looking to acquire windsurf for approximately $3 billion.

xAI announced Grok Studio, a rival to the Canvas style tools that we see available in other models like Gemini and Claude. The Studio option allows real time collaboration on documents, code reports and of course any games that you Vibe coded. Grok Studio is able to execute code in Python, C and JavaScript and it can integrate with Google Drive for file management.

Grok now has a memory that will remember all of your past conversations. This update allows Grok to provide tailored recommendations and advice based on the user’s previous chats. Grok has also added workspaces. This allows users to organise conversations and files organ all in one place. This means that users can return to a workspace to continue an ongoing conversation or piece of research without losing all of the previous context.

Gemini Live has been a big success since its launch and today Google announced that it will be expanding its availability to all Android users regardless of the plan that you have.

In a novel approach to benchmarking AI models, video game bench, or VGBench is a new open source tool that tests vision language models like Claude 3.7 and Gemini 2.5 on 1990s video games. Claude 3.7 is the best gamer out of all the LLMs, and you won’t find a cooler claim than that.

Abacus AI has launched an interesting looking new tool called DeepAgent. DeepAgent is designed to combine a range of tools, sources and LLMs together to take on tasks that would usually fall outside of the possibilities for just a single service.

Midjourney has just released a new user interface that makes it look more like a fully fledged editing tool than a mere image generator. With the new layers and smart selection features, users are able to achieve amazing amounts of control over the editing process and make fundamental changes to a scene’s composition in seconds.

Krea have launched an amazing new 3D scene creation tool called Stage. This new tool allows users to create a scene and then fill it with 3D models entirely from text prompts.

In a TED video has been published showing a live demo of Google’s prototype AI smart glasses. Powered by the Android XR platform, these glasses feature a miniature heads up display and leverage Google’s Gemini multimodal AI capabilities to produce some amazing results.

more insights