The Weekly AI Roundup: Gemini Removes Watermarks, AI Doubling Every 7 Months, OpenAI Text-To-Speech

Weekly AI Roundup March 17-23
The new Gemini Flash Experimental model has been seriously impressing us with its capabilities. There have been numerous examples of how the image generation output is capable of taking on advanced image editing tasks that would typically take hours to achieve in a tool like Photoshop. Most controversial is its ability to remove watermarks from images.

The Chinese search company Baidu released two new AI models at the weekend. Ernie X1 is a reasoning model that is said to be on par with DeepSeek, but at only half of the price. The foundational model Ernie 4.5 is said to be on par with GPT 4.5, but at an astonishing 1% of the cost.

ReCamMaster is a new video model that will take an input video and then regenerate it from a completely different camera angle or motion.

Perplexity released the second half of their teaser video. That was a promotion encouraging people to “Ask Perplexity”, rather than checking Google for a question.

The video starred Squid Game’s Lee Jung-Jae and it’s received praise and applause from everyone.

It feels like Google AI are rolling out important updates every few days at the moment, and today they announced ‘Canvas’ and ‘Audio Overview’. As you’d expect, Canvas adds an output panel in the chat window and creates artefacts based on your requests. Most typically, this will be for generating and editing code.

Audio Overview allows you to add a file and have it converted into an engaging podcast style discussion. Both of these tools are available to try in Gemini right now.

xAI has acquired the AI video generation company Hotshot. We really haven’t heard much out of Hotshot since they first made some waves shortly after their launch in 2024, so it’s quite interesting to see them being consumed into the xAI team.

StabilityAI has released a preview of a research model called ‘Stable Virtual Camera’. This is one of the most intriguing tools that we’ve seen recently.It allows the user to add a single flat image and then generate a 3D video output.

New research suggests that the problem solving capability of AI is doubling every seven months.The AI evaluation company Metr tests against ‘length of task’ as one of their key metrics. This means that if AI models are capable of solving a problem that would take a human 10 minutes to do today, in seven months time, then will be capable of solving a task that would take a human 20 minutes to complete.

Windsurf has launched Wave 5, significantly enhancing its AI assisted coding tool with the introduction of the ‘Windsurf Tab’ feature. This update as allows the AI to predict entire code sections, navigate through files and insert import statements with just a press of the tab key. The aim for Windsurf is to provide a seamless, intuitive coding environment that assists and predicts to make coding as efficient as possible.

Grok rolled out the imaginatively titled ‘Deeper Research’ option that will dedicate longer thinking time and apply more reasoning to your request. In our test, the Deeper Research tool took almost three minutes of thinking time before completing the request and the results were pretty impressive.

After being taught to dance and fight, the Unitree G1 became the first humanoid robot to perform a side flip.
This amazing display of athletic ability is probably the most impressive single motion that we’ve seen from a robot to date. How long before robots are able to perform feats that far exceed the abilities of humans?

OpenAI has introduced three new audio models through their API, two for speech-to-text and one for text-to-speech.These models allow developers to build AI agents that can speak with natural voice interactions and open up whole new frontiers of possibilities when working with the OpenAI API.

Anthropic has released real time web searching in Claude. Not having the very latest information to hand has been a bit of a barrier for Claude for some users, but now the whole Internet is available as a source.

Google’s Notebook LM has a great new Mind Maps feature. This allows users to convert their notes and sources into visual diagrams, aiding in the organisation and comprehension of complex information. This feature should really transform the way users interact with the information produced by Notebook.

Speaking at the Tesla ‘All Hands’ event, Elon Musk talked a lot about its humanoid robot Optimus. Elon said “Optimus will be the biggest product of all time by far. Nothing will even be close. I think it’ll be ten times bigger than the next biggest product ever made.”

Perplexity CEO Aravind Srinivas announced that starting next week, their new agentic browser Comet will begin public testing with a small set of users. He also hinted that this intriguing new product might only be weeks from a full release.

Grok now has a dedicated image editing mode. It had already been possible to instruct Grok to make some edits to an image, but now it’s properly built into the interface.

more AI News