New Gemma 4 AI models understand images and audio

New Gemma 4 models demonstrate expanded multimodal understanding and enhanced efficiency, pushing boundaries for AI development across personal devices and cloud platforms.

Google DeepMind has rolled out its latest suite of Gemma 4 models, marking a significant progression in open-weight large language model technology. The released models, including Gemma 4 31B IT Thinking, Gemma 4 26B A4B IT Thinking, Gemma 4 E4B IT Thinking, and Gemma 4 E2B IT Thinking, showcase enhanced capabilities in processing not only text but also images, video, and audio.

Expanded Multimodal Horizons

The Gemma 4 family distinguishes itself through its expanded multimodal functionalities. Across all model sizes, the systems exhibit support for variable aspect ratio and resolution images. Specifically, the E2B and E4B variants offer native audio and video processing, broadening their application scope considerably. This advancement positions Gemma 4 for more complex tasks, including rich audio-visual understanding and agentic tool use, as demonstrated by its performance on benchmarks like τ2-bench for agentic tool use.

Performance and Safety Gains

"Gemma 4 models significantly outperform Gemma 3 and 3n models in improving safety, while keeping unjustified refusals low."

Evaluations reveal substantial improvements in safety metrics compared to previous Gemma iterations. The models produced minimal policy violations in text-to-text and image-to-text tasks. This focus on safety, coupled with maintained or improved performance on diverse datasets such as MMMLU (Multilingual Q&A) and AIME 2026 Mathematics, underscores a commitment to responsible AI development.

Accessibility and Deployment

The Gemma 4 models are designed for flexible deployment. A JAX library, available on GitHub, facilitates using and fine-tuning the models on personal hardware, including CPUs, GPUs, and TPUs. Furthermore, Gemma 4 is now accessible on Google Cloud, integrated with services like Vertex AI, Google Kubernetes Engine (GKE), and Google Compute Engine (GCE), offering developers robust options for scaling their AI applications. The integration with Google ADK (AI Development Kit) also enables the creation of fully functional AI agents.

Development and Training

The underlying strength of Gemma 4 stems from the quality and diversity of its training data. While specific details on the training dataset remain largely undisclosed, the model card highlights its extensibility for building autonomous agents capable of planning, navigating applications, and completing tasks via native function calling support.

Frequently Asked Questions

Q: What are the new Gemma 4 AI models?

Google DeepMind has released new Gemma 4 AI models that are better at understanding different types of information. They can now process text, images, audio, and video.

Q: How do the Gemma 4 models work with images, audio, and video?

The Gemma 4 models can understand images with different sizes and clarity. Some versions can also directly process audio and video, which helps them do more complex tasks.

Q: Are the new Gemma 4 models safer than older ones?

Yes, the Gemma 4 models are designed to be safer and make fewer mistakes than previous Gemma models. They had fewer policy violations in tests.

Q: How can developers use the new Gemma 4 models?

Developers can use the Gemma 4 models on their own computers or through Google Cloud services. They can also use tools like the JAX library or Google ADK to build AI agents.

New Gemma 4 AI models understand images and audio

Expanded Multimodal Horizons

Performance and Safety Gains

Accessibility and Deployment

Development and Training

Frequently Asked Questions

NewsRadar

The Present

Search Records

Explore

New Gemma 4 AI models understand images and audio

Expanded Multimodal Horizons

Performance and Safety Gains

Accessibility and Deployment

Development and Training

Frequently Asked Questions

Know What Changed

Huawei focuses on watches and phones due to chip limits

Unmarked Chip on Motherboard Halts Repair for iFixit User

DeepSeek V4-Pro AI Model Price Cut 75% in China

AWS MCP Server for AI Agents Now Available in US East 1 and EU Central 1

Google Maps Needs Location Access To Show Your Map

AI Assistant Changes How It Answers Questions

NewsRadar

The Present

Search Records

Explore