For the previous three months, Google’s Gemini 3 Professional has held its floor as one of the vital succesful frontier fashions out there. However within the fast-moving world of AI, three months is a lifetime — and opponents haven’t been standing nonetheless.
Earlier at present, Google launched Gemini 3.1 Professional, an replace that brings a key innovation to the corporate’s workhorse energy mannequin: three ranges of adjustable considering that successfully flip it into a light-weight model of Google’s specialised Deep Assume reasoning system.
The discharge marks the primary time Google has issued a “level one” replace to a Gemini mannequin, signaling a shift within the firm’s launch technique from periodic full-version launches to extra frequent incremental upgrades. Extra importantly for enterprise AI groups evaluating their mannequin stack, 3.1 Professional’s new three-tier considering system — low, medium, and excessive — offers builders and IT leaders a single mannequin that may scale its reasoning effort dynamically, from fast responses for routine queries as much as multi-minute deep reasoning periods for complicated issues.
The mannequin is rolling out now in preview throughout the Gemini API through Google AI Studio, Gemini CLI, Google’s agentic growth platform Antigravity, Vertex AI, Gemini Enterprise, Android Studio, the buyer Gemini app, and NotebookLM.
The ‘Deep Assume Mini’ impact: adjustable reasoning on demand
Probably the most consequential characteristic in Gemini 3.1 Professional is just not a single benchmark quantity — it’s the introduction of a three-tier considering degree system that offers customers fine-grained management over how a lot computational effort the mannequin invests in every response.
Gemini 3 Professional provided solely two considering modes: high and low. The brand new 3.1 Professional provides a medium setting (much like the earlier excessive) and, critically, overhauls what “excessive” means. When set to excessive, 3.1 Professional behaves as a “mini model of Gemini Deep Assume” — the corporate’s specialised reasoning mannequin that was up to date simply final week.
The implication for enterprise deployment may very well be important. Moderately than routing requests to completely different specialised fashions based mostly on job complexity — a typical however operationally burdensome sample — organizations can now use a single mannequin endpoint and regulate reasoning depth based mostly on the duty at hand. Routine doc summarization can run on low considering with quick response instances, whereas complicated analytical duties will be elevated to excessive considering for Deep Assume–caliber reasoning.
Benchmark Efficiency: Extra Than Doubling Reasoning Over 3 Professional
Google’s printed benchmarks inform a narrative of dramatic enchancment, notably in areas related to reasoning and agentic functionality.
On ARC-AGI-2, a benchmark that evaluates a mannequin’s means to resolve novel summary reasoning patterns, 3.1 Professional scored 77.1% — greater than double the 31.1% achieved by Gemini 3 Professional and considerably forward of Anthropic’s Sonnet 4.6 (58.3%) and Opus 4.6 (68.8%). This consequence additionally eclipses OpenAI’s GPT-5.2 (52.9%).
The good points lengthen throughout the board. On Humanity’s Final Examination, a rigorous tutorial reasoning benchmark, 3.1 Professional achieved 44.4% with out instruments, up from 37.5% for 3 Professional and forward of each Claude Sonnet 4.6 (33.2%) and Opus 4.6 (40.0%). On GPQA Diamond, a scientific data analysis, 3.1 Professional reached 94.3%, outperforming all listed opponents.
The place the outcomes develop into notably related for enterprise AI groups is within the agentic benchmarks — the evaluations that measure how properly fashions carry out when given instruments and multi-step duties, the form of work that more and more defines manufacturing AI deployments.
On Terminal-Bench 2.0, which evaluates agentic terminal coding, 3.1 Professional scored 68.5% in comparison with 56.9% for its predecessor. On MCP Atlas, a benchmark measuring multi-step workflows utilizing the Mannequin Context Protocol, 3.1 Professional reached 69.2% — a 15-point enchancment over 3 Professional’s 54.1% and almost 10 factors forward of each Claude and GPT-5.2. And on BrowseComp, which exams agentic net search functionality, 3.1 Professional achieved 85.9%, surging previous 3 Professional’s 59.2%.
Why Google selected a ‘0.1’ launch — and what it indicators
The versioning determination is itself noteworthy. Earlier Gemini releases adopted a sample of dated previews — a number of 2.5 previews, for example, earlier than reaching common availability. The selection to designate this replace as 3.1 relatively than one other 3 Professional preview suggests Google views the enhancements as substantial sufficient to warrant a model increment, whereas the “level one” framing units expectations that that is an evolution, not a revolution.
Google’s weblog submit states that 3.1 Professional builds straight on classes from the Gemini Deep Assume sequence, incorporating methods from each earlier and newer variations. The benchmarks strongly recommend that reinforcement studying has performed a central position within the good points, notably on duties like ARC-AGI-2, coding benchmarks, and agentic evaluations — precisely the domains the place RL-based coaching environments can present clear reward indicators.
The mannequin is being launched in preview relatively than as a common availability launch, with Google stating it is going to proceed making developments in areas corresponding to agentic workflows earlier than shifting to full GA.
Aggressive implications in your enterprise AI stack
For IT determination makers evaluating frontier mannequin suppliers, Gemini 3.1 Professional’s launch has to not solely make them rethink which fashions to decide on but additionally learn how to adapt to such a quick tempo of change for their very own services and products.
The query now’s whether or not this launch triggers a response from opponents. Gemini 3 Professional’s authentic launch final November set off a wave of mannequin releases throughout each proprietary and open-weight ecosystems.
With 3.1 Professional reclaiming benchmark management in a number of vital classes, the strain is on Anthropic, OpenAI, and the open-weight group to reply — and within the present AI panorama, that response is probably going measured in weeks, not months.
Availability
Gemini 3.1 Professional is accessible now in preview via the Gemini API in Google AI Studio, Gemini CLI, Google Antigravity, and Android Studio for builders. Enterprise clients can entry it via Vertex AI and Gemini Enterprise. Customers on Google AI Professional and Extremely plans can entry it via the Gemini app and NotebookLM.


