Databricks makes ai_extract, ai_classify, and ai_query generally available, bringing AI into SQL pipelines without code
Databricks completed its native AI SQL function suite in June 2026, promoting ai_extract, ai_classify, and ai_query to general availability in a series of releases between June 11 and June 15 — giving data engineers and analysts the ability to run structured AI tasks directly from SQL queries against any connected model.
What's new
ai_query (GA: June 15) is the general-purpose entry point: it lets users "query any supported AI model directly from SQL or Python," treating AI inference as a first-class SQL operation. The function can call hosted models on Databricks' Foundation Model APIs — including Anthropic's Claude family, OpenAI's GPT series, and Databricks' own Genie models — without leaving a SQL notebook or pipeline.
ai_extract (GA: June 11) is specialized for document parsing. It "extracts structured data from text and documents according to a schema you provide, supporting nested objects, arrays, type validation, citations, and confidence scores." A data engineer can write a SQL function call that takes a contract PDF or customer email and returns a typed JSON object — no Python glue code required.
ai_classify (GA: June 11) handles categorization: it "classifies text content according to custom labels you provide, with support for label descriptions, global instructions, and multi-label classification." Sentiment labeling, support ticket routing, content moderation, and taxonomy assignment can all be implemented as SQL expressions.
Three additional updates complete the June picture:
- Claude Fable 5 became available as a Databricks-hosted model through Foundation Model APIs pay-per-token on June 9, the same day Anthropic launched the model — making it available in Databricks workspaces without additional API key configuration.
- Genie Code expanded on June 11 to support OpenAI on Databricks alongside existing providers, and received an auto-approve mode on June 4 that allows tool actions to run without per-step human confirmation.
- Vector Search was renamed to AI Search on June 1. The rebranded service now supports full-text search indexes that require no vectors or embeddings — extending search to teams that do not run embedding pipelines.
Context
Databricks has been building toward SQL-native AI since it introduced Unity Catalog-governed AI models in late 2024. The AI Functions suite (ai_extract, ai_classify, ai_query) represents the practical output of that groundwork: AI capabilities that follow Databricks' permissioning and audit model, run inside the Lakehouse where data already lives, and appear to SQL users the same way as any other function.
This approach contrasts with workflows where data is extracted to a separate system for AI processing. The native SQL path reduces data movement, keeps AI inferences within existing governance boundaries, and makes AI accessible to users who know SQL but not Python or API SDKs.
Why it matters
For enterprise data teams, the AI SQL functions remove a significant adoption barrier. The alternative — writing Python UDFs, managing API clients, handling authentication, and orchestrating inference jobs — has historically required data engineering resources that most analytics teams do not have. A SELECT ai_classify(description, array('billing', 'technical', 'cancellation')) FROM support_tickets call requires no new tooling.
The simultaneous GA of three functions plus same-day availability of Claude Fable 5 signals that Databricks is moving quickly to position itself as the platform layer where enterprise AI runs closest to the data. As more organizations accumulate large quantities of unstructured text — contracts, emails, tickets, documents — the value of AI functions that operate on that text without leaving the Lakehouse compounds.
Corroborating sources
- Docs.databricks
https://docs.databricks.com/aws/en/release-notes/product/2026/june
“ai_extract: extracts structured data from text and documents according to a schema you provide, supporting nested objects, arrays, type validation, citations, and confidence scores.”