Model Context Protocol (MCP) and Apache Iceberg
The Model Context Protocol (MCP) is an open standard (originally introduced by Anthropic) that defines a uniform interface for connecting AI agents and LLMs to external tools and data sources. MCP servers expose capabilities as “tools” that LLMs can call natively — enabling AI agents to interact with APIs, databases, file systems, and data platforms without custom integration code for each model.
Apache Iceberg + MCP is a powerful combination: an MCP server for Iceberg exposes lakehouse catalog discovery, table schema inspection, and SQL query execution as LLM-callable tools. This gives any MCP-compatible AI agent (Claude, GPT-4o, Gemini, Llama, and others) direct access to governed Iceberg lakehouse data.
Why MCP + Iceberg Matters
The traditional AI-to-data integration pattern requires custom code for every model and every data source. MCP standardizes this:
- Any MCP-compatible LLM can call Iceberg tools without model-specific integration code.
- Tool definitions (schemas, descriptions) guide the LLM on when and how to call each Iceberg capability.
- Governed access: The MCP server authenticates with the Iceberg catalog (via the REST API) and enforces access control before returning data.
Core Iceberg MCP Tools
A well-designed Iceberg MCP server exposes tools such as:
list_namespaces
Lists all available namespaces (databases/schemas) in the catalog. The LLM uses this to discover what data domains are available.
list_tables
Lists all tables within a namespace. Returns table names and optional descriptions.
describe_table
Returns the full schema of a table: column names, data types, descriptions, and partition spec. The LLM uses this to understand what data is available and how to query it correctly.
query_iceberg
Executes a SQL query against the Iceberg catalog and returns results. This is the primary data access tool.
get_table_snapshots
Returns the snapshot history for a table — enabling the LLM to implement time travel queries or understand when data was last updated.
get_recent_rows
Returns a sample of recent rows from a table — useful for the LLM to understand data formats and values before writing a complex query.
Example MCP Server (Python / PyIceberg)
from mcp.server import Server
from mcp.types import Tool, TextContent
from pyiceberg.catalog import load_catalog
import json
app = Server("iceberg-mcp")
catalog = load_catalog("my_catalog", **{
"type": "rest",
"uri": "https://my-catalog.example.com",
"credential": "client-id:client-secret",
})
@app.list_tools()
async def list_tools():
return [
Tool(
name="list_tables",
description="List all Iceberg tables in a namespace",
inputSchema={
"type": "object",
"properties": {
"namespace": {"type": "string", "description": "Database/namespace name"}
},
"required": ["namespace"]
}
),
Tool(
name="describe_table",
description="Get the schema and properties of an Iceberg table",
inputSchema={
"type": "object",
"properties": {
"namespace": {"type": "string"},
"table": {"type": "string"}
},
"required": ["namespace", "table"]
}
),
]
@app.call_tool()
async def call_tool(name: str, arguments: dict):
if name == "list_tables":
tables = catalog.list_tables(arguments["namespace"])
return [TextContent(type="text", text=json.dumps([str(t) for t in tables]))]
elif name == "describe_table":
table = catalog.load_table(f"{arguments['namespace']}.{arguments['table']}")
schema_str = str(table.schema())
return [TextContent(type="text", text=schema_str)]
MCP + Dremio: The Agentic Lakehouse Pattern
Dremio’s Agentic Lakehouse is designed to serve as the data backend for AI agent workflows. Dremio’s AI Semantic Layer (virtual datasets, column descriptions, pre-defined metrics) provides the context that MCP tools return to LLMs — making AI-generated queries more accurate and trustworthy.
A Dremio MCP server pattern:
- LLM calls
list_namespaces→ Dremio returns available business domains. - LLM calls
describe_tablewith a selected table → Dremio returns schema + semantic descriptions. - LLM generates SQL grounded in the semantic context.
- LLM calls
query_iceberg→ Dremio executes against Iceberg, returns results. - LLM synthesizes a natural-language answer.
MCP Clients and Compatibility
MCP is supported by:
- Claude (Anthropic): Native MCP support in the Claude API.
- GitHub Copilot / VS Code: MCP extension support.
- LangChain / LangGraph: MCP tool wrapping.
- AutoGen (Microsoft): MCP tool integration.
- Continue.dev: Local AI assistant with MCP support.
The MCP ecosystem is rapidly expanding. Any new model or agent framework that adopts MCP can immediately leverage Iceberg MCP servers — making Iceberg data available to the entire AI tooling ecosystem without per-model integration work.
Security Considerations
Iceberg MCP servers should:
- Authenticate with the catalog using service account credentials, not user credentials.
- Scope catalog access to only the namespaces/tables the AI agent should see.
- Use the catalog’s credential vending to get object-storage-scoped access for data reads.
- Log all tool calls for audit.
- Limit query complexity (timeout, row limits) to prevent resource abuse.