Skip to content
Dremio-Specific Engine & Optimizations Last updated: May 29, 2026

Dremio Arrow Flight SQL

Dremio Arrow Flight SQL is a high-performance database connectivity protocol based on Apache Arrow and gRPC, transferring columnar query results over the network without serialization overhead.

arrow flight sqldremio flight sqlcolumnar protocol databasegrpc data transfereliminate jdbc serialization tax

Dremio Arrow Flight SQL

Dremio Arrow Flight SQL is an open source database connectivity protocol built on Apache Arrow Flight and gRPC. It is designed to replace legacy transport protocols like Java Database Connectivity (JDBC) and Open Database Connectivity (ODBC) for bulk data retrieval.

While legacy protocols require query engines to convert memory structures into row-based formats before network transmission (and require clients to deserialize them back), Arrow Flight SQL transmits columnar data buffers directly. This eliminates the CPU-intensive serialization and deserialization (serde) tax.

The Serialization Tax and the gRPC Framework

In traditional client-server database architectures:

Server Memory (Columnar) ──[Row Conversion]──> Network Wire (Row-Based) ──[Columnar Conversion]──> Client Memory (Columnar)

This double translation wastes significant CPU cycles when transferring millions of rows. Arrow Flight SQL streamlines this process:

Server Memory (Arrow Columnar) ──[Direct gRPC Stream]──> Client Memory (Arrow Columnar)

By streaming Arrow record batches directly over gRPC (HTTP/2-based RPC framework), the data remains in its native columnar layout from server to client.

Core Features of Dremio’s Implementation

Dremio natively implements an Arrow Flight SQL server interface, enabling clients to benefit from several key capabilities:

Python Integration Example

Connecting a Python application to Dremio via Arrow Flight SQL allows analysts to load Iceberg table queries directly into pandas or Polars DataFrames in seconds. The following script illustrates connection initialization and data retrieval:

from pyarrow import flight

/* Define connection credentials and Dremio Flight endpoint */
host = "sql.dremio.cloud:443"
token = "my_dremio_personal_access_token"

/* Initialize Flight client with authorization headers */
client = flight.FlightClient(f"grpc+tls://{host}")
options = flight.FlightCallOptions(headers=[(b"authorization", f"Bearer {token}".encode())])

/* Define SQL query against the semantic layer */
query = 'SELECT * FROM "Sales Space".analytics.orders'
descriptor = flight.FlightDescriptor.for_command(query.encode("utf-8"))

/* Retrieve Flight info and stream data blocks */
info = client.get_flight_info(descriptor, options)
reader = client.do_get(info.endpoints[0].ticket, options)
table = reader.read_all()

/* Convert Arrow table to local pandas DataFrame */
df = table.to_pandas()
print(df.head())

📚 Go Deeper on Apache Iceberg

Alex Merced has authored three hands-on books covering Apache Iceberg, the Agentic Lakehouse, and modern data architecture. Pick up a copy to master the full ecosystem.

← Back to Iceberg Knowledge Base