Skip to content

Connect Dremio Software to Dremio Cloud: Hybrid Federation Across Deployments

Published: at 05:00 AM

Dremio Cloud can connect to Dremio Software (self-managed) instances as a federated data source. This creates a hybrid deployment where Dremio Cloud serves as the primary query interface while accessing datasets managed by Dremio Software instances running in your own data centers or private cloud.

This connector is designed for organizations that have existing Dremio Software deployments and are adopting Dremio Cloud for new workloads, or that need to federate data across a cloud-managed Dremio platform and on-premises Dremio instances.

Why Connect Dremio Software to Dremio Cloud

Hybrid Federation

Your Dremio Software instance manages on-premises data sources — Oracle databases, SQL Server, network-attached file storage, and internal data lakes. Dremio Cloud manages cloud-native sources — S3, BigQuery, Snowflake, and cloud-hosted databases. By connecting Dremio Software to Dremio Cloud, you can write a single SQL query that joins on-premises data (through Dremio Software) with cloud data (through Dremio Cloud).

-- Join on-premises data via Dremio Software with cloud data in Dremio Cloud
SELECT
  cloud.customer_name,
  cloud.cloud_revenue,
  onprem.erp_balance,
  onprem.last_payment_date,
  CASE
    WHEN cloud.cloud_revenue > 100000 AND onprem.erp_balance < 5000 THEN 'Good Standing'
    WHEN onprem.erp_balance > 50000 THEN 'At Risk'
    ELSE 'Standard'
  END AS account_health
FROM analytics.gold.cloud_customers cloud
JOIN "dremio-onprem".onprem.erp_accounts onprem ON cloud.customer_id = onprem.customer_id
ORDER BY cloud.cloud_revenue DESC;

Incremental Cloud Migration

Organizations don’t shut down on-premises data centers overnight. Connecting Dremio Software to Dremio Cloud lets you:

  1. Start using Dremio Cloud for new cloud-native workloads
  2. Continue using Dremio Software for on-premises sources
  3. Federate across both from a single Dremio Cloud interface
  4. Gradually migrate data sources from Software to Cloud as on-premises systems are decommissioned

Consolidated Governance

Users access both on-premises and cloud data through Dremio Cloud’s interface. Dremio Cloud’s governance policies (column masking, row-level filtering) apply to the federated view of data, providing a single governance layer across all data.

Prerequisites

Step-by-Step: Connect Dremio Software to Dremio Cloud

1. Add the Source

Click ”+” in the Dremio Cloud console and select Dremio from the source types.

2. Configure Connection

3. Set Authentication

Provide credentials for a Dremio Software user account. Consider creating a dedicated service account with appropriate permissions:

4. User Impersonation

User impersonation allows Dremio Cloud to pass the identity of the requesting user to Dremio Software. When enabled, queries executed through Dremio Cloud run with the permissions of the authenticated user on the Dremio Software side. This preserves your existing Dremio Software access control policies.

Without impersonation, all Cloud queries execute as the service account configured in the connection, which may have broader access than individual users should.

5. Configure Advanced Settings

Set Reflection Refresh, Metadata refresh intervals, and connection properties. Click Save.

Querying Across Deployments

-- Query on-premises data through Dremio Software
SELECT
  department,
  employee_count,
  avg_salary
FROM "dremio-onprem".hr.department_summary;

-- Join on-premises HR data with cloud-native analytics
SELECT
  d.department,
  d.employee_count,
  d.avg_salary,
  c.department_cloud_spend,
  ROUND(c.department_cloud_spend / d.employee_count, 2) AS cloud_cost_per_employee
FROM "dremio-onprem".hr.department_summary d
JOIN analytics.gold.cloud_infrastructure_costs c ON d.department = c.department
ORDER BY cloud_cost_per_employee DESC;

Build a Semantic Layer Across Deployments

CREATE VIEW analytics.gold.enterprise_360 AS
SELECT
  onprem.employee_id,
  onprem.employee_name,
  onprem.department,
  onprem.office_location,
  cloud.cloud_account_id,
  cloud.monthly_cloud_spend,
  CASE
    WHEN cloud.monthly_cloud_spend > 10000 THEN 'Heavy Cloud User'
    WHEN cloud.monthly_cloud_spend > 1000 THEN 'Moderate'
    ELSE 'Light'
  END AS cloud_usage_tier
FROM "dremio-onprem".hr.employees onprem
LEFT JOIN analytics.gold.cloud_accounts cloud ON onprem.employee_id = cloud.owner_id;

In the Catalog, click EditGenerate Wiki and Generate Tags.

AI-Powered Analytics Across Deployments

Dremio AI Agent

The AI Agent lets users ask questions spanning both on-premises and cloud data: “Which departments have the highest cloud cost per employee?” or “Show me heavy cloud users in the engineering department.” The Agent reads your semantic layer’s wiki descriptions and generates SQL that joins across both Dremio deployments.

Dremio MCP Server

Connect Claude or ChatGPT to your federated data:

  1. Create a Native OAuth app in Dremio Cloud
  2. Configure redirect URLs for your AI client
  3. Connect via mcp.dremio.cloud/mcp/{project_id}

A CTO asks Claude “Compare cloud infrastructure costs per department with on-premises headcount” and gets insights spanning both deployment models.

AI SQL Functions

-- Classify departments by cloud optimization potential
SELECT
  department,
  employee_count,
  cloud_cost_per_employee,
  AI_CLASSIFY(
    'Based on cloud spending patterns, classify optimization potential',
    'Department: ' || department || ', Employees: ' || CAST(employee_count AS VARCHAR) || ', Cloud Cost/Employee: $' || CAST(cloud_cost_per_employee AS VARCHAR),
    ARRAY['Well Optimized', 'Room for Improvement', 'Over-Provisioned', 'Needs Audit']
  ) AS optimization_status
FROM (
  SELECT
    d.department,
    d.employee_count,
    ROUND(c.department_cloud_spend / d.employee_count, 2) AS cloud_cost_per_employee
  FROM "dremio-onprem".hr.department_summary d
  JOIN analytics.gold.cloud_infrastructure_costs c ON d.department = c.department
);

Important Considerations

Network Latency

Cross-network queries between Dremio Cloud and on-premises Dremio Software add network latency. Optimize by:

Cloud Egress Costs

Data returned from Dremio Software to Dremio Cloud may incur cloud egress charges if the Software instance runs in a different network or cloud provider. Strategies to minimize egress:

Version Compatibility

Keep Dremio Software at version 24.0 or later for best compatibility with Dremio Cloud. Older versions may have limited feature support through the federation connector.

Security

Monitoring and Troubleshooting

Monitor the health and performance of your hybrid deployment:

Track these metrics to ensure your hybrid architecture delivers consistent performance as usage grows.

Governance Across Deployments

Dremio Cloud’s Fine-Grained Access Control (FGAC) applies governance to the federated view of data:

Connect BI Tools via Arrow Flight

BI tools connected to Dremio Cloud via Arrow Flight access both cloud and on-premises data through a single connection:

All queries benefit from Reflections, governance, and the semantic layer — regardless of where the source data resides.

VS Code Copilot Integration

Dremio’s VS Code extension with Copilot integration enables developers to query federated data from their IDE. Ask Copilot “Compare cloud costs per department with on-premises headcount” and it generates SQL using your semantic layer that spans both deployments.

Reflections for Hybrid Optimization

Create Reflections on hybrid views to cache cross-deployment query results:

  1. Build views that join Cloud and Software data
  2. Create Reflections on those views
  3. Set refresh intervals based on how frequently the underlying on-premises data changes

After creation, dashboard queries that span both deployments are served from Dremio Cloud’s Reflection cache — eliminating network latency for repeat queries.

Migration Planning: Software to Cloud

Use the Dremio-to-Dremio connector as a migration bridge:

  1. Phase 1 — Federation: Connect Dremio Software to Dremio Cloud. All existing Software views remain accessible from Cloud.
  2. Phase 2 — Parallel Development: Build new views and Reflections in Dremio Cloud while continuing to maintain Software views.
  3. Phase 3 — Source Migration: Gradually move individual data sources (PostgreSQL, Oracle, S3) from Software connections to Cloud connections. Update views to reference Cloud-native sources.
  4. Phase 4 — Decommission: Once all sources are connected to Cloud, remove the Dremio Software connection.

During the migration, users experience no disruption — they continue querying through Dremio Cloud while the underlying sources are being transitioned.

Common Deployment Architectures

Hub-and-Spoke Model

Dremio Cloud serves as the central hub, with multiple Dremio Software instances as spokes. Each spoke manages a specific data center or business unit:

Dremio Cloud federates across all spokes, providing a single analytics interface for the entire organization.

Staged Migration Model

For organizations migrating to the cloud in waves:

Disaster Recovery Model

Dremio Software serves as a fallback if Cloud connectivity is temporarily unavailable. On-premises critical workloads run against Software; Cloud handles all other analytics. This architecture provides business continuity for mission-critical dashboards and reports.

Performance Best Practices

Maximize hybrid performance with these strategies:

Get Started

Organizations can seamlessly federate across Dremio deployments, enable AI analytics on combined on-premises and cloud data, and migrate incrementally to the cloud — all while maintaining unified governance. The Dremio-to-Dremio connector is the bridge that makes hybrid lakehouse analytics practical.

Whether you’re running a single Dremio Software instance in one data center or managing multiple Software installations across global facilities, Dremio Cloud provides a unified analytical interface. Combine the raw data processing power of on-premises Dremio Software with the AI capabilities, Reflections, and managed infrastructure of Dremio Cloud. The result is a truly hybrid analytics platform that grows with your cloud migration at whatever pace your organization requires. No rip-and-replace, no big-bang migration — just a gradual, governed transition that protects your existing investments.

Try Dremio Cloud free for 30 days and connect your existing Dremio Software instances.