Skip to content

Comprehensive Hands-on Walk Through of Dremio Cloud Next Gen (Hands-on with Free Trial)

Published: at 09:00 AM

On November 13, at the Subsurface Lakehouse Conference in New York City, Dremio announced and released Dremio Next Gen Cloud, the most complete and accessible version of its Lakehouse Platform to date. This release advances Dremio’s mission to make data lakehouses easy, fast, and affordable for organizations of any size.

This tutorial offers a hands-on introduction to Dremio and walks through the new free trial experience. With managed storage and no need to connect your own infrastructure or enter a credit card (until you want to), you can explore the full platform, including new AI features, Autonomous Performance Management, and the integrated lakehouse catalog, right away.

What is Dremio?

Dremio is a Data Lakehouse Platform for the AI Era, let’s explore what this means.

What is a Data Lakehouse?

A data lakehouse is an architecture that uses your data lake (object storage or Hadoop) as the primary data store for flexibility and openness, then adds two layers to operationalize it like a data warehouse:

Dremio is designed to unify these modular lakehouse components into a seamless experience. Unlike platforms that treat Iceberg as an add-on to proprietary formats, Dremio is built to be natively Iceberg-first—delivering a warehouse-like experience without vendor lock-in.

The Challenges of the Data Lakehouse

While lakehouses offer the benefit of serving as a central source of truth across tools, they come with practical challenges during implementation.

How Dremio’s Platform Supports the Lakehouse

Dremio simplifies many of these challenges with a platform that makes your lakehouse feel like it “just works.” It does this through several powerful features:

Dremio aims to provide a familiar and easy SQL interface to all your data.

Registering For Dremio Trial

To get started with your Dremio Trial, head over to the Getting Started Page and create a new account with your preferred method.

Getting Started Page with Dremio

If using Google/Microsoft/Github you’ll be all right up after authenticating, if signing up with your email you’ll get an email to confirm your registration.

Confirmation Email

When you create a new Dremio account, it automatically creates a new Organization, which can contain multiple Projects. The organization will be assigned a default name, which you can change later.

On the next screen, you’ll name your first project. This initial project will use Dremio’s managed storage as the default storage for the lakehouse catalog.

If you prefer to use your own data lake as catalog storage, you can create a new project when you’re ready. Currently, only Amazon S3 is supported for custom catalog storage, with additional options coming soon.

Even though S3 is the only supported option for Dremio Catalog storage at the moment, Dremio still allows you to connect to other Iceberg catalogs backed by any cloud storage solution and data lakes using its wide range of source connectors.

Choosing your Dremio Region and Project Name

Now you’ll be on your Dremio Dashboard where you’ll wait a few minutes for your organization to be provisioned.

Provisioning of Dremio Project

Once the environment is provisioned you’ll see several options including a chat box to work with the new integrated Dremio AI Agent which we will revisit later in this tutorial.

The Dremio environment is now active!

One of the best ways to get started is by adding data to Dremio by clicking add data which will open a window where you can either:

If looking for some sample files to upload, Kaggle is always a good place to find some datasets to play with.

Although for this tutorial let’s use SQL to create tables in Dremio Catalog, insert records into those tables and then query them.

Curating Your Lakehouse

Let’s go visit the data explorer to show you how you will navigate your integrated catalog and other datasources. Click on the second icon from the top in the menu that is to the left of the screen that looks like a table, this will take you to the dataset explorer.

Dremio's Navigation Menu

In the dataset explorer you’ll see two sections:

Please click on the plus sign next to “namespaces” and add a new namespace called “dremio” this will be necessary to run some SQL scripts I give you without needing to modify them.

Adding a new namespace

Now you’ll see the new dremio namespace and in their we can create new tables and views. You may notice there is already a sample data namespace which includes a variety of sample data you can use to experiment with if you want.

The new namespace has been added to Dremio

Running Some SQL

Now head over to the the “SQL Runner” a full SQL IDE built right into the Dremio experience which includes autocomplete, syntax highlighting, function lookup, the typical IDE shortcuts and much more. It is accessed by clicking the third menu icon which looks like a mini terminal window.

The Dremio SQL Runner

Let me call out a few things to your attention:

There is a lot more to learn about the SQL runner, but let’s go ahead and run some SQL. I’ve written several SQL scripts you can copy into the SQL Runner and run as is. Choose any of the below and copy them into SQL runner and run the SQL. Give the code a look over, comments in the code help explain what it is doing.

The SQL for the majority of these examples follow a similar pattern:

This represents a very typical way of using Dremio where you model your datasets not by replicating your data but logically with SQL views. Dremio’s autonomous reflections feature will see how these views are queried and dynamically determine what views should be materialized into the Dremio’s reflection cache without anyone having to lift a finger keeping everything performance on storage and compute usage. A data engineer could also manually trigger the creation of a reflection and Dremio will assign that reflection a score to help understand whether the reflection is providing value or not, we’ll show this when we got over Dremio’s settings UI.

Two of the examples do something a little different:

AI Visualizations

Now that you’ve added some data to your lakehouse but running some of my example scripts, let’s show you some of the benefits of Dremio’s new integrated AI Agent.

While if you are planning to build long lived BI dashboards you should still leverage integrations most BI tools like Tableau and Power BI have with Dremio, but if you need to just seem something visual quick or just need something to screenshot into a presentation the AI Agent can be quite helpful.

So let’s say I run the supply chain example, I could prompt the AI Agent with a prompt like:

Can you create a bar chart based on the data in the gold layer of my supply chain data

The AI Agent will pull the list of tables your particular user has access to and see if it finds the dataset I loosely referred to. It will then pull the schema and other data from the semantic layer to better understand the dataset before deciding how to best visualize it. But of course my prompt was vague and the AI agent decides to confirm some details with me:

I can create that bar chart — quick clarifying questions before I run the query
I inspected your gold layer and found the view dremio.supply_chain.gold.daily_supplier_delivery_kpis with these columns:

supplier_id (string)
receipt_date (date)
lines_received (int)
otif_rate (double)
on_time_rate (double)
in_full_rate (double)
total_received_qty (int)
total_order_qty (int)
gross_fill_rate (decimal)
To produce the bar chart I need to know which metric and timeframe you want visualized. Common options:

Average OTIF per supplier (bar per supplier showing average “otif_rate” across the dataset)
Average gross fill rate per supplier (bar per supplier showing average “gross_fill_rate”)
Total lines received per supplier for a specific date or date range (sum of “lines_received”)
Side-by-side bars per supplier showing two metrics (e.g., OTIF and gross_fill_rate) — I can produce grouped bars if you confirm.
Suggested defaults (if you don’t specify):

Metric: avg(otif_rate)
Timeframe: all available dates
Chart: vertical bar, supplier on x-axis, metric on y-axis, ordered by metric desc
If that works, I will run this SQL against the gold view and build the bar chart:

SELECT “supplier_id”, AVG(“otif_rate”) AS “avg_otif_rate”, AVG(“gross_fill_rate”) AS “avg_gross_fill_rate” FROM “first-project”.“dremio”.“supply_chain”.“gold”.“daily_supplier_delivery_kpis” — optionally add: WHERE “receipt_date” BETWEEN DATE ‘2025-08-01’ AND DATE ‘2025-08-31’ GROUP BY “supplier_id” ORDER BY “avg_otif_rate” DESC

Please confirm:

Which metric? (otif_rate, gross_fill_rate, lines_received, total_received_qty, etc.)
Timeframe (all data, a date, or a date range). If you say a month/week/day, I will assume the current year unless you specify otherwise.
Chart type (single bar, grouped bars for two metrics).
After you confirm I will run the query and produce the bar chart for you.

I respond with

1. Average OTIF per supplier
2. All data
3. Vertical Bar

It then explains to the me it’s thought process to illustrate where it needs more detail, I notice at this point it is including the project name first-project in the table name which is not necessary as this is assumed so it should start with the namespace dremio. I tell it this and get back my visualization.

Generating a Visualization with the Dremio AI Agent

We found success starting with quite a vague question but working with the AI we were able to get a visualization of a useful metric within a few minutes.

AI Function

Using your data to create visualization isn’t the only cool AI integration in the Dremio Arsenal. Dremio also has added a variety of new SQL AI Functions which allow you to do a variety of things like turn unstructured data into structured data. Let’s see a very simple example you can run right in your SQL runner assuming you have a dremio namespace.

-- Create the recipes table with an ARRAY column for ingredients (sample rows)
-- Note: this uses CREATE TABLE AS SELECT to create a physical table with sample data.
CREATE FOLDER IF NOT EXISTS dremio.recipes;
CREATE TABLE IF NOT EXISTS dremio.recipes.recipes AS
SELECT 1 AS "id",
       'Mild Salsa' AS "name",
       ARRAY['tomato','onion','cilantro','jalapeno','lime'] AS "ingredients",
       CURRENT_TIMESTAMP AS "created_at"
UNION ALL
SELECT 2, 'Medium Chili', ARRAY['beef','tomato','onion','chili powder','cumin','jalapeno'], CURRENT_TIMESTAMP
UNION ALL
SELECT 3, 'Spicy Vindaloo', ARRAY['chicken','chili','ginger','garlic','vinegar','habanero'], CURRENT_TIMESTAMP;

-- Create View where AI is used to classify each recipe as Mild, Medium or Spicy
CREATE OR REPLACE VIEW dremio.recipes.recipes_enhanced AS SELECT id,
       name,
       ingredients,
       AI_CLASSIFY('Identify the Spice Level:' || ARRAY_TO_STRING(ingredients, ','), ARRAY [ 'mild', 'medium', 'spicy' ]) AS spice_level
from   dremio.recipes.recipes;

The Dremio AI Functions

With these AI functions you can also use it pull data from JSON files or folders of images to generate structured datasets. Imagine taking a folder of scans of paper applications and turning them into an iceberg table with all the right fields by having the AI scan these images, this is kind of use case made possible by these functions.

Dremio Jobs Pane

Want to see what queries are coming or investigate deeper why a query may have failed or taken longer than expecting, the Dremio job pane which is the next option on the left menu will allow you see all your jobs and then click on them to see exaustive detail on how they were processed.

Dremio Job Pane

Dremio Settings

If you click on the last menu item, the gear, you’ll get two options:

Project Settings

Dremio Project Settings

NOTE: SQL can be sent to Dremio for execution outside of Dremio’s UI using JDBC, ODBC, Apache Arrow Flight and Dremio’s REST API. Refer to docs.dremio.com for documentation on how to leverage these interfaces.

Also in project settings you’ll find sections like:

Organization Settings

Under Organizations settings you’ll find:

User Settings

At the very bottom left corner there is a button to see settings for the individual user. The Main use for this is to change to dark/light more and to create PAT tokens for authenticating external clients.

Granting Access

Once you create new non-admin users in your Dremio org, they’ll have zero access to anything so you’ll need to give them precise access to particular projects, namespaces, folders, sources etc.

While you can do this for an individual user, it will likely be easier to create “roles” you can grant access to groups of users with. Below is the example of the kind of SQL you may use to grant access to a single namespace for a new user.

-- Give Permissions to project
GRANT SELECT, VIEW REFLECTION, VIEW JOB HISTORY, USAGE, MONITOR,
       CREATE TABLE, INSERT, UPDATE, DELETE, DROP, ALTER, EXTERNAL QUERY, ALTER REFLECTION, OPERATE
ON PROJECT
TO USER "alphatest2user@alexmerced.com";

-- Give Permissions to Namespace in Catalog
GRANT ALTER, USAGE, SELECT, WRITE, DROP on FOLDER "dremio" to USER "alphatest2user@alexmerced.com";

-- Give Permissions to a Folder in the namespace
GRANT ALTER, USAGE, SELECT, WRITE, DROP on FOLDER dremio.recipes to USER "alphatest2user@alexmerced.com";

Connecting your Dremio Catalog to Other Engines Like Spark

Now you can connect to the Dremio Platform using JDBC/ODBC/ADBC-Flight/REST and send SQL to Dremio for Dremio to execute which I hope you take full advantage of. Although, sometimes you are sharing a dataset in your catalog with someone else who wants to use their preferred compute tool. Dremio Catalog bein Apache Polaris based supports the Apache Iceberg REST Catalog SPEC meaning it can connect to pretty much to any Apache Iceberg supporting tool. Below is an example of how you’d connect in Spark.

Run a local spark envrionment using the following command:

docker run -p 8888:8888 -e DREMIO_PAT={YOUR PAT TOKEN} alexmerced/spark35nb:latest

Then use the following code to run spark code against Dremio Catalog (keep in mind the CATALOG_NAME variable should match your project name).

import os
import pyspark
from pyspark.sql import SparkSession
from pyspark.sql import Row

# Fetch Dremio base URL and PAT from environment variables
DREMIO_CATALOG_URI = "https://catalog.dremio.cloud/api/iceberg"
DREMIO_AUTH_URI = "https://login.dremio.cloud/oauth/token"
DREMIO_PAT = os.environ.get('DREMIO_PAT')
CATALOG_NAME = "first-project" # should be project name

if not DREMIO_CATALOG_URI or not CATALOG_NAME or not DREMIO_AUTH_URI or not DREMIO_PAT:
    raise ValueError("Please set environment variables DREMIO_CATALOG_URI, DREMIO_AUTH_URI and DREMIO_PAT.")

# Configure Spark session with Iceberg and Dremio catalog settings
conf = (
    pyspark.SparkConf()
        .setAppName('DremioIcebergSparkApp')
        # Required external packages For FILEIO (org.apache.iceberg:iceberg-azure-bundle:1.9.2, org.apache.iceberg:iceberg-aws-bundle:1.9.2, org.apache.iceberg:iceberg-azure-bundle:1.9.2, org.apache.iceberg:iceberg-gcp-bundle:1.9.2)
        .set('spark.jars.packages', 'org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.9.2,com.dremio.iceberg.authmgr:authmgr-oauth2-runtime:0.0.5,org.apache.iceberg:iceberg-aws-bundle:1.9.2')
        # Enable Iceberg Spark extensions
        .set('spark.sql.extensions', 'org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions')
        # Define Dremio catalog configuration using RESTCatalog
        .set('spark.sql.catalog.dremio', 'org.apache.iceberg.spark.SparkCatalog')
        .set('spark.sql.catalog.dremio.catalog-impl', 'org.apache.iceberg.rest.RESTCatalog')
        .set('spark.sql.catalog.dremio.uri', DREMIO_CATALOG_URI)
        .set('spark.sql.catalog.dremio.warehouse', CATALOG_NAME)  # Not used but required by Spark
        .set('spark.sql.catalog.dremio.cache-enabled', 'false')
        .set('spark.sql.catalog.dremio.header.X-Iceberg-Access-Delegation', 'vended-credentials')
        # Configure OAuth2 authentication using PAT
        .set('spark.sql.catalog.dremio.rest.auth.type', 'com.dremio.iceberg.authmgr.oauth2.OAuth2Manager')
        .set('spark.sql.catalog.dremio.rest.auth.oauth2.token-endpoint', DREMIO_AUTH_URI)
        .set('spark.sql.catalog.dremio.rest.auth.oauth2.grant-type', 'token_exchange')
        .set('spark.sql.catalog.dremio.rest.auth.oauth2.client-id', 'dremio')
        .set('spark.sql.catalog.dremio.rest.auth.oauth2.scope', 'dremio.all')
        .set('spark.sql.catalog.dremio.rest.auth.oauth2.token-exchange.subject-token', DREMIO_PAT)
        .set('spark.sql.catalog.dremio.rest.auth.oauth2.token-exchange.subject-token-type', 'urn:ietf:params:oauth:token-type:dremio:personal-access-token')
)

# Initialize Spark session
spark = SparkSession.builder.config(conf=conf).getOrCreate()
print("✅ Spark session connected to Dremio Catalog.")

# Step 1: Create a namespace (schema) in the Dremio catalog
spark.sql("CREATE NAMESPACE IF NOT EXISTS dremio.db")
# spark.sql("CREATE NAMESPACE IF NOT EXISTS dremio.db.test1")
print("✅ Namespaces Created")

# Step 2: Create sample Iceberg tables in the Dremio catalog
spark.sql("""
CREATE TABLE IF NOT EXISTS dremio.db.customers (
    id INT,
    name STRING,
    email STRING
)
USING iceberg
""")

spark.sql("""
CREATE TABLE IF NOT EXISTS dremio.db.orders (
    order_id INT,
    customer_id INT,
    amount DOUBLE
)
USING iceberg
""")

print("✅ Tables Created")

# Step 3: Insert sample data into the tables
customers_data = [
    Row(id=1, name="Alice", email="alice@example.com"),
    Row(id=2, name="Bob", email="bob@example.com")
]

orders_data = [
    Row(order_id=101, customer_id=1, amount=250.50),
    Row(order_id=102, customer_id=2, amount=99.99)
]

print("✅ Dataframes Generated")

customers_df = spark.createDataFrame(customers_data)
orders_df = spark.createDataFrame(orders_data)

customers_df.writeTo("dremio.db.customers").append()
orders_df.writeTo("dremio.db.orders").append()

print("✅ Tables created and sample data inserted.")

Conclusion

Dremio Next Gen Cloud represents a major leap forward in making the data lakehouse experience seamless, powerful, and accessible. Whether you’re just beginning your lakehouse journey or modernizing a complex data environment, Dremio gives you the tools to work faster and smarter—with native Apache Iceberg support, AI-powered features, and a fully integrated catalog.

From federated queries across diverse sources to autonomous performance tuning, Dremio abstracts away the operational headaches so you can focus on delivering insights. And with built-in AI capabilities, you’re not just managing data—you’re unlocking its full potential.

If you haven’t already, sign up for your free trial and start building your lakehouse—no infrastructure or credit card required.

The next generation of analytics is here. Time to explore what’s possible.