AI & Enterprise AI19 November 20247 min read

AI in Data Engineering — Where the Workflow Actually Changes

AI assistance in data engineering is producing real productivity gains in narrow places and overhyped claims in others. A practitioner view of where data engineers should actually adopt AI in 2024.

ByIntellectual AI Engineering Practice· Collective byline

Data engineering teams have been working through AI assistance the same way software engineering teams did a year earlier. The pattern is similar — broad capability, narrower productive use, real but measured impact. The data engineering work that AI actually accelerates is more specific than the marketing suggests.

This piece is a practitioner view of where data engineering workflows are changing in 2024, where AI is helping, and where the bottlenecks remain in places AI doesn't reach.

What data engineering actually does

A working data team's responsibilities:

Ingestion — pulling data from source systems into the data platform
Transformation — cleaning, joining, aggregating to produce usable datasets
Modelling — designing schemas that serve analytics, BI, ML workloads
Quality — testing, monitoring, alerting on data issues
Documentation — describing what each dataset is for and how to use it
Performance — keeping pipelines running within latency and cost envelopes
Governance — lineage, access control, privacy
Operations — running the pipelines, responding to incidents

Each of these is potentially affected by AI assistance. The actual impact varies.

Where AI helps

SQL generation for transformations

The most consistent productivity gain. A data engineer writes a description of a transformation; AI produces a candidate SQL. The engineer reviews, adjusts, tests. For routine transformations, the loop is faster than writing from scratch.

The pattern works for well-defined schemas with reasonable conventions. It works less well for sprawling schemas where the AI can't easily identify which tables apply.

dbt model authoring

dbt models follow recognisable patterns. AI generates a strong starting point: the model definition, the schema test cases, the documentation. The engineer reviews and refines.

This is one of the highest-leverage places for AI in data engineering. dbt's structure plays well with AI assistance.

Schema design suggestions

A data engineer designing a new mart asks: "What columns should this table have for this use case?" AI suggests; the engineer evaluates. Faster than starting blank.

The AI's suggestions reflect general patterns; the engineer adapts to the specific context. The conversation is faster than working alone.

Test generation

Asking the AI to suggest tests for a dataset produces useful candidates: not-null tests, uniqueness tests, referential integrity tests, range tests for numerics. The engineer picks the ones that apply.

Data tests that would have been written perfunctorily are written more thoroughly because the candidate set is wider.

Documentation

Generating descriptions for tables and columns. AI proposes; the engineer reviews. Documentation that was deferred actually ships.

Lineage extraction

For undocumented pipelines, AI helps reconstruct lineage by reading the SQL and inferring dependencies. The output isn't perfect; it's a starting point for the team to verify.

Data quality investigation

When an anomaly appears in production data, AI helps investigate. Reading recent pipeline runs, identifying suspicious changes, drafting hypotheses. The engineer follows the leads.

Where AI doesn't help much

Understanding source systems

The source system's behaviour, edge cases, and operational quirks are not in any documentation AI can read. The team has to learn them by interacting with the system or its operators.

Cross-system data reconciliation

Reconciling data across systems with different definitions of the same concept (customer, transaction, account) requires understanding the business meaning. AI surfaces the differences; the team has to decide what to do about them.

Performance optimisation at scale

Tuning queries against specific warehouse engines (Snowflake, BigQuery, Databricks) is engine-specific work. AI offers general suggestions; the engineer applies engine-specific knowledge. The gain is marginal compared to other tasks.

Schema migration planning

Migrating a schema while preserving downstream consumers requires understanding which consumers depend on what. AI helps with the mechanics; the team handles the impact analysis.

Incident response

Production data incidents need fast diagnosis and decision. AI helps brainstorm; the engineer drives. The decision velocity isn't significantly faster.

Stakeholder negotiation

Many data engineering decisions involve stakeholders — business analysts, downstream consumers, source-system owners. The negotiation is interpersonal work; AI doesn't help.

The toolchain integration

A working pattern for AI assistance in data engineering:

In the SQL editor

AI assistance inside the SQL editor (BigQuery's Gemini, Snowflake Copilot, dbt Cloud's AI, or third-party tools). The engineer writes; AI suggests; the engineer adopts or ignores.

In the dbt workflow

AI integrated with dbt project structure. Generating models, tests, documentation in dbt's conventions.

In the data catalogue

AI-generated descriptions, tags, lineage in the data catalogue. The catalogue becomes more useful; the manual maintenance burden drops.

In the monitoring system

AI summarisation of recent runs, of pipeline failures, of data quality anomalies. The team gets faster understanding of what's happening.

In the IDE for pipeline code

For pipelines written in Python (Airflow DAGs, dbt models, ETL scripts), the same code-assistant patterns that help software engineers help data engineers.

The quality consideration

A specific concern for data engineering: AI-generated SQL or transformations can look right and produce wrong results. Specifically:

A JOIN that looks right but joins on the wrong key
A GROUP BY that omits a dimension and produces aggregated numbers that don't represent what the engineer intended
A WHERE clause that filters too aggressively or not aggressively enough
A unit or format conversion that's nearly but not quite right

These are the data engineering equivalents of "the SQL runs but the answer is wrong." The defence is the same as in conventional data engineering: tests, validation against known totals, peer review.

Teams that loosen these disciplines because AI is helping ship faster end up with data quality issues that aren't immediately visible. The discipline is more important with AI assistance, not less.

What we keep seeing

Recurring patterns in data engineering teams adopting AI:

Productivity gains concentrate in routine work. Boilerplate dbt models, standard transformations, schema tests — these accelerate consistently. Complex work doesn't accelerate as much.

The data team's specific tooling matters. Generic AI assistance is less valuable than AI assistance integrated with dbt, the warehouse engine, the catalogue. Tool-specific AI compounds with existing investments.

Schema documentation gets ahead. Catalogue completeness improves materially. Documentation that was deferred actually ships.

Test coverage improves. Coverage that took years to build accumulates in months. Quality posture strengthens.

Production incidents have similar shape. AI doesn't reduce incident frequency much; it doesn't make incident response much faster. The discipline of testing, monitoring, and runbooks is unchanged.

Senior engineers benefit less per hour than mid-level. They were faster anyway; the marginal productivity gain is smaller. Mid-level engineers see the largest measured improvement.

What we recommend

For data engineering teams in 2024:

Adopt AI assistance with the data tooling that already serves the team — dbt, warehouse engines, catalogue, monitoring. Generic AI is less useful than integrated AI.
Expect productivity gains in routine work, not in the hard parts. Plan accordingly.
Maintain testing and validation discipline. AI-assisted SQL can look right and be wrong; testing is the safety net.
Use AI for documentation aggressively. The catalogue gets better; the team's institutional knowledge gets captured.
Use AI for test generation aggressively. Coverage improves; quality posture strengthens.
Don't expect AI to handle stakeholder, business-understanding, or interpersonal work. Those remain human.
Measure the productivity changes. Lead time on tickets, throughput on dataset delivery, time on documentation. The data tells you what's working.

AI in data engineering is a real productivity tool with real but bounded impact in 2024. The teams that integrate it deliberately and maintain discipline ship faster and with higher quality. The teams that adopt enthusiastically and let the testing slip ship data quality problems faster. The difference, as in software engineering, is in the discipline around the tools, not in the tools themselves.

Bring an enterprise programme.

Architecture audit, new delivery, modernisation, or in-flight rescue — Intellectual engages directly on enterprise programmes with senior practitioners.

Contact Intellectual →Read more insights