AI in Data Engineering — Where the Workflow Actually Changes
AI assistance in data engineering is producing real productivity gains in narrow places and overhyped claims in others. A practitioner view of where data engineers should actually adopt AI in 2024.
Data engineering teams have been working through AI assistance the same way software engineering teams did a year earlier. The pattern is similar — broad capability, narrower productive use, real but measured impact. The data engineering work that AI actually accelerates is more specific than the marketing suggests.
This piece is a practitioner view of where data engineering workflows are changing in 2024, where AI is helping, and where the bottlenecks remain in places AI doesn't reach.
What data engineering actually does
A working data team's responsibilities:
- Ingestion — pulling data from source systems into the data platform
- Transformation — cleaning, joining, aggregating to produce usable datasets
- Modelling — designing schemas that serve analytics, BI, ML workloads
- Quality — testing, monitoring, alerting on data issues
- Documentation — describing what each dataset is for and how to use it
- Performance — keeping pipelines running within latency and cost envelopes
- Governance — lineage, access control, privacy
- Operations — running the pipelines, responding to incidents
Each of these is potentially affected by AI assistance. The actual impact varies.
Where AI helps
SQL generation for transformations
The most consistent productivity gain. A data engineer writes a description of a transformation; AI produces a candidate SQL. The engineer reviews, adjusts, tests. For routine transformations, the loop is faster than writing from scratch.
The pattern works for well-defined schemas with reasonable conventions. It works less well for sprawling schemas where the AI can't easily identify which tables apply.
dbt model authoring
dbt models follow recognisable patterns. AI generates a strong starting point: the model definition, the schema test cases, the documentation. The engineer reviews and refines.
This is one of the highest-leverage places for AI in data engineering. dbt's structure plays well with AI assistance.
Schema design suggestions
A data engineer designing a new mart asks: "What columns should this table have for this use case?" AI suggests; the engineer evaluates. Faster than starting blank.
The AI's suggestions reflect general patterns; the engineer adapts to the specific context. The conversation is faster than working alone.
Test generation
Asking the AI to suggest tests for a dataset produces useful candidates: not-null tests, uniqueness tests, referential integrity tests, range tests for numerics. The engineer picks the ones that apply.
Data tests that would have been written perfunctorily are written more thoroughly because the candidate set is wider.
Documentation
Generating descriptions for tables and columns. AI proposes; the engineer reviews. Documentation that was deferred actually ships.
Lineage extraction
For undocumented pipelines, AI helps reconstruct lineage by reading the SQL and inferring dependencies. The output isn't perfect; it's a starting point for the team to verify.
Data quality investigation
When an anomaly appears in production data, AI helps investigate. Reading recent pipeline runs, identifying suspicious changes, drafting hypotheses. The engineer follows the leads.
Where AI doesn't help much
Understanding source systems
The source system's behaviour, edge cases, and operational quirks are not in any documentation AI can read. The team has to learn them by interacting with the system or its operators.
Cross-system data reconciliation
Reconciling data across systems with different definitions of the same concept (customer, transaction, account) requires understanding the business meaning. AI surfaces the differences; the team has to decide what to do about them.
Performance optimisation at scale
Tuning queries against specific warehouse engines (Snowflake, BigQuery, Databricks) is engine-specific work. AI offers general suggestions; the engineer applies engine-specific knowledge. The gain is marginal compared to other tasks.
Schema migration planning
Migrating a schema while preserving downstream consumers requires understanding which consumers depend on what. AI helps with the mechanics; the team handles the impact analysis.
Incident response
Production data incidents need fast diagnosis and decision. AI helps brainstorm; the engineer drives. The decision velocity isn't significantly faster.
Stakeholder negotiation
Many data engineering decisions involve stakeholders — business analysts, downstream consumers, source-system owners. The negotiation is interpersonal work; AI doesn't help.
The toolchain integration
A working pattern for AI assistance in data engineering:
In the SQL editor
AI assistance inside the SQL editor (BigQuery's Gemini, Snowflake Copilot, dbt Cloud's AI, or third-party tools). The engineer writes; AI suggests; the engineer adopts or ignores.
In the dbt workflow
AI integrated with dbt project structure. Generating models, tests, documentation in dbt's conventions.
In the data catalogue
AI-generated descriptions, tags, lineage in the data catalogue. The catalogue becomes more useful; the manual maintenance burden drops.
In the monitoring system
AI summarisation of recent runs, of pipeline failures, of data quality anomalies. The team gets faster understanding of what's happening.
In the IDE for pipeline code
For pipelines written in Python (Airflow DAGs, dbt models, ETL scripts), the same code-assistant patterns that help software engineers help data engineers.
The quality consideration
A specific concern for data engineering: AI-generated SQL or transformations can look right and produce wrong results. Specifically:
- A JOIN that looks right but joins on the wrong key
- A GROUP BY that omits a dimension and produces aggregated numbers that don't represent what the engineer intended
- A WHERE clause that filters too aggressively or not aggressively enough
- A unit or format conversion that's nearly but not quite right
These are the data engineering equivalents of "the SQL runs but the answer is wrong." The defence is the same as in conventional data engineering: tests, validation against known totals, peer review.
Teams that loosen these disciplines because AI is helping ship faster end up with data quality issues that aren't immediately visible. The discipline is more important with AI assistance, not less.
What we keep seeing
Recurring patterns in data engineering teams adopting AI:
Productivity gains concentrate in routine work. Boilerplate dbt models, standard transformations, schema tests — these accelerate consistently. Complex work doesn't accelerate as much.
The data team's specific tooling matters. Generic AI assistance is less valuable than AI assistance integrated with dbt, the warehouse engine, the catalogue. Tool-specific AI compounds with existing investments.
Schema documentation gets ahead. Catalogue completeness improves materially. Documentation that was deferred actually ships.
Test coverage improves. Coverage that took years to build accumulates in months. Quality posture strengthens.
Production incidents have similar shape. AI doesn't reduce incident frequency much; it doesn't make incident response much faster. The discipline of testing, monitoring, and runbooks is unchanged.
Senior engineers benefit less per hour than mid-level. They were faster anyway; the marginal productivity gain is smaller. Mid-level engineers see the largest measured improvement.
What we recommend
For data engineering teams in 2024:
- Adopt AI assistance with the data tooling that already serves the team — dbt, warehouse engines, catalogue, monitoring. Generic AI is less useful than integrated AI.
- Expect productivity gains in routine work, not in the hard parts. Plan accordingly.
- Maintain testing and validation discipline. AI-assisted SQL can look right and be wrong; testing is the safety net.
- Use AI for documentation aggressively. The catalogue gets better; the team's institutional knowledge gets captured.
- Use AI for test generation aggressively. Coverage improves; quality posture strengthens.
- Don't expect AI to handle stakeholder, business-understanding, or interpersonal work. Those remain human.
- Measure the productivity changes. Lead time on tickets, throughput on dataset delivery, time on documentation. The data tells you what's working.
AI in data engineering is a real productivity tool with real but bounded impact in 2024. The teams that integrate it deliberately and maintain discipline ship faster and with higher quality. The teams that adopt enthusiastically and let the testing slip ship data quality problems faster. The difference, as in software engineering, is in the discipline around the tools, not in the tools themselves.
RELATED READING
More from the field.
Service practices the article draws on, related programmes, and other pieces on adjacent topics.
Service practices
Related pieces
Three Years of Enterprise AI — What We Got Right and Wrong
A practitioner reflection on three years of enterprise AI work — the patterns I called correctly, the calls I got wrong, and what to take from each into 2026 and beyond.
The 2026 AI Infrastructure Shift — What's Changing Underneath
The infrastructure layer for enterprise AI is shifting in 2026. New hardware, new deployment patterns, new economics. A look at what's actually different and what it means for architecture decisions.
MCP One Year In — What's Working, What Isn't
Model Context Protocol is a year into broader adoption. The standardisation has paid off in specific ways and disappointed in others. A practitioner perspective from the trenches.
Discuss this work
Bring an enterprise programme.
If anything in this piece resonates with what you're building, talk to us. Senior practitioners engage directly on architecture and delivery.
Work with the practitioners
Bring an enterprise programme.
Architecture audit, new delivery, modernisation, or in-flight rescue — Intellectual engages directly on enterprise programmes with senior practitioners.