#25 - 5 Lessons From Real-World Data Architecture Decisions (Case Study)
When Custom Data Pipelines Start Cracking: A Pragmatic Guide to ETL Modernisation
Read time: 4 minutes.
Hi Data Modernisers,
TL;DR: A financial services company’s custom data pipelines started failing at scale. Instead of chasing the “perfect” ETL tool, they pragmatically chose Databricks, solving 60% of their problems and gaining real momentum. The lesson? Focus on pain points over features, embrace "good enough" solutions, and optimise for your team's actual capabilities, not technical ideals.
The harsh truth about data pipelines? Most teams are running on digital duct tape until they're forced to face reality.
I've witnessed this story unfold many times across the continents. Teams start with custom scripts and one-off workflows because they're flexible. But as customer bases grow and data complexity explodes, those exact solutions become fragile bottlenecks. Silent failures, week-long onboardings, debugging mazes, sound familiar? What once felt like engineering craftsmanship suddenly feels like a technical debt you can't escape.
The financial services company in this week's case study hit that exact tipping point, and their response offers a masterclass in pragmatic decision-making.
Today, we are diving into how innovative teams evaluate ETL platforms without getting lost in feature wars, why choosing the "best" tool often means ignoring the perfect one, and the evaluation framework that led to real momentum.
5 Lessons From Real-World Data Architecture Decisions
Here's what most teams get wrong about data platform selection: they optimise for technical elegance instead of operational reality. After 15 years of watching organisations struggle with this decision, I have learned that the best architecture is not the most sophisticated; it's the one that makes your team productive on day one.
The financial services team's choice between lakehouse, warehouse, and cloud-native approaches reveals what matters.
1. Architecture Philosophy Beats Feature Lists
The financial services team’s decision wasn't about comparing connector counts or transformation speeds. It was about choosing between three fundamentally different platform approaches:
Databricks: “One unified platform for data, analytics, and ML; handle everything from raw data to production models”
Snowflake + dbt: “Best-of-breed combination; world-class warehouse plus industry-standard transformations”
AWS-Native: “Stick with one cloud vendor's ecosystem for seamless integration and cost optimisation”
Before you get lost in feature matrices, understand the platform philosophies:
Databricks = Unified complexity vs. everything-in-one-place convenience
Snowflake + dbt = Clean separation of concerns vs. integration overhead
AWS-native = Vendor lock-in vs. seamless service integration
The team chose Databricks because they prioritised future ML capabilities and wanted to avoid managing integrations between multiple vendors. Your choice should reflect your organisation's tolerance for complexity and its relationships with vendors.
2. Strategic Vision Trumps Current Team Skills
Here's the counterintuitive insight that changed everything: the financial services team had three analytics engineers who were SQL-native, one data scientist who was comfortable with Python, and one platform engineer who was well-versed in AWS.
Snowflake + dbt would have been the obvious choice—SQL transformations, familiar workflows, gentler learning curve. But they made a counterintuitive decision: Databricks.
Why? Because their biggest bottleneck wasn't technical, it was organisational. They needed a platform that could handle both their current analytics workload AND their planned ML initiatives without requiring a second platform decision in 18 months.
The lesson: evaluate platforms against your team's evolution, not just current skills. Sometimes the right choice is the one that forces productive growth.
3. Build for Your Next 50 Customers, Not Your Next 50 Years
This phrase means: scale for predictable, near-term growth instead of hypothetical future scenarios.
Wrong approach: “We need a platform that handles 100TB/day because we might get there someday”
Right approach: “We need a platform that handles 5TB/day reliably, with a clear path to 15TB”
The financial services team was processing 2TB daily with plans to reach 6-8TB within 24 months. Instead of optimising for petabyte scale, they focused on:
Reliable schema evolution as data sources multiplied
Cost predictability during their growth phase
Team productivity to deliver value faster
All three platforms could handle their scale requirements. The winner was the one that made scaling feel incremental, not revolutionary.
Don't over-engineer for problems you might never have. Choose the platform that makes your next phase of growth feel natural.
4. The Hidden Costs of "Best-of-Breed" Architecture
The team seriously considered Snowflake + dbt because it promised best-of-breed excellence: a world-class data warehouse plus the industry-standard transformation layer. On paper, it was superior.
But they dug deeper into the hidden costs:
Integration complexity between platforms
Multiple vendor relationships to manage
Skill specialisation is required for each tool
Debugging across boundaries when things go wrong
They realised that "best-of-breed" often means "best-of-headaches" for teams under 10 people. The Databricks approach meant one vendor relationship, unified monitoring, and more straightforward troubleshooting.
The lesson: Factor integration tax into your evaluation. Sometimes, 85% capability on one platform beats 95% capability across three platforms.
5. Make Architecture Decisions Like Business Decisions
The financial services team's final decision came down to a business question: Which platform reduces our time-to-insight for new data sources?
They ran a simple test: onboard a new customer dataset using each approach and measuring the end-to-end timeline.
Results:
Databricks: 3 days (schema-on-read flexibility)
Snowflake + dbt: 5 days (schema design + dbt model creation)
AWS Stack: 8 days (Glue job configuration + Redshift optimisation)
The Databricks approach won because it directly improved their customer onboarding process, their most significant business constraint.
The lesson: Convert technical decisions into business metrics. Which platform makes your organisation more responsive to opportunities?
Here's what you learned today:
Architecture philosophy beats features - Choose between lakehouse, warehouse+layer, or cloud-native based on your org's risk tolerance
Team evolution matters more than current skills - Pick the platform that supports where your team needs to go, not just where they are
Integration tax is real - Factor hidden costs of "best-of-breed" complexity into your decision.
The exemplary data architecture doesn't just store and process data; it accelerates your organisation's ability to act on insights. Sometimes that means choosing the platform that feels slightly uncomfortable today because it enables the growth you need tomorrow.
PS...If you're enjoying this newsletter, please consider referring this edition to a colleague who's wrestling with data modernisation decisions. They'll get practical insights without the vendor noise.
And whenever you are ready, there are 3 ways I can help you:
Free ETL Architecture Audit - 60-minute deep-dive where we map your current data flows and identify precisely where chaos is killing your AI initiatives
Modular Pipeline Migration - Complete rebuild from spaghetti scripts to a dbt + Airflow architecture that your AI systems can depend on
AI-Ready Data Platform - Full implementation of version-controlled, tested, modular ETL with real-time capabilities designed for production AI workloads
That’s it for this week. If you found this helpful, leave a comment to let me know ✊
About the Author
Khurram, founder of BigDataDig and a former Teradata Global Data Consultant, brings over 15 years of deep expertise in data integration and robust ETL processing. Leveraging this extensive background, he now specialises in helping organisations in the financial services, telecommunications, retail, and government sectors implement cutting-edge, AI-ready data solutions. His methodology prioritises pragmatic, value-driven implementations that effectively manage risk while ensuring that data is prepared and optimised for AI and advanced analytics.