Tech Blogs Digest 27.10 - 02.11
This week we AI-analysed 6828 posts for you, filtered out the chaff and hand-picked the wheat. Subscribe to stay up to date with future digests!
This week
🏗️ Architecture - React Server Components, hybrid graph-semantic search, and serverless image processing
💾 Databases - Scaling a database to billions of records, and saving $2M in 47 minutes
📊 Data - AVRO in the CDC pipeline, building a knowledge graph, and fixing an AttributeReference memory leak in Apache Spark
🤖 LLMs in production - Optimizing government forms, log analysis tool, and production-ready healthcare RAG
⚙️ DevOps - Modular universal Dockerfile, automated on-demand dev environments, and Terraform end-to-end tests
🧠 ML - Optimizing an ML inference workload on CPU, building a Netflix-grade recommendation system, and using KDD to save vehicle engines
🛡️ Security - Preventing error message leaks
☕ Java - Designing your own Hibernate
🟨 JavaScript - Barrel file costs, and large list rendering
🏗️ ARCHITECTURE
The article breaks down how migrating to server components, specifically React Server Components, actually affects page-load time and interactivity, and why the promised gains often don’t materialize
The article reveals how Tripadvisor scaled their tagging system from manual labels to a hybrid graph-semantic platform, dramatically improving how travelers discover new interests
A guide that walks through how to build a fully serverless image-processing pipeline on AWS - from uploading to resizing, metadata tracking and notifications, with real-world cost figures and production-ready tips
💾 DATABASES
How I Scaled a Single Database to 4.7 Billion Records Without It Exploding: The Untold Story | 20 min read
The article pulls back the curtain on how one team brought a single PostgreSQL database from 180 GB to 18 TB, handled 4.7 billion records, and slashed query times from 45 seconds to 180 milliseconds, while the business’s valuation hung in the balance
When the database behind a Fortune 500 company hit 100% CPU and lost $43K per minute, an emergency index fix turned a nightmare into a $2M rescue in under an hour
📊 DATA
From JSON to AVRO in the CDC pipeline | 8 min read
Explore how switching from JSON to AVRO in a CDC pipeline helped slash storage by ~30-50%, shrink Kafka topic size from 9 GB/day to under 1 GB, and dramatically boost query performance
Iceberg CDC: Stream a Little Dream of Me | 11 min read
Why streaming Apache Iceberg feels like playing a jazz solo: fixes in v3 and the upcoming v4 overhaul aim to make incremental upserts and deletes both precise and performant
Deep Dive: Building a Knowledge Graph from Scratch | 20 min read
When you’re building a data pipeline from scraping to semantics, this article walks you through exactly how Neo4j, async scraping, translation caching and graph-schema design turned a messy web of data into a queryable knowledge graph
Taming the Beast: Understanding and Preventing AttributeReference Memory Leaks in Apache Spark | 10 min read
Apache Spark memory leaks can hide in plain sight - millions of lingering AttributeReference objects quietly bloating logical-plan chains and crushing performance
🤖 LLMS IN PRODUCTION
From Pixels to Schemas: How Claude Vision Turns Any Government Form Into a Voice-Accessible Service | 11 min read
Data-entry for government services goes from form chaos to conversational ease as the system transforms scanned PDFs into voice-accessible workflows - bridging images to actionable schemas via Claude Vision
See how a DIY system uses local LLMs to turn chaotic log files into clear, actionable insights - skipping cloud dependencies for greater control
A new architecture blends a vector-search powerhouse with strict citation tracking to make healthcare RAG systems safe, auditable, and production-ready
10 Lessons Learned Building Voice AI Agents | 16 min read
Practical takeaways from building voice-AI agents reveal that infrastructure, clear role separation and live data access matter more than model hype
⚙️ DEVOPS
Building a Universal Container System (So I Never Have to Write Another Custom Dockerfile) | 23 min read
A modular container system replaces bespoke Dockerfiles with one configurable template that cuts setup time from days to minutes
Your guide to Extend Kubernetes Scheduler | 14 min read
Learn how to customise the Kubernetes Scheduler for tighter control over cost, compliance and performance - whether via YAML tweaks, custom Go plugins or multiple scheduling profiles
Database CI/CD with the Oracle Database Operator for Kubernetes, GitHub Actions, and Liquibase — Take 2 | 11 min read
Effortlessly spin up and tear down containerised Oracle Database Operator for Kubernetes-managed databases in your CI/CD pipeline, triggered by GitHub Actions and automated schema changes via Liquibase for full dev-branch lifecycle management
A self-service, on-demand dev environment system built with Kubernetes Operators lets developers spin up and auto-teardown full stacks in minutes - no Git commits required
How I learned Terratest | 13 min read
A real-world journey into Terratest that starts from Go unit testing habits and evolves into scalable infrastructure test modules - packed with patterns, gotchas and reusable strategies
🧠 ML
Who to Nudge, Not Just Whether to Nudge: Turning A/B Tests into Smart Policies with Causal ML | 9 min read
The article reveals how causal-ML transforms A/B tests from “does the nudge work?” into “who should we nudge for max effect?”
Optimizing PyTorch Model Inference on AWS Graviton | 9 min read
The article offers practical, hardware-aware tweaks, like leveraging bfloat16 math and optimized kernels, to dramatically accelerate PyTorch model inference on AWS Graviton CPUs
Dive into how a Netflix-style recommender system is built from scratch, exploring item-item similarities, matrix factorization, stacking and production tricks
Using KDD’s full-nine-phase data mining process, the article shows how to predict engine failures up to 14 days early — achieving over 91% accuracy and saving over $1 million annually
🛡️ SECURITY
When Error Messages Leak More Than Logs: ORMs, Frameworks, and the Quiet Reconnaissance Problem | 10 min read
Full stack traces and framework internals can leak to unauthenticated clients just by submitting malformed requests - giving attackers a detailed map of your app’s architecture and attack surface
☕ JAVA
Why They Ask: “Could You Design Your Own ORM Framework?” | 12 min read
Dive into what really happens when you’re asked to build an ORM: from lazy loading and identity maps to why frameworks like Hibernate aren’t just “magic”
🟨 JAVASCRIPT
Unconventional build-latency culprit: how swapping out “barrel files” propelled one team’s build speed to 5x faster
Angular Meets Large Lists | 8 min read
Ditching standard virtual-scrolling, this article reveals clever new techniques to render huge lists in Angular with far better performance and responsiveness



That Oracle Database Operator piece with Liquibase integration is a game changer for teams still treating database schemas like fragile snowflakes. The idea of spinnig up and tearing down full Oracle instances per dev branch was unthinkable just a few years ago without massive infrastructure costs. Pairing GitHub Actions with automated schema versioning through Liquibase finally brings parity between application code and database deployment workflows. This is the kind of DevOps maturity that can actually shorten release cycles in enterprises stuck on legacy Oracle stacks.