Solving the Negative ROI of Vibe-Coded Infrastructure
The Client: A Venture Firm Facing a Performance Ceiling
Our client is a global powerhouse at the intersection of gaming, Web3, and immersive technology. While widely recognized as a premier venture firm, a critical arm of their operation functions as a high-precision capital engine. They provide User Acquisition (UA) Funding – a non-dilutive form of capital that allows high-growth studios to scale their player base without equity dilution.
In this model, the firm acts as a strategic data partner. Their backend must ingest millions of rows of attribution data from Meta, TikTok, Google, and Reddit, then calculate real-time ROAS (Return on Ad Spend) to trigger funding tranches. In a business where revenue shares are tied to performance, maintaining absolute data synchronization is a prerequisite for financial scaling.
The Context: Moving Beyond the High-Velocity Prototype
The platform’s first iteration was designed for speed and market validation—a successful “vibe-coded” prototype that allowed the firm to quickly test its investment thesis. As the business model proved successful and data volumes grew, the initial architecture reached its natural operational limit:
- File-Based Dependencies: Early data storage in local Parquet files was ideal for rapid prototyping but created synchronization challenges as the team expanded, making a centralized source of truth necessary.
- The Monolithic Logic Challenge: The initial core logic was optimized for a smaller, centralized dataset. As the firm managed millions in User Acquisition capital, this monolithic approach made it difficult to observe granular data transitions in real-time.
- Resource Optimization: The prototype’s “all-at-once” data loading strategy was effective for early-stage datasets but began to strain CPU and memory resources at the firm’s current enterprise scale.
The Intervention: Choosing Determinism for Long-Term Scale
When our gun developer took over the infrastructure, they were tasked with evolving the platform into a hardened financial instrument. The mission was to transition from agile heuristics to deterministic enterprise engineering.
- The Pivot to ClickHouse: While many teams would continue to handle data cleaning at the application level, our dev made a non-obvious architectural decision: moving deduplication directly into the database storage layer. By implementing ClickHouse with the ReplacingMergeTree engine, they allowed the database to handle the overlapping batches of ad API data automatically. This ensures that regardless of API retry cycles, every transaction is accounted for exactly once.
- Architectural Lean & Modularization: Recognizing that the platform had outgrown its early logic, our dev initiated a radical optimization of the codebase. By stripping away redundant logic and refactoring the backend into a 15-step modular pipeline, they achieved the same business results while utilizing only reducing infrastructure resources dependency to only 10% of what it traditionally was. This modularity allows for total auditability of every ROAS calculation.
- Memorializing Financial Reality: To ensure institutional-grade reporting, our dev also built a PostgreSQL-backed Billing Portal to separate dynamic analytics from historical financial records. While ClickHouse handles high-speed computations, the Postgres system “memorializes” approved spend and fees. This creates an immutable record for invoicing, ensuring that historical data remains fixed even as formulaic models evolve.
Technical Highlights
- Strategic Data Chunking: Replaced “all-at-once” loading with chunked processing, ensuring sub-second performance and preventing memory overflows.
- Dual-Write Parity Validation: Engineered a temporary dual-write backend to prove the new ClickHouse system matched legacy results to the penny before the official cut-over.
- Hardened Integrations: Refactored Reddit and Snapchat integrations with month-based chunking for reliable high-volume data ingestion.
The Result
By evolving the platform into a hardened, deterministic architecture, the successful prototype transformed into a scalable financial asset. The firm can now onboard new portfolio companies with confidence, backed by an infrastructure engineered for institutional precision.
Core Tech Stack
- Analytics Engine: ClickHouse (utilizing ReplacingMergeTree for automated deduplication)
- Relational Database: PostgreSQL (6-table schema for access control and memorialized billing)
- Logic Layer: Python (Modular AnalyticsDataPipeline refactor)
- Integrations: Meta, Google, TikTok, Snapchat, and Reddit Ads APIs
- Infrastructure: AWS (EC2, Secrets Manager), GitLab CI/CD