Capital One’s Gayathri Balakumar Speaks on Real Time Data Engineering
Interview with Gayathri Balakumar, who has more than 17 years of experience building large-scale financial data and analytics systems
- |
- Written by Erik Vander Kolk, CEO of Banking Exchange
Gayathri Balakumar is a Lead Data Engineer at Capital One, with more than 17 years of experience building large-scale financial data and analytics systems for the fintech and insurance industries.
1. To start, can you share how you first became involved in large-scale financial data engineering and what drew you to real-time analytics within the banking industry?
My professional background in large-scale financial data engineering commenced as a data warehousing engineer, with a primary focus on batch processing for prominent institutions, including Bank of America, Citi, and Capital One. To effectively facilitate real-time analytics, data consistency, accuracy, and resiliency serve as the fundamental data pillars.
Guided by these core organizational mission values, I spearheaded the analytics application modernization initiative at Capital One. This transition was necessitated by a strategic objective to enhance customer support via instantaneous data availability. To execute this, we migrated to a microservices-based architecture, leveraging Pub-Sub models to develop real-time applications and APIs.
These systems are designed to deliver impactful, real-time outcomes in scenarios such as fraud detection, credit approval processing, and dynamic credit line increase programs, specifically Non-Prespending Limit (NPS).
2. Financial institutions are increasingly moving away from static credit models toward real-time decisioning. What is driving that shift, and how are modern data pipelines making it possible?
Traditional credit models relied on infrequent, stagnant data points such as monthly bureau reports, which fail to capture the daily financial fluctuations experienced by small business owners and gig workers. By shifting from nightly batch ETL processes that lags to catch up on process and delays decisioning abilities to continuous event streams powered by Pub-Sub models, institutions can now perform real-time underwriting using current behavioral insights.
This transition is critical for fraud prevention using geolocation during swipe; For instance, imagine a credit card swiped at a local city like New York and then across the world a few minutes later in London. Batch scoring leaves controls a full day behind; real-time pipelines allow every transaction or login to be processed in milliseconds. Ultimately, modern decision engines instantly evaluate a card swipe on these attributes before approving a transaction.
Similarly, static snapshot models have also been replaced with instant approval analytic models utilizing a constant stream of data to close the window on identity monetization and improve accuracy.
3. You have worked on systems capable of processing massive volumes of transaction and customer spending data. What are the biggest engineering challenges involved in maintaining both speed and accuracy at that scale?
Data Consistency Across Distributed Systems: The Dual-Write Problem in Maintaining Speed and Accuracy: The Dual-Write Problem
Ensuring consistency across distributed systems is a significant hurdle in large-scale financial engineering. When a single transaction must update multiple disparate systems—such as ledgers, audit logs, and risk databases—achieving atomic writes is incredibly difficult. Sequential processing introduces too much latency, yet independent writes risk partial failures that leave data in an inconsistent state.
Failure Analysis: Consider a scenario where an account is debited but a fraud model fails to receive the update. Because the model lacks the latest transaction data, it may erroneously approve subsequent fraudulent activity, creating a vulnerability that attackers frequently exploit.
Proposed Engineering Strategies:
- Event Sourcing: By logging every change as an immutable, append-only event rather than modifying state directly, institutions can accurately reconstruct system history and maintain order regardless of individual failures.
- Distributed Consensus Protocols: Algorithms like Raft or Paxos ensure all nodes agree on a value before commitment, though this high level of accuracy comes at the cost of increased write latency.
- Saga Patterns: This approach manages distributed transactions as a series of local steps. If a later step fails, compensating transactions undo previous actions, achieving eventual consistency without global locks.
4. How is real-time transaction analysis changing the way financial institutions evaluate customer spending behavior, payment ability, and credit accessibility?
Current credit scoring methodologies are shifting toward a more fluid paradigm, utilizing substantial volumes of real-time data from various channels such as e-commerce activity, mobile interactions, geolocation logs, and social media patterns. By applying machine learning to these datasets, institutions can identify complex risk profiles that often remain invisible to conventional scoring systems.
Rather than simply verifying income, banks can now integrate continuous streams of data regarding employment changes, transaction history, and spending habits to develop a more precise understanding of a consumer's repayment capacity.
Furthermore, the use of open banking APIs allows financial institutions to aggregate data from external sources, providing a comprehensive perspective on user behavior. When paired with granular merchant and transaction categorization, these diverse data points facilitate highly detailed risk evaluations specifically designed for today's digital-first consumers.
5. Fraud prevention and credit expansion often need to operate simultaneously. How can AI driven data engineering systems help institutions balance risk mitigation with customer experience?
Many financial institutions have come to realize that their primary challenge is one of decision-making, though it often appears as a data issue. Despite significant investments in accelerating data mobility via APIs, event streaming, and enhanced pipelines, banks have lagged in upgrading the risk-assessment frameworks that process this information. This has led to an accumulation of disconnected tools, fragile data pipelines, and sluggish manual processes, occurring just as fraudulent actors are intensifying their synchronized attacks throughout the customer lifecycle.
Addressing this does not require a total overhaul of core systems. Instead, banks can unlock 70–80% of AI's potential by integrating a centralized, real-time decisioning layer above their existing infrastructure. Prioritizing use cases in credit, AML, fraud, and compliance allows for rapid, low-risk improvements.
AI moves beyond rigid, universal rules by assessing every transaction against a customer's unique behavioral history. By performing risk evaluations directly during the authorization phase, AI can prevent suspicious transfers before they occur without compromising overall approval efficiency. This technology allows models to distinguish between actual fraud and legitimate but unusual activity—such as a major purchase made during travel—transforming what used to be a blind rule into a dynamic learning process.
6. You have also worked extensively with sensitive financial and personal data. How are data engineering teams approaching security, encryption, and compliance while still enabling faster real time analytics?
A fundamental architectural transformation is underway. By 2026, security and governance will be integrated directly into data engineering workflows instead of being treated as isolated post-pipeline processes. This shift moves security controls from being final, "bolt-on" layers to becoming core components woven into the initial pipeline architecture.
This "privacy by design" strategy is demonstrated in pipelines that automatically encrypt sensitive data, such as customer IDs, during ingestion or implement automated data retention policies to purge records after set durations. Consequently, data within real-time analytics systems remains protected from exposure, effectively shrinking the attack surface without introducing significant latency.
Compliance strategies have evolved similarly, with a growing number of organizations adopting real-time tools during Extract, Transform, and Load (ETL) processes. By ensuring immediate adherence to data governance regulations, companies can mitigate non-compliance risks proactively rather than identifying breaches through retrospective audits.
7. Looking ahead, how do you see AI and real-time financial data infrastructure reshaping consumer credit programs and the broader banking experience over the next five years?
Lending is currently transitioning into an omnipresent, embedded experience driven by AI and alternative data. Real-time signals now allow for immediate, customized offers, integrating borrowing into routine digital activities.
This evolution transforms credit from a discrete product requiring an application into an ever-present environmental condition. Fast, personalized decisions powered by AI, combined with embedded finance distribution and real-time payments, prioritize speed as the new standard. Consequently, activities like purchase financing or debt refinancing are resolved in seconds rather than days, occurring directly within the platforms or apps where the consumer already interacts.
Success for financial institutions will depend on developing scalable API platforms that monetize access via ecosystem partnerships. By embedding banking services into enterprise software and commerce platforms, leaders can turn transaction data into active revenue streams through tailored financial services and risk-scoring.
The upcoming five years will fundamentally restructure the industry. Leading institutions will be those with real-time data infrastructure robust enough for high-speed AI operations, secure enough to maintain trust, and open enough to meet customers wherever they are. By 2031, banking will likely function as a background system that anticipates credit needs, provides personalized pricing, and ensures fraud protection seamlessly.











