Currently: Senior Software Engineer • 8+ years
Kasi Viswanath G
I build production distributed systems — and I’m shifting my focus toward ML infrastructure and GPU performance engineering.
- Event-driven systems with Kafka
- Kubernetes-based production workloads
- Seasonal peak scaling (US sale events)
- Airflow + BigQuery data pipelines
- Performance tuning across SQL + async processing
Technical Foundation
8+ years building production distributed systems in fintech and high-volume e-commerce environments. Strong background in concurrency, event-driven architectures, data systems, and performance optimization.
- Designed and maintained event-driven architectures using Kafka for high-throughput order processing.
- Deployed and operated containerized workloads on Kubernetes with scaling considerations for peak traffic.
- Built resilient, idempotent workflows for correctness under high concurrency and retry scenarios.
- Developed Airflow DAGs for orchestrating production data pipelines.
- Optimized Azure SQL queries and indexing strategies for performance-critical workflows.
- Improved processing throughput via partition tuning, batching, and concurrency adjustments.
- Integrated analytics pipelines using BigQuery for operational insights.
- Strong understanding of system bottlenecks: CPU, I/O, memory, and network constraints.
Current Focus: ML Infrastructure & GPU Performance
Building deeper expertise in low-level systems and GPU compute to transition into ML infrastructure and performance engineering roles. My focus is on understanding compute efficiency from the hardware layer upward — memory hierarchy, parallelism, and bottleneck analysis.
- Strengthening fundamentals in memory management, data layout, cache behavior, and multithreading.
- Studying performance tradeoffs between abstraction and low-level control.
- Exploring lock-free and concurrency-oriented design patterns.
- Learning CUDA execution model: threads, warps, blocks, and grids.
- Understanding GPU memory hierarchy: global, shared, constant, and register memory.
- Profiling workloads to analyze memory bandwidth vs compute-bound bottlenecks.
- Exploring Triton kernel development for high-performance tensor operations.
- Studying distributed training internals (DDP, NCCL, scaling patterns).
- Building toward reproducible benchmarking and profiling-driven optimization workflows.
Experience
- Built and maintained distributed order fulfillment workflows using Kafka and Kubernetes in a high-volume environment.
- Designed scalable event-driven processing pipelines resilient to seasonal peak traffic spikes (US sale events).
- Improved throughput and reduced processing latency via partition tuning, async batching, and concurrency optimization.
- Developed Airflow DAGs powering analytics and operational workflows via BigQuery.
- Optimized Azure SQL queries and indexing strategies for order state transitions and operational reporting.
- Implemented idempotent, retry-safe patterns to ensure correctness and reliability under high concurrency.
- Contributed to backend services powering a digital investment platform used by a six-figure registered user base.
- Built and optimized portfolio aggregation and transaction-related workflows.
- Improved API performance and query efficiency for latency-sensitive user flows.
- Strengthened observability and production monitoring to improve reliability and incident response.
I’m targeting ML infrastructure and performance roles, leveraging my background in distributed systems. Current focus areas: profiling, benchmarking, GPU utilization, and scalable training/inference systems.