Software Engineer · Platform & Systems

Venugopal
Bogireddy

I build the platform layers other engineers build on. Six years at Amazon: shared services at 4M TPS, privacy compliance under real regulatory pressure, and an LLM tool I shipped because on-call triage at 2 hours a session was just too painful to ignore. Outside work, I'm currently learning AV fundamentals by building evaluation tooling for autonomous driving simulators.

4M+
TPS Owned
6+
Years at Amazon
$5M+
Business Impact
85%
Payload Reduction

Impact by the Numbers

4ms
P99 latency reduction via HTTP→gRPC migration
Amazon Ads · 2024
$4.2M
Annual business impact from latency improvements
Amazon Ads · 2024
$1M
Infrastructure cost eliminated via cache deprecation
Amazon Ads · 2023
85%
Payload reduction — 8.2KB → under 1KB via Protobuf
Amazon Ads · 2024
9ms
Final latency after full migration — from 14ms baseline
Amazon Ads · 2024
4hrs
New team onboarding time — down from 1 week
Vert.x Platform · 2023
~0
False on-call pages per rotation — down from 5–6/cycle
On-call Remediation · 2024
20min
Triage time per session — down from 2 hours via agentic LLM tool
Kiro Agentic Tooling · 2024

Experience

Nov 2022 — Present
Amazon Ads
Current

SDE II — Ad Exchange Platform

  • DAG-based workflow orchestration (Vert.x) — shared platform foundation reused across multiple teams; new team onboarding from 1 week → under 4 hours.
  • Platform-wide gRPC migration at 4M TPS — owned architecture, multi-team API design, phased rollout, zero-downtime cutover; P99 latency −4ms → $4.2M annual impact.
  • Privacy & compliance foundations — PII pseudonymization, GDPR/DMA/CCPA/GPP consent enforcement, cryptographic standardization, legal-grade audit documentation.
  • Agentic on-call tooling (Amazon Kiro) — LLM-powered triage across 160 weekly tickets; repetitive triage reduced from 2 hours → 20 minutes per session.
  • $1M+ infrastructure cost elimination — identified stale cache clusters and deprecated regions during PII migration; cross-org alignment with zero operational impact.
Feb 2020 — Nov 2022
Amazon Web Services

SDE I → SDE II — Security Platforms

  • Zero-downtime Elasticsearch → OpenSearch migration — dual-write architecture, SLA validation, ISM-based sharding strategy; zero disruption to security ops teams.
  • Access control & authorization framework — least-privilege IAM, query-scoped permission propagation, storage-layer access validation for large-scale security investigation platform.
  • Application security review (3TB/day pipeline) — identified and remediated gaps across logging, API gateway hardening, resource monitoring, and access governance.
  • Synthetic data framework — mentored intern from design to production; adopted by 4 teams; data creation time 2 days → 2 hours with TTL and service-boundary constraints.
Jun 2019 — Aug 2019
Amazon Web Services

SDE Intern

  • ETL pipeline (Gaffer, Accumulo, AWS EMR) — large-scale security data processing at 4× faster throughput vs legacy approach.
Aug 2018 — Dec 2019
Stony Brook University, NY

M.S. — Computer Science

Jul 2014 — May 2018
IIT Bhubaneswar, India

B.Tech — Computer Science & Engineering

Technical Depth

Languages & Frameworks
Java (Spring) Python TypeScript SQL Node.js C/C++ Vert.x
Platform & API Design
DAG Orchestration gRPC REST Protobuf OpenRTB Event-driven Arch Internal Dev Platforms
AI & Agentic Systems
Amazon Bedrock LLM API Integration Agentic Workflow Design Spec-driven Dev SOP Constraints Structured Output Validation
Cloud & Infrastructure
AWS Lambda EMR DynamoDB S3 SQS / SNS OpenSearch CloudWatch CDK / CloudFormation Docker CI/CD
Security & Compliance
GDPR DMA CCPA GPP PII Pseudonymization Least-privilege IAM Zero-trust Audit Documentation
Reliability & Observability
SLA/SLO Design On-call Ownership ELK / OpenSearch Load Testing Runbook Development Zero-downtime Migration

Engineering Stories

The decisions behind the numbers — where things broke, how I navigated them, and what I shipped.

Platform Migration

gRPC Migration Under Scale — Three Compounding Problems

Moving 4M TPS from HTTP to gRPC: a latency regression at 1% dialup, scope creep from a legacy Groovy module, and a deadline slip — navigated by holding the line until the root cause was fixed rather than shipping known debt.

14ms → 9ms latency 85% payload reduction 2M TPS at milestone
Scope & Negotiation

Preventing Ad-hoc Compliance Debt — Twitch DMA Bypass

Mid-execution on DMA standardization, a supply team demanded a source-specific bypass. I pushed for the exact revenue number first, traced the root cause to a signal-merging bug on their side, and made the $50K/week cost a bounded, time-limited decision rather than an open-ended emergency.

$50K/week scoped Zero bypass code shipped Standardized framework adopted
Competing Priorities

Two Hard Deadlines, One Engineer

Privacy compliance deadline with external customer commitment collided with an auction design migration locked across four teams. Used feature flags to unblock the supply team early, extended auction timeline by one month with full alignment, shipped privacy on time.

120M req/day stopped punting +25% projected revenue NBA live test: +12% at dial
Ownership & Initiative

On-call Alarm Audit Across Three Teams

Nobody asked me to fix the miscalibrated alarm ownership list. I used agentic tooling to inventory 250+ alarms, delegated identity team analysis to the right owners, validated 17 business-critical alarms, and eliminated 5–6 false pages per rotation cycle.

5–6 false pages → ~0 250+ alarms audited 3 teams aligned
Learning Fast

Building the gRPC Framework on a Tech I'd Never Touched

Tasked with designing the gRPC integration framework for a new Vert.x service — a technology I'd never used — from scratch. Diagnosed a subtle asyncStub → futureStub latency issue during load testing. Built in 3.5 weeks; first team onboarded in under 2 days.

1ms node latency 3.5 weeks to production <2 days first onboarding
Mentorship

Coaching Through Negotiation, Not Just Code

Delegated majority of a Render Endemic Creative feature to a junior developer. When external timeline pressure landed on him, I ran negotiation role-plays — acting as the partner team pushing on deadline — so he could practice holding position before doing it for real. He handled the final conversations largely independently.

Feature shipped on schedule Developer put on promo path Bugs surfaced by reordering

Personal Projects

Building in public — systems, tools, and experiments outside of work. This section grows as I ship.

av-sim-bench

Automated evaluation framework for autonomous driving simulators — log-replay safety metrics, A* grid-world simulation, and statistical fidelity testing. Key finding: a miscalibrated generator passes every safety rule check but fails all five statistical tests. Safety metrics are necessary, not sufficient.

Python Pandas PyArrow SciPy NetworkX Pytest
View Project → GitHub →

Let's talk

Open to the right opportunity.

Targeting Senior SWE roles in platform engineering, AI infrastructure, or distributed systems — places where scale is real and ownership runs deep. If you're building something serious and want someone who can own it end-to-end, I'd like to hear about it.