Distributed Systems / GPU Infrastructure Engineer at Capa Cloud

Apply for this position

Review the job details below and click Apply Now to get started.

Distributed Systems / GPU Infrastructure Engineer

Capa Cloud

Wyoming, Michigan, United States

$5,000 / mo

We are looking for a Distributed Systems / GPU Infrastructure Engineer to help architect and scale the core infrastructure behind the CapaCloud decentralized GPU network.You will work on GPU orchestration, node infrastructure, distributed computing systems, workload scheduling, performance optimization, and platform reliability.This is a high-impact engineering role for someone passionate about building the next generation of decentralized AI infrastructure.Key ResponsibilitiesDesign and build scalable distributed GPU infrastructureDevelop systems for node orchestration and workload schedulingOptimize GPU utilization and compute performanceBuild fault-tolerant infrastructure for decentralized environmentsImprove network reliability, scalability, and uptimeDevelop deployment automation and infrastructure toolingWork with AI and blockchain teams to integrate compute systemsMonitor infrastructure performance and troubleshoot bottlenecksContribute to backend architecture and cloud-native systemsImplement secure infrastructure best practicesRequired Skills & ExperienceStrong experience with distributed systems and backend infrastructureExperience with Kubernetes, Docker, and container orchestrationStrong Linux systems administration knowledgeExperience with GPU infrastructure and CUDA environmentsProficiency in Go, Rust, Python, or similar backend languagesExperience with cloud infrastructure platformsUnderstanding of networking, virtualization, and load balancingExperience building scalable APIs and infrastructure servicesFamiliarity with monitoring tools and observability stacksStrong debugging and performance optimization skillsNice To HaveExperience in decentralized infrastructure or Web3Experience with AI/ML infrastructureBare-metal infrastructure experienceExperience with distributed storage systemsKnowledge of peer-to-peer networking systemsOpen-source contributionsWhat Success Looks LikeReliable decentralized GPU orchestration systemHigh-performance compute scheduling infrastructureReduced latency and improved GPU efficiencyStable infrastructure scaling across multiple regionsStrong uptime and system reliability metricsEmployment TypeFull-timeRemote

Go to CazVid directly