Review the job details below and click Apply Now to get started.
Senior Java Site Reliability Engineer
Nadeem Ehsan
McLean, Virginia, United States
$100 / hr
Role: Senior Java Site Reliability EngineerExp: 16-20 YearsJob Type: ContractProject: HybridLocation: McLean, VAIndustry: Banking / Financial Services Key ResponsibilitiesSupport and maintain highly available production platforms across cloud and distributed environments. Drive incident management, root cause analysis, problem management, and platform stability initiatives.Monitor and maintain uptime of Java applications and microservices.Proactively identify and resolve application performance bottlenecks.Conduct root cause analysis (RCA) for application outages and incidents.Implement resiliency patterns including circuit breakers, retries, and failover mechanisms.Lead reliability engineering efforts focused on system availability, performance optimization, and operational excellence. Implement and enhance observability solutions including monitoring, logging, alerting, and incident response automation.Collaborate with development, infrastructure, and cloud engineering teams to improve deployment reliability and operational efficiency. Support infrastructure modernization, cloud transformation, and platform automation initiatives.Coordinate disaster recovery testing, resiliency validation, capacity planning, and production readiness reviews. Provide technical leadership and mentor offshore/onshore engineering teams. Required Experience16–20 years of experience in Site Reliability Engineering (SRE), Production Engineering, Platform Engineering, or Application Support.Strong experience supporting large-scale enterprise production environments. Proven background in incident management, problem management, and operational support.Experience working within banking, financial services, fintech, or other highly regulated industries. Hands-on experience supporting mission-critical applications with stringent availability and performance requirements. Required SkillsJavaLinux/Unix AdministrationKubernetes and Container PlatformsDockerCloud Platforms (AWS, Azure, or GCP)CI/CD Tools (Jenkins, GitHub Actions, GitLab CI/CD, ArgoCD)Infrastructure as Code (Terraform, Ansible)Monitoring & Observability Tools (Splunk, Datadog, Grafana, Prometheus, Moogsoft)ServiceNow, JIRA, ConfluencePython, Bash, or Shell ScriptingSQL and Database TroubleshootingApplication Performance Monitoring (APM)Production Release ManagementDisaster Recovery and High Availability Architectures EducationBachelor's degree in Computer Science, Information Systems, Engineering, or a related technical discipline.