All jobs

[Remote] SRE Platform Engineer

100% Remote Full-time Open now

Note: The job is a remote job and is open to candidates in USA. GE Vernova is seeking a Platform System Reliability Engineer to manage their EKS Kubernetes environment, which supports global grid software SaaS products. This role involves ensuring the security, scalability, and resilience of the infrastructure while overseeing the full lifecycle of production clusters.

Responsibilities

  • Help design and deploy hardened EKS clusters across multiple AWS regions, ensuring consistent security baselines
  • Build and maintain reusable Terraform and Ansible modules for automated provisioning of cloud infrastructure services including networking services, compute, storage, queue and cache, etc
  • Implement "Policy as Code" guardrails and secure network perimeters (ESPs) in alignment with NERC CIP and IEC 62443 standards
  • Standardize run books, operating processes required to run critical infrastructure with highest reliability
  • Define and enforce Kubernetes resource quotas, limit ranges, and Pod Priority classes to ensure mission-critical services receive prioritized compute resources
  • Manage the ingress strategy and service mesh architecture to facilitate secure, performant connectivity between distributed micro services
  • Lead platform-level smoke, load testing and disaster recovery exercises to validate that the infrastructure can meet 99.99% uptime targets
  • Partner with application teams to right-size containerized workloads, optimizing for both performance and cloud cost (FinOps)
  • Act as the highest technical escalation point for complex Kubernetes internals, troubleshooting issues such as failed pods, memory leaks, and network partitions
  • Lead root cause analysis (RCA) for platform-level outages, implementing systemic fixes to prevent recurring failures
  • Proactively identify and automate repetitive operational tasks—such as cluster upgrades and OS patching—to ensure the team spends at least 50% of their time on engineering improvements
  • Institutionalize platform monitoring using Prometheus and Grafana, creating dashboards that surface the "Golden Signals" of cluster health

Skills

  • 5 years of experience operating production-grade Kubernetes clusters at scale
  • Expert-level knowledge of multi-cluster management, performance tuning and experience implementing observability tools such as Prometheus/Grafana, Dynatrace, Splunk, Datadog, etc
  • Deep hands-on experience with AWS core services (EKS, EC2, ALB, S3, RDS, MSK)
  • Proficiency in Terraform, Ansible, and Python or Go for infrastructure automation and deployment tools like ArgoCD or Flux
  • Strong understanding and hands on experience of cloud networking concepts such as VPCs, routing, load balancing and security configurations such as encryption, certificate management
  • Bachelor's Degree in Computer Science or 'STEM' Majors (Science, Technology, Engineering and Math) with advanced experience
  • 6–8 years in SRE or Platform Engineering roles supporting mission-critical, 24/7 cloud environments
  • Proven track record as a structured incident responder who can handle production down/break the glass scenarios in mission critical applications
  • Practical knowledge of NERC CIP, SOC2, ISO 27001, or IEC 62443 compliance standards in a SaaS context
  • AWS Certified DevOps Engineer – Professional, CKA (Certified Kubernetes Administrator), or SRE Practitioner Certification
  • Experience supporting mission-critical systems in energy, utilities, or other high-stakes industrial sectors

Benefits

  • Relocation Assistance Provided: Yes

Company Overview

  • GE Vernova provides energy consulting, gas power, and grid solutions. It was founded in 2024, and is headquartered in Boston, Massachusetts, USA, with a workforce of 10001+ employees. Its website is https://www.gevernova.com.
  • Apply To This Job

    You might also like

    [Remote] Business Development Director

    100% Remote Full-time

    [Remote] QA Automation Engineer

    100% Remote Full-time

    [Remote] Business Analyst

    100% Remote Full-time

    [Remote] Part-Time Evaluator, Data Analytics- Remote

    100% Remote Full-time

    [Remote] Finance Analyst

    100% Remote Full-time

    [Remote] Site Reliability Engineer (SRE)

    100% Remote Full-time

    [Remote] Manager, Data Engineering

    100% Remote Full-time

    [Remote] Sr. Technical Program Manager, WW Tech Partners Solutions Architecture

    100% Remote Full-time

    [Remote] Director, Strategy & Operations, Consumer

    100% Remote Full-time

    [Remote] Staff Analyst, Product

    100% Remote Full-time

    Sales & Customer Service Associate - FT/PT

    100% Remote Full-time

    NOW HIRING all Tax positions for top firm based in Colorado

    100% Remote Full-time

    Tech Lead, Web Core Product & Chrome Extension - Lexington, KY, USA

    100% Remote Full-time

    Experienced Home Advisor Customer Support Representative – Delivering Exceptional Apple Home Experiences

    100% Remote Full-time

    Infrastructure & Capital Projects – Construction bolthires Manager – All Levels – (Consultant, Senior bolthires, Lead bolthires), ANS

    100% Remote Full-time

    Remote Data Entry Specialist – Precise Data Management, Quality Assurance & Process Improvement for arenaflex (Fully Remote)

    100% Remote Full-time

    [Remote] Senior Counsel, Sales & Growth (Advertising Technology)

    100% Remote Full-time

    Operations Strategy and Growth Associate

    100% Remote Full-time

    Experienced Data Entry Specialist – Remote Opportunity at arenaflex

    100% Remote Full-time

    Experienced Remote Data Entry Specialist – Online Live Chat Support for arenaflex

    100% Remote Full-time