HPC Kubernetes Engineering Manager

  • NorthMark Strategies LLC
  • Dallas, TX
  • 7mo ago
  • Full-Time
  • On-site

The Company

NorthMark Compute & Cloud (NMC²) is backed by dedicated leadership and investment, with a clear mission as it operates at the bleeding edge of technology. Its goal is to scale and enhance the high-performance computing (HPC) and cloud infrastructure that supports its clients' research, production, and delivery, enabling breakthroughs that shape the industries of tomorrow. Its engineers build critical infrastructure to eliminate friction in scientific research, simulations, analysis, and decision-making, accelerating discovery and driving faster innovation.
 

The Position

We are seeking a highly skilled Kubernetes Engineering Manager with a focus on HPC to join our Platform Engineering function in Dallas. Kubernetes underpin all facets of our Research platforms and HPC estate here at

NMC². As the HPC Kubernetes Engineering Manager you will take ownership of the strategic roadmap, design and delivery of our Kubernetes platform. In addition, you will focus on continuous optimizations and performance enhancements of our kubernetes platform as Research demands augment. We are looking for a highly experienced technical manager who can lead the significant scaling up our existing compute platforms and who excels working on the bleeding edge of technology; pushing the boundaries of HPC compute performance and providing an innovative approach to solving complex technical challenges that arise. The HPC Kubernetes Engineering Manager will collaborate closely with the Kubernetes Platform Management team to ensure a smooth transition of new engineering capabilities, with a strong focus on operational excellence in all aspects of design and implementation. 
 

Responsibilities:

  • Strong leadership and strategic vision in the design, deployment and scaling of a high-performance kubernetes platform 

  • Pro-active stakeholder engagement, ensuring the Kubernetes platform supports broader business outcomes and research demands 

  • Confident communication and collaboration, you will help drive cross functional engineering initiatives across the Technology and Research organizations 

  • Vendor Management experience, working closely with our key vendors providing continuous feedback to leverage and influence roadmaps and ensuring efficient and timely deployment, support and maintenance of critical platforms   

  • People leadership, managing and developing engineers and a high performing team across the UK and US 

  • A deep understanding of emerging trends and technologies in the Kubernetes ecosystems, working closely with Architecture and Innovation Teams to appraise and adopt  

  • Ensuring platforms are reliable, highly available and secure, managed with a DevOps mindset and Infrastructure-as-Code toolset    

  • Budget control, capacity forecasting and management 
     

Requirements:

  • Bachelor's Degree or equivalent experience

  • Extensive technical experience with Kubernetes tailored for HPC/ML workloads in a complex distributed environment 

  • Contribute to performance tuning of ML workloads across GPU/CPU clusters - optimizations for workload scheduling, GPU integration, and resource management for distributed training jobs 

  • Experience scaling a high performance kubernetes platforms geographically at scale 

  • Implement and manage multi-tenant compute environments ensuring isolation and performance 

  • Integrate with distributed file systems and high-speed interconnects (e.g., InfiniBand, RoCE) 

  • Ability to collaborate effectively across teams to deliver engineering solutions with a strong emphasis on operational excellence and seamless capability handover 

  • Confident stakeholder management and communication skills, aligning to value driven outcomes 

  • Excellent team leadership, project management skills and promoting a high performance culture 

  • Drive engineering best practices across CI/CD, automation & tooling, configuration management and SRE concepts 

  • A commitment to security by designing and building secure, high-integrity systems 

It is impossible to list every requirement for, or responsibility of, any position.  Similarly, we cannot identify all the skills a position may require since job responsibilities and the Company’s needs may change over time.  Therefore, the above job description is not comprehensive or exhaustive.  The Company reserves the right to adjust, add to or eliminate any aspect of the above description.  The Company also retains the right to require all employees to undertake additional or different job responsibilities when necessary to meet business needs.

Must be legally authorized to work in the United States without the need for employer sponsorship, now or at any time in the future.

Benefits & Perks:

  • Hybrid-Work Schedule: We provide a hybrid working schedule with 3 days a week in the office

  • Company-Paid Lunch Stipend: Lunch is provided via GrubHub

  • Company-Paid Benefits: 100% Employer-Paid Medical in our High Deductible Health Plan, Dental and Vision benefits for employees and their families, 16 weeks of Paid Parental Leave, Employee Assistance Program, Life insurance, Short-Term Disability and Long-Term Disability

  • 401(k): Company will match 100% of your contributions up to 6%

  • Optional Employee-Paid Benefits: Medical insurance in our PPO plan and a variety of other benefits such as Health Savings Accounts (with Company Contribution!), Flexible Spending Accounts, Supplemental Life Insurance, Wellhub and more.

  • Time Off:  25 days of Paid Time Off plus 12 company holidays


EQUAL OPPORTUNITY EMPLOYER

NORTHMARK STRATEGIES LLC IS AN EQUAL EMPLOYMENT OPPORTUNITY EMPLOYER. THE COMPANY'S POLICY IS NOT TO DISCRIMINATE AGAINST ANY APPLICANT OR EMPLOYEE BASED ON RACE, COLOR, RELIGION, NATIONAL ORIGIN, GENDER, AGE, SEXUAL ORIENTATION, GENDER IDENTITY OR EXPRESSION, MARITAL STATUS, MENTAL OR PHYSICAL DISABILITY, AND GENETIC INFORMATION, OR ANY OTHER BASIS PROTECTED BY APPLICABLE LAW. THE FIRM ALSO PROHIBITS HARASSMENT OF APPLICANTS OR EMPLOYEES BASED ON ANY OF THESE PROTECTED CATEGORIES.