Save Job Back to Search Job Description Summary Similar JobsGlobal exposure and opportunities to work on cross-border projectsHigh Leadership Visibility & Impact on Business OutcomesAbout Our ClientA global leader renowned for innovative solutions, robust infrastructure, and driving digital transformation headquartered in Singapore.Job DescriptionServe as the overall lead and the point of accountability for end-to-end GPUaaS and data centre operations, including operational reporting.Oversee day-to-day platform and facility operations across GPU hardware, networking, environmental systems, security controls, and supporting software.Lead and coordinate internal operations teams, vendors, and consultants during routine activities as well as critical incidents.Partner with engineering and external stakeholders to deliver platform upgrades and data centre improvement initiatives.Develop, review, and refine operational processes to maintain platform stability across compute, power, cooling, and infrastructure components.Take charge of major incidents, drive root cause analysis, and ensure clear, timely updates to customers and stakeholders.Provide regular updates to the management on operational performance, risks, and improvement plans.Ensure incidents are triaged and escalated appropriately based on severity, business impact, and SLA/SLO commitments.Build, lead, and motivate a strong operations team with a focus on accountability and continuous improvement.Set clear performance expectations, coach team members, and support ongoing professional development.Oversee security incident management and uphold security and compliance standards within the GPUaaS environment.Stay current with industry security developments and implement safeguards to protect customer workloads and platform integrity.Support scheduled maintenance activities and participate in on-call duties when required.The Successful ApplicantBachelor's degree in Computer Science, Information Technology, or a related field.At least 8 years of experience in data centre operations, with a minimum of 3 years in a leadership capacity.Solid understanding of data centre infrastructure, including servers, networking, storage, and both physical and cybersecurity controls.Practical experience with electrical and mechanical systems, facilities management, and preventive maintenance practices.Demonstrated ability to lead teams and manage vendors effectively.Strong organisational skills with the ability to adapt to evolving operational demands.Hands-on experience with Linux and hypervisor administration in GPU or GPUaaS environments.Strong analytical and troubleshooting skills, with a proactive approach to performance optimisation and system reliability.Working knowledge of storage technologies, including capacity planning, troubleshooting, and data protection strategies.Experience managing GPU infrastructure, including configuration, monitoring, and performance tuning.Familiarity with liquid cooling technologies used in high-density GPU environments.Understanding of GPU cluster architectures and AI/HPC environments, including collective communications (e.g. NCCL, RDMA), high-performance networking (e.g. InfiniBand), and containerised or orchestrated platforms supporting AI and HPC workloads.What's on OfferAs a growing firm with a tightly-knit team, the successful candidate will get the chance to contribute to a highly performing team while having the autonomy to make certain decisions for the team.ContactWinson Low (Lic No: R22106039/ EA no: 18C9065)Quote job refJN-032026-6959635Phone number+65 6416 9865Job summaryFunctionITSpecialisationInfrastructureWhat is your area of specialisation?Technology & TelecomsLocationSingaporeContract TypePermanentConsultant nameWinson Low (Lic No: R22106039/ EA no: 18C9065)Consultant contact+65 6416 9865Job ReferenceJN-032026-6959635