Reporting to the UCLA Health DGIT Infrastructure team, the DevOps Engineer/HPC Administrator is a critical role in the delivery and analysis of quality technical solutions for our integrated data centers (on premise and cloud). This role is responsible for the implementation of infrastructure and application build, release, deployment, and configuration activities; pipeline automation, and optimization and management. Additional responsibilities include helping to architect, orchestrate, and automate infrastructure and service deployment; prototyping, building and executing test plans; performing quality reviews; participating in the delivery of operational support; and troubleshooting issues. This position will participate in the development and oversight of industry standard DevOps processes. This role will take the lead in maintaining current on premise HPC Clusters while assisting the team in assessing public and private cloud solutions for future HPC infrastructure migrations and deployments.
Bachelor’s degree in computer science, computer engineering, or a related field, or the equivalent combination of education and related experience. Experience with AWS/Cloud computing design, provisioning, and tuning. Infrastructure architecture background with experience in virtualization technologies. Knowledge of systems infrastructure that power today’s modern and highly available web and mobile applications, with deep domain expertise in one or more areas: compute, networking, storage, high availability, cloud security, application performance. Strong understanding and knowledge of AWS Services (EC2, Network, ELB, S3/EBS, DynamoDB, Lambda, API Gateway, IAM, CloudFormation, and other core AWS technologies.) Strong understanding of Puppet, Ansible, or TerraForm, DevOps frameworks. Strong Linux experience, Linux System Administration in a remote Linux/Unix environment. Strong understanding of hardware, security, storage, networking and database capacity. Experience in deploying, managing and monitoring systems that run on Linux. Experience in application development, requirement gathering, UI design, deployment, code refine with programming languages such as C/C++, Java, Perl, Python, Ruby, and bash/csh/ksh. Multiple years hands on working as a System Engineer, Networking or Data Center operation and support experience. Experience delivering infrastructure for distributed, scalable, secure, reliable software systems. Scripting experience in languages such as PHP, Perl, Python, Ruby. Understanding of networking, load balancing principles and approaches to scaling out of systems. Experience designing, developing, testing, and deploying applications/systems using proven or emerging technologies, in a variety of technologies and environments. Experience with continuous deployment/continuous delivery using tools such as git, Maven, Bamboo, etc. Strong background in RHEL/CentOS/Ubuntu systems administration. AWS Certified Solution Architect preferred. Docker/ECS experience is preferred. Familiarity with container technologies like Docker, rkt, Swarm, Mesos, or Kubernetes. A wide degree of creativity and latitude is expected.