Senior AI Infrastructure Mgmt.Engineer
6+ years
Headquartered in Sunnyvale, with offices in Dallas & Hyderabad, Fission Labs is a leading software development company, specializing in crafting flexible, agile, and scalable solutions that propel businesses forward. With a comprehensive range of services, including product development, cloud engineering, big data analytics, QA, DevOps consulting, and AI/ML solutions, we empower clients to achieve sustainable digital transformation that aligns seamlessly with their business goals.
Roles and Responsibilities
We are seeking an experienced Senior Infrastructure Mgmt. Engineer with expertise in Azure and AWS, coupled with a strong background in AI/ML deployments. The ideal candidate will have a proven track record in designing, implementing, and maintaining scalable and secure cloud infrastructure on Azure and AWS platforms. Experience with GCP is a plus. The primary responsibility will be to lead the DevOps initiatives and ensure compliance with industry standards and regulations.
Linux Expertise:
- Possess in-depth knowledge of Linux operating systems, including CentOS, Ubuntu, and Red Hat, with expertise in shell scripting, package management, and system administration.
- Configure and optimize Linux-based servers for performance, security, and resource utilization, including kernel tuning, file system management, and network configuration.
Cloud Expertise (AWS/Azure):
- Demonstrate hands-on experience with a wide range of AWS and Azure services, including but not limited to EC2, S3, Lambda, RDS, Azure VMs, Azure Blob Storage, Azure Functions, etc.
- Architect cloud solutions leveraging best practices and services offered by AWS and Azure, optimizing for scalability, reliability, and cost-effectiveness.
- Implement and manage hybrid cloud environments, facilitating seamless integration and interoperability between AWS and Azure services.
Infrastructure as Code (IAC):
- Develop and maintain Infrastructure as Code (IAC) templates using tools such as Terraform or AWS CloudFormation, defining infrastructure components as code for automated provisioning and configuration.
- Establish version control practices for IAC templates, ensuring traceability, auditability, and reproducibility of infrastructure changes.
AI/ML Infrastructure Mgmt:
- Experience setting up cloud infrastructure stack, databases, service endpoints, GPU as well as CPU resource scaling, optimization etc.
- Should have worked AIOps/MLOP
- Should have worked on deploying AI/ML Apps using Docker and Kubernetes
- Should have worked on scaling, high availability and reliability tasks for AI application
- Should have worked on deploying and maintaining GPU clusters for AI/ML training and inference
Qualifications Required
- Bachelor's degree in Computer Science, Engineering, or related field.
- 6+ years of experience in Infrastructure Mgmt. roles, with a focus on cloud platforms (Azure and AWS Preferred).
- Hands-on experience with operations (DevSecOps) principles and best practices.
- Proficiency in scripting languages such as Python, PowerShell, or Bash.
- Excellent communication and collaboration skills.
- Certifications such as AWS Solution Architect Associate, AWS Cloud Practitioner, Azure DevOps Engineer Expert, Azure Administrator Certified Kubernetes Administrator or relevant industry certifications are a plus.
Skill and Experience Required
- Opportunity to work on impactful technical challenges with global reach.
- Vast opportunities for self-development, including online university access and knowledge sharing opportunities.
- Sponsored Tech Talks & Hackathons to foster innovation and learning.
- Generous benefits packages including health insurance, retirement benefits, flexible work hours, and more.
- Supportive work environment with forums to explore passions beyond work.
This role presents a unique opportunity to contribute to the future of impactful business solutions while advancing your career in a collaborative and innovative environment.