IT Careers at Animbus

Build What Powers the Future of IT

At Animbus, we don’t just manage technology—we shape the way businesses stay secure, connected, and ready for what’s next. Our team comes from years of building resilient systems for some of the biggest names in tech, and now we’re building something of our own.

If you are curious, collaborative, and excited to solve real-world challenges with real impact, you will fit right in. Come join a team that’s rewriting how IT works—quietly, confidently, and with purpose.

Progress In Leadership...

Remote-First Flexibility

Enjoy the freedom to work from anywhere. Our remote-friendly culture promotes work-life balance while enabling meaningful impact.

Global
collaboration

Collaborate with professionals across time zones and cultures. Our diverse, international team brings fresh ideas and global perspectives to every initiative.

Work With Cutting-
Edge Technology

Join an innovative environment where you’ll leverage the latest technologies. Our solutions empower SaaS platforms and enterprises to move at cloud speed.

Grow With Experts

We invest in your growth. From mentorship and training to access to world-class learning resources, we support your personal and professional development every step of the way.

Impact-Driven Culture

We value initiative, ownership, and a results-oriented mindset—every role here contributes directly to solving real-world challenges for our customers.

Strategic Learning

Using Lean frameworks, such as Kaizen, you will have the opportunity to solve challenges related to our strategy or that of your client and share your learnings.

JOB OPENINGS

Site Reliability Engineer (AI Infrastructure)​

Summary:

  • Role: Site Reliability Engineer (AI Infrastructure)
  • Experience: 7+ Years
  • Location: Remote
  • Education: Bachelor’s or Master’s degree in Computer Science or related field

WHO WE ARE

Animbus (www.Animbus.ai) powers enterprise AI with managed infrastructure built for mission-critical performance. We enable organizations to move beyond experimentation and confidently scale AI into production through secure, high-performance, and fully managed computing environments. Our platform integrates high-density GPU-driven NeoCloud infrastructure for training and inference, unified AI workload orchestration across hybrid and multi-cloud ecosystems, SRE-led operational excellence with proactive monitoring and automation, enterprise-grade security and compliance, and transparent, cost-efficient pricing. At Animbus, we remove the complexity of building and managing AI infrastructure — so enterprises can innovate faster, scale smarter, and focus on outcomes that matter. Exciting opportunity for an experienced infrastructure professional to work on impactful, large-scale technology projects. The role offers strong growth potential while expanding expertise across modern cloud and automation technologies. ​

JOB DESCRIPTION:

You'll be the backbone of our AI infrastructure — building, automating, and maintaining the platforms that enterprise AI runs on.

  • Design and manage high-density GPU infrastructure for AI training and inference
  • Build and operate scalable Kubernetes-based platforms for AI workloads
  • Implement and maintain infrastructure-as-code (Terraform, Ansible, etc.)
  • Develop observability, monitoring, and alerting frameworks (Prometheus, Grafana, ELK, etc.)
  • Improve system reliability through automation, incident management, and root cause analysis
  • Collaborate with AI/ML teams to optimize performance and resource utilization
  • Ensure security, compliance, and governance best practices across environments

QUALIFICATIONS:

  • 7+ years of experience in SRE, DevOps, or Cloud Infrastructure roles
  • Strong hands-on experience with Kubernetes and container orchestration
  • Experience managing GPU-based environments (preferably NVIDIA ecosystem)
  • Deep knowledge of Linux systems, networking, and distributed systems
  • Experience with AWS, Azure, or GCP (multi-cloud exposure preferred)
  • Strong scripting / programming skills (Python, Bash, Go, or similar)
  • Solid understanding of monitoring, logging, and reliability engineering principles

GOOD TO HAVE:

  • Remote Monitoring & Management
  • Experience with AI/ML platforms such as Kubeflow, MLflow, Ray, or similar
  • Exposure to MLOps practices and AI lifecycle management
  • Experience with performance tuning of AI workloads
  • Understanding of cost optimization strategies in cloud and GPU environments
  • Relevant cloud certifications (AWS / Azure / GCP)

Summary:

  • Role: AI Platform Engineer ​ (GPU & Intelligent Proxy Infrastructure)
  • Experience: 7+ Years
  • Location: Remote
  • Education: Bachelor’s or Master’s degree in Computer Science or related field

WHO WE ARE

Animbus (www.Animbus.ai) powers enterprise AI with managed infrastructure built for mission-critical performance. We enable organizations to move beyond experimentation and confidently scale AI into production through secure, high-performance, and fully managed computing environments. Our platform integrates high-density GPU-driven NeoCloud infrastructure for training and inference, unified AI workload orchestration across hybrid and multi-cloud ecosystems, SRE-led operational excellence with proactive monitoring and automation, enterprise-grade security and compliance, and transparent, cost-efficient pricing. At Animbus, we remove the complexity of building and managing AI infrastructure — so enterprises can innovate faster, scale smarter, and focus on outcomes that matter. Exciting opportunity for an experienced infrastructure professional to work on impactful, large-scale technology projects. The role offers strong growth potential while expanding expertise across modern cloud and automation technologies. ​

JOB DESCRIPTION:

You'll be the backbone of our AI infrastructure — building, automating, and maintaining the platforms that enterprise AI runs on.

  • Architect and manage high-density GPU clusters for AI training and inference
  • Design and deploy intelligent proxy layers for secure AI workload routing, API traffic control, and multi-tenant isolation
  • Build and enhance Kubernetes-based AI platforms with advanced networking controls
  • Implement GPU-aware scheduling, workload isolation, and performance optimization
  • Develop Infrastructure-as-Code for compute, networking, and proxy deployments
  • Configure secure ingress/egress controls, API gateways, and service mesh architectures
  • Implement observability across GPU performance, proxy traffic, and distributed systems
  • Collaborate with AI teams to optimize inference latency and training throughput

QUALIFICATIONS:

  • 7+ years of experience in Platform Engineering, DevOps, or Cloud Infrastructure
  • Strong expertise in Kubernetes, container networking, and cluster architecture
  • Hands-on experience managing GPU environments (NVIDIA ecosystem preferred)
  • Experience with proxy technologies (NGINX, Envoy, HAProxy, or similar)
  • Deep understanding of Linux systems, networking (TCP/IP, DNS, TLS), and load balancing
  • Experience with Infrastructure-as-Code (Terraform, Helm, Ansible, etc.)
  • Strong scripting/programming skills (Python, Go, Bash, etc.)
  • Experience with AWS, Azure, or GCP

GOOD TO HAVE:

  • Experience with service mesh technologies (Istio, Linkerd, etc.)
  • Familiarity with AI frameworks (PyTorch, TensorFlow, Hugging Face)
  • Exposure to MLOps pipelines and AI workload orchestration tools
  • Experience with GPU performance tuning and capacity planning
  • Knowledge of Zero Trust architecture and API security best practices
  • Cloud or Kubernetes certifications

Join Us!

If you’re passionate about simplifying work life and driving innovation in the hybrid cloud space, we want to hear from you! Send your profile to hello@animbus.ai and take the first step towards a rewarding career with Animbus. Can’t find a role that matches your skills and experience? Feel free to submit your resume for future consideration.

© Animbus. All Rights Reserved.