Site Reliability Engineer
Akka’s platform for agentic AI systems simplifies building new classes of applications, enabled by AI – based on our experience of building distributed systems
Job Description Summary
We are seeking a hands-on Site-Reliability Engineer to join our team. You will help architect, implement, and maintain our platform and our pipelines. You’ll partner closely with development, operations, and product teams to maintain and expand our PaaS offering.
Responsibilities
- Develop and extend software to monitor and improve end-to-end platform performance, identify runtime deficiencies, find potential failures, and fix production issues in a fully managed multi-cloud environment.
- Participate in on-call rotation and incident-resolution.
- Build deep, full-stack knowledge of our platforms and applications.
- Work to simplify and automate deployment processes, run-time operations, and provide non-disruptive releases.
- Help create and maintain an environment that provides security and privacy for our customers' data.
- Maintain application reliability and uptime SLAs throughout the application lifecycle using programmatic self-healing and software automation.
- Travel occasionally to meet with the rest of Akka's technical team.
- Create and implement security policies as code to automate and enforce security controls.
- Create comprehensive documentation for all configurations, processes, and procedures. Provide training and knowledge sharing with other team members.
Qualifications
You
- Are an SRE who understands how to operate modern distributed data systems on Kubernetes to be extremely reliable with predictable performance.
- Have strong analytical and problem-solving skills.
- Have a high degree of both autonomy and teamwork skills to function in a distributed team environment.
- Have experience with (multiple) cloud service offerings, specifically from an operational perspective (we operate on GCP, AWS, and Azure today).
- Have a passion for automating the complexities of orchestrating and running multi-tenant cloud application services.
- Are accustomed to collaborating with business owners and understanding diverse business requirements.
- Have five or more years of experience in distributed systems architecture and runtime requirements.
- Are a voracious learner, ready to take on new technologies and techniques quickly and constantly.
- Are skilful at interacting and working with people; working with a self-organized lean and agile team to mitigate project risks, manage effort, and ensure quality.
- Are dedicated to best practices such as infrastructure as code, automated testing, code reviews, CI/CD, GitOps, and testing.
- Are biased towards action on tough problems and issues, and focused on your customer’s success.
- Are an agent of change, constantly learning and seeking better outcomes.
- Are familiar with many of the supporting technologies we use, including Terraform, Crossplane, FluxCD, GitOps, Helm, Prometheus, Grafana, Actors, Service Mesh frameworks, etc.
- Are experienced with complex and secure networking environments, including Encryption Keys and TLS.
Ideally, you also...
- Have knowledge of the Akka libraries for distributed systems, including Akka clustering.
- Have supported SaaS/PaaS systems.
- Have excellent written and verbal communication skills in at least English.
- Have been at least exposed to policy-as-code and/or admissions controllers.
Frequently cited statistics show that women and underrepresented groups apply to jobs only if they meet 100% of the qualifications. Akka encourages you to break that statistic and to apply. No one ever meets 100% of the qualifications. We look forward to your application.
Location
This is a remote position where the candidate can be located anywhere in the world. We would like some overlap with North America to allow proper cooperation with our current team.
What We Offer
Akka is a welcoming, transparent, and highly distributed company dedicated to creating high-performance distributed systems that bring success to all who use them. With a strong focus on work-life balance, our company offers a fast-paced, collaborative environment mixed with challenging and engaging work. This combination has attracted and retained some of the brightest minds in our technology communities.
Benefits:
- Competitive salary with performance-based incentives.
- Remote-first, flexible work environment.
- Comprehensive health and wellness benefits.
- Opportunities for professional development and continuous learning.
- Collaborative, inclusive, and innovative company culture.
Our Core Values:
- We’re Authentic: We value transparency and genuine communication, without politics or games. We're honest and assume good intentions, cultivating trust and accountability within our organization and in our interactions with others outside of Akka.
- We’re Customer-Focused: We value customer outcomes above all else. By prioritizing our customers' interests, and meeting them where they are today, we help ensure their success. We are dedicated to deeply understanding our customer’s needs, anticipating challenges, navigating time constraints and striving to exceed expectations.
- We’re Nonconventional: We value fearless innovation by challenging the status quo and embracing alternative approaches. Continuous learning and a growth mindset aimed at improving ourselves, our company, and our products, drives us to push boundaries and explore new solutions. Guided by a bias for action, we leverage industry and customer insights to inspire fresh ideas, enabling optimal future offerings.
- We’re Persistent: We value excellence through continuous experimentation and courageous problem-solving. We recognize that achieving success often demands approaching challenges with tenacity and taking calculated risks to achieve leading-edge solutions.
Akka is an Equal Opportunity Employer.