SR Site Reliability Engineer
The Wakapi team is growing! We’re on the lookout for a new Wakaper to join us as our next SR Site Reliability Engineer. Think you’re the right fit? Let’s make it happen!
The Role:
We are on the lookout for a highly skilled Senior Site Reliability Engineer to join our Platform Engineering team. The ideal candidate will have a strong understanding of DevOps and Service Level Management (SLM) metrics. As well as experience working in event-driven infrastructure projects using tools like Terraform, New Relic, Kubernetes, AWS, and Kafka.
As a representative of Platform Engineering, you will play a critical role working with other
engineering teams to ensure our platform infrastructure tooling fulfils their needs and has a positive impact on Developer Experience. As well as helping them determine the right settings and thresholds for triggering alerts or automations on their applications.
Responsibilities:
Scalability and High Availability: Design, implement, and maintain scalable and
highly available systems using load balancing, auto-scaling patterns, canary
releases, and blue-green deployments.
• Monitoring, Logging, and Observability: Develop and maintain monitoring and
logging dashboards using tools like New Relic, Prometheus, Grafana, and Datadog.
Ensure observability through metrics, tracing, log aggregation, and alerting.
• Alerting and Automation: Help teams determine the right settings and thresholds
for triggering alerts or automations on their applications. Understand that each
application has different performance requirements, such as varying acceptable
response times or resource constraints.
• System Performance and Reliability: Monitor, optimize, and ensure system
reliability and performance using tools like New Relic to:
o Apply DORA metrics to measure and improve development and operational
performance.
o Ensure compliance with SLM metrics like SLAs, SLOs, and SLIs by tracking
uptime, response times, and resolution times.
• Resiliency: Implement and advocate for "Chaos" engineering practices to ensure
system resiliency.
• Collaboration: Work with cross-functional teams to enhance platform engineering
practices and gathering the right information for metrics analysis.
Requirements:
• Proven experience working with Infrastructure-as-Code tooling, like Terraform,
for infrastructure management.
• Strong understanding of scalability and high availability patterns, including load
balancing, auto-scaling, canary releases, and blue-green deployments.
• Strong understanding of DevOps metrics (like DORA) and their application in
measuring and improving development and operational performance.
• Strong understanding of Service Level Management (SLM) metrics (like SLAs,
SLOs, and SLIs). And their importance in defining, monitoring, and ensuring
compliance from the services bound to them.
• Experience with monitoring, logging, and observability tools like New Relic,
Prometheus, Grafana, and Datadog.
• Experience working with Kafka and improving performance of event-driven, realtime data processing and streaming projects and architectures.
• Familiarity with tooling used for SLM, DevOps and DORA metrics like Apache
Dev Lake, Grafana and New Relic.
• Experience working with AWS, Azure or GCP for cloud infrastructure management.
• Experience working with CI/CD pipeline tools such as GitHub Actions, Jenkins,
GitLab CI, or similar.
• Analytical Skills. Ability to analyze and interpret metrics to drive improvements.
• Strong communication skills to effectively collaborate with team members and
stakeholders.
Nice-to-haves
• Familiarity with Observability-as-Code tooling and practices.
• Familiarity with "Chaos" engineering practices for system resiliency.
C - LS K - 20032025
- Department
- Development
- Locations
- Mendoza, Other / Remote
- Remote status
- Hybrid
Mendoza
The Wakapi Spirit
We offer excellent working conditions, an ever-evolving benefits package and a great working environment that will make you feel Wakapi is your perfect fit.
Enjoy our personalized career plans to achieve your professional goals and remember that we are committed to the growth of our talents through continuous training programs.
About Wakapi
We are passionate about what we do and we strive to be better with each day that goes by.
We want to contribute to the generation of a technological community powered by the conviction that, through small actions, we can help create a better world.
SR Site Reliability Engineer
The Wakapi team is growing! We’re on the lookout for a new Wakaper to join us as our next SR Site Reliability Engineer. Think you’re the right fit? Let’s make it happen!
Loading application form
Already working at Wakapi?
Let’s recruit together and find your next colleague.