Chaos Engineer

Fulcrum Digital

Pune

Not disclosed

Work from Office

Full Time

Min. 5 years

Job Details

Job Description

Sr Chaos Engineer

Who are we

Fulcrum Digital is an agile and next-generation digital accelerating company providing digital transformation and technology services right from ideation to implementation. These services have applicability across a variety of industries, including banking & financial services, insurance, retail, higher education, food, healthcare, and manufacturing.
Key Responsibilities: Chaos Engineering: · Design and implement chaos engineering experiments to identify weaknesses in systems and applications. · Develop and execute strategies to improve system resilience and reliability. · Analyze experiment results, provide actionable insights, and drive remediation efforts. · Collaborate with development, operations, and infrastructure teams to integrate chaos engineering practices. Operational Acceptance: · Develop and maintain comprehensive operational acceptance criteria for new and existing systems. · Conduct thorough operational acceptance testing, ensuring systems meet all predefined criteria before go-live. · Work closely with project managers, developers, and QA teams to align operational acceptance processes with project timelines and objectives. · Document and communicate operational readiness findings, providing recommendations for improvement. System Resilience and Reliability: · Implement and manage strategies for continuous improvement of system resilience and reliability. · Monitor and assess system performance, identifying potential risks and areas for enhancement. · Lead initiatives to improve disaster recovery and business continuity plans. · Stay updated with the latest industry trends and best practices in chaos engineering and operational acceptance. Collaboration and Training: · Educate and mentor team members on chaos engineering and operational acceptance methodologies. · Foster a culture of resilience and reliability within the organization. · Engage with external communities, attending conferences and participating in knowledge-sharing events.

Requirements

Extensive experience in chaos engineering, operational acceptance testing, and system resilience. Strong understanding of cloud platforms (AWS, Azure, GCP) and their resilience features. Proficiency in scripting and automation tools (Python, Bash, Terraform, etc.). Experience with monitoring and observability tools (Prometheus, Grafana, Splunk, etc.). Experience with Chaos Engineering Tools such as Gremlin, Chaos Monkey etc., Excellent analytical and problem-solving skills. Strong communication and collaboration skills, with the ability to work effectively in cross-functional teams. Certifications in relevant fields (e.g., AWS Certified Solutions Architect, Azure DevOps Engineer) are a plus.

Experience Level

Senior Level

Job role

Work location

Pune, India

Department

IT & Information Security

Role / Category

IT Security

Employment type

Full Time

Shift

Day Shift

Job requirements

Experience

Min. 5 years

About company

Name

Fulcrum Digital

Job posted by Fulcrum Digital

Apply on company website