Lead Software Engineer
Wells Fargo India Solutions Pvt LtdJob Description
Lead Software Engineer
About this role:
Wells Fargo is seeking a Lead Software Engineer.
In this role, you will:
- Lead complex technology initiatives including those that are companywide with broad impact
- Act as a key participant in developing standards and companywide best practices for engineering complex and large scale technology solutions for technology engineering disciplines
- Design, code, test, debug, and document for projects and programs
- Review and analyze complex, large-scale technology solutions for tactical and strategic business objectives, enterprise technological environment, and technical challenges that require in-depth evaluation of multiple factors, including intangibles or unprecedented technical factors
- Make decisions in developing standard and companywide best practices for engineering and technology solutions requiring understanding of industry best practices and new technologies, influencing and leading technology team to meet deliverables and drive new initiatives
- Collaborate and consult with key technical experts, senior technology team, and external industry groups to resolve complex technical issues and achieve goals
- Lead projects, teams, or serve as a peer mentor
Required Qualifications:
- 5+ years of Software Engineering experience, or equivalent demonstrated through one or a combination of the following: work experience, training, military experience, education
Desired Qualification:
- Experience in Software Engineering, SRE, DevOps, or Platform Engineering.
- Strong proficiency in Python for automation and tooling.
- Hands‑on experience with Grafana, Prometheus, and Splunk in production environments.
- Solid understanding of SLIs, SLOs, dashboards, alerting, and observability best practices.
- Experience applying AI/ML concepts to monitoring, alerting, or operational analytics.
- Strong knowledge of Linux, networking, and distributed systems.
- Experience with Cloud platforms and Kubernetes/OpenShift.
- Proven experience leading incidents, RCAs, and reliability initiatives
- Experience building custom Prometheus exporters or advanced Grafana dashboards.
- Strong Splunk expertise (search, dashboards, alerts, log pipelines).
- Experience operationalizing ML models for observability (AIOps).
- Familiarity with CI/CD, Terraform, Ansible, and enterprise automation platforms.
- Experience supporting large‑scale, regulated, or globally distributed systems.
- Improved reliability and performance against defined SLOs.
- Reduced alert noise and faster detection and recovery of incidents.
- Increased automation and self‑healing adoption using Python and observability signals.
- Strong observability maturity across platforms and applications.
- Improved MTTD and MTTR through effective use of Grafana, Prometheus, and Splunk.
Job Expectation:
Reliability & Availability Engineering
- Own and improve availability, performance, scalability, and resilience of production systems.
- Define, monitor, and manage SLIs/SLOs and error budgets to guide reliability investments.
- Lead capacity planning, performance testing, failover readiness, and disaster‑recovery design.
Observability & Monitoring (Grafana / Prometheus / Splunk)
- Design and operate a comprehensive observability stack using:
- Prometheus for metrics collection and alerting
- Grafana for dashboards, visualization, and SLO tracking
- Splunk for log aggregation, troubleshooting, and incident forensics
- Build and maintain golden dashboards and actionable alerts aligned to business impact.
- Reduce alert fatigue through signal‑based monitoring and correlation of metrics, logs, and traces.
- Partner with application teams to define instrumentation standards for metrics and logging.
- Use observability data to improve MTTD, MTTR, and reliability outcomes.
Automation & Python Engineering
- Develop Python‑based automation for monitoring, alert remediation, deployments, scaling, and recovery.
- Build self‑healing workflows integrated with Prometheus alerts and Splunk signals.
- Create reusable automation frameworks and internal SRE tooling.
- Embed automation into CI/CD pipelines to improve deployment safety and reliability.
AI/ML‑Driven Reliability (AIOps)
- Apply AI/ML techniques to observability and operations use cases, including:
- Anomaly detection on Prometheus metrics
- Log pattern analysis and correlation in Splunk
- Predictive capacity and trend forecasting
- Noise reduction and intelligent alerting
- Partner with data and platform teams to operationalize ML models in production.
- Evaluate and integrate AIOps capabilities into the observability ecosystem.
Incident Management & RCA
- Serve as incident commander and senior escalation point for P1/P2 incidents.
- Lead blameless post‑incident reviews (PIRs) backed by Grafana metrics and Splunk evidence.
- Drive corrective and preventive actions to completion.
Platform & Application Partnership
- Collaborate with platform, application, cloud, and SRE teams to embed reliability and observability by design.
- Influence architectural decisions to ensure systems are observable, scalable, and operable.
- Provide SRE guidance during major releases, migrations, and modernization initiatives.
Security, Risk & Compliance
- Ensure observability and automation comply with enterprise security and audit requirements.
- Support resilience validation, failover drills, and business continuity testing.
Technical Leadership
- Mentor and guide SRE and software engineers.
- Define standards for observability, automation, reliability, and incident response.
- Act as the technical authority for complex production and platform issues.
Posting End Date:
21 May 2026*Job posting may come down early due to volume of applicants.
We Value Equal Opportunity
Wells Fargo is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, status as a protected veteran, or any other legally protected characteristic.
Employees support our focus on building strong customer relationships balanced with a strong risk mitigating and compliance-driven culture which firmly establishes those disciplines as critical to the success of our customers and company. They are accountable for execution of all applicable risk programs (Credit, Market, Financial Crimes, Operational, Regulatory Compliance), which includes effectively following and adhering to applicable Wells Fargo policies and procedures, appropriately fulfilling risk and compliance obligations, timely and effective escalation and remediation of issues, and making sound risk decisions. There is emphasis on proactive monitoring, governance, risk identification and escalation, as well as making sound risk decisions commensurate with the business unit’s risk appetite and all risk and compliance program requirements.
Candidates applying to job openings posted in Canada: Applications for employment are encouraged from all qualified candidates, including women, persons with disabilities, aboriginal peoples and visible minorities. Accommodation for applicants with disabilities is available upon request in connection with the recruitment process.
Applicants with Disabilities
To request a medical accommodation during the application or interview process, visit Disability Inclusion at Wells Fargo.
Drug and Alcohol Policy
Wells Fargo maintains a drug free workplace. Please see our Drug and Alcohol Policy to learn more.
Wells Fargo Recruitment and Hiring Requirements:
a. Third-Party recordings are prohibited unless authorized by Wells Fargo.
b. Wells Fargo requires you to directly represent your own experiences during the recruiting and hiring process.
Job role
Job requirements
About company
Similar jobs you can apply for
Software / Web DeveloperMobile Application Developer
Cubefore Solutions Private Limited
Trainee Process Reengineering Consultant
Navabharat Limited
Quality Control Executive
Bhati Solitaire Llp
Full-stack Developer
Meritocracy Techlytics Private Limited