Principal Cloud Site Reliability Engineer

Oracle Financial Services Software Ltd

Hyderabad

Not disclosed

Work from Office

Full Time

Min. 6 years

Job Details

Job Description

Site Reliability Developer 4

Job Summary:

As a Principal Cloud Engineer (SRE), you will play a key role in ensuring the reliability, performance, and scalability of modern cloud-based data platforms. This position involves close collaboration with development, operations, and security teams to automate processes, monitor system health, and maintain optimal uptime for critical production workloads. You will leverage your technical expertise to design, automate, and maintain large-scale data pipelines and lakehouse infrastructure, supporting mission-critical data engineering and analytics initiatives.

Key Responsibilities:

  • Design, implement, and maintain scalable, secure cloud infrastructure for large data platforms (data lakes, data warehouses, and lakehouse solutions) on OCI, AWS, Azure, or GCP.
  • Collaborate with Data Engineering teams to build robust, automated ETL/ELT pipelines using tools such as Apache Spark, Databricks, Kafka, or Oracle Cloud Data Integration.
  • Implement site reliability engineering best practices tailored for data systems: SLO/SLI definition, error budgeting, automated monitoring, data integrity validation, and incident response for data workloads.
  • Design and optimize data storage solutions leveraging both structured and unstructured storage (object storage, data lake/lakehouse platforms like Delta Lake, Iceberg etc.,).
  • Automate infrastructure provisioning and CI/CD deployments for data pipelines and analytic workloads with tools like Terraform, Ansible, or CloudFormation.
  • Instrument and monitor data platform components for performance, availability, resource consumption, and data quality using observability tools (e.g., Grafana, Splunk).
  • Troubleshoot and resolve complex data pipeline or infrastructure issues, conducting root cause analyses and post-incident reviews.
  • Advocate for and implement security, governance, and compliance best practices—including data privacy, encryption, and access controls.
  • Mentor junior team members and promote knowledge sharing around data platform reliability.

Qualifications:

  • Bachelor’s or Master’s in Computer Science, Engineering, Data Science, or related field, or equivalent experience.
  •  6 or more years experience in cloud engineering, SRE, or DevOps roles with at least 4 years supporting data engineering initiatives.
  • Practical experience designing and operating large-scale cloud-based data platforms (data lakes, warehouses, or lakehouses).
  • Strong hands-on skills with infrastructure-as-code (e.g., Terraform), automation (Python/Scala), and containerization (Kubernetes, Docker).
  • Familiarity with data processing frameworks (Apache Spark, Databricks, Hadoop), as well as orchestration tools (Airflow, Oozie, or similar).
  • Working knowledge of distributed storage, data formats (Parquet, Avro), and modern analytics platforms.
  • Solid understanding of networking, cloud security, and regulatory compliance for data platforms.
  • Strong analytical, troubleshooting, and communication skills.
  • Preferred certifications: Cloud Architect/Engineer (OCI, AWS, Azure, GCP), Databricks, or relevant data engineering credentials.

Job role

Work location

HYDERABAD, TELANGANA, India

Department

Software Engineering

Role / Category

DevOps

Employment type

Full Time

Shift

Day Shift

Job requirements

Experience

Min. 6 years

About company

Name

Oracle Financial Services Software Ltd

Job posted by Oracle Financial Services Software Ltd

Apply on company website