Job Description
Design and build production data engineering solutions to deliver data pipeline patterns using following Google Cloud Platform (GCP) services:
• In-depth understanding of Google's product technology and underlying architectures
• BigQuery – Warehouse/ data marts – Through understanding of Big Query internals to write efficient queries for ELT needs, creation of views/materialized views, creation of reusable store procedures etc.
• DataFlow (Apache Beam) – reusable Flex templates/ data processing frameworks using Java for both batch and stream needs.
• Experience of designing, building, and deploying production-level data pipelines using Kafka; Strong experience working on Event Driven Architecture
• Strong knowledge of the Kafka Connect framework, with experience using several connector types HTTP REST proxy, JMS, File, SFTP, JDBC etc
• Experience in handling huge volumes of streaming messages from Kafka
• Cloud Composer (Apache Airflow) – to build, monitor and orchestrating the pipeline
• Knowledge on BigTable
• Cloud SQL, Compute Engine, Cloud Function, Cloud Run and App Engine, Cloud Storage
• Experience with open-source distributed storage and processing utilities in the Apache Hadoop family.
• Extensive knowledge on processing various file formats orc, Avro, csv, json, xml etc.
• Knowledge/experience in any ETL tools like DataStage/Informatica – Ability to understand existing on-premises ETL workflows and redesign them in GCP.
• Experience and expertise on Terraform to deploy the GCP’s in CI/CD.
• Knowledge/ Experience on connecting to on-prem API’s from google cloud.
Powered by JazzHR
c2SJwi56hb