About Gradera

Gradera defines a new category of enterprise transformation called Software-Orchestrated Services™ - where software orchestrates human expertise, digital workers, and enterprise systems to deliver governed outcomes at scale. As an AI Native Services firm, we help enterprises redesign how work gets done across operations, product, engineering, customer experience, data, and enterprise workflows to move beyond fragmented AI pilots and disconnected automation toward measurable business outcomes

Overview

We are seeking skilled Data Engineers to join our Data & Digital Twin Foundation team. You will design, build, and maintain data pipelines that power digital twin platforms, real-time operational systems, and AI/ML workloads. Working closely with data architects, simulation engineers, and ML teams, you will transform raw operational data into high-quality, governed datasets that drive intelligent decision-making.

Key Responsibilities

Design, develop, and maintain scalable data pipelines using Databricks, PySpark, and Delta Lake

Build real-time and batch data ingestion pipelines from diverse operational systems using high-performance Kafka data pipelines.

Implement data transformations that serve digital twin platforms and operational analytics

Integrate Kafka event streams with Databricks for real-time operational state updates

Implement data quality checks using Delta Live Tables expectations

Ensure data governance compliance through Unity Catalog (lineage, access control, metadata)

Optimize pipeline performance, reliability, and cost efficiency

Write clean, well-documented, and testable code following engineering best practices

Collaborate with ML engineers to deliver feature-engineered datasets

Participate in code reviews, knowledge sharing, and continuous improvement initiatives

Support production data systems through monitoring, troubleshooting, and incident resolution.

Build business data warehouse solutions using Terradata for business intelligence.

Our core data platform stack includes:

Data Platform & Lakehouse

Databricks as the single point of truth for all data

Realtime Data Pipelines implemented using Kafka for data ingestion.

Databricks SQL for analytical queries

Unity Catalog for metadata management and governance

Terradata for data warehouse and business intelligence.

Stream & Event Processing

Apache Kafka for real-time event ingestion

Structured Streaming for continuous data processing

Delta Live Tables for declarative, quality-enforced pipelines

Data Quality

Delta Live Tables expectations for data validation

Data profiling and anomaly detection