About Gradera

Gradera is an AI‑Native Services firm pioneering Software‑Orchestrated Services™ —a new enterprise transformation model where software orchestrates human expertise, digital workers, and enterprise systems to deliver governed, scalable outcomes. We help enterprises move beyond fragmented AI pilots, disconnected automation, and labor‑led models by redesigning how work gets done across operations, product, engineering, customer experience, data, and core workflows.

Data Scientist

Location : Hyderabad, Telangana

Department : Engineering

Employment Type : Full-Time

Overview

We are seeking a highly analytical and curious Data Scientist to transform complex, real-world data into meaningful insights and scalable machine learning solutions. In this role, you will work across the full data lifecycle—partnering with data engineering and business teams to explore, clean, and understand diverse datasets, and translating those insights into models, experiments, and data-driven recommendations.

You will play a critical role in bridging raw data and business impact , developing a deep understanding of how data is generated, structured, and used. This includes conducting rigorous exploratory analysis, assessing data quality and lineage, and building robust analytical datasets that power advanced modeling and reporting.

This role offers the opportunity to work with large-scale data platforms, cloud infrastructure, and modern machine learning frameworks , while contributing to impactful decision-making through experimentation, analytics, and self-service data tools.

Role & Responsibilities

Collect, clean, and analyze large structured and unstructured datasets from multiple internal and external sources

Conduct thorough exploratory data analysis (EDA) to understand data distributions, relationships, outliers, and missing value patterns

Profile and audit datasets to assess data quality, completeness, consistency, and fitness for modeling

Investigate and document data lineage — understanding where data originates, how it flows, and how it transforms across systems

Identify and resolve data anomalies, inconsistencies, and integrity issues in collaboration with data engineering teams

Develop a deep understanding of the business domain and the underlying data that represents it — including what each field means, how it is captured, and what its limitations are

Translate raw, messy, real-world data into clean, well-understood analytical datasets ready for modeling and reporting

Apply statistical techniques such as correlation analysis, hypothesis testing, variance analysis, and distribution fitting to extract meaningful signals from noise

Build and deploy machine learning models including regression, classification, clustering, NLP, and time-series analysis

Design, evaluate, and analyze A/B experiments and controlled tests using causal inference techniques

Develop data-driven recommendations backed by rigorous statistical reasoning

Write clean, production-ready code in Python or R

Collaborate with data engineers to build reliable data pipelines and feature stores

Deploy and monitor ML models using MLOps best practices on cloud infrastructure

Build dashboards and self-serve analytics tools to support stakeholder decision-making

Data Understanding & Analysis Skills

Strong ability to interrogate unfamiliar datasets and quickly develop a working understanding of their structure, semantics, and quirks

Experience working with messy, incomplete, or poorly documented real-world data

Skilled in identifying hidden patterns, trends, seasonality, and anomalies through visual and statistical exploration

Ability to ask the right questions about data — challenging assumptions, validating sources, and understanding the context in which data was collected

Proficiency in data profiling, descriptive statistics, and summary reporting to communicate the shape and health of a dataset

Experience creating data dictionaries, documentation, and data quality reports to support team-wide data understanding

Comfort working across structured (relational tables), semi-structured (JSON, XML), and unstructured (text, logs, sensor streams) data formats

Technical Skills Required

Proficiency in Python (pandas, NumPy, scikit-learn, PyTorch or TensorFlow) and/or R

Strong SQL skills with hands-on experience in DB2 and SQL Server

Experience with Databricks for large-scale data processing, feature engineering, and model training

Familiarity with cloud platforms: Azure or AWS

Experience with data warehouses and big data platforms (Databricks, Snowflake, or Redshift)

Knowledge of MLOps tools such as MLflow, Kubeflow, or Airflow

Experience with streaming data technologies such as Kafka or Spark

Solid foundation in probability, statistics, linear algebra, and experimental design

Nice to Have

Experience with deep learning, NLP, computer vision, or Bayesian methods

Familiarity with real-time or streaming data pipelines

Open-source contributions or published research