About Gradera
Gradera is an AI‑Native Services firm pioneering Software‑Orchestrated Services™ —a new enterprise transformation model where software orchestrates human expertise, digital workers, and enterprise systems to deliver governed, scalable outcomes. We help enterprises move beyond fragmented AI pilots, disconnected automation, and labor‑led models by redesigning how work gets done across operations, product, engineering, customer experience, data, and core workflows.
Data Scientist
Location : Hyderabad, Telangana
Department : Engineering
Employment Type : Full-Time
Overview
We are seeking a highly analytical and curious Data Scientist to transform complex, real-world data into meaningful insights and scalable machine learning solutions. In this role, you will work across the full data lifecycle—partnering with data engineering and business teams to explore, clean, and understand diverse datasets, and translating those insights into models, experiments, and data-driven recommendations.
You will play a critical role in bridging raw data and business impact , developing a deep understanding of how data is generated, structured, and used. This includes conducting rigorous exploratory analysis, assessing data quality and lineage, and building robust analytical datasets that power advanced modeling and reporting.
This role offers the opportunity to work with large-scale data platforms, cloud infrastructure, and modern machine learning frameworks , while contributing to impactful decision-making through experimentation, analytics, and self-service data tools.
Role & Responsibilities
Collect, clean, and analyze large structured and unstructured datasets from multiple internal and external sources
Conduct thorough exploratory data analysis (EDA) to understand data distributions, relationships, outliers, and missing value patterns
Profile and audit datasets to assess data quality, completeness, consistency, and fitness for modeling
Investigate and document data lineage — understanding where data originates, how it flows, and how it transforms across systems
Identify and resolve data anomalies, inconsistencies, and integrity issues in collaboration with data engineering teams
Develop a deep understanding of the business domain and the underlying data that represents it — including what each field means, how it is captured, and what its limitations are
Translate raw, messy, real-world data into clean, well-understood analytical datasets ready for modeling and reporting
Apply statistical techniques such as correlation analysis, hypothesis testing, variance analysis, and distribution fitting to extract meaningful signals from noise
Build and deploy machine learning models including regression, classification, clustering, NLP, and time-series analysis
Design, evaluate, and analyze A/B experiments and controlled tests using causal inference techniques
Develop data-driven recommendations backed by rigorous statistical reasoning
Write clean, production-ready code in Python or R
Collaborate with data engineers to build reliable data pipelines and feature stores
Deploy and monitor ML models using MLOps best practices on cloud infrastructure
Build dashboards and self-serve analytics tools to support stakeholder decision-making
Data Understanding & Analysis Skills
Strong ability to interrogate unfamiliar datasets and quickly develop a working understanding of their structure, semantics, and quirks
Experience working with messy, incomplete, or poorly documented real-world data
Skilled in identifying hidden patterns, trends, seasonality, and anomalies through visual and statistical exploration
Ability to ask the right questions about data — challenging assumptions, validating sources, and understanding the context in which data was collected
Proficiency in data profiling, descriptive statistics, and summary reporting to communicate the shape and health of a dataset
Experience creating data dictionaries, documentation, and data quality reports to support team-wide data understanding
Comfort working across structured (relational tables), semi-structured (JSON, XML), and unstructured (text, logs, sensor streams) data formats
Technical Skills Required
Proficiency in Python (pandas, NumPy, scikit-learn, PyTorch or TensorFlow) and/or R
Strong SQL skills with hands-on experience in DB2 and SQL Server
Experience with Databricks for large-scale data processing, feature engineering, and model training
Familiarity with cloud platforms: Azure or AWS
Experience with data warehouses and big data platforms (Databricks, Snowflake, or Redshift)
Knowledge of MLOps tools such as MLflow, Kubeflow, or Airflow
Experience with streaming data technologies such as Kafka or Spark
Solid foundation in probability, statistics, linear algebra, and experimental design
Nice to Have
Experience with deep learning, NLP, computer vision, or Bayesian methods
Familiarity with real-time or streaming data pipelines
Open-source contributions or published research