with 5 years of hands-on experience in designing, developing, and optimizing big data pipelines and solutions. The ideal candidate will have strong expertise in
SQL, Python, Apache Spark, Hive, and Hadoop
ecosystems and will be responsible for building scalable data platforms to support business intelligence, analytics, and machine learning use cases.
Key Responsibilities
Design, develop, and maintain scalable
ETL pipelines
using Spark, Hive, and Hadoop.
Write efficient
SQL queries
for data extraction, transformation, and analysis.
Develop automation scripts and data processing workflows using
Python
.
Optimize data pipelines for
performance, reliability, and scalability
.
Work with structured and unstructured data from multiple sources.
Ensure
data quality, governance, and security
throughout the data lifecycle.
Collaborate with cross-functional teams (Data Scientists, Analysts, and Business stakeholders) to deliver data-driven solutions.
Monitor and troubleshoot production data pipelines.
Requirements
Required Skills & Qualifications
5+ years of experience
in Data Engineering / Big Data development.
Strong expertise in
for data manipulation, scripting, and automation.
Hands-on experience with
Apache Spark
(PySpark/Scala) for large-scale data processing.
Solid knowledge of
Hive
for querying and managing data in Hadoop environments.
Strong working knowledge of
Hadoop ecosystem
(HDFS, YARN, MapReduce, etc.).
Experience with data pipeline orchestration tools (Airflow, Oozie, or similar) is a plus.
Familiarity with cloud platforms (AWS, Azure, or GCP) is preferred.
* Excellent problem-solving, debugging, and communication skills.
Beware of fraud agents! do not pay money to get a job
MNCJobz.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.