We are looking for an experienced and highly skilled
Hadoop Data Engineer
to join our dynamic team. The ideal candidate will have hands-on expertise in developing optimized data pipelines using
Python, PySpark, Scala, Spark-SQL, Hive
, and other big data technologies. You will be responsible for translating complex business and technical requirements into efficient data pipelines and ensuring high-quality code delivery through collaboration and code reviews.
Roles & Responsibilities:
Data Transformation & Pipeline Development:
Design and implement optimized data pipelines using
PySpark, Python, Scala, and Spark-SQL
.
Build complex data transformation logic and ensure data ingestion from source systems to
Data Lakes
(Hive, HBase, Parquet).
Produce unit tests for Spark transformations and helper methods.
Collaboration & Communication:
Work closely with
Business Analysts
to review test results and obtain sign-offs.
Prepare comprehensive design and operational documentation for future reference.
Code Quality & Review:
Conduct peer code reviews and act as a gatekeeper for quality checks.
Ensure quality and efficiency in the delivery of code through pair programming and collaboration.
Production Deployment:
Ensure smooth production deployments and perform post-deployment verification.
Technical Expertise:
Provide hands-on coding and support in a
highly collaborative environment
.
Contribute to development, automation, and continuous improvement practices.
System Knowledge:
Strong understanding of
data structures, data manipulation, distributed processing
, and
application development
.
Exposure to technologies like
Kafka
,
Spark Streaming
, and
ML
is a plus.
RDBMS & Database Management:
Hands-on experience with
RDBMS
technologies (MariaDB, SQL Server, MySQL, Oracle).
Knowledge of
PLSQL
and stored procedures is an added advantage.
Other Responsibilities:
Exposure to
TWS jobs
for scheduling.
Knowledge and experience in
Hadoop tech stack
,
Cloudera Distribution
, and
CI/CD pipelines
using
Git, Jenkins
.
Experience with
Agile Methodologies
and
DevOps
practices.
Technical Requirements:
Experience:
6-9.5 years of experience in
Hadoop
,
Spark
,
PySpark
,
Scala
,
Hive
,
Spark-SQL
,
Python
,
Impala
,
CI/CD
, and
Git
.
Strong understanding of
Data Warehousing Methodology
and
Change Data Capture
(CDC).
In-depth knowledge of
Hadoop & Spark
ecosystems with hands-on experience in
PySpark
and
Hadoop
technologies.
Proficiency in working with
RDBMS
such as
MariaDB
,
SQL Server
,
MySQL
, or
Oracle
.
Experience with
stored procedures
and
TWS job scheduling
.
Solid experience with
Enterprise Data Architectures
and
Data Models
.
Background in
Core Banking
or
Finance domains
is preferred; experience in
AML
(Anti-Money Laundering) domain is a plus.
Skills & Qualifications:
Strong hands-on coding skills in
Python
,
PySpark
,
Scala
,
Spark-SQL
.
Proficient in
Hadoop
ecosystem (Hive, HBase, etc.).
Knowledge of
CI/CD
,
Agile
, and
DevOps
methodologies.
Good understanding of
data integration
,
data pipelines
, and
distributed data systems
.
Experience with
Oracle
,
PLSQL
, and working with large-scale databases.
* Strong analytical and problem-solving skills, with an ability to troubleshoot complex data issues.
Beware of fraud agents! do not pay money to get a job
MNCJobz.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.