Provide technical vision and create roadmaps to align with the long-term technology strategy
Proficient in building data ingestion pipelines to ingest data from heterogeneous sources like RDBMS, Hadoop, Flat Files, REST APIs, AWS S3
Key player in Hadoop Data Ingestion team which enables data science community to develop analytical/predictive models and implementing ingestion pipelines using Hadoop Eco-System Tools Flume, Sqoop, Hive, HDFS, Pyspark , Trino and Presto sql.
Work on governance aspects for the Data Analytics applications such as documentation, design reviews, metadata etc
Extensively use DataStage, Teradata\Oracle utility scripts and Data stage jobs to perform data transformation/Loading across multiple FSLDM layers
Review and help streamline the design for big data applications and ensure that the right tools are used for the relevant use cases
Engage users to achieve concurrence on technology provided solution. Conduct review of solution documents along with Functional Business Analys and Business Unit for sign-off
Create technical documents (functional/non-functional specification, design specification, training manual) for the solutions. Review interface design specifications created by development team
Participate in selection of product/tools via RFP/POC.
Provide inputs to help with the detailed estimation of projects and change requests
Execute continuous service improvement and process improvement plans
Requirements
Bachelors degree in computer science or similar relevant education background.
3 - 7 years of experience with Data Engineering experience in the banking domain including implementation of Data Lake, Data Warehouse, Data Marts, Lake Houses etc
Experience in data modeling for large scale data warehouses, business marts on Hadoop based databases, Teradata, Oracle etc for a bank
Expertise in Big Data Ecosystem such as Cloudera (Hive, Impala, Hbase, Ozone, Iceberg), Spark, Presto, Kafka
Experience in a Metadata tool such as IDMC, Axon, Watson Knowledge Catalog, Collibra etc
Expertise in designing frameworks using Java, Scala, Python and creation of applications, utilities using these tools
Expertise in operationalizing machine learning models including optimizing feature pipelines and deployment using batch/API, model monitoring, implementation of feedback loops
Knowledge of report/dashboards using a reporting tool such as Qliksense, PowerBI
Expertise in integrating applications with Devops tools
Knowledge of building applications on MPP appliances such as Teradata, Greenplum, Netezza is a mandatory
Domain knowledge of the banking industry include subject areas such as Customer, Products, CASA, Cards, Loans, Trade, Treasury, General Ledger, Origination, Channels, Limits, Collaterals, Campaigns etc