Data Engineer (Senior) Job in BetterData Pte Ltd

Data Engineer (senior)

SG, Singapore

BetterData Pte Ltd

4 Current Jobs Openings

Apply Now

Job Description

Job Posted
Empty
Location
Singapore
Full time
Non-Remote
No Visa Sponsored

Who Are We Looking For:

---------------------------

We are seeking a experienced

Data Engineer (Senior)

to build and maintain data infrastructure to convert our research into scalable, production-ready solutions for synthetic tabular data generation. You will also architect and operate our large-scale data curation, scraping, and cleaning pipelines to deliver massive amounts of datasets for pretraining and finetuning large language models on tabular and unstructured domains.

This is an

individual contributor (IC)

role suited for someone who thrives in a fast-paced, early-stage start-up environment. The ideal candidate has

experience scaling data and machine learning systems

to handle datasets with

billions of records

and can build and optimize complex data pipelines for enterprise applications. You'll work closely with software, machine learning and applied research teams to optimize performance and ensure seamless integration of systems, handling data from

financial institutions, government agencies, consumer brands and more

Key Responsibilities:

-------------------------

#

Data Infrastructure and Pipeline Development

Build data ingestion pipelines from

enterprise relational databases

(e.g.

Oracle

SQL Server

PostgreSQL

MySQL,

Databricks, Snowflake

BigQuery)

and files (e.g.

Parquet, CSV)

for large-scale synthetic data pipelines.

Design scalable data pipelines

for

batch processing.

Architect and maintain data warehouses and data lakes (e.g. Delta Lake)

optimized for synthetic data training and generation workflows.

Seamlessly transform

Pandas-based research code

into

production-ready pipelines.

Build automated data quality monitoring and validation systems

to ensure data integrity throughout the pipeline lifecycle.

Implement comprehensive data lineage tracking

and audit capabilities for regulatory compliance and privacy validation.

Design robust

error handling

mechanisms, with

automatic retries

and

data recovery

in case of pipeline failures.

Track performance metrics such as

data throughput

latency

, and

processing times

to ensure efficient pipeline operations at scale.

Implement monitoring and alerting (e.g. Prometheus, Grafana) for pipeline health, throughput, and data quality metrics.

Optimize resource allocation and cost efficiency for distributed processing at terabytes to petabyte scale.

#

Massive-Scale Data Collection & Ingestion

Design and build distributed web scraping clusters to extract data from millions of pages.

Build LLM-aided data filtering systems combining automated model scoring to evaluate and prioritize high-quality content.

#

Understanding of ML concepts and algorithms

Fair understanding of machine learning concepts, training workflows and algorithms, with familiarity in tools like PyTorch and Hugging Face.

#

Documentation & Reporting

Create clear

documentation

of data pipelines, workflows, and system architectures to enable smooth handovers and collaboration across teams.

Qualifications

-----------------------

Bachelor's degree in Computer Science, Software Engineering, Data Engineering, or related field with strong foundation in distributed systems and data processing

Expert proficiency at

scaling data pipelines

and

machine learning systems

to handle

billions of rows

in enterprise environments.

3+ years of experience in building scalable data solutions with

Python

and distinct libraries such as:

Data Science Libraries: Pandas, NumPy, Scikit-learn.

Deep Learning Libraries: Pytorch

Scaling Libraries: Spark, Dask, etc

Orchestration tools: Airflow, Dagster, etc

Data validation: Pandera, Pydantic, etc

Expertise in

automated data quality frameworks

including

rule-based and AI-based automation

for

format validation, anomaly detection, statistical validation.

Proficiency in building

ETL/ELT pipelines

and managing data across

relational databases (e.g. PostgreSQL, Oracle Database, SQL Server, MySQL)

data lakes (e.g. Delta Lake)

and

cloud storage

.

Experience in building

data monitoring and alerting systems.

Hands-on experience with web scraping tools (Scrapy, Selenium, Puppeteer).

Experience building ML data pipelines

and supporting infrastructure for training and deploying machine learning models at scale.

#

Good to Have

Experience with data governance frameworks

and compliance requirements (GDPR, CCPA, PDPA) in data processing systems.

Experience with containerization and orchestration

using Docker, Kubernetes, and cloud-native deployment strategies.

Strong knowledge of cloud platforms

(AWS, GCP, Azure) and their data services (S3, BigQuery, Data Lake Storage, etc).

Why Join Us:

----------------

This is a unique opportunity for someone looking to actively

build and scale

systems in a fast-moving start-up. If you've successfully scaled machine learning and data systems to billions of rows and thrive in a dynamic, hands-on environment, this role is for you.

Benefits:

-------------

Flexible time-off arrangements

Flexible work arrangements - work from office at One North or WFH on some days

Equity eligibility: Competitive equity packages, with grant size evaluated based on the candidate's experience, skills, and impact.

How to apply:

-----------------

Does this role sound like a good fit to you?

We see this first:

Submit your application

We see this last:

If the above does not work, you may email us your CV (pdf format) at jobs@betterdata.ai.

Include the title of the role in your subject

Indicate your available start - end dates (DDMMYY - DDMMYY)

Send along links/supporting information that best showcase the relevant things you have built and done

Beware of fraud agents! do not pay money to get a job

MNCJobz.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.

Related Jobs

Senior / Lead Data Engineer

MasterCard

Singapore

Apply Now
Data Engineer Senior

FedEx

Singapore

Apply Now

D

Senior Data Engineer DCS

DCI CONSULTANTS PRIVATE LIMITED

SG

Apply Now
DevOps (ShipHats) Engineer / Senior Associate, Data & AI, Technology Consulting

EY

Raffles Quay, S00, SG

Apply Now

Job Detail

Job Id

JD1661689
Industry

Not mentioned
Total Positions

1
Job Type:

Full Time
Salary:

Not mentioned
Employment Status

Permanent
Job Location

SG, Singapore
Education

Not mentioned

Jobs by Function

Popular Job Skills

Popular Industries

Popular Cities

Jobseekers

Employers