Ai Driven Big Data Engineer (phd Required)

Singapore, S00, SG, Singapore

Job Description

AI- Driven Big Data Engineer


=================================

Employment Type:

Full-Time

Location

: Remote, Singapore

Level:

Entry to Mid Level (PhD Required)

Bridge Cutting-Edge AI Research with Petabyte-Scale Data Systems


----------------------------------------------------------------------


Pixalate is an online trust and safety platform that protects businesses, consumers and children from deceptive, fraudulent and non-compliant mobile, CTV apps and websites. We're seeking a PhD-level Big Data Engineer to revolutionize how AI transforms massive-scale data operations.


Our impact is real and measurable. Our software has uncovered:

Gizmodo: An iCloud Feature Is Enabling a $65 Million Scam Washington Post: Your kids' apps are spying on them ProPublica: Porn, Piracy, Fraud: What Lurks Inside Google's Black Box Ad Empire

About the Role


------------------


Work at the intersection of big data and AI, where you'll develop intelligent, self-healing data systems processing trillions of data points daily. You'll have autonomy to pursue research in distributed ML systems and AI-enhanced data optimization, with your innovations deployed at unprecedented scale within months, not years.


This isn't traditional data engineering - you'll implement agentic AI for autonomous pipeline management, leverage LLMs for data quality assurance, and create ML-optimized architectures that redefine what's possible at petabyte scale.

Key Research Areas & Responsibilities


------------------------------------------

AI-Enhanced Data Infrastructure



Design intelligent pipelines with autonomous optimization and self-healing capabilities using agentic AI Implement ML-driven anomaly detection for terabyte-scale datasets

Distributed Machine Learning at Scale



Build distributed ML pipelines Develop real-time feature stores for billions of transactions Optimize feature engineering with AutoML and neural architecture search

Required Qualifications


---------------------------

Education & Research



PhD in Computer Science, Data Science, or Distributed Systems (exceptional Master's with research experience considered) Published research or expertise in distributed computing, ML infrastructure, or stream processing

Technical Expertise



Core Languages

: Expert SQL (window functions, CTEs), Python (Pandas, Polars, PyArrow), Scala/Java

Big Data Stack

: Spark 3.5+, Flink, Kafka, Ray, Dask

Storage & Orchestration

: Delta Lake, Iceberg, Airflow, Dagster, Temporal

Cloud Platforms

: GCP (BigQuery, Dataflow, Vertex AI), AWS (EMR, SageMaker), Azure (Databricks)

ML Systems

: MLflow, Kubeflow, Feature Stores, Vector Databases, scikit-learn + search CV, H2O AutoML, auto-sklearn, GCP Vertex AI AutoML Tables

Neural Architecture Search:

KerasTuner, AutoKeras, Ray Tune, Optuna, PyTorch Lightning + Hydra

Research Skills



Track record with 100TB+ datasets Experience with lakehouse architectures, streaming ML, and graph processing at scale Understanding of distributed systems theory and ML algorithm implementation

Preferred Qualifications


----------------------------

Experience applying LLMs to data engineering challenges Ability to translate complex AutoML/NAS research into practical production workflows Hands-on project examples of feature engineering automation or NAS experiments Proven success in automating ML pipelines, from raw data to an optimized model architecture Contributions to Apache projects (Spark, Flink, Kafka) Knowledge of privacy-preserving techniques and data mesh architectures

What Makes This Role Unique


-------------------------------


You'll work with one of the few truly petabyte-scale production datasets outside of major tech companies, with the freedom to experiment with cutting-edge approaches. Unlike traditional big data roles, you'll apply the latest AI research to fundamental data challenges - from using LLMs to understand data quality issues to implementing agentic systems that autonomously optimize and heal data pipelines.




UNwu4kYb0N

Beware of fraud agents! do not pay money to get a job

MNCJobz.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.


Related Jobs

Job Detail

  • Job Id
    JD1633988
  • Industry
    Not mentioned
  • Total Positions
    1
  • Job Type:
    Full Time
  • Salary:
    Not mentioned
  • Employment Status
    Permanent
  • Job Location
    Singapore, S00, SG, Singapore
  • Education
    Not mentioned