Ai Cloud Solution Architect & Engineer

SG, Singapore

Job Description

About the project


---------------------


Join Neurons Lab as an

AI Cloud Solution Architect & Engineer

- a unique hybrid role combining strategic solution design with hands-on engineering execution. You'll bridge the gap between client requirements and technical implementation, designing AI/ML architectures and then building them yourself using modern cloud infrastructure practices.

Our Focus

: We specialize in serving

Banking, Financial Services, and Insurance (BFSI)

enterprise customers with stringent compliance, security, and regulatory requirements. You'll work on mission-critical AI/ML systems where security architecture, data governance, and regulatory compliance are paramount.


This role is perfect for technical professionals who love both the "what" and the "how" - architecting elegant solutions AND rolling up their sleeves to code, deploy, and optimize them. You'll work across multiple AI consulting engagements, from Generative AI workshops to enterprise ML platform development, all while maintaining the highest standards of security and compliance required by financial institutions.

Duration:

Part-time long-term engagement with project-based allocations

Reporting:

Direct report to Head of Cloud

Objective


=============


Deliver end-to-end AI cloud solutions by combining architectural excellence with hands-on engineering capabilities:

Architecture & Design

: Gather requirements, design cloud architectures, calculate ROI, and create technical proposals for AI/ML solutions

Engineering Excellence

: Build production-grade infrastructure using IaC, develop APIs and prototypes, implement CI/CD pipelines, and manage AI workload operations

Client Success

: Transform business requirements into working solutions that are secure, scalable, cost-effective, and aligned with AWS best practices

Knowledge Transfer

: Create reusable artifacts, comprehensive documentation, and architectural patterns that accelerate future project delivery

KPI


=======

Architecture & Pre-Sales:



Design and document 3+ solution architectures per month with comprehensive diagrams and specifications Achieve 80%+ client acceptance rate on proposed architectures and estimates Deliver ROI calculations and cost models within 2 business days of request

Engineering Delivery:



Deploy infrastructure through IaC (AWS CDK/Terraform) with zero manual configuration Create at least 3 reusable IaC components or architectural patterns per quarter Implement CI/CD pipelines for all projects with automated testing and deployment Maintain 95%+ uptime for production AI/ML inference endpoints Document architecture and implementation details weekly for knowledge sharing

Quality & Best Practices:



Ensure all solutions pass AWS Well-Architected Review standards Deliver comprehensive documentation within 1 week of architecture completion Create simplified UIs/demos for PoC validation and client presentations

Areas of Responsibility


===========================

Solution Architecture (40%)


-------------------------------

Requirements & Design:



Elicit and document business and technical requirements from clients Design end-to-end cloud architectures for AI/ML solutions (training, inference, data pipelines) Create architecture diagrams, technical specifications, and implementation roadmaps Evaluate technology options and recommend optimal AWS services for specific use cases

Business Analysis:



Calculate ROI, TCO, and cost-benefit analysis for proposed solutions Estimate project scope, timelines, team composition, and resource requirements Participate in presales activities: technical presentations, demos, and proposal support Collaborate with sales team on SOW creation and customer workshops

Strategic Planning:



Design for scalability, security, compliance, and cost optimization from day one Create reusable architectural patterns and reference architectures Stay current with AWS AI/ML services and emerging cloud technologies

Cloud Engineering & AI Infrastructure (60%)


================================================

Infrastructure as Code Development:



Build and maintain cloud infrastructure using

AWS CDK

(primary) and

Terraform

Develop reusable IaC components and modules for common patterns Implement infrastructure for AI/ML workloads: GPU clusters, model serving, data lakes Manage compute resources: EC2, ECS, EKS, Lambda, SageMaker compute instances

Application Development:



Develop

Python

applications: FastAPI backends, data processing scripts, automation tools Create prototype interfaces using Streamlit, React, or similar frameworks Build and integrate RESTful APIs for AI model serving and data access Implement authentication, authorization, and API security best practices

AI/ML Operations (MLOps):



Deploy and manage AI/ML model serving infrastructure (SageMaker endpoints, containerized models) Build ML pipelines: data ingestion, preprocessing, training automation, model deployment Implement model versioning, experiment tracking, and A/B testing frameworks Manage GPU resource allocation, training job scheduling, and compute optimization Monitor model performance, inference latency, and system health metrics

DevOps & Automation:



Design and implement CI/CD pipelines using GitHub Actions, GitLab CI, or AWS CodePipeline Automate deployment processes with infrastructure testing and validation Implement monitoring, logging, and alerting using CloudWatch, Prometheus, Grafana Manage containerization with Docker and orchestration with Kubernetes/ECS

Data Engineering:



Build data pipelines for AI training and inference using AWS Glue, Step Functions, Lambda Design and implement data lakes using S3, Lake Formation, and data cataloging Implement automated and scheduled data synchronization processes Optimize data storage and retrieval for ML workloads

Security & Compliance:



Implement cloud security best practices: IAM, VPC design, encryption, secrets management Build enterprise security and compliance strategies for AI/ML workloads Ensure solutions meet regulatory requirements (PCI-DSS, GDPR, SOC2, MAS TRM, etc where applicable) Conduct security reviews and implement remediation strategies

Cost & Performance Optimization:



Optimize cloud spend for compute-intensive AI workloads Implement spot instance strategies, auto-scaling, and resource scheduling Monitor and optimize GPU utilization, inference latency, and throughput Perform cost analysis and implement cost-saving measures

Operations & Support:



Implement disaster recovery procedures for AI models and training data Manage backup strategies and business continuity planning Troubleshoot and resolve production issues in AI infrastructure Provide technical guidance to project teams during implementation

Skills


==========

Cloud Architecture & Design:



Strong solution architecture skills with ability to translate business requirements into technical designs Experience in Well Architected review and remediation Deep understanding of AWS services, particularly compute, storage, networking, and AI/ML services Experience designing scalable, highly available, and fault-tolerant systems Ability to create clear architecture diagrams and technical documentation Cost modeling and ROI calculation capabilities

Technical Leadership:



Comfortable leading technical discussions with clients and stakeholders Ability to guide engineers and share knowledge effectively Strong problem-solving and analytical thinking skills Experience with architectural decision-making and trade-off analysis

Programming & Development:



Advanced Python

programming: object-oriented design, async programming, testing API development with

FastAPI, Flask

, or similar frameworks Frontend development basics: React, etc (for prototypes and demos with AI code generation tools) Shell scripting for automation and deployment Git version control and collaborative development workflows

Infrastructure as Code:



AWS CDK

(required) - CloudFormation experience is valuable

Terraform

(highly preferred) for multi-cloud or hybrid scenarios Understanding of IaC best practices: modularity, reusability, testing Experience with infrastructure testing and validation frameworks

AI/ML Infrastructure:



Hands-on experience with

AWS SageMaker

: training jobs, endpoints, pipelines, notebooks Understanding of ML lifecycle: data preparation, training, deployment, monitoring Experience with

GPU management

and optimization for training/inference Knowledge of containerization for ML models (Docker, container registries) Familiarity with ML frameworks: PyTorch, TensorFlow, LangChain, Llamaindex, etc

DevOps & Automation:



CI/CD pipeline design and implementation (GitHub Actions, GitLab CI, AWS CodePipeline) Container orchestration:

Docker

, Kubernetes, Amazon ECS Configuration management and deployment automation Monitoring and observability: CloudWatch, Prometheus, Grafana, ELK stack

Communication & Collaboration:



Excellent written and verbal communication in

Advanced English

Ability to explain complex technical concepts to non-technical stakeholders Comfortable with client-facing presentations and technical demos Strong documentation skills with attention to detail Collaborative mindset with ability to work across functional teams

Problem-Solving:



Advanced task breakdown and estimation abilities Debugging and troubleshooting complex distributed systems Performance optimization and tuning Incident response and root cause analysis

Knowledge


-------------

AWS Cloud Platform (Required):



AWS Certified Solutions Architect Associate

(minimum requirement)

AWS Certified Solutions Architect Professional

or

AWS Certified Machine Learning - Specialty

(highly preferred) Deep knowledge of core AWS services:

+

Compute

: EC2, Lambda, ECS, EKS, SageMaker
+

Storage

: S3, EFS, EBS, FSx
+

Networking

: VPC, Route53, CloudFront, API Gateway, Load Balancers
+

AI/ML

: SageMaker, Bedrock, Rekognition, Textract, Comprehend, Lex, Polly
+

Data

: RDS, DynamoDB, Redshift, Glue, Athena, Kinesis
+

Security

: IAM, KMS, Secrets Manager, Security Hub, GuardDuty
+

DevOps

: GitHub Action, CodePipeline, CodeBuild, CodeDeploy, CloudFormation, CDK, Terraform

AI/ML Technologies:



Understanding of machine learning concepts and model training/deployment lifecycle Familiarity with

Generative AI

technologies: LLMs, RAG, vector databases, prompt engineering Knowledge of ML frameworks and libraries: PyTorch, TensorFlow, scikit-learn, pandas, numpy Experience with

MLOps

practices and tools Understanding of model serving patterns: real-time vs batch inference

Software Development:



Modern software development practices: testing, code review, documentation API design principles: RESTful, GraphQL, event-driven architectures Database design and optimization: SQL and NoSQL Authentication and authorization: OAuth, JWT, IAM

DevOps & Infrastructure:



Linux/UNIX system administration Networking fundamentals: TCP/IP, DNS, HTTP/HTTPS, load balancing Security best practices for cloud environments Disaster recovery and business continuity planning

Industry Knowledge:



Understanding of cloud consulting delivery models Familiarity with agile/scrum methodologies Awareness of compliance frameworks: GDPR, HIPAA, SOC2, ISO27001 Knowledge of FinTech, or other regulated industries (plus)

Additional Knowledge (Preferred):



Azure or GCP certifications and experience Multi-cloud architecture patterns Serverless architecture patterns Data engineering and data lake design Cost optimization strategies and FinOps practices

Experience


--------------

Cloud Engineering & Architecture:



5+ years

in cloud engineering, DevOps, or solution architecture roles

3+ years

hands-on experience with AWS services and architecture Proven track record of designing and implementing cloud solutions from scratch Experience with both greenfield projects and cloud migration initiatives

AI/ML Infrastructure:



2+ years

working with AI/ML workloads on cloud platforms Hands-on experience deploying and managing ML models in production Experience with GPU-based compute for training or inference Understanding of AI/ML infrastructure challenges and optimization techniques

Infrastructure as Code:



3+ years

building infrastructure using IaC tools (AWS CDK, Terraform, CloudFormation) Experience creating reusable IaC modules and components Track record of infrastructure automation and standardization

Software Development:



4+ years

programming experience in Python (required) Experience building APIs with FastAPI, Flask, or similar frameworks History of creating prototypes, MVPs, or PoC applications Comfortable with full-stack development for demos and prototypes

DevOps & Automation:



3+ years

implementing CI/CD pipelines and deployment automation Experience with containerization (Docker) and orchestration (Kubernetes/ECS) Linux/UNIX system administration experience Monitoring and observability implementation

Client-Facing Work:



Experience gathering requirements and translating them into technical solutions History of presenting technical architectures to clients and stakeholders Participation in presales activities, demos, or technical workshops Ability to work directly with customers to solve complex problems

Industry Experience (Preferred):



Consulting or professional services background Experience in regulated industries (FinTech, Insurance, Banks) Work with enterprise clients on large-scale implementations * Startup or fast-paced environment experience

Beware of fraud agents! do not pay money to get a job

MNCJobz.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.


Related Jobs

Job Detail

  • Job Id
    JD1642707
  • Industry
    Not mentioned
  • Total Positions
    1
  • Job Type:
    Full Time
  • Salary:
    Not mentioned
  • Employment Status
    Permanent
  • Job Location
    SG, Singapore
  • Education
    Not mentioned