Hi, I'm Shafay Amjad

Lead Data Engineer | GenAI Specialist | ML Platform Architect

Building enterprise-scale data platforms processing billions of events daily. Expertise in AWS/GCP, Airflow, DBT, Spark, and cutting-edge GenAI solutions with Bedrock, LangChain, and HuggingFace. 10+ certifications, 9+ years delivering measurable business impact at Meta, Amazon, Delivery Hero, and more.

def build_data_platform():
    return {
        'role': 'Lead Data Engineer',
        'specialization': ['GenAI', 'ML Engineering'],
        'languages': [
            'Python', 'SQL', 'SPARQL',
            'R', 'Scala'
        ],
        'cloud': ['AWS', 'GCP', 'Serverless'],
        'architecture': [
            'Lakehouse', 'Delta Lake',
            'Data Mesh', 'Event-Driven'
        ],
        'data_stack': [
            'Airflow', 'DBT', 'Databricks',
            'Spark', 'Kafka', 'Iceberg'
        ],
        'modern_tools': [
            'Great Expectations', 'Dagster',
            'Trino', 'dbt Cloud'
        ],
        'genai_ml': [
            'Bedrock', 'SageMaker',
            'PyTorch', 'QuickSight'
        ],
        'visualization': [
            'QuickSight', 'Looker',
            'Streamlit', 'Metabase'
        ],
        'scale': 'billions of events/day',
        'impact': 'measurable revenue growth'
    }

Scroll Down

About Me

Lead Data Engineer & GenAI Specialist

I'm a Lead Data Engineer and GenAI specialist based in Berlin, Germany, with 9+ years of experience architecting enterprise-scale data platforms and intelligent systems. Working across top-tier companies like Meta, Amazon, Delivery Hero, Goldman Sachs, and leading enterprises, I've built solutions processing billions of events daily, driving measurable revenue growth and operational excellence.

Data Engineering Excellence: I design and deploy metadata-driven data pipelines using Airflow, DBT, Spark, and Kafka, implementing end-to-end data governance with lineage tracking and automated lifecycle management. My expertise spans both AWS (Lambda, Glue, Redshift, S3, SageMaker, Bedrock) and GCP (BigQuery, Cloud Functions, Dataflow), with infrastructure as code using Terraform and orchestration via Kubernetes.

GenAI & Machine Learning: Leading the charge in GenAI adoption, I've integrated Amazon Bedrock, LangChain, and HuggingFace to deploy production-ready LLM-powered AI agents, chatbots, and RAG systems. I build production ML pipelines using TensorFlow, PyTorch, and SageMaker, delivering recommendation engines, predictive analytics, and real-time intelligent automation at scale.

9+

Years Experience

13+

Certifications

6

Major Companies

Billions

Events Processed

Profile Views

Data Engineering at Scale

Airflow, DBT, Spark, Kafka/Kinesis • Processing billions of events daily • Metadata-driven pipelines with full governance

GenAI & LLM Engineering

Amazon Bedrock, LangChain, HuggingFace • Production RAG systems • AI agents & chatbots for automation

Multi-Cloud Architecture

AWS (Lambda, Glue, Redshift, Bedrock, SageMaker) • GCP (BigQuery, Dataflow, VertexAI) • Terraform IaC

ML/Data Science Production

TensorFlow, PyTorch, SageMaker • Real-time ML pipelines • Recommendation engines • Predictive analytics

Professional Experience

Lead Data Engineer

Orion S.A.

Jan 2024 - Present
  • Architected and deployed scalable, serverless data platform using AWS, centralizing analytics across manufacturing, operations, and finance
  • Designed metadata-driven pipelines with DBT and Airflow, ensuring end-to-end data governance and lineage tracking
  • Led GenAI adoption by integrating Amazon Bedrock, LangChain, and Streamlit for LLM-powered AI agents and chatbots
  • Enhanced supply chain analytics with optimized streaming/batch pipelines, driving measurable revenue gains
AWS Lambda Redshift Bedrock DBT Terraform Kubernetes LangChain

Senior Data Engineer

Delivery Hero SE

Jan 2022 - Jan 2024
  • Designed and maintained scalable data pipelines across AWS & GCP, orchestrating batch and streaming workflows with Apache Airflow
  • Built secure, production-ready APIs delivering real-time analytics to global teams
  • Led infrastructure modernization with Terraform-based IaC, improving deployment consistency across GCP
  • Developed large-scale analytics solutions with BigQuery and Spark, processing billions of events daily
GCP BigQuery Airflow Terraform Spark DBT Looker

Senior Data Engineer

Meta

2021 - 2022
  • Designed and implemented large-scale data pipelines supporting Meta's product analytics and insights
  • Built high-performance data infrastructure processing petabytes of data across distributed systems
  • Collaborated with cross-functional teams to deliver data solutions for product development and optimization
  • Optimized data workflows and query performance for business-critical metrics and reporting
Python Spark Presto Hive SQL Big Data

Senior Data Engineer

Amazon

Sep 2020 - Jan 2022
  • Led development of predictive modeling pipelines and ML-driven analytics to improve customer experience
  • Owned end-to-end delivery of BI and ML solutions using Redshift, SageMaker, Glue, Lambda, and Athena
  • Designed fully serverless workflows for real-time ingestion, transformation, and reporting
  • Architected multi-stream real-time data systems with Kafka and Kinesis for CX optimization
AWS SageMaker Redshift Kafka Kinesis Lambda

Senior Data Engineer

Goldman Sachs

Jun 2021 - Dec 2021
  • Developed enterprise-scale data pipelines for financial analytics and reporting
  • Implemented secure data processing workflows meeting strict compliance requirements
  • Collaborated with data science teams on ML model deployment and infrastructure
Python SQL Cloud Services Docker

Senior Software Engineer / Data Scientist

NorthBay Solutions LLC

Feb 2019 - Sep 2020
  • Built ETL pipelines and OCR automation for insurance and healthcare data systems
  • Developed domain-specific ML models and recommendation engines for SMEs and enterprise clients
  • Delivered graph-based and Alexa-integrated automation with LLM integration
  • Applied ML libraries (TensorFlow, PyTorch, scikit-learn) in production workflows
Python AWS TensorFlow PyTorch SageMaker

Machine Learning Engineer

NorthBay Solutions

Sep 2017 - Feb 2019
  • Delivered ML and analytics solutions via serverless deployments using AWS Lambda and SageMaker
  • Automated data warehouse processes and built voice-enabled LLM systems using Alexa and AWS
  • Worked with real-time data tools like Spark and Flink, and AWS services including EMR, S3, DynamoDB, and RDS
  • Built scalable models using scikit-learn, TensorFlow, Keras, and other core ML frameworks
Python AWS Lambda SageMaker Spark Flink Alexa

Data Integration GDC Intern

Teradata

Jul 2016 - Sep 2016
  • Proficient in writing and optimizing complex SQL queries and ETL pipelines
  • Developed procedures for integration of data warehouses in operative IT environments
  • Organized and maintained technical documentation for supporting solutions
SQL ETL Teradata Data Warehousing

Featured Projects & Achievements

Enterprise-scale data platforms and GenAI solutions delivering measurable business impact

Streaming

Multi-Stream Real-Time Data Platform

Architected multi-stream real-time data systems using Kafka and Kinesis for a leading e-commerce platform, processing billions of events daily to support customer experience optimization, predictive analytics, and dynamic metric dashboards across global markets.

Billions of Events/Day
Real-Time Insights
Kafka Kinesis Lambda DynamoDB Step Functions Python
GCP

Global Large-Scale Analytics Platform

Developed and optimized large-scale analytics solutions on GCP using BigQuery, Spark, and Cloud Functions for a global food delivery platform, processing billions of events daily across multiple international markets, driving millions in revenue through automated decision-making and real-time analytics.

Millions in Revenue
Global Markets
BigQuery Cloud Functions Dataflow Spark Terraform DBT Looker
ML

Production ML Pipelines & Recommendation Engines

Built and deployed production-grade ML pipelines and recommendation engines using SageMaker, TensorFlow, and PyTorch. Automated deep learning workflows for predictive analytics, customer segmentation, and targeted advertising strategies.

Improved CX
Targeted Advertising
SageMaker TensorFlow PyTorch MLflow Kubeflow Python
IaC

Terraform-Based Infrastructure Modernization

Led infrastructure modernization by introducing Terraform-based IaC across AWS and GCP, improving deployment consistency, reproducibility, and enabling automated multi-cloud infrastructure management at enterprise scale.

Automated Deployments
Enhanced Security
Terraform AWS GCP Kubernetes Docker GitHub Actions

Enterprise Technology Stack

Cloud Platforms & Infrastructure

AWS: Lambda, Glue, Redshift, S3, SageMaker, Bedrock, ECS, EKS, Step Functions, Athena, QuickSight GCP: BigQuery, Cloud Functions, Dataflow, VertexAI, Cloud Storage, Pub/Sub Architecture: Serverless, Microservices, Event-Driven, Lambda Architecture

Data Engineering & ETL

Orchestration: Apache Airflow, Dagster, Prefect, AWS Step Functions Transformation: DBT (Data Build Tool), dbt Cloud, Apache Spark, PySpark Streaming: Apache Kafka, Apache Flink, AWS Kinesis, Google Pub/Sub Data Integration: Fivetran, Workato, Airbyte, Stitch Lakehouse/Tables: Databricks, Delta Lake, Apache Iceberg, Apache Hudi Data Warehouses: Snowflake, Redshift, BigQuery, Trino/Presto Data Quality: Great Expectations, Monte Carlo, Soda, dbt Tests

GenAI & Large Language Models

LLM Platforms: Amazon Bedrock, OpenAI API, Anthropic Claude Frameworks: LangChain, LlamaIndex, Semantic Kernel Models: HuggingFace Transformers, GPT-4, Claude, Llama Techniques: RAG (Retrieval-Augmented Generation), Fine-tuning, Prompt Engineering Vector Stores: Pinecone, Weaviate, ChromaDB, FAISS

Machine Learning & Data Science

Deep Learning: TensorFlow, PyTorch, Keras, JAX ML Libraries: scikit-learn, XGBoost, LightGBM, CatBoost Data Science: pandas, Polars, NumPy, SciPy, Matplotlib, Seaborn MLOps: MLflow, Kubeflow, SageMaker Pipelines, Weights & Biases ML Platforms: AWS SageMaker, Databricks ML, VertexAI, AWS QuickSight ML NLP: NLTK, SpaCy, Transformers, Sentence-BERT

DevOps & Infrastructure as Code

IaC: Terraform, CloudFormation, Pulumi, CDK Containers: Docker, Kubernetes, ECS, EKS, GKE CI/CD: GitHub Actions, GitLab CI, CodePipeline, Jenkins Monitoring: CloudWatch, Datadog, Prometheus, Grafana

BI, Analytics & Automation

BI & Visualization: AWS QuickSight, Looker, Power BI, Tableau, Streamlit, Metabase Planning & Analytics: Anaplan, SAP, dbt Metrics iPaaS & Automation: Workato, MuleSoft, Apache NiFi, n8n Query Engines: Trino, Presto, DuckDB, Apache Drill Languages: Python, SQL, SPARQL, R, Scala, Bash

Skills & Technologies

Cloud Platforms

AWS
Google Cloud
Lambda
Redshift
BigQuery
SageMaker
Bedrock
Glue

Data Engineering

Airflow
DBT
Spark
Kafka
Kinesis
Fivetran
Snowflake
Looker

Programming Languages

Python
SQL
R
C++

Machine Learning & AI

TensorFlow
PyTorch
scikit-learn
LangChain
HuggingFace
Keras
NLTK
SpaCy

DevOps & Infrastructure

Terraform
Docker
Kubernetes
GitHub Actions
CI/CD

Data Visualization & BI

Streamlit
Looker
QuickSight
PowerBI

Certifications

13+ professional certifications across cloud platforms, data engineering, and ML/AI • View Credly Badges

AWS Solutions Architect

Associate

2020

AWS Cloud Practitioner

Foundational

2020

GCP Associate Cloud Engineer

ACG

2023

GCP Data Engineer

ACG

2023

HashiCorp Terraform

Associate

2023

HashiCorp Infrastructure

Automation (ACG)

2023

Databricks GenAI

Fundamentals

2023

Databricks Spark

Apache Spark

2023

Airflow Certified

DAG Auth

2023

Statistics & Deep Learning

Specialization

2019

Python & Data Science

Specialization

2016

GitHub Activity

Open source contributions, stats, and popular repositories

Profile Stats

GitHub Stats

Most Used Languages

Top Languages

Contribution Streak

GitHub Streak
GitHub Trophies
Octocat Daft Punk Octocat Daft Punk Octocat Robotocat

Get In Touch

I'm always open to discussing new opportunities, collaborations, or just having a chat about data engineering and AI.

Location

Berlin, Germany

Let's Build Something Amazing Together

Whether you're looking for a lead data engineer, need consultation on cloud architecture, or want to collaborate on innovative AI projects, I'd love to hear from you.