PS

Phetho Silas Mokgalapa

Senior Data Engineer

ZA

About

Senior Data Engineer with 5+ years of progressive experience architecting and implementing enterprise-scale data solutions. Proven expertise in designing robust data warehouse architectures, building high-performance ETL/ELT pipelines, and leading data transformation initiatives. Specialized in cloud-native data platforms, real-time streaming architectures, and advanced workflow orchestration. Successfully migrated legacy mainframe systems to modern cloud infrastructure.

Skills

Enterprise data warehouse designData Vault 2.0 modelingdimensional modelingdata lake architecturesmicroservices-based data platforms
Apache SparkPySparkScalaApache Airflow 3.0dbt Clouddbt Corecustom Python frameworksApache Kafkareal-time streaming
SnowflakeAWS RDSAWS RedshiftAWS S3AWS LambdaAWS GlueAzure Data FactoryGoogle BigQuerymulti-cloud data strategies
PythonScalaJavaSQLSnowflake SQLPostgreSQLMSSQLHiveRBashShell scripting
Apache AirflowAutomic Enterprise SchedulerAzure Data FactoryAWS Step Functions
dbt Clouddbt CoreSpark SQLadvanced SQL optimizationdata quality frameworks
DockerKubernetesJenkinsGitLab CI/CDInfrastructure as CodeTerraformautomated testing frameworks
Query tuningpartition strategiesindexing optimizationcost optimizationmonitoring and alerting
Technical mentoringcross-functional team leadershipstakeholder managementagile methodologies
ClouderaWhereScape 3DSAP HANA WebIDE
Machine LearningLSTM neural networkspredictive analyticsMLOps
PowerBITableau
automated document processing

Projects

Experience

Senior Data Engineer

Sanlam

Led enterprise-wide data migration initiative from legacy DB2 mainframe to modern Cloudera ecosystem. Architected and implemented Data Vault 2.0 methodology using WhereScape 3D. Designed and deployed automated Spark-based ETL pipelines. Built comprehensive data ingestion framework for Cloudera-to-Snowflake migration, enabling real-time analytics capabilities. Implemented automated scheduling solutions with Automic Enterprise Scheduler. Created Information Mart layer using SAP HANA WebIDE, serving business users with sub-second query performance. • Led enterprise-wide data migration initiative from legacy DB2 mainframe to modern Cloudera ecosystem • Architected and implemented Data Vault 2.0 methodology using WhereScape 3D • Designed and deployed automated Spark-based ETL pipelines • Built comprehensive data ingestion framework for Cloudera-to-Snowflake migration, enabling real-time analytics capabilities • Implemented automated scheduling solutions with Automic Enterprise Scheduler • Created Information Mart layer using SAP HANA WebIDE, serving business users with sub-second query performance • Technologies: Apache Spark, Python, Scala, Snowflake, Cloudera, Apache Airflow, WhereScape, SAP HANA, Automic

2021-10 - Present

Data Scientist & Systems Developer trainee

Mindworx Consulting (Pty) Ltd

Architected and developed enterprise Python REST API for SaaS platform, serving 10,000+ concurrent users. Built advanced Spark ETL pipeline for invoice analysis. Developed machine learning models for predictive analytics, achieving 92% accuracy in house price predictions. Created intelligent RPA system using Python and UiPath for automated document processing. Led data visualization initiatives using PowerBI and Tableau for NGO crime analysis, impacting policy decisions. Implemented LSTM neural networks for financial forecasting with 85% prediction accuracy. • Architected and developed enterprise Python REST API for SaaS platform, serving 10,000+ concurrent users • Built advanced Spark ETL pipeline for invoice analysis • Developed machine learning models for predictive analytics, achieving 92% accuracy in house price predictions • Created intelligent RPA system using Python and UiPath for automated document processing • Led data visualization initiatives using PowerBI and Tableau for NGO crime analysis, impacting policy decisions • Implemented LSTM neural networks for financial forecasting with 85% prediction accuracy • Technologies: Python, Apache Spark, Machine Learning, RPA

2020-01 - 2021-12

Education

University of The Free State

BSc in Actuarial Science

2016-01 - 2020-12

University of Limpopo

BSc in Mathematical Science

2013-01 - 2015-12