Follow our cloud data engineer curriculum
and boost your career!

Eligible CPF and multi-financing up to 100%

To be recalled Access to the programme

Approach 3P

Ready to take off
Full immersion
Ready to perform

Our training centre guides you in identifying the ideal training, helping you maximize funding opportunities.
We put all the keys in hand for a start with confidence.

Experience an immersive and intensive training experience, designed to dive into practical workshops and real case studies.
Learn by doing, and develop concrete skills directly applicable to your future projects.

At the end of your career, we evaluate your acquired skills, issue certification attesting to your expertise, and accompany you to ensure your success in your professional projects.
You are now ready to excel!

Description of the training

In-depth training on the design, optimization and management of data pipelines, covering essential skills such as data engineering with Python, SQL, massive data processing (Big Data), data integration via tools such as Apache Spark and Kafka, as well as cloud architecture with platforms such as AWS, Azure or Google Cloud.

Objectives of training

At the end of this training, participants will be able to:

  • Mastering the databases of data engineering: Understanding the fundamental principles of data pipelines, including architecture, integration, transformation and data storage.
  • Learn how to use powerful tools for mass data processing: Mastering technologies such as Apache Spark and Apache Kafka for parallel processing and real-time data integration.
  • Optimize data pipeline performance and safety: Acquire the skills needed to optimize, secure and monitor data pipelines throughout their life cycle.
  • Manage workflows with orchestration tools: Know how to use tools like Airflow or Prefect to automate and orchestrate tasks and processes in data pipelines.
  • Design and deploy a complete data pipeline: Be able to create an end-to-end data pipeline, from collection to analysis, to performance optimization and error management in the production environment.


Who is this training for?

The training is aimed at a wide audience, including:

  • Developers and IT engineers wishing to specialize in data management.
  • Data analysts who want to deepen their skills in managing and processing large data volumes.
  • Early data scientists wanting to master the data infrastructure to prepare their models.
  • Database administrators wishing to expand their skills to complex data systems.
  • Cloud Computing professionals seeking to understand data architectures in the cloud.
  • Young graduates or persons undergoing retraining interested in the field of Data Engineering.
  • Technical or CTO managers wishing to better supervise data management projects in their company.

Prerequisites

No specific prerequisites are required.


Training programme

Day 1-2: Introduction to Data Engineering

  • Objective: To understand the basic principles of data pipelines, their architecture and operation.
Introduction to data pipelines
  • Data pipeline principles: Architecture, data flow, integration, transformation and storage.
  • Key concepts: ETL vs ELT, structured and unstructured data management.
  • Introduction to Apache Kafka and Apache Spark for massive data processing.
Basic tools for data engineering
  • Python for data management with Pandas: Handling, cleaning, and data transformation.
  • Introduction to SQL: Selection, joins, aggregations, query optimization.
  • Presentation of Numpy and Matplotlib for calculations and visualizations of data.
Day 3-4: Introduction to Apache Spark and Kafka
  • Purpose: Learn to use Apache Spark for parallel processing and massive data.
Apache Spark and its use
  • Installation of Spark, RDD and DataFrame: Differences and their use for data processing.
  • Spark operations: map, filter, reduce, groupBy and performance optimization.
  • Cache and partitioning to speed up the processing of massive data.
Kafka for real-time data stream integration
  • Kafka architecture: Producers, consumers, brokers, topics, scores.
  • Using Kafka Streams to manage data in real time.
  • Integration of Kafka with Spark for streaming data processing.
Day 5-6: Optimization of data pipelines
  • Objective: Learn to optimize the performance of data pipelines and secure data flows.
Optimizing pipeline performance
  • Resource management, data partitioning and parallelism to improve performance.
  • Best practices for securing data pipelines: Authentication, encryption and error management.
Securing and monitoring data pipelines
  • Data integrity monitoring and error management in data pipelines.
  • Use of monitoring tools to ensure robust and efficient pipelines.
Day 7: Data Pipeline Orchestration and Management
  • Objective: Learn how to manage workflows with orchestration tools.
Introduction to pipeline orchestration
  • Using tools like Apache Airflow, Luigi or Prefect to orchestrate data pipelines.
  • Automation of workflows and management of dependencies between tasks.
Error management and data quality
  • Ensure data quality in pipelines: Validation and cleaning of input data.
  • Error Management: Capture and manage anomalies in automated pipelines.
Day 8: Final Project - Creation of a complete data pipeline
  • Purpose: Deploy an end-to-end data pipeline using Kafka, Spark, and orchestration tools.
Data pipeline design and development
  • Design of a data pipeline by integrating the tools studied: Data collection, transformation and analysis.
Deployment and management of the pipeline in production
  • Pipeline optimization: Performance, error management, and scalability in a production environment.
  • Real-time workflow management with Kafka and massive data processing with Spark.


Training assets

  • Pedagogical and modular approach: Alternative between theory and practice for better assimilation of concepts.
  • Cloud Integration: Strong focus on cloud and distributed solutions.
  • Qualified speakers: Specialist trainers with practical experience in the field.
  • Educational tools and materials: Access to online resources, live demonstrations and real-life case studies.
  • Accessibility: Training is open to all, without advanced technical prerequisites.
  • Implementation: Complete project from the end of the modules to consolidate the achievements.
  • Preparation for Industry: Focus on standard certifications and tools used in the professional environment.


Pedagogical methods and tools used

  • Live demonstrations with data engineering services.
  • Practical workshops and real case studies in various sectors (industry, trade, health).
  • Feedback: Sharing best practices and common mistakes in business.
  • Simulations and tools: Using simulators for interactive workshops.


Evaluation

  • End of training QCM to test the understanding of the concepts addressed.
  • Practical case studies or group discussions to apply the knowledge gained.
  • Ongoing evaluation during practical sessions.
  • Implementation: Complete project from the end of the modules to consolidate the achievements.


Normative References

  • Well-Architected Cloud Framework.
  • GDPR (General Data Protection Regulation).
  • ISO 27001, SOC 2 (Service Organization Control).
  • NIST Cybersecurity Framework.

Modalities

Inter-company or remote
Intra-enterprise

Inter-company or remote

Duration:18 days

Price:€10000

More details Contact us

Intra-enterprise

Duration and program can be customized according to your company's specific needs

More details Contact us
💬
FAQ Assistant