digital@thrayait.com +60162650525, +919043703606

Training Information

Google Cloud

We are pleased to offer a comprehensive suite of training solutions tailored to meet your needs. Our services encompass both online and offline corporate training options, ensuring flexibility and accessibility for your team's professional development.

Click Here for Enquiry Form

Course Content

Syllabus:

Google Cloud Platform Training

with Real-Time Projects

GCP Basics

GCP Introduction

Why we need Cloud.

Overview of Google Cloud Platform (GCP)

Key GCP Services and Products

Understanding Cloud Computing and its Benefits

How to create Free Tier Account in GCP

GCP Interfaces

Console

Navigating the GCP Console

Configuring the GCP Console for Efficiency

Using the GCP Console for Service Management

Shell

Introduction to GCP Shell

Command-line Interface (CLI) Basics

GCP Shell Commands for Service Deployment and Management

SDK

Overview of GCP Software Development Kits (SDKs)

Installing and Configuring SDKs

Writing and Executing GCP SDK Commands

GCP Locations

Regions

Understanding GCP Regions

Selecting Regions for Service Deployment

Impact of Region on Service Performance

Zones

Exploring GCP Zones

Distributing Resources Across Zones

High Availability and Disaster Recovery Considerations

Importance

Significance of Choosing the Right Location

Global vs. Regional Resources

Factors Influencing Location Decisions

GCP IAM & Admin

Identities

Introduction to Identity and Access Management (IAM)

Users, Groups, and Service Accounts

Best Practices for Identity Management

Roles

GCP IAM Roles Overview

Defining Custom Roles

Role-Based Access Control (RBAC) Implementation

Policy

Resource-based Policies

Understanding and Implementing Organization Policies

Auditing and Monitoring Policies

Resource Hierarchy

GCP Resource Hierarchy Structure

Managing Resources in a Hierarchy

Organizational Structure Best Practices

GCP Networking

VPC (Virtual Private Cloud)

Creating and Configuring VPCs

Subnetting and IP Address Management

Load Balancer

Types of Load Balancers in GCP

Configuring and Managing Load Balancers

Firewalls

GCP Firewall Rules

Network Security Best Practices

Compute Options

Google Compute Engine (GCE)

Introduction to GCE and Virtual Machines (VMs)

Creating and Configuring VM Instances

Custom Images and Snapshots

Google Kubernetes Engine (GKE)

Overview of Kubernetes and Container Orchestration

Deploying and Managing Containerized Applications on GKE

Kubernetes Clusters and Node Pools

Google App Engine (GAE)

Understanding the App Engine Platform

Deploying Applications with App Engine

Configuring App Engine Services

Cloud Functions

Serverless Computing with Cloud Functions

Writing and Deploying Serverless Functions

Triggers and Events in Cloud Functions

GCP Data Engineering Services

Google Cloud Storage

Introduction to Cloud Storage

Overview of Cloud Storage as a scalable and durable object storage service.

Understanding buckets and objects in Cloud Storage.

Use cases for Cloud Storage, such as data backup, multimedia storage, and website content delivery.

Cloud Storage Operations

Creating and managing Cloud Storage buckets.

Uploading and downloading objects to and from Cloud Storage.

Setting access controls and permissions for buckets and objects.

Data Transfer and Lifecycle Management

Strategies for efficient data transfer to and from Cloud Storage.

Implementing data lifecycle policies for automatic object deletion or archival.

Utilizing Transfer Service for large-scale data transfers.

Versioning and Object Versioning

Enabling and managing versioning for Cloud Storage buckets.

Understanding how object versioning works.

Use cases for object versioning in data resilience and recovery.

Integration with Other GCP Services

Integrating Cloud Storage with BigQuery for data analytics.

Using Cloud Storage as a data source for Dataflow and Dataproc.

Exploring options for serving static content on websites.

Best Practices and Security

Implementing best practices for optimizing Cloud Storage performance.

Securing data in Cloud Storage with encryption and access controls.

Monitoring and logging for Cloud Storage operations.

Cloud SQL

Introduction to Cloud SQL

Overview of Cloud SQL as a fully managed relational database service.

Supported database engines and use cases for Cloud SQL.

Creating and Managing Cloud SQL Instances

Creating MySQL or PostgreSQL instances.

Configuring database settings, users, and access controls.

Importing and exporting data in Cloud SQL.

Backups and High Availability

Configuring automated backups and performing manual backups.

Implementing high availability with failover replicas.

Strategies for restoring data from backups.

Scaling and Performance Optimization

Vertical and horizontal scaling options in Cloud SQL.

Performance optimization tips for database queries.

Monitoring and troubleshooting database performance.

Integration with Other GCP Services

Connecting Cloud SQL with App Engine, Compute Engine, and Kubernetes Engine.

Using Cloud SQL as a backend database for applications.

Data synchronization with Cloud Storage and BigQuery.

Security and Compliance

Implementing data encryption in transit and at rest.

Managing database user roles and permissions.

Ensuring compliance with industry standards.

Bigtable

Introduction to Bigtable

Overview of Bigtable as a fully managed NoSQL wide-column store.

Use cases for Bigtable in real-time analytics and IoT applications.

Key Concepts and Data Modeling

Understanding the key concepts of Bigtable: tables, rows, columns, and timestamps.

Designing effective data models for optimal performance.

Operations and Administration

Creating and managing Bigtable instances.

Configuring and monitoring clusters for performance.

Backing up and restoring data in Bigtable.

Integration with Data Processing Services

Integrating Bigtable with Dataflow and Dataproc for data processing.

Using Bigtable as a storage backend for Apache HBase.

Security Best Practices

Configuring access controls and permissions in Bigtable.

Implementing encryption for data at rest and in transit.

Auditing and monitoring for security compliance.

Advanced Topics

Exploring Bigtable replication for data redundancy.

Optimizing Bigtable performance for specific use cases.

Handling schema evolution and data migration.

BigQuery (SQL development)

Introduction to BigQuery

Overview of BigQuery as a fully managed, serverless data warehouse.

Use cases for BigQuery in business intelligence and analytics.

SQL Queries and Performance Optimization

Writing and optimizing SQL queries in BigQuery.

Understanding query execution plans and best practices.

Partitioning and clustering tables for performance.

Data Integration and Export

Loading data into BigQuery from Cloud Storage, Cloud SQL, and other sources.

Exporting data from BigQuery to various formats.

Real-time data streaming into BigQuery.

Access Controls and Security

Configuring access controls and permissions in BigQuery.

Implementing encryption for data in BigQuery.

Auditing and monitoring for security compliance.

Integration with Other GCP Services

Integrating BigQuery with Dataflow for ETL processes.

Using BigQuery in conjunction with Data Studio for visualization.

Building data pipelines with BigQuery and Composer.

DataProc (Pyspark Development)

Introduction to DataProc

Overview of DataProc as a fully managed Apache Spark and Hadoop service.

Use cases for DataProc in data processing and analytics.

Cluster Creation and Configuration

Creating and managing DataProc clusters.

Configuring cluster properties for performance and scalability.

Preemptible instances and cost optimization.

Running Jobs on DataProc

Submitting and monitoring Spark and Hadoop jobs on DataProc.

Use of initialization actions and custom scripts.

Job debugging and troubleshooting.

Integration with Storage and BigQuery

Reading and writing data from/to Cloud Storage and BigQuery.

Integrating DataProc with other storage solutions.

Performance optimization for data access.

Security and Access Controls

Configuring access controls for DataProc clusters.

Implementing encryption for data at rest and in transit.

Managing security configurations for DataProc.

Scaling and Automation

Autoscaling DataProc clusters based on workload.

Using Dataprep or other tools for data preparation before processing.

Automation and scheduling of recurring jobs.

DataFlow (Apache Beam development)

Introduction to DataFlow

Overview of DataFlow as a fully managed stream and batch processing service.

Use cases for DataFlow in real-time analytics and ETL.

Building Data Pipelines with Apache Beam

Writing Apache Beam pipelines for batch and stream processing.

Transformations and windowing concepts.

Error handling and testing of DataFlow pipelines.

Monitoring and Optimization

Monitoring and troubleshooting DataFlow pipelines.

Optimizing pipeline performance and resource utilization.

Utilizing DataFlow templates for reusable pipelines.

Integration with Other GCP Services

Integrating DataFlow with BigQuery, Pub/Sub, and other GCP services.

Real-time analytics and visualization using DataFlow and BigQuery.

Workflow orchestration with Composer.

Windowing and Watermarking

Understanding windowing concepts for stream processing.

Implementing watermarks for event time processing.

Handling late data and out-of-order events.

Security and Access Controls

Configuring access controls for DataFlow jobs.

Implementing encryption for data in transit and at rest.

Best practices for securing DataFlow pipelines.

Cloud Pub/Sub

Introduction to Pub/Sub

Understanding the role of Pub/Sub in event-driven architectures.

Key Pub/Sub concepts: topics, subscriptions, messages, and acknowledgments.

Creating and Managing Topics and Subscriptions

Using the GCP Console to create Pub/Sub topics and subscriptions.

Configuring message retention policies and acknowledgment settings.

Publishing and Consuming Messages

Writing and deploying code to publish messages to a topic.

Implementing subscribers to consume and process messages from subscriptions.

Error Handling and Retry Policies

Configuring error handling mechanisms.

Implementing retry policies for fault-tolerant message processing.

Integration with Other GCP Services

Connecting Pub/Sub with Cloud Functions for serverless event-driven computing.

Integrating Pub/Sub with Dataflow for real-time stream processing.

Monitoring and Logging

Setting up monitoring and logging for Pub/Sub.

Analyzing metrics and logs to troubleshoot and optimize message processing.

Cloud Composer (DAG Creations)

Introduction to Composer

Overview of Composer as a fully managed workflow orchestration service.

Use cases for Composer in managing and scheduling workflows.

Creating and Managing Workflows

Creating and configuring Composer environments.

Defining and scheduling workflows using Apache Airflow.

Monitoring and managing workflow executions.

Integration with Data Engineering Services

Orchestrating workflows involving BigQuery, DataFlow, and other services.

Coordinating ETL processes with Composer.

Integrating with external systems and APIs.

Extending and Customizing Composer

Extending Apache Airflow with custom operators and sensors.

Creating and managing Composer plugins.

Versioning and managing workflow code.

Security and Access Controls

Configuring access controls for Composer environments.

Implementing encryption for data and workflow metadata.

Best practices for securing Composer workflows.

Error Handling and Troubleshooting

Handling errors and retries in Composer workflows.

Debugging and troubleshooting failed workflow executions.

Logging and monitoring for Composer workflows.

Data Fusion

Introduction to Data Fusion

Overview of Data Fusion as a fully managed data integration service.

Use cases for Data Fusion in ETL and data migration.

Building Data Integration Pipelines

Creating ETL pipelines using the visual interface.

Configuring data sources, transformations, and sinks.

Using pre-built templates for common integration scenarios.

Integration with GCP and External Services

Integrating Data Fusion with BigQuery, Cloud Storage, and other GCP services.

Connecting to external databases, APIs, and data sources.

Real-time data integration and streaming support.

Versioning and Collaboration

Managing version control for Data Fusion pipelines.

Collaborating with team members on pipeline development.

Best practices for maintaining and updating pipelines.

Security and Access Controls

Configuring access controls for Data Fusion environments and pipelines.

Implementing encryption for data in transit and at rest.

Security considerations for handling sensitive data.

Monitoring and Optimization

Monitoring pipeline executions and job statuses.

Optimizing Data Fusion pipelines for performance.

Utilizing logs and metrics for troubleshooting.

Terraform

Terraform Basics

Installing and configuring Terraform.

Writing Terraform configurations using HashiCorp Configuration Language (HCL).

Initializing and applying Terraform configurations.

Infrastructure Provisioning

Creating and managing infrastructure resources using Terraform.

Terraform state and remote backends.

Importing existing infrastructure into Terraform.

Module and Provider Usage

Organizing Terraform configurations using modules.

Utilizing different providers for various cloud services.

Best practices for reusable and modular Terraform code.

Variables, Outputs, and Functions

Defining and using variables in Terraform.

Outputting values from Terraform configurations.

Terraform Workflow and Best Practices

Terraform workflows: plan, apply, and destroy.

Managing Terraform environments and workspaces.

GCP Data Engineering Projects

Data Analysis in BigQuery using SQL.

ETL case study with PySpark in Dataproc

Processing Streaming Data with Pub/Sub and Dataflow

Building Orchestration for Batch Data Loading Using Cloud Composer

By the End of the course What Students can Expect

Proficient in SQL Development:

Mastering SQL for querying and manipulating data within Google BigQuery and Cloud SQL.

Writing complex queries and optimizing performance for large-scale datasets.

Understanding schema design and best practices for efficient data storage.

Pyspark Development Skills:

Proficiency in using PySpark for large-scale data processing on Google Cloud.

Developing and optimizing Spark jobs for distributed data processing.

Understanding Spark's RDDs, DataFrames, and transformations for data manipulation.

Apache Beam Development Mastery:

Creating data processing pipelines using Apache Beam.

Understanding the concepts of parallel processing and data parallelism.

Implementing transformations and integrating with other GCP services.

DAG Creations with Cloud Composer:

Designing and implementing Directed Acyclic Graphs (DAGs) for orchestrating workflows.

Using Cloud Composer for workflow automation and managing dependencies.

Developing DAGs that integrate various GCP services for end-to-end data processing.

Architecture Planning:

Proficient in architecting end-to-end data solutions on GCP.

Understanding the principles of designing scalable, reliable, and cost-effective data architectures.

Certification Readiness

Prepare for the Google Cloud Professional Data Engineer (PDE) and

Associate Cloud Engineer (ACE) certifications through a combination of theoretical knowledge and hands-on experience.

The course will empower students with practical skills in SQL, PySpark, Apache Beam, DAG creations, and architecture planning, ensuring they are well-prepared to tackle real-world data engineering challenges and successfully obtain GCP certifications.