digital@thrayait.com +60162650525, +919043703606

Training Information

Azure Databricks with Pyspark

We are pleased to offer a comprehensive suite of training solutions tailored to meet your needs. Our services encompass both online and offline corporate training options, ensuring flexibility and accessibility for your team's professional development.

Click Here for Enquiry Form

Course Content

ADB with PYSPARK

 

Module 1: Cloud Computing Concepts

What is the "Cloud" ?

Why cloud services

Types of cloud models

Deployment Models

private Cloud deployment model

public Cloud deployment model

hybrid cloud deployment model

Microsoft Azure,

Amazon Web Services,

Google Cloud Platform

characteristics of cloud computing

On-demand self-service

Broad network access

Multi-tenancy and resource pooling

Rapid elasticity and scalability

Measured service

Cloud Data Warehouse Architecture

Shared Memory architecture

Shared Disk architecture

Shared Nothing architecture

 

Module 2: Core Azure Services

Core Azure Architectural components

Core Azure Services and Products

Azure solutions

Azure management tools

 

Module 3: Security, Privacy, Compliance

Securing network connectivity

Core Azure identity services

Security tools and features

Azure Governance methodologies

Monitoring and reportings

Privacy, compliance, and data protection standards

 

Module 4: Azure Pricing and Support

Azure subscriptions

Planning and managing costs

Azure support options

Azure Service Level Agreements (SLAs)

Service Lifecycle in Azure

 

Module 5: Introduction to Azure Databricks

Introduction to Databricks

Azure Databricks Architecture

Azure Databricks Main Concepts

 

Module 6: Azure Databricks Account Creation

Azure Free Account

Free Subscription for Azure Databricks

Create Databricks Community Edition Account

 

Module 7: Databricks Cluster Types and Notebook Options

Creating and configuring clusters

create Notebook

quick tour on notebook options

 

Module 8: Databricks Utilities and Notebook Parameters

Dbutils commands on files, directories

Notebooks and libraries

Databricks Variables

Widget Types

Databricks notebook parameters

 

Module 9: Databricks CLI

Azure Databricks CLI Installation

Databricks CLI - DBFS, Libraries and Jobs

 

Module 10: Databricks Integration with Azure Blob Storage

Read data from Blob Storage and Creating Blob mount point

 

Module 11: Databricks Integration with Azure Data Lake Storage Gen2

Reading files from Azure Data Lake Storage Gen2

 

Module 12: Databricks Integration with Azure Data Lake Storage Gen1

Reading Files from data lake storage Gen1

 

Module 13: Reading and Writing CSV files in Databricks

Read CSV Files

Read TSV Files and PIPE Seperated CSV Files

Read CSV Files with multiple delimiter in spark 2 and spark 3

Reading different position Multidelimiter CSV files

 

Module 14: Reading and Writing Parquet files in Databricks

Read Parquet files from Data Lake Storage Gen2

Reading and Creating Partition files in Spark

 

Module 16: Parsing Complex Json FilesL

Reading and Writing JSON Files

Reading, Transforming and Writing Complex JSON files

 

Module 17: Reading and Writing ORC and Avro Files

Reading and Writing ORC and Avro Files

 

Module 19: Databricks Integration with Azure Synapse

Reading and Writing Azure Synapse data from Azure Databricks

 

Module 20: Databricks Integration with Amazon Redshift(Redshift)

Read and Write data from Redshift using databricks

 

Module 21: Databricks Integration with Snowflake

Reading and Writing data from Snowflake

 

Module 22: Databricks Integration with CosmosDB SQL API

Reading and Writing data from Azure CosmosDB Account

 

Module 23: Python Introduction

Python Introduction

Installation and setup

Python Data Types for Azure Databricks

 

Module 24: Python Data Types

Deep dive into String Data Types in Python for Azure Databricks

Deep dive into python collection list and tuple

Deep dive on set and dict data types in python

 

Module 25: Python Functions and Arguments

Python Functions and Arguments

Lambda Functions

 

Module 26: Python Modules and Packages

Python Modules and Packages

 

Module 27: Python Flow Control

Python Flow Control

For-Each

While

 

Module 28: Python File Handling

Python File Handling

 

Module 29: Python Logging Module

Python Logging Module

 

Module 30: Python Exception Handling

Python Exception Handlings

 

Module 31: Pyspark Introduction

Pyspark Introduction

Pyspark Components and Features

 

Module 32: Spark Architecture and Internals

Apache Spark Internal architecture

jobs stages and tasks

Spark Cluster Architecture Explained

 

Module 33: Spark RDD

Different Ways to create RDD in Databricks

Spark Lazy Evaluation Internals & Word Count Program

RDD Transformations in Databricks & coalesce vs repartition

RDD Transformation and Use Cases

 

Module 34: Spark SQL

Spark SQL Introduction

Different ways to create DataFrames

 

Module 35: Spark SQL Intenals

Catalyst Optimizer and Spark SQL Execution Plan

Deep dive on Sparksession vs sparkcontext

spark SQL Basics part-1

RDD Transformation and Use Cases

 

Module 36: Spark SQL Basics

Spark SQL Basics Part-2

Joins in Spark SQL

 

Module 37: Spark SQL Functions and UDFs

Spark SQL Functions part-1

Spark SQL Functions part-2

Spark SQL Functions Part-3

Spark SQL UDFs

Spark SQL Temp tables and Joins

 

Module 38: Databricks Delta and Implementing Dimensions SCD1 and SCD2

Implementing SCD Type1 and Apache Spark Databricks Delta

Delta Lake in Azure Databricks

Implementing SCD Type with and without Databricks Delta

 

Module 39: Databricks Integration with Azure Data Factory

Azure Data Factory Integration with Azure Databricks

 

Module 40: Databricks Streaming

Delta Streaming in Azure Databricks

Data Ingestion with Auto Loader in Azure Databricks

 

Module 41: Azure Databricks Projects

Azure Databricks Project-1

Azure Databricks Project-2

 

Module 42: Databricks Integration with Azure Devops

Azure Databricks CICD Pipelines