uCertify

CSC 520 - Big Data and Its Applications

Name: CSC 520 - Big Data and Its Applications
Brand: uCertify
SKU: 978-1-64459-793-4
Price: 279.99 USD

(FSU-CSC520.AJE1) / ISBN : 978-1-64459-793-4

Lessons

Lab

TestPrep

AI Tutor (Add-on)

Get A Free Trial

This course includes:

Free pre-assessment and first 2 lessons

18+ Interactive Lessons | 94+ Exercises

Accessible on mobile and tablet too

Certificate of completion

Are you an instructor?

Access detailed information about the course content, learning objectives, activities, and assessments before adding it to your curriculum.

Skills You’ll Get

Interactive Lessons

18+ Interactive Lessons 94+ Exercises | 156+ Quizzes | 207+ Flashcards | 207+ Glossary of terms

Gamified TestPrep

105+ Pre Assessment Questions | 105+ Post Assessment Questions |

Hands-On Labs

76+ LiveLab | 12+ Video tutorials | 20+ Minutes

Download Course Outline

Introduction to the World of Big Data

Understanding Big Data
Evolution of Big Data
Failure of Traditional Database in Handling Big Data
3 Vs of Big Data
Sources of Big Data
Different Types of Data
Big Data Infrastructure
Big Data Life Cycle
Big Data Technology
Big Data Applications
Big Data Use Cases

Big Data Storage Concepts

Cluster Computing
Distribution Models
Distributed File System
Relational and Non‐Relational Databases
Scaling Up and Scaling Out Storage

NoSQL Database

Introduction to NoSQL
Why NoSQL
CAP Theorem
ACID
BASE
Schemaless Databases
NoSQL (Not Only SQL)
Migrating from RDBMS to NoSQL

Big Data Processing, Management, and Cloud Computing

Part I: Big Data Processing and Management Conce...essing, Management Concepts, and Cloud Computing
Data Processing
Shared Everything Architecture
Shared‐Nothing Architecture
Batch Processing
Real‐Time Data Processing
Parallel Computing
Distributed Computing
Big Data Virtualization
Part II: Managing and Processing Big Data in Clo...essing, Management Concepts, and Cloud Computing
Introduction
Cloud Computing Types
Cloud Services
Cloud Storage
Cloud Architecture

Driving Big Data with Hadoop Tools and Technologies

Apache Hadoop
Hadoop Storage
Hadoop Computation
Hadoop 2.0
HBASE
Apache Cassandra
SQOOP
Flume
Apache Avro
Apache Pig
Apache Mahout
Apache Oozie
Apache Hive
Hive Architecture
Hadoop Distributions

Big Data Analytics

Terminology of Big Data Analytics
Big Data Analytics
Data Analytics Life Cycle
Big Data Analytics Techniques
Semantic Analysis
Visual analysis
Big Data Business Intelligence
Big Data Real‐Time Analytics Processing
Enterprise Data Warehouse

Big Data Analytics with Machine Learning

Introduction to Machine Learning
Machine Learning Use Cases
Types of Machine Learning

Mining Data Streams and Frequent Itemset

Itemset Mining
Association Rules
Frequent Itemset Generation
Itemset Mining Algorithms
Maximal and Closed Frequent Itemset
Mining Maximal Frequent Itemsets: the GenMax Algorithm
Mining Closed Frequent Itemsets: the Charm Algorithm
CHARM Algorithm Implementation
Data Mining Methods
Prediction
Important Terms Used in Bayesian Network
Density-Based Clustering Algorithm
DBSCAN
Kernel Density Estimation
Mining Data Streams
Time Series Forecasting

Cluster Analysis

Clustering
Distance Measurement Techniques
Hierarchical Clustering
Analysis of Protein Patterns in the Human Cancer‐Associated Liver
Recognition Using Biometrics of Hands
Expectation Maximization Clustering Algorithm
Representative‐Based Clustering
Methods of Determining the Number of Clusters
Optimization Algorithm
Choosing the Number of Clusters
Bayesian Analysis of Mixtures
Fuzzy Clustering
Fuzzy C‐Means Clustering

Big Data Visualization

Big Data Visualization
Conventional Data Visualization Techniques
Tableau
Bar Chart in Tableau
Line Chart
Pie Chart
Bubble Chart
Box Plot
Tableau Use Cases
Installing R and Getting Ready
Data Structures in R
Importing Data from a File
Importing Data from a Delimited Text File
Control Structures in R
Basic Graphs in R

The Python Data Science Stack

Introduction
Python Libraries and Packages
Using Pandas
Data Type Conversion
Aggregation and Grouping
Exporting Data from Pandas
Visualization with Pandas
Summary

Statistical Visualizations

Introduction
Types of Graphs and When to Use Them
Components of a Graph
Seaborn
Which Tool Should Be Used?
Types of Graphs
Pandas DataFrames and Grouped Data
Changing Plot Design: Modifying Graph Components
Exporting Graphs
Summary

Working with Big Data Frameworks

Introduction
Hadoop
Spark
Writing Parquet Files
Handling Unstructured Data
Summary

Diving Deeper with Spark

Introduction
Getting Started with Spark DataFrames
Writing Output from Spark DataFrames
Exploring Spark DataFrames
Data Manipulation with Spark DataFrames
Graphs in Spark
Summary

Handling Missing Values and Correlation Analysis

Introduction
Setting up the Jupyter Notebook
Missing Values
Handling Missing Values in Spark DataFrames
Correlation
Summary

Exploratory Data Analysis

Introduction
Defining a Business Problem
Translating a Business Problem into Measurable Metrics and Exploratory Data Analysis (EDA)
Structured Approach to the Data Science Project Life Cycle
Summary

Reproducibility in Big Data Analysis

Introduction
Reproducibility with Jupyter Notebooks
Gathering Data in a Reproducible Way
Code Practices and Standards
Avoiding Repetition
Summary

Creating a Full Analysis Report

Introduction
Reading Data in Spark from Different Data Sources
SQL Operations on a Spark DataFrame
Generating Statistical Measurements
Summary

Introduction to the World of Big Data

Discussing Big Data Characteristics
Discussing Big Data

Big Data Storage Concepts

Discussing Big Data Storage

NoSQL Database

Discussing the NoSQL Database

Big Data Processing, Management, and Cloud Computing

Implementing the Data Processing Cycle
Discussing Big Data Processing and Management Concepts - Part I
Discussing Big Data Processing and Management Concepts - Part II

Driving Big Data with Hadoop Tools and Technologies

Discussing Components of Hadoop
Discussing Big Data Using Hadoop Tools and Technologies

Big Data Analytics

Discussing Big Data Analytics

Big Data Analytics with Machine Learning

Discussing Machine Learning

Mining Data Streams and Frequent Itemset

Implementing Frequent Itemset Mining Using R
Determining the Support Count and Confidence Count
Implementing the Eclat Algorithm Using R
Implementing Apriori Algorithm Using R

Cluster Analysis

Implementing K-Means Clustering

Big Data Visualization

Creating a Connection in a New Workbook
Creating a Bar Chart
Creating a Line Chart
Creating a Pie Chart
Creating a Bubble Chart
Creating a Box Plot
Assigning Value to a Variable
Using the length(), mean(), and median() Functions
Using the matrix() Function
Using the if-else Statement
Using the for Loop
Using the while Loop

The Python Data Science Stack

Interacting with the Python Shell
Calculating the Square
Grouping a DataFrame
Applying a Function to a Column
Subsetting a DataFrame
Slicing and Subsetting
Reading Data from a CSV File
Viewing the Standard Deviation
Calculating the Median Value
Calculating the Mean Value

Statistical Visualizations

Plotting an Analytical Graph
Creating a Graph
Creating a Graph for a Mathematical Function
Creating a Line Graph Using Seaborn
Creating a Line Graph Using pandas
Creating a Line Graph Using matplotlib
Detecting Outliers
Displaying Histograms
Using a Box Plot
Constructing a Scatterplot
Plotting a Line Graph with Styles and Color
Configuring a Title and Labels for Axis Objects
Designing a Complete Plot
Exporting a Graph to a File on a Disk

Working with Big Data Frameworks

Performing DataFrame Operations in Spark
Accessing Data with Spark
Parsing Text in Spark

Diving Deeper with Spark

Creating a DataFrame Using a CSV File
Creating a DataFrame from an Existing RDD
Specifying the Schema of a DataFrame
Removing a Column from a DataFrame
Renaming a Column in a DataFrame
Adding a Column to a DataFrame
Creating a KDE Plot
Creating a Linear Model Plot
Creating a Bar Chart

Handling Missing Values and Correlation Analysis

Filtering Data
Counting Missing Values
Handling NaN Values
Using the Backward and Forward Filling Methods
Calculating Correlation Coefficient

Exploratory Data Analysis

Generating the Feature Importance of the Target Variable
Identifying the Target Variable
Plotting a Heatmap
Generating a Normal Distribution Plot

Reproducibility in Big Data Analysis

Performing Data Reproducibility
Preprocessing Missing Values with High Reproducibility
Normalizating the Data

Any questions?
Check out the FAQs

Still have unanswered questions and need to get in touch?

Related Courses

All Courses

Lab

CCNA 200-301 Pearson uCertify Network Simulator

ISBN: 9781616918378

200-301-SIMULATOR.AB1

Lessons AI Tutor

Accounting Course 101

ISBN: 9781644597002

ACCOUNT-WRKBK.AE1

Lessons Lab

Accounting All-in-One

ISBN: 9781644594490

ACCOUNTS.AE1

Lessons TestPrep

ACCUPLACER For Beginners

ISBN: 9781644595732

ACCUPLACER.AE1

Lessons TestPrep

ACT Prep 2024

ISBN: 9781644594889

ACT-PREP.AE1

Lessons Lab TestPrep

Mastering Active Directory

ISBN: 9781644595909

ACTV-DIRECT.AJ1

Lessons Lab AI Tutor

Adversarial Machine Learning

ISBN: 9798900590165

ADV-ML.AU1

This course includes:

Free pre-assessment and first 2 lessons

18+ Interactive Lessons | 94+ Exercises

Accessible on mobile and tablet too

Certificate of completion

Are you an instructor?

Access detailed information about the course content, learning objectives, activities, and assessments before adding it to your curriculum.

CSC 520 - Big Data and Its Applications

Are you an instructor?