CSC 520 - Big Data and Its Applications

(FSU-CSC520.AJE1) / ISBN : 978-1-64459-793-4
Lessons
Lab
TestPrep
AI Tutor (Add-on)
Get A Free Trial

Skills You’ll Get

1

Introduction to the World of Big Data

  • Understanding Big Data
  • Evolution of Big Data
  • Failure of Traditional Database in Handling Big Data
  • 3 Vs of Big Data
  • Sources of Big Data
  • Different Types of Data
  • Big Data Infrastructure
  • Big Data Life Cycle
  • Big Data Technology
  • Big Data Applications
  • Big Data Use Cases
2

Big Data Storage Concepts

  • Cluster Computing
  • Distribution Models
  • Distributed File System
  • Relational and Non‐Relational Databases
  • Scaling Up and Scaling Out Storage
3

NoSQL Database

  • Introduction to NoSQL
  • Why NoSQL
  • CAP Theorem
  • ACID
  • BASE
  • Schemaless Databases
  • NoSQL (Not Only SQL)
  • Migrating from RDBMS to NoSQL
4

Big Data Processing, Management, and Cloud Computing

  • Part I: Big Data Processing and Management Conce...essing, Management Concepts, and Cloud Computing
  • Data Processing
  • Shared Everything Architecture
  • Shared‐Nothing Architecture
  • Batch Processing
  • Real‐Time Data Processing
  • Parallel Computing
  • Distributed Computing
  • Big Data Virtualization
  • Part II: Managing and Processing Big Data in Clo...essing, Management Concepts, and Cloud Computing
  • Introduction
  • Cloud Computing Types
  • Cloud Services
  • Cloud Storage
  • Cloud Architecture
5

Driving Big Data with Hadoop Tools and Technologies

  • Apache Hadoop
  • Hadoop Storage
  • Hadoop Computation
  • Hadoop 2.0
  • HBASE
  • Apache Cassandra
  • SQOOP
  • Flume
  • Apache Avro
  • Apache Pig
  • Apache Mahout
  • Apache Oozie
  • Apache Hive
  • Hive Architecture
  • Hadoop Distributions
6

Big Data Analytics

  • Terminology of Big Data Analytics
  • Big Data Analytics
  • Data Analytics Life Cycle
  • Big Data Analytics Techniques
  • Semantic Analysis
  • Visual analysis
  • Big Data Business Intelligence
  • Big Data Real‐Time Analytics Processing
  • Enterprise Data Warehouse
7

Big Data Analytics with Machine Learning

  • Introduction to Machine Learning
  • Machine Learning Use Cases
  • Types of Machine Learning
8

Mining Data Streams and Frequent Itemset

  • Itemset Mining
  • Association Rules
  • Frequent Itemset Generation
  • Itemset Mining Algorithms
  • Maximal and Closed Frequent Itemset
  • Mining Maximal Frequent Itemsets: the GenMax Algorithm
  • Mining Closed Frequent Itemsets: the Charm Algorithm
  • CHARM Algorithm Implementation
  • Data Mining Methods
  • Prediction
  • Important Terms Used in Bayesian Network
  • Density-Based Clustering Algorithm
  • DBSCAN
  • Kernel Density Estimation
  • Mining Data Streams
  • Time Series Forecasting
9

Cluster Analysis

  • Clustering
  • Distance Measurement Techniques
  • Hierarchical Clustering
  • Analysis of Protein Patterns in the Human Cancer‐Associated Liver
  • Recognition Using Biometrics of Hands
  • Expectation Maximization Clustering Algorithm
  • Representative‐Based Clustering
  • Methods of Determining the Number of Clusters
  • Optimization Algorithm
  • Choosing the Number of Clusters
  • Bayesian Analysis of Mixtures
  • Fuzzy Clustering
  • Fuzzy C‐Means Clustering
10

Big Data Visualization

  • Big Data Visualization
  • Conventional Data Visualization Techniques
  • Tableau
  • Bar Chart in Tableau
  • Line Chart
  • Pie Chart
  • Bubble Chart
  • Box Plot
  • Tableau Use Cases
  • Installing R and Getting Ready
  • Data Structures in R
  • Importing Data from a File
  • Importing Data from a Delimited Text File
  • Control Structures in R
  • Basic Graphs in R
11

The Python Data Science Stack

  • Introduction
  • Python Libraries and Packages
  • Using Pandas
  • Data Type Conversion
  • Aggregation and Grouping
  • Exporting Data from Pandas
  • Visualization with Pandas
  • Summary
12

Statistical Visualizations

  • Introduction
  • Types of Graphs and When to Use Them
  • Components of a Graph
  • Seaborn
  • Which Tool Should Be Used?
  • Types of Graphs
  • Pandas DataFrames and Grouped Data
  • Changing Plot Design: Modifying Graph Components
  • Exporting Graphs
  • Summary
13

Working with Big Data Frameworks

  • Introduction
  • Hadoop
  • Spark
  • Writing Parquet Files
  • Handling Unstructured Data
  • Summary
14

Diving Deeper with Spark

  • Introduction
  • Getting Started with Spark DataFrames
  • Writing Output from Spark DataFrames
  • Exploring Spark DataFrames
  • Data Manipulation with Spark DataFrames
  • Graphs in Spark
  • Summary
15

Handling Missing Values and Correlation Analysis

  • Introduction
  • Setting up the Jupyter Notebook
  • Missing Values
  • Handling Missing Values in Spark DataFrames
  • Correlation
  • Summary
16

Exploratory Data Analysis

  • Introduction
  • Defining a Business Problem
  • Translating a Business Problem into Measurable Metrics and Exploratory Data Analysis (EDA)
  • Structured Approach to the Data Science Project Life Cycle
  • Summary
17

Reproducibility in Big Data Analysis

  • Introduction
  • Reproducibility with Jupyter Notebooks
  • Gathering Data in a Reproducible Way
  • Code Practices and Standards
  • Avoiding Repetition
  • Summary
18

Creating a Full Analysis Report

  • Introduction
  • Reading Data in Spark from Different Data Sources
  • SQL Operations on a Spark DataFrame
  • Generating Statistical Measurements
  • Summary

1

Introduction to the World of Big Data

  • Discussing Big Data Characteristics
  • Discussing Big Data
2

Big Data Storage Concepts

  • Discussing Big Data Storage
3

NoSQL Database

  • Discussing the NoSQL Database
4

Big Data Processing, Management, and Cloud Computing

  • Implementing the Data Processing Cycle
  • Discussing Big Data Processing and Management Concepts - Part I
  • Discussing Big Data Processing and Management Concepts - Part II
5

Driving Big Data with Hadoop Tools and Technologies

  • Discussing Components of Hadoop
  • Discussing Big Data Using Hadoop Tools and Technologies
6

Big Data Analytics

  • Discussing Big Data Analytics
7

Big Data Analytics with Machine Learning

  • Discussing Machine Learning
8

Mining Data Streams and Frequent Itemset

  • Implementing Frequent Itemset Mining Using R
  • Determining the Support Count and Confidence Count
  • Implementing the Eclat Algorithm Using R
  • Implementing Apriori Algorithm Using R
9

Cluster Analysis

  • Implementing K-Means Clustering
10

Big Data Visualization

  • Creating a Connection in a New Workbook
  • Creating a Bar Chart
  • Creating a Line Chart
  • Creating a Pie Chart
  • Creating a Bubble Chart
  • Creating a Box Plot
  • Assigning Value to a Variable
  • Using the length(), mean(), and median() Functions
  • Using the matrix() Function
  • Using the if-else Statement
  • Using the for Loop
  • Using the while Loop
11

The Python Data Science Stack

  • Interacting with the Python Shell
  • Calculating the Square
  • Grouping a DataFrame
  • Applying a Function to a Column
  • Subsetting a DataFrame
  • Slicing and Subsetting
  • Reading Data from a CSV File
  • Viewing the Standard Deviation
  • Calculating the Median Value
  • Calculating the Mean Value
12

Statistical Visualizations

  • Plotting an Analytical Graph
  • Creating a Graph
  • Creating a Graph for a Mathematical Function
  • Creating a Line Graph Using Seaborn
  • Creating a Line Graph Using pandas
  • Creating a Line Graph Using matplotlib
  • Detecting Outliers
  • Displaying Histograms
  • Using a Box Plot
  • Constructing a Scatterplot
  • Plotting a Line Graph with Styles and Color
  • Configuring a Title and Labels for Axis Objects
  • Designing a Complete Plot
  • Exporting a Graph to a File on a Disk
13

Working with Big Data Frameworks

  • Performing DataFrame Operations in Spark
  • Accessing Data with Spark
  • Parsing Text in Spark
14

Diving Deeper with Spark

  • Creating a DataFrame Using a CSV File
  • Creating a DataFrame from an Existing RDD
  • Specifying the Schema of a DataFrame
  • Removing a Column from a DataFrame
  • Renaming a Column in a DataFrame
  • Adding a Column to a DataFrame
  • Creating a KDE Plot
  • Creating a Linear Model Plot
  • Creating a Bar Chart
15

Handling Missing Values and Correlation Analysis

  • Filtering Data
  • Counting Missing Values
  • Handling NaN Values
  • Using the Backward and Forward Filling Methods
  • Calculating Correlation Coefficient
16

Exploratory Data Analysis

  • Generating the Feature Importance of the Target Variable
  • Identifying the Target Variable
  • Plotting a Heatmap
  • Generating a Normal Distribution Plot
17

Reproducibility in Big Data Analysis

  • Performing Data Reproducibility
  • Preprocessing Missing Values with High Reproducibility
  • Normalizating the Data

Related Courses

All Courses
scroll to top