Pandas for Everyone: Python Data Analysis

(CSUSB-PYTHON-PANDAS.AP1)
Lessons
Lab
TestPrep
AI Tutor (Add-on)
Get A Free Trial

Skills You’ll Get

1

Preface

  • Breakdown of the Course
  • How to Read This Course
  • Setup
2

Pandas DataFrame Basics

  • Introduction
  • Load Your First Data Set
  • Look at Columns, Rows, and Cells
  • Grouped and Aggregated Calculations
  • Basic Plot
  • Conclusion
3

Pandas Data Structures Basics

  • Create Your Own Data
  • The Series
  • The DataFrame
  • Making Changes to Series and DataFrames
  • Exporting and Importing Data
  • Conclusion
4

Plotting Basics

  • Why Visualize Data?
  • Matplotlib Basics
  • Statistical Graphics Using matplotlib
  • Seaborn
  • Pandas Plotting Method
  • Conclusion
5

Tidy Data

  • Columns Contain Values, Not Variables
  • Columns Contain Multiple Variables
  • Variables in Both Rows and Columns
  • Conclusion
6

Apply Functions

  • Primer on Functions
  • Apply (Basics)
  • Vectorized Functions
  • Lambda Functions (Anonymous Functions)
  • Conclusion
7

Data Assembly

  • Combine Data Sets
  • Concatenation
  • Observational Units Across Multiple Tables
  • Merge Multiple Data Sets
  • Conclusion
8

Data Normalization

  • Multiple Observational Units in a Table (Normalization)
  • Conclusion
9

Groupby Operations: Split-Apply-Combine

  • Aggregate
  • Transform
  • Filter
  • The pandas.core.groupby. DataFrameGroupBy object
  • Working With a MultiIndex
  • Conclusion
10

Missing Data

  • What Is a NaN Value?
  • Where Do Missing Values Come From?
  • Working With Missing Data
  • Pandas Built-In NA Missing
  • Conclusion
11

Data Types

  • Data Types
  • Converting Types
  • Categorical Data
  • Conclusion
12

Strings and Text Data

  • Introduction
  • Strings
  • String Methods
  • More String Methods
  • String Formatting (F-Strings)
  • Regular Expressions (RegEx)
  • The regex Library
  • Conclusion
13

Dates and Times

  • Python's datetime Object
  • Converting to datetime
  • Loading Data That Include Dates
  • Extracting Date Components
  • Date Calculations and Timedeltas
  • Datetime Methods
  • Getting Stock Data
  • Subsetting Data Based on Dates
  • Date Ranges
  • Shifting Values
  • Resampling
  • Time Zones
  • Arrow for Better Dates and Times
  • Conclusion
14

Linear Regression (Continuous Outcome Variable)

  • Simple Linear Regression
  • Multiple Regression
  • Models with Categorical Variables
  • One-Hot Encoding in scikit-learn with Transformer Pipelines
  • Conclusion
15

Generalized Linear Models

  • About This Lesson
  • Logistic Regression (Binary Outcome Variable)
  • Poisson Regression (Count Outcome Variable)
  • More Generalized Linear Models
  • Conclusion
16

Survival Analysis

  • Survival Data
  • Kaplan Meier Curves
  • Cox Proportional Hazard Model
  • Conclusion
17

Model Diagnostics

  • Residuals
  • Comparing Multiple Models
  • k-Fold Cross-Validation
  • Conclusion
18

Regularization

  • Why Regularize?
  • LASSO Regression
  • Ridge Regression
  • Elastic Net
  • Cross-Validation
  • Conclusion
19

Clustering

  • k-Means
  • Hierarchical Clustering
  • Conclusion
20

Life Outside of Pandas

  • The (Scientific) Computing Stack
  • Performance
  • Dask
  • Siuba
  • Ibis
  • Polars
  • PyJanitor
  • Pandera
  • Machine Learning
  • Publishing
  • Dashboards
  • Conclusion
21

It’s Dangerous To Go Alone!

  • Local Meetups
  • Conferences
  • The Carpentries
  • Podcasts
  • Other Resources
  • Conclusion
A

Appendix A: Concept Maps

B

Appendix B: Installation and Setup

  • B.1 Install Python
  • B.2 Install Python Packages
  • B.3 Download Book Data
C

Appendix C: Command Line

  • C.1 Installation
  • C.2 Basics
D

Appendix D: Project Templates

E

Appendix E: Using Python

  • E.1 Command Line and Text Editor
  • E.2 Python and IPython
  • E.3 Jupyter
  • E.4 Integrated Development Environments (IDEs)
F

Appendix F: Working Directories

G

Appendix G: Environments

  • G.1 Conda Environments
  • G.2 Pyenv + Pipenv
H

Appendix H: Install Packages

  • H.1 Updating Packages
I

Appendix I: Importing Libraries

J

Appendix J: Code Style

  • J.1 Line Breaks in Code
K

Appendix K: Containers: Lists, Tuples, and Dictionaries

  • K.1 Lists
  • K.2 Tuples
  • K.3 Dictionaries
L

Appendix L: Slice Values

M

Appendix M: Loops

N

Appendix N: Comprehensions

O

Appendix O: Functions

  • O.1 Default Parameters
  • O.2 Arbitrary Parameters
P

Appendix P: Ranges and Generators

Q

Appendix Q: Multiple Assignment

R

Appendix R: Numpy ndarray

S

Appendix S: Classes

T

Appendix T: SettingWithCopyWarning

  • T.1 Modifying a Subset of Data
  • T.2 Replacing a Value
  • T.3 More Resources
U

Appendix U: Method Chaining

V

Appendix V: Timing Code

W

Appendix W: String Formatting

  • W.1 C-Style
  • W.2 String Formatting: .format() Method
  • W.3 Formatting Numbers
X

Appendix X: Conditionals (if-elif-else)

Y

Appendix Y: New York ACS Logistic Regression Example

Z

Appendix Z: Replicating Results in R

  • Z.1 Linear Regression
  • Z.2 Logistic Regression
  • Z.3 Poisson Regression

1

Pandas DataFrame Basics

  • Exploring Data Selection and Subsetting Techniques
  • Performing Grouped and Aggregated Calculations Using the .groupby() Method
2

Pandas Data Structures Basics

  • Creating a DataFrame and Making Changes to it
3

Plotting Basics

  • Analyzing Data Distributions and Relationships
  • Creating a Scatter Plot Using Multivariate Data
  • Creating a Density Plot Using Bivariate Data
4

Tidy Data

  • Reshaping and Analyzing a Dataset
  • Using Functions and Methods to Process and Tidy Data
5

Apply Functions

  • Performing Calculations Across DataFrames
  • Vectorizing Functions
6

Data Assembly

  • Performing Concatenation Using the concat() Function
  • Merging Multiple Data Sets Using the .merge() Function
7

Data Normalization

  • Understanding Multiple Observational Units in a Data Set
8

Groupby Operations: Split-Apply-Combine

  • Performing Data Summarization Using Group-by Operations
  • Performing Boolean Subsetting on the Data
  • Performing Operations on Grouped Objects
  • Visualizing Data with MultiIndex Grouping and Trend Analysis
9

Missing Data

  • Loading, Manipulating, and Merging CSV Data
  • Finding and Cleaning Missing Data
10

Data Types

  • Performing Data Type Conversion
11

Strings and Text Data

  • Manipulating String and Processing Text
  • Finding and Substituting a Pattern
12

Dates and Times

  • Converting an Object Type into a datetime Type
  • Extracting Date Components from the Data
  • Getting Stock Data and Subsetting it Based on Dates
  • Resampling Dates Using the .resample() Method
13

Linear Regression (Continuous Outcome Variable)

  • Performing Linear Regression
  • Performing Multiple Regression
  • Preprocessing Categorical Data with Scikit-Learn Pipelines
14

Generalized Linear Models

  • Performing Logistic Regression
  • Performing Poisson Regression Using the poisson() Function
15

Survival Analysis

  • Performing Survival Analysis Using the KaplanMeierFitter() Function
16

Model Diagnostics

  • Comparing OLS and GLM Models in Jupyter Notebook
  • Comparing Models Using Cross-Validation
17

Regularization

  • Performing L1 Regularization Using the Lasso() Function
  • Performing L2 Regularization Using the Ridge() Function
  • Evaluating Elastic Net Regularization and Parameter Optimization Using Cross-Validation
18

Clustering

  • Performing k-Means Clustering
  • Using Hierarchical Clustering Algorithms

Related Courses

All Courses
scroll to top