Python Libraries for Data Science

Objectives

  • Be able to choose the appropriate Python libraries for data projects
  • Master NumPy, Pandas, Matplotlib, Seaborn and Plotly
  • Become autonomous in data analysis, data cleaning and visualization

Target Audience

  • Data analysts
  • Python developers

Prerequisites

  • Knowledge of Python fundamentals (variables, types, loops, conditions, functions, files)

Teaching Methods

  • Theoretical input: interactive presentations with slides
  • Hands-on practice: individual labs and progressive exercises using real financial datasets
  • Active learning: collaborative problem-solving
  • Balanced theory/practice approach: 30% theory / 70% practice
  • Course materials provided to participants
  • Mid-course quiz (20 questions) and final quiz (30 questions) to validate acquired skills

Target Certification: RS6701 — Manipulating, analyzing and visualizing data using Python Data Science modules — CPF eligible

Detailed Program

Day 1 — The Scientific Python Ecosystem & NumPy

  • Overview of Python Data Science packages
  • Installing scientific libraries: pip, venv, miniconda, mamba, miniforge, WinPython
  • Development environments: IPython, Jupyter Notebook, JupyterLab, Spyder, VS Code
  • Introduction to the NumPy library
  • Advantages of arrays (performance, data representation)
  • Creating arrays with array(), zeros(), ones(), full(), arange(), linspace(), logspace()
  • Matrix multiplication with np.dot and the @ operator
  • Identity matrix with identity() and eye(), diagonal matrix with diag()
  • Random initialization using NumPy's random module
  • Data types and attributes ndim, shape, size, dtype, itemsize, nbytes
  • Indexing, slicing, advanced indexing and broadcasting
  • Transposing and reshaping arrays (transpose(), reshape(), newaxis())
  • Concatenating and splitting arrays (concatenate(), vstack(), hstack(), split())
  • Functions: sum(), min(), max(), median(), percentile(), cumsum(), var(), argmin(), argmax()
  • Boolean masks for extracting information
  • Loading and saving arrays: loadtxt(), save(), load()

Day 2 — Data Manipulation with Pandas

  • Introduction to the Pandas library
  • Creating a Series and a DataFrame
  • Extracting row and column indices (index and columns attributes)
  • Importing and exporting data (CSV, Excel…)
  • Data exploration: head(), tail(), info(), describe(), dtypes
  • Implicit and explicit indexing with loc and iloc
  • Advanced selection: boolean expressions, query() method
  • Concatenating data with concat(), merging and joining with merge() and join()
  • Missing values: isna(), dropna(), fillna(), interpolate()
  • Sorting data: sort_index(), sort_values()
  • Removing data and duplicates: drop(), drop_duplicates()
  • Aggregation functions: sum(), cumsum(), min(), max(), mean(), median(), var(), std(), quantile()
  • Grouping and analysis: groupby(), aggregate(), apply(), filter(), transform()
  • Pivot tables: pivot_table()
  • Moving averages: rolling(), expanding(), ewm()
  • Multi-indexing: MultiIndex.from_product(), from_tuple(), from_arrays()
  • String processing and regular expressions with Pandas
  • Time series data: to_datetime(), date_range(), asfreq(), resample()

Day 3 — Visualization with Matplotlib & Seaborn

  • Introduction to Matplotlib: MATLAB-style vs object-oriented approach
  • Figure and Axes objects
  • Plotting curves with plot(): color, style, width, title, legend
  • Scatter plots with scatter()
  • Error bars with errorbar()
  • Area filling with fill_between()
  • Histograms with hist()
  • Multiple charts with subplots() and 3D plots with mplot3d
  • Pandas plotting: plot(), bar(), barh(), hist(), box(), scatter(), pie()
  • Introduction to Seaborn: Figure-level API and Axes-level API
  • Relational plots: relplot(), lineplot(), scatterplot()
  • Distributions: displot(), histplot(), jointplot(), pairplot()
  • Categorical data: catplot(), barplot(), countplot(), boxplot(), violinplot()
  • Heatmaps: heatmap()
  • Linear regression models: lmplot()
  • Customization: set_theme(), set_style(), set_context(), despine()

Day 4 — Interactive Visualization with Plotly

  • Introduction to the Plotly library and Kaleido: exploring Plotly Express
  • Plotting curves with line(): customizing figures with title, width, height, marker, labels
  • Adding information: hover_data, hover_name, text
  • Multiple charts: facet_row, facet_col
  • Style customization: template option and default themes
  • Area charts with area(): adding patterns with pattern_shape
  • Scatter plots with scatter(): using size, size_max, opacity, symbol
  • Color bars: color_continuous_scale, update_layout(), update_coloraxes()
  • Formatting bar charts with bar() and histograms with histogram()
  • 3D charts with scatter_3d() and line_3d()
  • Mapping data with line_map(), scatter_map(), line_geo(), scatter_geo(), and choropleth()
Scroll to Top