PolarSPARC

Pandas DataFrame Styling


Bhaskar S 06/21/2024


Overview

A Pandas DataFrame is a 2-dimensional, labeled table-like structure, with rows and columns (just like an Excel sheet), that is extensively used for slicing and dicing of data for analysis.

To highlight the data insights, typically Pandas is used in conjunction with Matplotlib for data visualization. However, a Pandas DataFrame has built-in functionality for styling the DataFrame table in order to highlight the data insights.

In this article, we will get our hands dirty in exploring how one can use the styling capability of the Pandas to make the DataFrame table more presentable.


Installation and Setup

Installation and setup will be on a Linux desktop running Ubuntu 22.04 LTS. Note that the stable Python version on Ubuntu 22.04 is 3.10.

For the demonstration, we will create a Python virtual environment using the venv module. In order to do that, we first need to install the package for venv by executing the following command in a terminal window:


$ sudo apt install python3.10-venv


The Python venv module allows one to create a lightweight virtual environments, each with its own directory structure, that are isolated from the system specific directory structure. To create a Python virtual environment, execute the following command(s) in the terminal window:


$ cd $HOME/Downloads

$ python3 -m venv venv


This will create a directory called venv under the current directory. On needs to activate the newly created virtual environment by executing the following command in the terminal window:


$ source venv/bin/activate


On successful virtual environment activation, the prompt will be prefixed with (venv).

Finally, We will install the required Python modules by executing the following command(s) in the terminal window (with venv activated):


$ pip install jupyter matplotlib pandas


This completes the required installation and setup for the hands-on demonstration.


Hands-on Pandas DataFrame Styling

For the demonstration, we will create the following small dataset called insurance-small.csv that will be located in the directory $HOME/Downloads/data:


insurance-small.csv
age,sex,bmi,children,smoker,region,charges
19,female,27.9,0,yes,southwest,16884.924
28,male,33,3,no,southeast,4449.462
33,male,22.705,0,no,northwest,21984.47061
46,female,33.44,1,no,southeast,8240.5896
37,female,27.74,3,no,northwest,7281.5056
60,female,25.84,0,no,northwest,28923.13692
25,male,26.22,0,no,northeast,2721.3208
62,female,26.29,0,yes,southeast,27808.7251
23,male,,0,no,southwest,1826.843
56,female,39.82,0,no,southeast,11090.7178
27,male,42.13,0,yes,southeast,39611.7577
52,female,30.78,1,no,northeast,10797.3362
23,male,23.845,0,no,northeast,2395.17155
30,male,35.3,0,yes,southwest,36837.467
60,female,,0,no,northeast,13228.84695
37,male,28.025,2,no,northwest,6203.90175
23,male,17.385,1,no,northwest,2775.19215
31,male,36.3,2,yes,southwest,38711
22,male,35.6,0,yes,southwest,35585.576
24,female,26.6,0,no,northeast,3046.062
31,female,36.63,2,no,southeast,4949.7587

We will be using Jupyter Notebook for all our hands-on demonstration.

Open a terminal and ensure we are in the directory $HOME/Downloads and then execute the following command to launch the Jupyter Notebook environment:


jupyter notebook


Note that all the following statements *MUST* be executed in the next notebook code cell.

To import the pandas module, execute the following:


import pandas as pd


To initialize the two variables n_rows and url, execute the following:


n_rows = 10

url = './data/insurance-small.csv'


To load our sample dataset into a DataFrame and display the first few rows, execute the following:


df = pd.read_csv(url)

df.head(n_rows)


The following illustration would be the typical output:


Display Basic DataFrame
Figure.1

To format the numbers for the columns bmi and charges to two decimal places, execute the following:


df.head(n_rows).style.format({'bmi': '{:.2f}', 'charges': '{:.2f}'})


The following illustration would be the typical output:


Two Decimal Places
Figure.2

To display borders around the table, change the background color, and the font of the data displayed, execute the following:


df.head(n_rows).style \
  .format({'bmi': '{:.2f}', 'charges': '{:.2f}'}) \
  .set_properties(**{'background-color': 'azure', 'border': '2.0px solid gray', 'color': 'gray', 'font-size': '10pt'})


The following illustration would be the typical output:


Border, Background, Font
Figure.3

To highlight the minimum, maximum, and null values in each of the columns, execute the following:


df.head(n_rows).style \
  .format({'bmi': '{:.2f}', 'charges': '{:.2f}'}) \
  .set_properties(**{'background-color': 'azure', 'border': '2.0px solid gray', 'color': 'gray'}) \
  .highlight_min() \
  .highlight_max() \
  .highlight_null()


The following illustration would be the typical output:


Minimum, Maximum, and Null
Figure.4

The minimum and maximum values share the same color and hence confusing the visualization !!!

To highlight the minimum, the maximum, and the null values with different background colors for each of the columns, execute the following:


df.head(n_rows).style \
  .format({'bmi': '{:.2f}', 'charges': '{:.2f}'}) \
  .set_properties(**{'background-color': 'azure', 'border': '2.0px solid gray', 'color': 'gray'}) \
  .highlight_min(color='lightgreen') \
  .highlight_max(color='salmon') \
  .highlight_null(color='lightgrey')


The following illustration would be the typical output:


Minimum, Maximum, and Null in Colors
Figure.5

To highlight the minimum, the maximum, and the null values with different background and text colors for each of the columns, execute the following:


df.head(n_rows).style \
  .format({'bmi': '{:.2f}', 'charges': '{:.2f}'}) \
  .set_properties(**{'background-color': 'azure', 'border': '2.0px solid gray', 'color': 'gray'}) \
  .highlight_min(color='lightgreen') \
  .highlight_max(props='background-color: salmon; color: white') \
  .highlight_null(props='background-color: lightgrey; color: blue; font-weight: bold')


The following illustration would be the typical output:


Minimum, Maximum, and Null in Colors
Figure.6

Notice that by default the minimum, the maximum, and the null values are determined for ALL the columns !!!

To highlight the minimum, the maximum, and the null values with different background and text colors for specific columns, execute the following:


df.head(n_rows).style \
  .format({'bmi': '{:.2f}', 'charges': '{:.2f}'}) \
  .set_properties(**{'background-color': 'azure', 'border': '2.0px solid gray', 'color': 'gray', 'font-weight': 'bold', 'font-style': 'italic'}) \
  .highlight_min(subset=['bmi', 'charges'], color='lightgreen') \
  .highlight_max(subset=['bmi', 'charges'], props='background-color: salmon; color: white') \
  .highlight_null(subset=['bmi', 'charges'], props='background-color: lightgrey; color: blue; font-weight: bold')


The following illustration would be the typical output:


Minimum, Maximum, and Null for Specific
Figure.7

To display the range of number in different text color gradient for the column age, execute the following:


df.head(n_rows).style \
  .format({'bmi': '{:.2f}', 'charges': '{:.2f}'}) \
  .set_properties(**{'background-color': 'azure', 'border': '2.0px solid gray', 'color': 'gray'}) \
  .highlight_min(subset=['bmi', 'charges'], color='lightgreen') \
  .highlight_max(subset=['bmi', 'charges'], props='background-color: salmon; color: white') \
  .highlight_null(subset=['bmi', 'charges'], props='background-color: lightgrey; color: blue; font-weight: bold') \
  .text_gradient(subset=['age'], cmap='Dark2', vmin=1, vmax=100)


The following illustration would be the typical output:


Age Text Gradient
Figure.8

To display row and column labels with a selected background color and text color, execute the following:


df.head(n_rows).style \
  .format({'bmi': '{:.2f}', 'charges': '{:.2f}'}) \
  .set_properties(**{'background-color': 'azure', 'border': '2.0px solid gray', 'color': 'gray'}) \
  .set_table_styles([{'selector': 'th', 'props': [('background-color', 'gray'), ('color', 'white')]}]) \
  .highlight_min(subset=['bmi', 'charges'], color='lightgreen') \
  .highlight_max(subset=['bmi', 'charges'], props='background-color: salmon; color: white')


The following illustration would be the typical output:


Row and Column Style
Figure.9

Notice that the columns names don't seem to have any kind of separation !!!

To hide the row labels and just display column labels with separation, execute the following:


df.head(n_rows).style \
  .format({'bmi': '{:.2f}', 'charges': '{:.2f}'}) \
  .set_properties(**{'background-color': 'azure', 'border': '2.0px solid gray', 'color': 'gray', 'font-size': '10pt'}) \
  .set_table_styles([{'selector': 'th', 'props': [('background-color', 'gray'), ('color', 'white'), ('border', '2.0px solid white')]}]) \
  .hide() \
  .highlight_min(subset=['bmi', 'charges'], color='lightgreen') \
  .highlight_max(subset=['bmi', 'charges'], props='background-color: salmon; color: white')


The following illustration would be the typical output:


Columns with Separation
Figure.10

To highlight the cells for the specific column smoker that has the value of yes, execute the following:


def is_smoker(col):
  return 'background-color: yellow; color: red; font-weight: bold' if col == 'yes' else 'color: green; font-weight: bold'
    
df.head(n_rows).style \
  .format({'bmi': '{:.2f}', 'charges': '{:.2f}'}) \
  .set_properties(**{'background-color': 'azure', 'border': '2.0px solid gray', 'color': 'gray', 'font-size': '10pt'}) \
  .set_table_styles([{'selector': 'th', 'props': [('background-color', 'gray'), ('color', 'white'), ('border', '2.0px solid white')]}]) \
  .hide() \
  .highlight_min(subset=['bmi', 'charges'], color='lightgreen') \
  .highlight_max(subset=['bmi', 'charges'], props='background-color: salmon; color: white') \
  .map(is_smoker, subset=['smoker'])


The following illustration would be the typical output:


Smoker Column with Yes
Figure.11

To highlight the entire row for the specific column smoker with the value yes, execute the following:


def is_smoker(row):
  n_cols = len(row)
  return n_cols * ['background-color: yellow; color: red; font-style: italic'] if row['smoker'] == 'yes' else n_cols * ['']
    
df.head(n_rows).style \
  .format({'bmi': '{:.2f}', 'charges': '{:.2f}'}) \
  .set_properties(**{'background-color': 'azure', 'border': '2.0px solid gray', 'color': 'gray', 'font-size': '10pt'}) \
  .set_table_styles([{'selector': 'th', 'props': [('background-color', 'gray'), ('color', 'white'), ('border', '2.0px solid white')]}]) \
  .hide() \
  .highlight_min(subset=['bmi', 'charges'], color='lightgreen') \
  .highlight_max(subset=['bmi', 'charges'], props='background-color: salmon; color: white') \
  .apply(is_smoker, axis=1)


The following illustration would be the typical output:


Entire Row for Smoker
Figure.12

This concludes the hands-on demonstration on styling the Pandas DataFrame and note that we have barely scratched the surface !!!


References

Pandas Dataframe Style Documentation



© PolarSPARC