Plotly

Plotly#

Learning Objectives#

  • Learn how to import and use Plotly and Plotly Express

  • Understand the differences between Plotly and other visualisation libraries like Matplotlib and Seaborn

  • Create bar plots, scatter plots, and box plot using Plotly

  • Customize plots with titles, axis labels, and colours

  • Visualize geospatial data on scatter plots and choropleth maps

  • Use Pandas data frames to feed data into Plotly visualization

Overview#

There are many different plotting libraries for Python, including Matplotlib, Plotly, Bokeh, Seaborn, and many more. Matplotlib is worth learning, as it is commonly used in academic settings for creating report-ready plots. However, some of the other plotting libraries, such as Plotly and Seaborn, provide a convenient way of creating interactive and visual plots. In this example, we will load the hills data set as before. This is the The Database of British and Irish Hills v18 and is freely available under a Creative Commons Attribution 4 License, at https://www.hills-database.co.uk/downloads.html. This data set contains grid reference information for peaks, hills, and cols in Britain.

import os
import pandas as pd

filename = "DoBIH_v18.csv"
data_folder = "data/"
project_folder = "./"
filepath = os.path.join(project_folder, data_folder, filename)

print(f"My data file is located at: '{filepath}'")
print(f"My data path is valid: {os.path.exists(filepath)}")

df = pd.read_csv(filepath, encoding='utf-8', engine='python')
  • We can use Plotly Express, which is just Plotly with reasonable default values, to get started very quickly.

  • First, let’s reproduce the Matplotlib bar plot example we saw previously.

  • We will need to install Plotly in our virtual environment.

import plotly.express as px
hill_count = df["Country"].value_counts()

fig = px.bar(
    hill_count,
    x=hill_count.values,
    y=hill_count.index,
    color=hill_count.index,
    orientation="h",
    title="Number of hills in Great Britain by Country"
    )
fig.show()
  • We can create many types of plots with Plotly.

fig = px.box(df, x="Country", y="Metres", color="Country")
fig.show()
  • Let’s recreate the scatter plot of the (lat, lon) data in Plotly.

fig = px.scatter(
    df, 
    x="Longitude", 
    y="Latitude",
    color="Country",
    hover_data="Metres",
    title="Location of hills in Great Britain"
    )
fig.layout.yaxis.scaleanchor="x"
fig.show()
  • Let’s make the marker size smaller.

fig = px.scatter(
    df, 
    x="Longitude", 
    y="Latitude",
    color="Country",
    hover_data="Metres",
    title="Location of hills in Great Britain",
    symbol_sequence=["triangle-up"]
    )
fig.update_traces(marker={'size': 2})
fig.layout.yaxis.scaleanchor="x"
fig.show()
  • And change the opacity.

fig = px.scatter(
    df, 
    x="Longitude", 
    y="Latitude",
    color="Country",
    hover_data="Metres",
    opacity=0.6,
    title="Location of hills in Great Britain",
    symbol_sequence=["triangle-up"]
    )
fig.update_traces(marker={'size': 2})
fig.layout.yaxis.scaleanchor="x"
fig.show()
  • Let’s colour the points not by country, but by their height. This is a continuous variable, so a continuous colour scale should work well.

fig = px.scatter(
    df, 
    x="Longitude", 
    y="Latitude",
    color="Metres",
    hover_data="Metres",
    title="Location of hills in Great Britain",
    symbol_sequence=["triangle-up"]
    )
fig.update_traces(marker={'size': 3})
fig.layout.yaxis.scaleanchor="x"
fig.show()
  • It is very easy to change the colour scale of the plot.

fig = px.scatter(
    df, 
    x="Longitude", 
    y="Latitude",
    color="Metres",
    color_continuous_scale='Viridis',
    hover_data="Metres",
    title="Location of hills in Great Britain",
    symbol_sequence=["triangle-up"]
    )
fig.update_traces(marker={'size': 3})
fig.layout.yaxis.scaleanchor="x"
fig.show()
  • Let’s filter our data to include only hills above 950 metres before plotting.

threshold_height = 700
tall_hills_df = df.loc[df["Metres"] >= threshold_height].sort_values("Metres")

fig = px.scatter(
    tall_hills_df, 
    x="Longitude", 
    y="Latitude",
    color="Metres",
    color_continuous_scale='Inferno',
    hover_data="Metres",
    title=f"Location of hills above {threshold_height} metres in Great Britain",
    symbol_sequence=["triangle-up"]
    )
fig.layout.yaxis.scaleanchor="x"
fig.show()
  • This isn’t ideal, however. We have plotted (lat, lon) coordinates without considering the map projection.

fig = px.scatter_mapbox(
    tall_hills_df,
    lat="Latitude",
    lon="Longitude",
    hover_name="Metres",
    color="Metres",
    color_continuous_scale='Inferno',
    zoom=5,
    height=700,
    opacity=0.8,
    mapbox_style="open-street-map"
)
fig.update_layout(margin={"r": 0, "t": 0, "l": 0, "b": 0})
fig = px.scatter_mapbox(
    df,
    lat="Latitude",
    lon="Longitude",
    hover_name="Metres",
    color="Metres",
    color_continuous_scale='Inferno',
    zoom=5,
    height=700,
    opacity=0.8,
    mapbox_style="open-street-map"
)
fig.update_layout(margin={"r": 0, "t": 0, "l": 0, "b": 0})
  • Let’s add some boundary information.

import requests
import numpy as np

counties_geojson_filepath = 'https://github.com/martinjc/UK-GeoJSON/raw/master/json/administrative/gb/lad.json'
county_geojson = requests.get(counties_geojson_filepath).json()
geojson_county_names = set()

for feature in county_geojson["features"]:
    geojson_county_names.add(feature["properties"]["LAD13NM"])

df_county_names = set(list(df["County"].unique()))

print(f"Counties in geojson data set: {len(geojson_county_names)}")
print(f"Counties in hills data set: {len(df_county_names)}")
    
  • There are a few more counties in the GeoJSON file than the hill count database. This could be due to name mis-matches/incorrect labelling, or because hills are not present in certain county regions.

  • For now, let’s just count the number of hills that are labelled correctly in the database. We will then colour based on the count of the hills in the region.

hill_count_data = []

for feature in county_geojson["features"]:
    d = feature["properties"]
    county_name = d["LAD13NM"]

    # Count hills in hill count dataset with the same county name
    d["hill_count"] = len(df.loc[df["County"] == county_name])
    hill_count_data.append(d)

hill_count_df = pd.DataFrame(hill_count_data)
hill_count_df.head(2)
fig = px.choropleth(
    hill_count_df,
    locations="LAD13NM",
    featureidkey="properties.LAD13NM",
    geojson=county_geojson,
    color_continuous_scale='Viridis',
    color="hill_count",
    title="UK boundaries coloured by number of hills",
    height=600,
)
fig.update_geos(fitbounds="locations", visible=False)
fig.update_layout(margin={"r": 0, "l": 0, "b": 0})
fig.update_traces(marker_line_width=0.5, marker_line_color="white")
fig.show()
  • This is fine, but we can do better. Let’s find the average hill height for the regions.

  • We can get the average height from the main database.

  • We don’t need to loop through the GeoJSON any more, as we have the keys in the new hill_count_df.

mean_hill_heights = df.groupby("County")["Metres"].mean()

hill_count_df["Mean Height Metres"] = hill_count_df["LAD13NM"].apply(lambda x: mean_hill_heights[x] if x in mean_hill_heights else 0)
fig = px.choropleth(
    hill_count_df,
    locations="LAD13NM",
    featureidkey="properties.LAD13NM",
    geojson=county_geojson,
    color_continuous_scale='Viridis',
    color="Mean Height Metres",
    title="UK boundaries coloured by mean hill height in the region",
    height=600,
)
fig.update_geos(fitbounds="locations", visible=False)
fig.update_layout(margin={"r": 0, "l": 0, "b": 0})
fig.update_traces(marker_line_width=0.5, marker_line_color="white")
fig.show()
  • Let’s add a Country field to the hill_count_df we just made, in case we want to plot just the hills in a particular country.

code_map = {
    "S": "Scotland",
    "W": "Wales",
    "E": "England",
    "I": "Ireland",
}

# Get the first character of the LAD13CD column and use it as a key in the above map

hill_count_df["Country"] = hill_count_df["LAD13CD"].apply(lambda x: code_map[x[0]])
hill_count_df.head(2)