如何用Jupyter Notebook製作新冠病毒疫情追蹤器?

如何用Jupyter Notebook製作新冠病毒疫情追蹤器?

如何用Jupyter Notebook製作新冠病毒疫情追蹤器?

出品 | AI科技大本營(ID:rgznai100)

新冠肺炎已在全球範圍內爆發。為了解全球疫情分佈情況,有技術人員使用Jupyter Notebook繪製了兩種疫情的等值線地圖(choropleth chart)和散點圖。

前者顯示了一個國家/地區的疫情擴散情況:該國家/地區的在地圖上的顏色越深,其確診案例越多。其中的播放鍵可以為圖表製作動畫,同時還可以使用滑塊手動更改日期。

第二個散點圖中的紅點則表明其大小與某一特定地點的確診病例數量成對數比例。這個圖表的分辨率更高,數據呈現的是州/省一級的疫情情況。

最終的疫情地圖顯示效果清晰明瞭,以下為作者分享的全部代碼:

<code>from datetime import datetime/<code><code>import re/<code>

<code>from IPython.display import display/<code><code>import numpy as np/<code><code>import pandas as pd/<code><code>import plotly.graph_objects as go/<code><code>from plotly.subplots import make_subplots/<code>

<code>pd.options.display.max_columns = 12/<code>

<code>date_pattern = re.compile(r"\\d{1,2}/\\d{1,2}/\\d{2}")/<code><code>def reformat_dates(col_name: str) -> str:/<code><code> # for columns which are dates, I'd much rather they were in day/month/year format/<code><code> try:/<code><code> return date_pattern.sub(datetime.strptime(col_name, "%m/%d/%y").strftime("%d/%m/%Y"), col_name, count=1)/<code><code> except ValueError:/<code><code> return col_name/<code>

<code># this github repo contains timeseries data for all coronavirus cases: https://github.com/CSSEGISandData/COVID-19/<code><code>confirmed_cases_url = "https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/" \\/<code><code> "csse_covid_19_time_series/time_series_19-covid-Confirmed.csv"/<code><code>deaths_url = "https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/" \\/<code><code> "csse_covid_19_time_series/time_series_19-covid-Deaths.csv"/<code>

如何用Jupyter Notebook制作新冠病毒疫情追踪器?

等值線地圖

<code>renamed_columns_map = {/<code><code> "Country/Region": "country",/<code><code> "Province/State": "location",/<code><code> "Lat": "latitude",/<code><code> "Long": "longitude"/<code><code>}/<code>

<code>cols_to_drop = ["location", "latitude", "longitude"]/<code>

<code>confirmed_cases_df = (/<code><code> pd.read_csv(confirmed_cases_url)/<code><code> .rename(columns=renamed_columns_map)/<code><code> .rename(columns=reformat_dates)/<code><code> .drop(columns=cols_to_drop)/<code><code>)/<code><code>deaths_df = (/<code><code> pd.read_csv(deaths_url)/<code><code> .rename(columns=renamed_columns_map)/<code><code> .rename(columns=reformat_dates)/<code><code> .drop(columns=cols_to_drop)/<code><code>)/<code>

<code>display(confirmed_cases_df.head)/<code><code>display(deaths_df.head)/<code>

如何用Jupyter Notebook制作新冠病毒疫情追踪器?如何用Jupyter Notebook制作新冠病毒疫情追踪器?

<code># extract out just the relevant geographical data and join it to another .csv which has the country codes./<code><code># The country codes are required for the plotting function to identify countries on the map/<code><code>geo_data_df = confirmed_cases_df[["country"]].drop_duplicates/<code><code>country_codes_df = (/<code><code> pd.read_csv(/<code><code> "country_code_mapping.csv",/<code><code> usecols=["country", "alpha-3_code"],/<code><code> index_col="country")/<code><code>)/<code><code>geo_data_df = geo_data_df.join(country_codes_df, how="left", on="country").set_index("country")/<code>

<code># my .csv file of country codes and the COVID-19 data source disagree on the names of some countries. This /<code><code># dataframe should be empty, otherwise it means I need to edit the country name in the .csv to match/<code><code>geo_data_df[(pd.is(geo_data_df["alpha-3_code"])) & (geo_data_df.index != "Cruise Ship")/<code>

輸出:

如何用Jupyter Notebook制作新冠病毒疫情追踪器?

<code>dates_list = (/<code><code> deaths_df.filter(regex=r"(\\d{2}/\\d{2}/\\d{4})", axis=1)/<code><code> .columns/<code><code> .to_list/<code><code>)/<code>

<code># create a mapping of date -> dataframe, where each df holds the daily counts of cases and deaths per country/<code><code>cases_by_date = {}/<code><code>for date in dates_list:/<code><code> confirmed_cases_day_df = (/<code><code> confirmed_cases_df/<code><code> .filter(like=date, axis=1)/<code><code> .rename(columns=lambda col: "confirmed_cases")/<code><code> )/<code><code> deaths_day_df = deaths_df.filter(like=date, axis=1).rename(columns=lambda col: "deaths")/<code><code> cases_df = confirmed_cases_day_df.join(deaths_day_df).set_index(confirmed_cases_df["country"])/<code>

<code>date_df = (/<code><code> geo_data_df.join(cases_df)/<code><code> .groupby("country")/<code><code> .agg({"confirmed_cases": "sum", "deaths": "sum", "alpha-3_code": "first"})/<code><code> )/<code><code> date_df = date_df[date_df["confirmed_cases"] > 0].reset_index/<code><code> cases_by_date[date] = date_df/<code><code># the dataframe for each day looks something like this:/<code><code>cases_by_date[dates_list[-1]].head/<code>

輸出:

如何用Jupyter Notebook制作新冠病毒疫情追踪器?

<code># helper function for when we produce the frames for the map animation/<code><code>def frame_args(duration):/<code><code> return {/<code><code> "frame": {"duration": duration},/<code><code> "mode": "immediate",/<code><code> "fromcurrent": True,/<code><code> "transition": {"duration": duration, "easing": "linear"},/<code><code> }/<code>

<code>fig = make_subplots(rows=2, cols=1, specs=[[{"type": "scattergeo"}], [{"type": "xy"}]], row_heights=[0.8, 0.2])/<code>

<code># set up the geo data, the slider, the play and pause buttons, and the title/<code><code>fig.layout.geo = {"showcountries": True}/<code><code>fig.layout.sliders = [{"active": 0, "steps": []}]/<code><code>fig.layout.updatemenus = [/<code><code> {/<code><code> "type": "buttons",/<code><code> "buttons": [/<code><code> {/<code><code> "label": "▶", # play symbol/<code><code> "method": "animate",/<code><code> "args": [None, frame_args(250)],/<code><code> },/<code><code> {/<code><code> "label": "◼",/<code><code> "method": "animate", # stop symbol/<code><code> "args": [[None], frame_args(0)],/<code><code> },/<code><code> ],/<code><code> "showactive": False,/<code><code> "direction": "left",/<code><code> }/<code><code>]/<code><code>fig.layout.title = {"text": "COVID-19 Case Tracker", "x": 0.5}/<code>

<code>frames = /<code><code>steps = /<code><code># set up colourbar tick values, ranging from 1 to the highest num. of confirmed cases for any country thus far/<code><code>max_country_confirmed_cases = cases_by_date[dates_list[-1]]["confirmed_cases"].max/<code>

<code># to account for the significant variance in number of cases, we want the scale to be logarithmic.../<code><code>high_tick = np.log1p(max_country_confirmed_cases)/<code><code>low_tick = np.log1p(1)/<code><code>log_tick_values = np.geomspace(low_tick, high_tick, num=6)/<code>

<code># ...however, we want the /labels/ on the scale to be the actual number of cases (i.e. not log(n_cases))/<code><code>visual_tick_values = np.expm1(log_tick_values).astype(int)/<code><code># explicitly set max cbar value, otherwise it might be max - 1 due to a rounding error/<code><code>visual_tick_values[-1] = max_country_confirmed_cases /<code><code>visual_tick_values = [f"{val:,}" for val in visual_tick_values]/<code>

<code># generate line chart data/<code><code># list of tuples: [(confirmed_cases, deaths), ...]/<code><code>cases_deaths_totals = [(df.filter(like="confirmed_cases").astype("uint32").agg("sum")[0], /<code><code> df.filter(like="deaths").astype("uint32").agg("sum")[0]) /<code><code> for df in cases_by_date.values]/<code>

<code>confirmed_cases_totals = [daily_total[0] for daily_total in cases_deaths_totals]/<code><code>deaths_totals =[daily_total[1] for daily_total in cases_deaths_totals]/<code>

<code># this loop generates the data for each frame/<code><code>for i, (date, data) in enumerate(cases_by_date.items, start=1):/<code><code> df = data/<code>

<code># the z-scale (for calculating the colour for each country) needs to be logarithmic/<code><code> df["confirmed_cases_log"] = np.log1p(df["confirmed_cases"])/<code>

<code>df["text"] = (/<code><code> date/<code><code> + "
"/<code><code> + df["country"]/<code><code> + "
Confirmed cases: "/<code><code> + df["confirmed_cases"].apply(lambda x: "{:,}".format(x))/<code><code> + "
Deaths: "/<code><code> + df["deaths"].apply(lambda x: "{:,}".format(x))/<code><code> )/<code>

<code># create the choropleth chart/<code><code> choro_trace = go.Choropleth(/<code><code> **{/<code><code> "locations": df["alpha-3_code"],/<code><code> "z": df["confirmed_cases_log"],/<code><code> "zmax": high_tick,/<code><code> "zmin": low_tick,/<code><code> "colorscale": "reds",/<code><code> "colorbar": {/<code><code> "ticks": "outside",/<code><code> "ticktext": visual_tick_values,/<code><code> "tickmode": "array",/<code><code> "tickvals": log_tick_values,/<code><code> "title": {"text": "Confirmed Cases"},/<code><code> "len": 0.8,/<code><code> "y": 1,/<code><code> "yanchor": "top"/<code><code> },/<code><code> "hovertemplate": df["text"],/<code><code> "name": "",/<code><code> "showlegend": False/<code><code> }/<code><code> )/<code><code> # create the confirmed cases trace/<code><code> confirmed_cases_trace = go.Scatter(/<code><code> x=dates_list,/<code><code> y=confirmed_cases_totals[:i],/<code><code> mode="markers" if i == 1 else "lines",/<code><code> name="Total Confirmed Cases",/<code><code> line={"color": "Red"},/<code><code> hovertemplate="%{x}
Total confirmed cases: %{y:,}<extra>"/<code><code> )/<code><code> # create the deaths trace/<code><code> deaths_trace = go.Scatter(/<code><code> x=dates_list,/<code><code> y=deaths_totals[:i],/<code><code> mode="markers" if i == 1 else "lines",/<code><code> name="Total Deaths",/<code><code> line={"color": "Black"},/<code><code> hovertemplate="%{x}
Total deaths: %{y:,}<extra>"/<code><code> )/<code>

<code>if i == 1:/<code><code> # the first frame is what the figure initially shows.../<code><code> fig.add_trace(choro_trace, row=1, col=1)/<code><code> fig.add_traces([confirmed_cases_trace, deaths_trace], rows=[2, 2], cols=[1, 1])/<code><code> # ...and all the other frames are appended to the `frames` list and slider/<code><code> frames.append(dict(data=[choro_trace, confirmed_cases_trace, deaths_trace], name=date))/<code>

<code>steps.append(/<code><code> {"args": [[date], frame_args(0)], "label": date, "method": "animate",}/<code><code> )/<code>

<code># tidy up the axes and finalise the chart ready for display/<code><code>fig.update_xaxes(range=[0, len(dates_list)-1], visible=False)/<code><code>fig.update_yaxes(range=[0, max(confirmed_cases_totals)])/<code><code>fig.frames = frames/<code><code>fig.layout.sliders[0].steps = steps/<code><code>fig.layout.geo.domain = {"x": [0,1], "y": [0.2, 1]}/<code><code>fig.update_layout(height=650, legend={"x": 0.05, "y": 0.175, "yanchor": "top", "bgcolor": "rgba(0, 0, 0, 0)"})/<code><code>fig/<code>

如何用Jupyter Notebook制作新冠病毒疫情追踪器?
如何用Jupyter Notebook制作新冠病毒疫情追踪器?

疫情散點圖

<code>renamed_columns_map = {/<code><code> "Country/Region": "country",/<code><code> "Province/State": "location",/<code><code> "Lat": "latitude",/<code><code> "Long": "longitude"/<code><code>}/<code>

<code>confirmed_cases_df = (/<code><code> pd.read_csv(confirmed_cases_url)/<code><code> .rename(columns=renamed_columns_map)/<code><code> .rename(columns=reformat_dates)/<code><code> .fillna(method="bfill", axis=1)/<code><code>)/<code><code>deaths_df = (/<code><code> pd.read_csv(deaths_url)/<code><code> .rename(columns=renamed_columns_map)/<code><code> .rename(columns=reformat_dates)/<code><code> .fillna(method="bfill", axis=1)/<code><code>)/<code>

<code>display(confirmed_cases_df.head)/<code><code>display(deaths_df.head)/<code>

如何用Jupyter Notebook制作新冠病毒疫情追踪器?
如何用Jupyter Notebook制作新冠病毒疫情追踪器?

<code>fig = go.Figure/<code>

<code>geo_data_cols = ["country", "location", "latitude", "longitude"]/<code><code>geo_data_df = confirmed_cases_df[geo_data_cols]/<code><code>dates_list = (/<code><code> confirmed_cases_df.filter(regex=r"(\\d{2}/\\d{2}/\\d{4})", axis=1)/<code><code> .columns/<code><code> .to_list/<code><code>)/<code>

<code># create a mapping of date -> dataframe, where each df holds the daily counts of cases and deaths per country/<code><code>cases_by_date = {}/<code><code>for date in dates_list:/<code><code> # get a pd.Series of all cases for the current day/<code><code> confirmed_cases_day_df = (/<code><code> confirmed_cases_df.filter(like=date, axis=1)/<code><code> .rename(columns=lambda col: "confirmed_cases")/<code><code> .astype("uint32")/<code><code> )/<code><code> # get a pd.Series of all deaths for the current day/<code><code> deaths_day_df = (/<code><code> deaths_df.filter(like=date, axis=1)/<code><code> .rename(columns=lambda col: "deaths")/<code><code> .astype("uint32")/<code><code> )/<code><code> cases_df = confirmed_cases_day_df.join(deaths_day_df) # combine the cases and deaths dfs/<code><code> cases_df = geo_data_df.join(cases_df) # add in the geographical data/<code><code> cases_df = cases_df[cases_df["confirmed_cases"] > 0] # get rid of any rows where there were no cases/<code><code> cases_by_date[date] = cases_df/<code><code># each dataframe looks something like this:/<code><code>cases_by_date[dates_list[-1]].head/<code>

輸出:

如何用Jupyter Notebook制作新冠病毒疫情追踪器?

<code># generate the data for each day/<code><code>fig.data = /<code><code>for date, df in cases_by_date.items:/<code><code> df["confirmed_cases_norm"] = np.log1p(df["confirmed_cases"])/<code><code> df["text"] = (/<code><code> date/<code><code> + "
"/<code><code> + df["country"]/<code><code> + "
"/<code><code> + df["location"]/<code><code> + "
Confirmed cases: "/<code><code> + df["confirmed_cases"].astype(str)/<code><code> + "
Deaths: "/<code><code> + df["deaths"].astype(str)/<code><code> )/<code><code> fig.add_trace(/<code><code> go.Scattergeo(/<code><code> name="",/<code><code> lat=df["latitude"],/<code><code> lon=df["longitude"],/<code><code> visible=False,/<code><code> hovertemplate=df["text"],/<code><code> showlegend=False,/<code><code> marker={/<code><code> "size": df["confirmed_cases_norm"] * 100,/<code><code> "color": "red",/<code><code> "opacity": 0.75,/<code><code> "sizemode": "area",/<code><code> },/<code><code> )/<code><code> )/<code>

<code># sort out the nitty gritty of the annotations and slider steps/<code><code>annotation_text_template = "Worldwide Totals" \\/<code><code> "
{date}

" \\/<code><code> "Confirmed cases: {confirmed_cases:,d}
" \\/<code><code> "Deaths: {deaths:,d}
" \\/<code><code> "Mortality rate: {mortality_rate:.1%}"/<code><code>annotation_dict = {/<code><code> "x": 0.03,/<code><code> "y": 0.35,/<code><code> "width": 150,/<code><code> "height": 110,/<code><code> "showarrow": False,/<code><code> "text": "",/<code><code> "valign": "middle",/<code><code> "visible": False,/<code><code> "bordercolor": "black",/<code><code>}/<code>

<code>steps = /<code><code>for i, data in enumerate(fig.data):/<code><code> step = {/<code><code> "method": "update",/<code><code> "args": [/<code><code> {"visible": [False] * len(fig.data)},/<code><code> {"annotations": [dict(annotation_dict) for _ in range(len(fig.data))]},/<code><code> ],/<code><code> "label": dates_list[i],/<code><code> }/<code>

<code># toggle the i'th trace and annotation box to visible/<code><code> step["args"][0]["visible"][i] = True/<code><code> step["args"][1]["annotations"][i]["visible"] = True/<code>

<code>df = cases_by_date[dates_list[i]]/<code><code> confirmed_cases = df["confirmed_cases"].sum/<code><code> deaths = df["deaths"].sum/<code><code> mortality_rate = deaths / confirmed_cases/<code><code> step["args"][1]["annotations"][i]["text"] = annotation_text_template.format(/<code><code> date=dates_list[i],/<code><code> confirmed_cases=confirmed_cases,/<code><code> deaths=deaths,/<code><code> mortality_rate=mortality_rate,/<code><code> )/<code>

<code>steps.append(step)/<code>

<code>sliders = [/<code><code> {/<code><code> "active": 0,/<code><code> "currentvalue": {"prefix": "Date: "},/<code><code> "steps": steps,/<code><code> "len": 0.9,/<code><code> "x": 0.05,/<code><code> }/<code><code>]/<code>

<code>first_annotation_dict = {**annotation_dict}/<code><code>first_annotation_dict.update(/<code><code> {/<code><code> "visible": True,/<code><code> "text": annotation_text_template.format(/<code><code> date="10/01/2020", confirmed_cases=44, deaths=1, mortality_rate=0.0227/<code><code> ),/<code><code> }/<code><code>)/<code><code>fig.layout.title = {"text": "COVID-19 Case Tracker", "x": 0.5}/<code><code>fig.update_layout(/<code><code> height=650,/<code><code> margin={"t": 50, "b": 20, "l": 20, "r": 20},/<code><code> annotations=[go.layout.Annotation(**first_annotation_dict)],/<code><code> sliders=sliders,/<code><code>)/<code><code>fig.data[0].visible = True # set the first data point visible/<code>

<code>fig/<code>

如何用Jupyter Notebook制作新冠病毒疫情追踪器?

https://mfreeborn.github.io/blog/2020/03/15/interactive-coronavirus-map-with-jupyter-notebook#Chart-1---A-Choropleth-Chart

【end】



有獎徵文


如何用Jupyter Notebook製作新冠病毒疫情追蹤器?
  • 超輕量級中文OCR,支持豎排文字識別、ncnn推理,總模型僅17M

  • 網紅直播時的瘦臉、磨皮等美顏功能是如何實現的?

  • 比特幣最主流,以太坊大跌,區塊鏈技術“萬金油”紅利已結束 | 區塊鏈開發者年度報告

  • 一文了解 Spring Boot 服務監控,健康檢查,線程信息,JVM堆信息,指標收集,運行情況監控!

  • 用 3 個“鴿子”,告訴你閃電網絡是怎樣改變加密消息傳遞方式的!

  • 出生小鎮、高考不順、復旦執教、闖蕩硅谷,59 歲陸奇為何如此“幸運”?

你點的每個“在看”,我都認真當成了AI


分享到:


相關文章: