Data & Charts to Answer Questions¶

Have cleaned the data, combined it together and done some exploratory analysis to answer the questions. Here is where I'll collate the answers and output the data ready for sending to the designers.

The questions Ineed to answer are below:

How have absence days due to stress/anxiety changed:
a. For all the services combined?
b. For each of the three services?
How has the number of absence episodes due to stress/anxiety changed:
a. For NBT/SWAS together?
b. For NBT/SWAS separately?
How has the average length of an anxiety episode changed (NBT & SWAS only)?
Which staff groups have the biggest rate of change (NBT & UHBT only)?
What are the top 5 absence reasons (excluding Maternity & other non-sickness absences) for NBT & SWAS?
What proportion of all sickness absences are due to stress & anxiety?

#Imports:
import pandas as pd
import plotly.offline as pyo
pyo.init_notebook_mode()

import nbformat
from nbconvert.preprocessors import ExecutePreprocessor

# Rerun all previous sections of the analysis to make sure that the data is up-to-date:

notebookFiles = ! ls
notebookFiles = notebookFiles[:-1]
notebookFiles

['1. NBT Absence Data.ipynb',
 '2. UHBT Absence Data.ipynb',
 '3. SWAS Absence Data.ipynb',
 '4. Combine Cleaned Data Sources For Stress & Anxiety Absence Data.ipynb',
 '5. Combine Cleaned Data Sources For All Absence Data.ipynb']

for file in notebookFiles:
    with open(file) as f:
        nb = nbformat.read(f, as_version=4)
        ep = ExecutePreprocessor(timeout=600, kernel_name='python3')
        ep.preprocess(nb, {'metadata': {'path': ''}})
        f.close()

CABLE_COLOURS_DICT = {
    'grey' : '#777',
    'green' : '#03AF6C',
    'red' : '#F56E56',
    'blue' : '#569BF5',
}

CABLE_COLOURS_LIST = [
    '#03AF6C',
    '#569BF5',
    '#F56E56',
    '#777',
]

NOGRID = {
    'showgrid' : False,
    'zeroline' : False,
    
}

# Datafiles:

dataLocs = {
    '1a' : r'../CleanData/FinalDataForPublication/Q1a - Stress-Anxiety-Absences-Per-FTE-All-Trusts-2011-2016.csv',
    '1b' : r'../CleanData/FinalDataForPublication/Q1b - Stress-Anxiety-Days-Per-FTE-By-Trust-2010-2016.csv',
    '2a' : r'../CleanData/FinalDataForPublication/Q2a - Stress-Anxiety-Episodes-Per-FTE-All-Trusts-2011-2016.csv',
    '2b' : r'../CleanData/FinalDataForPublication/Q2b - Stress-Anxiety-Episodes-Per-FTE-By-Trust-2010-2016.csv',
    '3' : r'../CleanData/FinalDataForPublication/Q3 - Average-Length-Of-Stress-Anxiety-Episode.csv',
    '4' : r'../CleanData/FinalDataForPublication/Q4 - Stress-Anxiety-Absences-By-Staff-Group.csv',
    '5' : r'../CleanData/FinalDataForPublication/Q5 - Top-5-Absence-Reasons-By-Number-Of-Absence-Days-Per-FTE-By-Year.csv',
    '6' : r'../CleanData/FinalDataForPublication/Q6 - Stress-Related-Absences-As-Proportion-Of-All-Sickness-Absences.csv'
}

Q1a How has the number of absences due to stress/anxiety per FTE changed for all the services combined?¶

Q1aData = pd.read_csv(dataLocs['1a'], index_col=0)
Q1aData

Q1aData.loc[Q1aData.Year >= 2013, 'Days Lost To Absence'].sum()

247935.80860000002

fig = {
    'data' : [{
        'type' : 'scatter',
        'mode' : 'markers+lines',
        'x' : Q1aData.Year,
        'y' : Q1aData.AbsenceDaysPerFTE,
        'marker' : {
            'color' : CABLE_COLOURS_DICT['green'],
        }
    }],
    'layout' : {
        'title' : 'Number of days lost to stress has increased since 2011',
        'annotations' : [{
            'text' : '<i>NBT & UHBT from 2011 onwards; SWAS from 2013 onwards</i>',
            'xref' : 'paper',
            'x' : 0,
            'yref' : 'paper',
            'y' : -0.15,
            'showarrow' : False,
            'xanchor' : 'left'
        }],
        'xaxis' : NOGRID,
        'yaxis' : NOGRID
    }
}

pyo.iplot(fig)

The number of days lost to stress/anxiety absences has risen from 2.3 days per FTE in 2011, to 3.2 days per FTE in 2016. The decrease in 2017 is likely due to incomplete data, as SWAS could only report absences for 3/4 of the financial year, and UHBT are missing a month of data from the same year.

The total number of days lost to stress-related absences since 2013 is 247,936.

fig = {
    'data' : [{
        'type' : 'scatter',
        'mode' : 'markers+lines',
        'x' : Q1aData.Year,
        'y' : Q1aData.pcChangeAbsenceDaysSince2011,
        'marker' : {
            'color' : CABLE_COLOURS_DICT['green'],
        }
    }],
    'layout' : {
        'title' : 'Number of days lost to stress has increased by 41%',
        'annotations' : [{
            'text' : '<i>NBT & UHBT from 2011 onwards; SWAS from 2013 onwards<br>2017 data for SWAS & UHBT adjusted for incomplete data for 2017</i>',
            'xref' : 'paper',
            'x' : 0,
            'yref' : 'paper',
            'y' : -0.1,
            'showarrow' : False,
            'xanchor' : 'left',
            'align' : 'left'
        },
        {
            'text' : 'From 27,700 days in 2011 to 52,600 days in 2017',
            'xref' : 'paper',
            'x' : 0.5,
            'yref' : 'paper',
            'y' : 1.075,
            'showarrow' : False,
        }],
    'yaxis' : {
        'tickformat' : '%'
        },
    'xaxis' : NOGRID
    }
}

fig['layout']['yaxis'].update(NOGRID)

pyo.iplot(fig)

pyo.plot(fig, filename='../Viz/Number of days lost to stress has increased by 41% since 2011.html')

'file:///Users/richardmuir/Data Analysis/Bristol Cable/Stress-NHS/Viz/Number of days lost to stress has increased by 41% since 2011.html'

Between 2011 and 2016, the number of days lost to anxiety/stress absences increased by nearly 40%. The sharp drop in 2017 is likely due to incomplete data - SWAS only reported absences for 3/4 of the 2017/17 financial year, while UHBT are missing a month of data from the same year

Q1b How has the number of absences due to stress/anxiety per FTE changed for each service?¶

Q1bData = pd.read_csv(dataLocs['1b'], index_col=0)
Q1bData

traces = []
for i, trust in enumerate(Q1bData.Trust.unique()):
    if trust == 'NBT':
        traces.append({
            'type' : 'scatter',
            'mode' : 'markers+lines',
            'x' : Q1bData.loc[Q1bData.Trust == trust , 'Year'],
            'y' : Q1bData.loc[Q1bData.Trust == trust , 'AbsenceDaysPerFTE'],
            'name' : trust,
            'marker' : {
                'color' : CABLE_COLOURS_LIST[i]
            }
        })
    else:
        # solid lines
        traces.append({
            'type' : 'scatter',
            'mode' : 'markers+lines',
            'x' : Q1bData.loc[(Q1bData.Trust == trust) & (Q1bData.Year <= 2016) , 'Year'],
            'y' : Q1bData.loc[(Q1bData.Trust == trust)  & (Q1bData.Year <= 2016), 'AbsenceDaysPerFTE'],
            'name' : trust,
            'marker' : {
                'color' : CABLE_COLOURS_LIST[i]
            },
            'legendgroup' : trust
        })
        # dashed lines
        traces.append({
            'type' : 'scatter',
            'mode' : 'markers+lines',
            'x' : Q1bData.loc[(Q1bData.Trust == trust) & (Q1bData.Year >= 2016) , 'Year'],
            'y' : Q1bData.loc[(Q1bData.Trust == trust)  & (Q1bData.Year >= 2016), 'AbsenceDaysPerFTE'],
            'name' : trust,
            'marker' : {
                'color' : CABLE_COLOURS_LIST[i],
            },
            'line' : {
                'dash' : 'dash'
            },
            'showlegend' : False,
            'legendgroup' : trust,
            'hoverinfo' : 'text+x',
            'text' : ['', '{:.0%}'.format(Q1bData.loc[(Q1bData.Trust == trust), 'AbsenceDaysPerFTE'].values[-1])]
        })


fig = {
    'data' : traces,
    'layout' : {
        'title' : 'Number of days lost to stress has increased for all three trusts',
        'xaxis' : NOGRID,
        'yaxis' : NOGRID,
        'annotations' : [{
            'text' : '<i>2017 data for SWAS & UHBT adjusted for incomplete data for 2017</i>',
            'xref' : 'paper',
            'x' : 0,
            'yref' : 'paper',
            'y' : -0.2,
            'showarrow' : False,
            'xanchor' : 'left',
            'align' : 'left'
        }]
    },
    
}


pyo.iplot(fig)

The number of absences lost to stress/anxiety stayed low for UHBT through 2011-2014, before increasing in 2015 & 2016. There has been a clear upwards trend in the number of absences lost to stress/anxiety in NBT, despite a fall in 2015 & 2016. The number of absences lost to stress/anxiety in SWAS has risen dramatically between 2013 and 2016, the drop in 2017 is due to incomplete data. SWAS only reported absences for 3/4 of the 2017/18 financial year, while UHBT are missing a month of data from the same year.

traces = []
for i, trust in enumerate(Q1bData.Trust.unique()):
    if trust == 'NBT':
        traces.append({
            'type' : 'scatter',
            'mode' : 'markers+lines',
            'x' : Q1bData.loc[Q1bData.Trust == trust , 'Year'],
            'y' : Q1bData.loc[Q1bData.Trust == trust , 'pcChangeAbsenceDaysSinceY1'],
            'name' : trust,
            'marker' : {
                'color' : CABLE_COLOURS_LIST[i]
            }
        })
    else:
        # solid lines
        traces.append({
            'type' : 'scatter',
            'mode' : 'markers+lines',
            'x' : Q1bData.loc[(Q1bData.Trust == trust) & (Q1bData.Year <= 2016) , 'Year'],
            'y' : Q1bData.loc[(Q1bData.Trust == trust)  & (Q1bData.Year <= 2016), 'pcChangeAbsenceDaysSinceY1'],
            'name' : trust,
            'marker' : {
                'color' : CABLE_COLOURS_LIST[i]
            },
            'legendgroup' : trust
        })
        # dashed lines
        traces.append({
            'type' : 'scatter',
            'mode' : 'markers+lines',
            'x' : Q1bData.loc[(Q1bData.Trust == trust) & (Q1bData.Year >= 2016) , 'Year'],
            'y' : Q1bData.loc[(Q1bData.Trust == trust)  & (Q1bData.Year >= 2016), 'pcChangeAbsenceDaysSinceY1'],
            'name' : trust,
            'marker' : {
                'color' : CABLE_COLOURS_LIST[i],
            },
            'line' : {
                'dash' : 'dash'
            },
            'showlegend' : False,
            'legendgroup' : trust,
            'hoverinfo' : 'text+x',
            'text' : ['', '{:.0%}'.format(Q1bData.loc[(Q1bData.Trust == trust), 'pcChangeAbsenceDaysSinceY1'].values[-1])]
        })

fig = {
    'data' : traces,
    'layout' : {
        'title' : 'Percentage change in number of days lost to stress has increased for all three trusts',
        'yaxis' : {
            'tickformat' : '%'
        },
        'annotations' : [{
            'text' : '<i>2017 data for SWAS & UHBT adjusted for incomplete data for 2017</i>',
            'xref' : 'paper',
            'x' : 0,
            'yref' : 'paper',
            'y' : -0.1,
            'showarrow' : False,
            'xanchor' : 'left',
            'align' : 'left'
        }],
        'xaxis' : NOGRID
    },
}

fig['layout']['yaxis'].update(NOGRID)
pyo.iplot(fig)

pyo.plot(fig, filename='../Viz/Number of days lost to stress has increased for all three trusts.html')

'file:///Users/richardmuir/Data Analysis/Bristol Cable/Stress-NHS/Viz/Number of days lost to stress has increased for all three trusts.html'

Between 2010 & 2016 the number of days lost to stress/anxiety absences in NBT has increased by 40%. Between 2011 & 2016 the number of days lost to stress/anxiety absences in UHBT has increased by 50%. Between 2013 & 2016 the number of days lost to stress/anxiety absences in SWAS has increased by more than 80%/has almost doubled. Sickness absences are under-reported in the 2017 data for UHBT and SWAS, accounting for at least some of the decrease.

Q2a How has the number of absence episodes due to stress/anxiety changed for NBT/SWAS together?¶

Q2aData = pd.read_csv(dataLocs['2a'], index_col=0)
Q2aData

fig = {
    'data' : [{
        'type' : 'scatter',
        'mode' : 'markers+lines',
        'x' : Q2aData.Year,
        'y' : Q2aData.EpisodesPerFTE,
        'marker' : {
            'color' : CABLE_COLOURS_DICT['green']
        }
    }],
    'layout' : {
        'title' : 'Stress/Anxiety episodes per FTE increased since 2013',
        'annotations' : [{
            'text' : '<i>Only NBT & SWAS. No episode data from UHBT</i>',
            'xref' : 'paper',
            'x' : 0,
            'yref' : 'paper',
            'y' : -0.15,
            'showarrow' : False,
            'xanchor' : 'left',
            'align' : 'left'
        }],
        'xaxis' : {
            'nticks' : 5
        },
        'yaxis' : NOGRID
    }
}

fig['layout']['xaxis'].update(NOGRID)

pyo.iplot(fig)

The number of episodes of stree/anxiety has increased sine 2013, when there was one absence episode per 15 FTE, to 2017, when there was one episode for every 10 FTE. An increase from 0.067 to 0.1 episodes per FTE. We expect the final number for 2017 to rise as more absences are reported for the 2017/18 financial year by SWAS and UHBT.

fig = {
    'data' : [{
        'type' : 'scatter',
        'mode' : 'markers+lines',
        'x' : Q2aData.Year,
        'y' : Q2aData.pcChangeAbsenceEpisodesSince2013,
        'marker' : {
            'color' : CABLE_COLOURS_DICT['blue']
        }
    }],
    'layout' : {
        'title' : 'Stress/Anxiety episodes per FTE increased by 72% since 2013',
        'annotations' : [{
            'text' : '<i>Only NBT & SWAS. No episode data from UHBT</i>',
            'xref' : 'paper',
            'x' : 0,
            'yref' : 'paper',
            'y' : -0.15,
            'showarrow' : False,
            'xanchor' : 'left',
            'align' : 'left'
        }],
        'xaxis' : {
            'nticks' : 4
        },
        'yaxis' : {
            'tickformat' : '%'
        }
    }
}

fig['layout']['yaxis'].update(NOGRID)
fig['layout']['xaxis'].update(NOGRID)

pyo.iplot(fig)

The number of absence episodes per FTE has increased by almost 60% since 2013. We expect the final increase in absences for 2017 to rise as more absences are reported for the 2017/18 financial year by SWAS and UHBT.

Q2b How has the number of absence episodes due to stress/anxiety changed for each of NBT/SWAS ?¶

Q2bData = pd.read_csv(dataLocs['2b'], index_col=0)
Q2bData

traces = []
for i, trust in enumerate(Q2bData.Trust.unique()):
    if trust == 'NBT' : 
        traces.append({
            'type' : 'scatter',
            'mode' : 'markers+lines',
            'x' : Q2bData.loc[Q2bData.Trust == trust , 'Year'],
            'y' : Q2bData.loc[Q2bData.Trust == trust , 'EpisodesPerFTE'],
            'name' : trust,
            'marker' : {
                'color' : CABLE_COLOURS_LIST[i]
            }
        })
    else:
        # solid lines
        traces.append({
            'type' : 'scatter',
            'mode' : 'markers+lines',
            'x' : Q2bData.loc[(Q2bData.Trust == trust) & (Q2bData.Year <= 2016) , 'Year'],
            'y' : Q2bData.loc[(Q2bData.Trust == trust)  & (Q2bData.Year <= 2016), 'EpisodesPerFTE'],
            'name' : trust,
            'marker' : {
                'color' : CABLE_COLOURS_LIST[i]
            },
            'legendgroup' : trust,
        })
        # dashed lines
        traces.append({
            'type' : 'scatter',
            'mode' : 'markers+lines',
            'x' : Q2bData.loc[(Q2bData.Trust == trust) & (Q2bData.Year >= 2016) , 'Year'],
            'y' : Q2bData.loc[(Q2bData.Trust == trust)  & (Q2bData.Year >= 2016), 'EpisodesPerFTE'],
            'name' : trust,
            'marker' : {
                'color' : CABLE_COLOURS_LIST[i],
            },
            'line' : {
                'dash' : 'dash'
            },
            'showlegend' : False,
            'legendgroup' : trust,
            'hoverinfo' : 'text+x',
            'text' : ['', '{:.2f}'.format(Q2bData.loc[(Q2bData.Trust == trust), 'EpisodesPerFTE'].values[-1])]
        })

fig = {
    'data' : traces,
    'layout' : {
        'title' : 'Number of stress/anxiety absence episodes per FTE by trust',
        'annotations' : [{
            'text' : '<i>Only NBT & SWAS. No episode data from UHBT<br>2017 data for SWAS adjusted for incomplete data for 2017</i>',
            'xref' : 'paper',
            'x' : 0,
            'yref' : 'paper',
            'y' : -0.15,
            'showarrow' : False,
            'xanchor' : 'left',
            'align' : 'left',
        }],
        'xaxis' : {
            'nticks' : 8
        },
        'yaxis' : {
            'tickformat' : '.2f'
        }
    }
}

fig['layout']['xaxis'].update(NOGRID)
fig['layout']['yaxis'].update(NOGRID)

pyo.iplot(fig)

For NBT, the number of absence episodes per FTE due to stress/anxiety increased from 0.051 in 2010 to 0.09 in 2016. It changed from approximately 1 episodes for every 20 FTE to 1 for every 11 FTE.

For SWAS, there were 0.054 episodes per FTE in 2013, increasing to 0.14 episodes per FTE in 2016. This is a change from 1 episodes for every 19 FTE to 1 episode for every 7 FTE. The drop in 2017 is due to incomplete data.

traces = []
for i, trust in enumerate(Q2bData.Trust.unique()):
    if trust == 'NBT' : 
        traces.append({
            'type' : 'scatter',
            'mode' : 'markers+lines',
            'x' : Q2bData.loc[Q2bData.Trust == trust , 'Year'],
            'y' : Q2bData.loc[Q2bData.Trust == trust , 'pcChangeAbsenceEpisodesSinceY1'],
            'name' : trust,
            'marker' : {
                'color' : CABLE_COLOURS_LIST[i]
            }
        })
    else:
        # solid lines
        traces.append({
            'type' : 'scatter',
            'mode' : 'markers+lines',
            'x' : Q2bData.loc[(Q2bData.Trust == trust) & (Q2bData.Year <= 2016) , 'Year'],
            'y' : Q2bData.loc[(Q2bData.Trust == trust)  & (Q2bData.Year <= 2016), 'pcChangeAbsenceEpisodesSinceY1'],
            'name' : trust,
            'marker' : {
                'color' : CABLE_COLOURS_LIST[i]
            },
            'legendgroup' : trust,
        })
        # dashed lines
        traces.append({
            'type' : 'scatter',
            'mode' : 'markers+lines',
            'x' : Q2bData.loc[(Q2bData.Trust == trust) & (Q2bData.Year >= 2016) , 'Year'],
            'y' : Q2bData.loc[(Q2bData.Trust == trust)  & (Q2bData.Year >= 2016), 'pcChangeAbsenceEpisodesSinceY1'],
            'name' : trust,
            'marker' : {
                'color' : CABLE_COLOURS_LIST[i],
            },
            'line' : {
                'dash' : 'dash'
            },
            'showlegend' : False,
            'legendgroup' : trust,
            'hoverinfo' : 'text+x',
            'text' : ['', '{:.0%}'.format(Q2bData.loc[(Q2bData.Trust == trust), 'pcChangeAbsenceEpisodesSinceY1'].values[-1])]
        })

fig = {
    'data' : traces,
    'layout' : {
        'title' : 'Stress anxiety absence episodes increased for NBT & SWAS',
        'annotations' : [{
            'text' : '<i>Only NBT & SWAS. No episode data from UHBT<br>2017 data for SWAS adjusted for incomplete data for 2017</i>',
            'xref' : 'paper',
            'x' : 0,
            'yref' : 'paper',
            'y' : -0.15,
            'showarrow' : False,
            'xanchor' : 'left',
            'align' : 'left',
        }],
        'xaxis' : {
            'nticks' : 8
        },
        'yaxis' : {
            'tickformat' : '%'
        }
    }
}

fig['layout']['xaxis'].update(NOGRID)
fig['layout']['yaxis'].update(NOGRID)

pyo.iplot(fig)

For NBT there was a 70% increase in the number of absence episodes due to anxiety/stress. For SWAS the number of episodes increased by more than 150%/almost tripled between 2013 and 2016. The number of episodes per FTE doubled between 2013 and 2014.

Q3 How has the average length of an anxiety episode changed (NBT & SWAS only)?¶

Q3Data = pd.read_csv(dataLocs['3'], index_col=0)
Q3Data

fig = {
    'data' : [{
        'type' : 'scatter',
        'mode' : 'markers+lines',
        'x' : Q3Data.Year,
        'y' : Q3Data.averageLengthOfEpisodeInDays,
        'marker' : {
            'color' : CABLE_COLOURS_DICT['red']
        }
    }],
    'layout' : {
        'title' : 'Average length in days of stress-related absence episodes decreasing',
        'annotations' : [{
            'text' : '<i>Only NBT & SWAS. No episode data from UHBT</i>',
            'xref' : 'paper',
            'x' : 0,
            'yref' : 'paper',
            'y' : -0.15,
            'showarrow' : False,
            'xanchor' : 'left'
        }],
        'xaxis' : {
            'nticks' : 5
        },
        'yaxis' : NOGRID
    }
}

fig['layout']['xaxis'].update(NOGRID)

pyo.iplot(fig)

The length of an absence episode has decreased since 2013.

Q4 Which staff groups have the biggest rate of change (NBT & UHBT only)?¶

Q4Data = pd.read_csv(dataLocs['4'], index_col=0)
Q4Data.head()

traces = []
for sg in Q4Data['Staff Group'].unique():
    dfsg = Q4Data[Q4Data['Staff Group'] == sg]
    traces.append({
        'type' : 'scatter', 
        'mode' : 'markers+lines',
        'x' : dfsg.Year,
        'y' : dfsg['AbsenceDaysPerFTE'],
        'name' : sg
    })
    
    
fig = {
    'data' : traces,
    'layout' : {
        'title' : 'Number of days lost per FTE to stress/anxiety absence by Staff Group',
        'annotations' : [{
            'text' : '<i>Only NBT & UHBT. No staff group data from SWAS</i>',
            'xref' : 'paper',
            'x' : 0,
            'yref' : 'paper',
            'y' : -0.15,
            'showarrow' : False,
            'xanchor' : 'left'
        }],
        'xaxis' : {
            'tickvals' : [2011, 2016],
            'ticktext' : [2011, 2016]
        },
        'yaxis' : NOGRID
    }
}

fig['layout']['xaxis'].update(NOGRID)

pyo.iplot(fig)

Most Staff groups saw an increase in the number of absence days lost.

traces = []
for sg in Q4Data['Staff Group'].unique():
    dfsg = Q4Data[Q4Data['Staff Group'] == sg]
    traces.append({
        'type' : 'scatter', 
        'mode' : 'markers+lines',
        'x' : dfsg.Year,
        'y' : dfsg['pcChangeAbsenceDaysSinceY1'],
        'name' : sg
    })
    
    
fig = {
    'data' : traces,
    'layout' : {
        'title' : 'Percent change in number of stress/anxiety absence episodes by Staff Group',
        'annotations' : [{
            'text' : '<i>Only NBT & UHBT. No staff group data from SWAS</i>',
            'xref' : 'paper',
            'x' : 0,
            'yref' : 'paper',
            'y' : -0.15,
            'showarrow' : False,
            'xanchor' : 'left'
        }],
        'xaxis' : {
            'tickvals' : [2011, 2016],
            'ticktext' : [2011, 2016]
        },
        'yaxis' : {
            'tickformat' : '%'
        }
    }
}

fig['layout']['xaxis'].update(NOGRID)
fig['layout']['yaxis'].update(NOGRID)

pyo.iplot(fig)

Healthcare Scientists had the biggest increase of 122% between 2011 and 2016. Going from 1.7 stress-related absence days per FTE in 2011 to 3.7 in 2016. Administrative & Clerical saw the next biggest increase at 56%. Only Additional Clinical Services saw a decrease in the number of stress-related absence days.

Q5a What are the top 5 absence reasons (excluding Maternity & other non-sickness absences) for NBT & SWAS, and how has the number of days absent per FTE changed?¶

Q5Data = pd.read_csv(dataLocs['5'], index_col=0)
Q5Data.head()

traces = []
for rsn in Q5Data['Absence Reason'].unique():
    tdf = Q5Data[Q5Data['Absence Reason'] == rsn]
    traces.append({
        'type' : 'scatter',
        'mode' : 'markers+lines',
        'x' : tdf.Year,
        'y' : tdf['Absence Days Per FTE'],
        'name' : rsn
    })
    
fig = {
    'data' : traces,
    'layout' : {
        'title' : 'Number of days per FTE lost to absence for top 5 absence reasons each year',
        'annotations' : [{
            'text' : '<i>Only NBT & SWAS. No absence reason data from UHBT</i><br>Absences for maternity have been excluded',
            'xref' : 'paper',
            'x' : 0,
            'yref' : 'paper',
            'y' : -0.15,
            'showarrow' : False,
            'xanchor' : 'left',
            'align' : 'left'
        }],
        'xaxis' : {
            'nticks' : 5
        },
        'yaxis' : NOGRID
    }
}
fig['layout']['xaxis'].update(NOGRID)

pyo.iplot(fig)

Since 2014, anxiety/Stress has been the absence category with the largest number of absence days, accounting for more days per FTE lost than even the catch-all categories of 'Not Specified' and 'Other'. We expect the total absences for 2017 to increase as SWAS finalise their data.

6. What proportion of all sickness absences are due to stress & anxiety?¶

Q6Data = pd.read_csv(dataLocs['6'], index_col=0)
Q6Data.head()

traces = []
for i, trust in enumerate(Q6Data.Trust.unique()):
    tdf = Q6Data[(Q6Data.Trust == trust) & 
                            (Q6Data['Absence Reason'] == 'Anxiety/Stress')]
    traces.append({
        'type' : 'scatter',
        'mode' : 'markers+lines',
        'x' : tdf.Year,
        'y' : tdf.propOfDays,
        'name' : trust,
        'marker' : {
            'color' : CABLE_COLOURS_LIST[i]
        }
    })

fig = {
    'data' : traces,
    'layout' : {
        'title' : 'Days lost to stress/anxiety as a proportion of all days lost to sickness',
        'annotations' : [{
            'text' : '<i>Only NBT & SWAS. No absence reason data from UHBT<br>Absences for maternity & other non-sickness absences have been excluded</i>',
            'xref' : 'paper',
            'x' : 0,
            'yref' : 'paper',
            'y' : -0.2,
            'showarrow' : False,
            'xanchor' : 'left',
            'align' : 'left'
        }],
        'xaxis' : {
            'nticks' : 8
        },
        'yaxis' : {
            'tickformat' : '%'
        }
    }
}

fig['layout']['xaxis'].update(NOGRID)
fig['layout']['yaxis'].update(NOGRID)

pyo.iplot(fig)

For NBT, the proportion of days lost to stress/anxiety peaked at 17% in 2014. For SWAS, the number has risen sharply from 14% in 2013 to 24% in 2016. 1 in 4 sickness days for SWAS staff was due to stress/anxiety. In contrast to the comparison with FTE, these figures are representative as they show the proportion of total absences.

Methodology & Caveats¶

Introduction¶

The data used in this analysis is from 6 different sources. Several assumptions & exclusions had to be made in order to make the data suitable for analysis, as such, some inaccuracies will exist.

The following outlines the methods used to extract & clean the data and the assumptions & exclusions made when combining them.

The data comes from releases under the Freedom of Information Act and open data published by the following organisations: North Bristol Trust (NBT), University Hospitals Bristol Trust (UHBT) and South West Ambulance Service (SWAS).

FOI Data¶

Each of these trusts responded to an FOI request for absence data relating to stress & anxiety. Though the request was the same to each organisation, the level of detail and the format of the response differed for each trust:

North Bristol Trust¶

Provided data from 2010-2017 in calendar years.
Provided data on the absence reason (including absences other than stress/anxiety) and the staff group.
Provided data on the number of episodes for each absence type, as well as the number of days (an episode can consist of multiple days).

University Hospitals Bristol Trust¶

Provided month-level data from April 2011 to December 2017.
Provided the number of absence days due to stress & anxiety split by staff group.

South West Ambulance Service¶

Provided data which lists each absence between April 2013 & 6th January 2018. The start and end dates are shown for each absence, which allows us to calculate the number of days and the number of episodes.
Information about the absence reason and department was also given.

Publicly Available Data¶

In order to account for the changing staffing levels, publicly available information on the number of staff each trust employs was added to the analysis. This information is supplied as Full Time Equivalent (FTE) numbers. Further differences exist in this data:

FTE data is available for the NBT for calendar years 2010-2016 at a staff group level. Also for 2017 at Trust level.

FTE data is available for the UHBT for financial years 2010/11 - 2016/17 at a staff group level. Also until Feb 2018 at Trust level.

FTE data is available for the SWAS for financial years 2013/14 - 2016/17 at a staff group level (data available beforehand, but FOI is only from 2013). Also as of January 2018 at Trust-level.

Data Differences¶

The table below shows how the data differs superficially for each source:

from IPython.display import HTML
HTML("""
<style type="text/css">
.tg  {border-collapse:collapse;border-spacing:0;border-color:#ccc;}
.tg td{font-family:Arial, sans-serif;font-size:14px;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;border-color:#ccc;color:#333;background-color:#fff;}
.tg th{font-family:Arial, sans-serif;font-size:14px;font-weight:normal;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;border-color:#ccc;color:#333;background-color:#f0f0f0;}
.tg .tg-lqy6{text-align:right;vertical-align:top}
.tg .tg-i9x5{background-color:#656565;color:#ffffff;vertical-align:top}
.tg .tg-b7b8{background-color:#f9f9f9;vertical-align:top}
.tg .tg-p5oz{background-color:#f9f9f9;text-align:right;vertical-align:top}
.tg .tg-yw4l{vertical-align:top}
</style>
<table class="tg">
  <tr>
    <th class="tg-i9x5">Source</th>
    <th class="tg-i9x5">Dates Available</th>
    <th class="tg-i9x5">Date Aggregation</th>
    <th class="tg-i9x5">Staff Group</th>
    <th class="tg-i9x5">Absence Reason</th>
    <th class="tg-i9x5">Number of days</th>
    <th class="tg-i9x5">Number of Episodes</th>
    <th class="tg-i9x5">FTE </th>
  </tr>
  <tr>
    <td class="tg-b7b8">NBT FOI</td>
    <td class="tg-b7b8">2010-2017</td>
    <td class="tg-b7b8">Calendar Year</td>
    <td class="tg-p5oz">Yes</td>
    <td class="tg-p5oz">Yes</td>
    <td class="tg-p5oz">Yes</td>
    <td class="tg-p5oz">Yes</td>
    <td class="tg-b7b8">N/A</td>
  </tr>
  <tr>
    <td class="tg-yw4l">NBT Accounts</td>
    <td class="tg-yw4l">2010-2016 (2017 from press office)</td>
    <td class="tg-yw4l">Calendar Year</td>
    <td class="tg-lqy6">Yes (not for 2017)</td>
    <td class="tg-lqy6">N/A</td>
    <td class="tg-lqy6">N/A</td>
    <td class="tg-lqy6">N/A</td>
    <td class="tg-yw4l">Yes</td>
  </tr>
  <tr>
    <td class="tg-b7b8">UHBT FOI</td>
    <td class="tg-b7b8">Apr '11 - Dec 17</td>
    <td class="tg-b7b8">Months</td>
    <td class="tg-p5oz">Yes</td>
    <td class="tg-p5oz">No</td>
    <td class="tg-p5oz">Yes</td>
    <td class="tg-p5oz">No</td>
    <td class="tg-b7b8">N/A</td>
  </tr>
  <tr>
    <td class="tg-yw4l">UHBT Accounts</td>
    <td class="tg-yw4l">2010/11 - 2016/17 (2017/18 from press office)</td>
    <td class="tg-yw4l">Financial Year</td>
    <td class="tg-lqy6">Yes</td>
    <td class="tg-lqy6">N/A</td>
    <td class="tg-lqy6">N/A</td>
    <td class="tg-lqy6">N/A</td>
    <td class="tg-yw4l">Yes</td>
  </tr>
  <tr>
    <td class="tg-b7b8">SWAS FOI</td>
    <td class="tg-b7b8">Apr '13 - Dec '17</td>
    <td class="tg-b7b8">None (individual absences provided)</td>
    <td class="tg-p5oz">Department</td>
    <td class="tg-p5oz">Yes</td>
    <td class="tg-p5oz">Yes</td>
    <td class="tg-p5oz">Yes</td>
    <td class="tg-b7b8">N/A</td>
  </tr>
  <tr>
    <td class="tg-yw4l">SWAS Accounts</td>
    <td class="tg-yw4l">2013/14 - 2016/17 (2017/18 from press office)</td>
    <td class="tg-yw4l">Financial Year</td>
    <td class="tg-lqy6">Yes</td>
    <td class="tg-lqy6">N/A</td>
    <td class="tg-lqy6">N/A</td>
    <td class="tg-lqy6">N/A</td>
    <td class="tg-yw4l">Yes</td>
  </tr>
</table>
""")

In addition to the superficial differences, the main other data integrity problem was inconsistent reporting in staff group and department.

For UHBT, the staff groups reported on in the FOI request could not be accurately matched to the staff groups provided in the publicly available FTE data due to inconsistent terminology.

For these cases, staff groups were only matched where there is a positive match.

For SWAS, the departments reported on in the FOI request did not correspond at all to the staff groups provided in the publicly available FTE data. Therefore, the SWAS data was not disaggregated to staff group. Instead only a Trust-level and Absence Reason-level data analysis was provided.

The fundamental differences in the data can be summarised as follows:

Reporting at both calendar year & financial year
Difficulty mapping staff groups in UHBT FOI data to staff groups in SWAS accounts
No accurate mapping for departments in SWAS FOI data to staff groups in SWAS accounts
No absence reason (other than those for Stress & Anxiety) data for UHBT
No episode data for UHBT

Accounting and compensating for the differences¶

Because other absence data was not provided by UHBT, absences due to stress and anxiety as compared to other absence reasons was not reported for this trust, instead only being reported for NBT & SWAS.

In order to provide meaningful insight into the data, a compensation was made to adjust for the discrepancy in reporting for Financial & Calendar years:

For both UHBT & SWAS, the month-level FOI data was mapped directly to the financial years provided in the publically available FTE data.
These financial years were then 'converted' into calendar years, accepting that there would be a loss in accuracy.
- For example, the Financial Year 2011/12 (covering the months from April 2011 to March 2012 inclusive) would be mapped to the 2011 calendar year.
This means that four months in each financial year would be misattributed to the corresponding calendar year.

In order to give the most up-to-date picture of stress-related absences, the Trust-level FTE data is used in place of the staff group data where possible, despite not aligning perfectly.

Absences per FTE for UHBT & SWAS will be understated as there is a comparison of 9 months of absence data with staffing levels for a whole year. In order to provide the most up-to-date figures possible, it was necessary to adjust the FTE data for these trusts:

UHBT supplied data until the end of December 2017. This means that for the 2017/18 financial year there are 3 months of absence data missing.
SWAS supplied data until the 6th January 2018. This means that for the 2017/18 financial year there are ~3 months of absence data missing.
The best option to retain data integrity whilst allowing for comparisons is to reduce the total number of FTE by one quarter in order to bring the FTE down to the same period of data as the absences were provided, though this is imperfect.
In this analysis, we're using FTE as a proxy for the number of days worked in a year, however FTE is actually the measure of staffing levels at a given point in time. By reducing FTE by 1/4, we are intentionally mis-stating the actual number of staff employed, but doing so in a way that allows us to make an comparison with the provided sickness & absence data.

It must be noted that a mistake in this process led to the number of stress-related absences per FTE being overstated for SWAS in the April 2018 print edition of The Bristol Cable. These figures were rectified in the online copy and a correction will be published in the following edition.

Other steps taken to clean & align the data¶

The staff groups did not directly match between and within data supplied by NBT & UHBT. This was easily fixed through text replacement. Stress & Anxiety absences were coded slightly differently. This was easily fixed through text replacement.

There were a significant quantity of Absence Reasons that were very closely aligned. These were recoded in order to reduce the number of categories. There were also slight differences in the way similar absence types were recorded by SWAS & NBT. These differences were manually aligned. All code used to align the absence reasons is available on Gitlab.

The absence data for UHBT was censored if the number of absence in a month/staff group combination was below 5. This was done to protect individual staff members. There were 9 such cases where this happened. For these cases, the average of two days was imputed. This was calculated as (1 + 2 + 3 + 4) / 4 = 2.5, rounded down to 2.

When calculating the top absence reasons we excluded maternity, leave and jury service and others, as we were purely looking at illness related absences.

The full list of exclusions is available on Gitlab.

Quality Control¶

SWAS data was QC'd by creating a pivot table on the source data, then summing the number of days absent and counting the number of records. The absence rates per FTE were then calculated by hand for a few data points.

NBT Data was QC'd by checking the summed absence days after coding against the totals provided in the source data, then calculating the absence rates by hand for a few data points.

UHBT Data was QC'd against the raw data and several data points were checked.

HTML("""<script>
  function code_toggle() {
    if (code_shown){
      $('div.input').hide('500');
      $('#toggleButton').val('Show Code')
    } else {
      $('div.input').show('500');
      $('#toggleButton').val('Hide Code')
    }
    code_shown = !code_shown
  }

  $( document ).ready(function(){
    code_shown=false;
    $('div.input').hide()
  });
</script>
<form action="javascript:code_toggle()"><input type="submit" id="toggleButton" value="Show Code"></form>""")

	Year	Number Of Staff (FTE)	Days Lost To Absence	AbsenceDaysPerFTE	pcChangeAbsenceDaysSince2011
0	2011	12045.3900	27729.4421	2.302079	0.000000
1	2012	11940.0000	31107.0179	2.605278	0.131706
2	2013	16127.0000	40079.1157	2.485218	0.079554
3	2014	16403.0000	48090.5273	2.931813	0.273550
4	2015	17177.0000	53923.7316	3.139299	0.363680
5	2016	16742.0000	53192.3499	3.177180	0.380135
6	2017	16188.9825	52650.0841	3.252217	0.412730

	Trust	Year	Number Of Staff (FTE)	Days Lost To Absence	AbsenceDaysPerFTE	pcChangeAbsenceDaysSinceY1
0	NBT	2010	7260.7000	16494.0000	2.271682	0.000000
1	NBT	2011	7346.3900	21263.0000	2.894347	0.274099
2	NBT	2012	7317.0000	23794.0000	3.251879	0.431485
3	NBT	2013	7263.0000	22119.0000	3.045436	0.340608
4	NBT	2014	7206.0000	26427.0000	3.667361	0.614381
5	NBT	2015	7715.0000	26993.0000	3.498769	0.540167
6	NBT	2016	7126.0000	22672.0000	3.181589	0.400543
7	NBT	2017	7182.0000	25640.0000	3.570036	0.571539
8	SWAS	2013	4073.0000	10309.0000	2.531058	0.000000
9	SWAS	2014	4072.0000	13629.0000	3.347004	0.322373
10	SWAS	2015	4184.0000	17740.0000	4.239962	0.675174
11	SWAS	2016	4180.0000	19307.0000	4.618900	0.824889
12	SWAS	2017	2948.8575	12869.0000	4.364063	0.724205
13	UHBT	2011	4699.0000	6466.4421	1.376132	0.000000
14	UHBT	2012	4623.0000	7313.0179	1.581877	0.149510
15	UHBT	2013	4791.0000	7651.1157	1.596977	0.160483
16	UHBT	2014	5125.0000	8034.5273	1.567713	0.139217
17	UHBT	2015	5278.0000	9190.7316	1.741328	0.265379
18	UHBT	2016	5436.0000	11213.3499	2.062794	0.498980
19	UHBT	2017	6058.1250	14141.0841	2.334234	0.696229

	Year	Number Of Staff (FTE)	No. of Episodes	EpisodesPerFTE	pcChangeAbsenceEpisodesSince2013
0	2013	11336.0000	757.0	0.066778	0.000000
1	2014	11278.0000	1069.0	0.094786	0.419416
2	2015	11899.0000	1132.0	0.095134	0.424623
3	2016	11306.0000	1189.0	0.105165	0.574841
4	2017	10130.8575	1163.0	0.114798	0.719085

	Trust	Year	Number Of Staff (FTE)	No. of Episodes	EpisodesPerFTE	pcChangeAbsenceEpisodesSinceY1
0	NBT	2010	7260.7000	372.0	0.051235	0.000000
1	NBT	2011	7346.3900	453.0	0.061663	0.203538
2	NBT	2012	7317.0000	609.0	0.083231	0.624500
3	NBT	2013	7263.0000	536.0	0.073799	0.440404
4	NBT	2014	7206.0000	613.0	0.085068	0.660358
5	NBT	2015	7715.0000	647.0	0.083863	0.636831
6	NBT	2016	7126.0000	617.0	0.086584	0.689954
7	NBT	2017	7182.0000	665.0	0.092593	0.807223
8	SWAS	2013	4073.0000	221.0	0.054260	0.000000
9	SWAS	2014	4072.0000	456.0	0.111984	1.063855
10	SWAS	2015	4184.0000	485.0	0.115918	1.136349
11	SWAS	2016	4180.0000	572.0	0.136842	1.521981
12	SWAS	2017	2948.8575	498.0	0.168879	2.112416

	Year	Days Lost To Absence	No. of Episodes	averageLengthOfEpisodeInDays
0	2013	32428.0	757.0	42.837517
1	2014	40056.0	1069.0	37.470533
2	2015	44733.0	1132.0	39.516784
3	2016	41979.0	1189.0	35.306140
4	2017	38509.0	1163.0	33.111780

	Staff Group	Year	Number Of Staff (FTE)	Days Lost To Absence	No. of Episodes	AbsenceDaysPerFTE	EpisodesPerFTE	pcChangeAbsenceDaysSinceY1	pcChangeAbsenceEpisodesSinceY1
0	Add Prof Scientific and Technic	2011	1335.03	1336.8308	9.0	1.001349	0.006741	0.000000	0.000000
5	Add Prof Scientific and Technic	2016	1369.00	1909.9852	16.0	1.395168	0.011687	0.393289	0.733664
6	Additional Clinical Services	2011	1155.61	5148.0000	125.0	4.454790	0.108168	0.000000	0.000000
11	Additional Clinical Services	2016	1300.00	5257.0000	175.0	4.043846	0.134615	-0.092248	0.244503
12	Administrative and Clerical	2011	1480.15	4008.0000	99.0	2.707834	0.066885	0.000000	0.000000

	Year	Absence Reason	Number of Staff In Year (FTE)	Days Lost To Absence	Absence Days Per FTE	Rank
170	2013	Not Specified	11336.0	35066.0	3.093331	1
141	2013	Anxiety/Stress	11336.0	32428.0	2.860621	2
168	2013	Musculo-skeletal	11336.0	30695.0	2.707745	3
171	2013	Other	11336.0	30376.0	2.679605	4
162	2013	Injury, Fracture	11336.0	18978.0	1.674135	5

	Year	Trust	Absence Reason	Days Lost To Absence	totalDays	propOfDays
0	2010	NBT	Anxiety/Stress	16494.0	147228.0	0.112030
1	2010	NBT	Other	130734.0	147228.0	0.887970
2	2011	NBT	Anxiety/Stress	21263.0	144610.0	0.147037
3	2011	NBT	Other	123347.0	144610.0	0.852963
4	2012	NBT	Anxiety/Stress	23794.0	151438.0	0.157120

Source	Dates Available	Date Aggregation	Staff Group	Absence Reason	Number of days	Number of Episodes	FTE
NBT FOI	2010-2017	Calendar Year	Yes	Yes	Yes	Yes	N/A
NBT Accounts	2010-2016 (2017 from press office)	Calendar Year	Yes (not for 2017)	N/A	N/A	N/A	Yes
UHBT FOI	Apr '11 - Dec 17	Months	Yes	No	Yes	No	N/A
UHBT Accounts	2010/11 - 2016/17 (2017/18 from press office)	Financial Year	Yes	N/A	N/A	N/A	Yes
SWAS FOI	Apr '13 - Dec '17	None (individual absences provided)	Department	Yes	Yes	Yes	N/A
SWAS Accounts	2013/14 - 2016/17 (2017/18 from press office)	Financial Year	Yes	N/A	N/A	N/A	Yes