Data & Charts to Answer Questions

Have cleaned the data, combined it together and done some exploratory analysis to answer the questions. Here is where I'll collate the answers and output the data ready for sending to the designers.

The questions Ineed to answer are below:

  1. How have absence days due to stress/anxiety changed:
    a. For all the services combined?
    b. For each of the three services?
  2. How has the number of absence episodes due to stress/anxiety changed:
    a. For NBT/SWAS together?
    b. For NBT/SWAS separately?
  3. How has the average length of an anxiety episode changed (NBT & SWAS only)?
  4. Which staff groups have the biggest rate of change (NBT & UHBT only)?
  5. What are the top 5 absence reasons (excluding Maternity & other non-sickness absences) for NBT & SWAS?
  6. What proportion of all sickness absences are due to stress & anxiety?
In [2]:
#Imports:
import pandas as pd
import plotly.offline as pyo
pyo.init_notebook_mode()

import nbformat
from nbconvert.preprocessors import ExecutePreprocessor
In [6]:
# Rerun all previous sections of the analysis to make sure that the data is up-to-date:

notebookFiles = ! ls
notebookFiles = notebookFiles[:-1]
notebookFiles
Out[6]:
['1. NBT Absence Data.ipynb',
 '2. UHBT Absence Data.ipynb',
 '3. SWAS Absence Data.ipynb',
 '4. Combine Cleaned Data Sources For Stress & Anxiety Absence Data.ipynb',
 '5. Combine Cleaned Data Sources For All Absence Data.ipynb']
In [8]:
for file in notebookFiles:
    with open(file) as f:
        nb = nbformat.read(f, as_version=4)
        ep = ExecutePreprocessor(timeout=600, kernel_name='python3')
        ep.preprocess(nb, {'metadata': {'path': ''}})
        f.close()
In [3]:
CABLE_COLOURS_DICT = {
    'grey' : '#777',
    'green' : '#03AF6C',
    'red' : '#F56E56',
    'blue' : '#569BF5',
}

CABLE_COLOURS_LIST = [
    '#03AF6C',
    '#569BF5',
    '#F56E56',
    '#777',
]

NOGRID = {
    'showgrid' : False,
    'zeroline' : False,
    
}
In [4]:
# Datafiles:

dataLocs = {
    '1a' : r'../CleanData/FinalDataForPublication/Q1a - Stress-Anxiety-Absences-Per-FTE-All-Trusts-2011-2016.csv',
    '1b' : r'../CleanData/FinalDataForPublication/Q1b - Stress-Anxiety-Days-Per-FTE-By-Trust-2010-2016.csv',
    '2a' : r'../CleanData/FinalDataForPublication/Q2a - Stress-Anxiety-Episodes-Per-FTE-All-Trusts-2011-2016.csv',
    '2b' : r'../CleanData/FinalDataForPublication/Q2b - Stress-Anxiety-Episodes-Per-FTE-By-Trust-2010-2016.csv',
    '3' : r'../CleanData/FinalDataForPublication/Q3 - Average-Length-Of-Stress-Anxiety-Episode.csv',
    '4' : r'../CleanData/FinalDataForPublication/Q4 - Stress-Anxiety-Absences-By-Staff-Group.csv',
    '5' : r'../CleanData/FinalDataForPublication/Q5 - Top-5-Absence-Reasons-By-Number-Of-Absence-Days-Per-FTE-By-Year.csv',
    '6' : r'../CleanData/FinalDataForPublication/Q6 - Stress-Related-Absences-As-Proportion-Of-All-Sickness-Absences.csv'
}

Q1a How has the number of absences due to stress/anxiety per FTE changed for all the services combined?

In [5]:
Q1aData = pd.read_csv(dataLocs['1a'], index_col=0)
Q1aData
Out[5]:
Year Number Of Staff (FTE) Days Lost To Absence AbsenceDaysPerFTE pcChangeAbsenceDaysSince2011
0 2011 12045.3900 27729.4421 2.302079 0.000000
1 2012 11940.0000 31107.0179 2.605278 0.131706
2 2013 16127.0000 40079.1157 2.485218 0.079554
3 2014 16403.0000 48090.5273 2.931813 0.273550
4 2015 17177.0000 53923.7316 3.139299 0.363680
5 2016 16742.0000 53192.3499 3.177180 0.380135
6 2017 16188.9825 52650.0841 3.252217 0.412730
In [6]:
Q1aData.loc[Q1aData.Year >= 2013, 'Days Lost To Absence'].sum()
Out[6]:
247935.80860000002
In [7]:
fig = {
    'data' : [{
        'type' : 'scatter',
        'mode' : 'markers+lines',
        'x' : Q1aData.Year,
        'y' : Q1aData.AbsenceDaysPerFTE,
        'marker' : {
            'color' : CABLE_COLOURS_DICT['green'],
        }
    }],
    'layout' : {
        'title' : 'Number of days lost to stress has increased since 2011',
        'annotations' : [{
            'text' : '<i>NBT & UHBT from 2011 onwards; SWAS from 2013 onwards</i>',
            'xref' : 'paper',
            'x' : 0,
            'yref' : 'paper',
            'y' : -0.15,
            'showarrow' : False,
            'xanchor' : 'left'
        }],
        'xaxis' : NOGRID,
        'yaxis' : NOGRID
    }
}

pyo.iplot(fig)

The number of days lost to stress/anxiety absences has risen from 2.3 days per FTE in 2011, to 3.2 days per FTE in 2016. The decrease in 2017 is likely due to incomplete data, as SWAS could only report absences for 3/4 of the financial year, and UHBT are missing a month of data from the same year.

The total number of days lost to stress-related absences since 2013 is 247,936.

In [14]:
fig = {
    'data' : [{
        'type' : 'scatter',
        'mode' : 'markers+lines',
        'x' : Q1aData.Year,
        'y' : Q1aData.pcChangeAbsenceDaysSince2011,
        'marker' : {
            'color' : CABLE_COLOURS_DICT['green'],
        }
    }],
    'layout' : {
        'title' : 'Number of days lost to stress has increased by 41%',
        'annotations' : [{
            'text' : '<i>NBT & UHBT from 2011 onwards; SWAS from 2013 onwards<br>2017 data for SWAS & UHBT adjusted for incomplete data for 2017</i>',
            'xref' : 'paper',
            'x' : 0,
            'yref' : 'paper',
            'y' : -0.1,
            'showarrow' : False,
            'xanchor' : 'left',
            'align' : 'left'
        },
        {
            'text' : 'From 27,700 days in 2011 to 52,600 days in 2017',
            'xref' : 'paper',
            'x' : 0.5,
            'yref' : 'paper',
            'y' : 1.075,
            'showarrow' : False,
        }],
    'yaxis' : {
        'tickformat' : '%'
        },
    'xaxis' : NOGRID
    }
}

fig['layout']['yaxis'].update(NOGRID)

pyo.iplot(fig)
In [15]:
pyo.plot(fig, filename='../Viz/Number of days lost to stress has increased by 41% since 2011.html')
Out[15]:
'file:///Users/richardmuir/Data Analysis/Bristol Cable/Stress-NHS/Viz/Number of days lost to stress has increased by 41% since 2011.html'

Between 2011 and 2016, the number of days lost to anxiety/stress absences increased by nearly 40%. The sharp drop in 2017 is likely due to incomplete data - SWAS only reported absences for 3/4 of the 2017/17 financial year, while UHBT are missing a month of data from the same year

Q1b How has the number of absences due to stress/anxiety per FTE changed for each service?

In [17]:
Q1bData = pd.read_csv(dataLocs['1b'], index_col=0)
Q1bData
Out[17]:
Trust Year Number Of Staff (FTE) Days Lost To Absence AbsenceDaysPerFTE pcChangeAbsenceDaysSinceY1
0 NBT 2010 7260.7000 16494.0000 2.271682 0.000000
1 NBT 2011 7346.3900 21263.0000 2.894347 0.274099
2 NBT 2012 7317.0000 23794.0000 3.251879 0.431485
3 NBT 2013 7263.0000 22119.0000 3.045436 0.340608
4 NBT 2014 7206.0000 26427.0000 3.667361 0.614381
5 NBT 2015 7715.0000 26993.0000 3.498769 0.540167
6 NBT 2016 7126.0000 22672.0000 3.181589 0.400543
7 NBT 2017 7182.0000 25640.0000 3.570036 0.571539
8 SWAS 2013 4073.0000 10309.0000 2.531058 0.000000
9 SWAS 2014 4072.0000 13629.0000 3.347004 0.322373
10 SWAS 2015 4184.0000 17740.0000 4.239962 0.675174
11 SWAS 2016 4180.0000 19307.0000 4.618900 0.824889
12 SWAS 2017 2948.8575 12869.0000 4.364063 0.724205
13 UHBT 2011 4699.0000 6466.4421 1.376132 0.000000
14 UHBT 2012 4623.0000 7313.0179 1.581877 0.149510
15 UHBT 2013 4791.0000 7651.1157 1.596977 0.160483
16 UHBT 2014 5125.0000 8034.5273 1.567713 0.139217
17 UHBT 2015 5278.0000 9190.7316 1.741328 0.265379
18 UHBT 2016 5436.0000 11213.3499 2.062794 0.498980
19 UHBT 2017 6058.1250 14141.0841 2.334234 0.696229
In [18]:
traces = []
for i, trust in enumerate(Q1bData.Trust.unique()):
    if trust == 'NBT':
        traces.append({
            'type' : 'scatter',
            'mode' : 'markers+lines',
            'x' : Q1bData.loc[Q1bData.Trust == trust , 'Year'],
            'y' : Q1bData.loc[Q1bData.Trust == trust , 'AbsenceDaysPerFTE'],
            'name' : trust,
            'marker' : {
                'color' : CABLE_COLOURS_LIST[i]
            }
        })
    else:
        # solid lines
        traces.append({
            'type' : 'scatter',
            'mode' : 'markers+lines',
            'x' : Q1bData.loc[(Q1bData.Trust == trust) & (Q1bData.Year <= 2016) , 'Year'],
            'y' : Q1bData.loc[(Q1bData.Trust == trust)  & (Q1bData.Year <= 2016), 'AbsenceDaysPerFTE'],
            'name' : trust,
            'marker' : {
                'color' : CABLE_COLOURS_LIST[i]
            },
            'legendgroup' : trust
        })
        # dashed lines
        traces.append({
            'type' : 'scatter',
            'mode' : 'markers+lines',
            'x' : Q1bData.loc[(Q1bData.Trust == trust) & (Q1bData.Year >= 2016) , 'Year'],
            'y' : Q1bData.loc[(Q1bData.Trust == trust)  & (Q1bData.Year >= 2016), 'AbsenceDaysPerFTE'],
            'name' : trust,
            'marker' : {
                'color' : CABLE_COLOURS_LIST[i],
            },
            'line' : {
                'dash' : 'dash'
            },
            'showlegend' : False,
            'legendgroup' : trust,
            'hoverinfo' : 'text+x',
            'text' : ['', '{:.0%}'.format(Q1bData.loc[(Q1bData.Trust == trust), 'AbsenceDaysPerFTE'].values[-1])]
        })


fig = {
    'data' : traces,
    'layout' : {
        'title' : 'Number of days lost to stress has increased for all three trusts',
        'xaxis' : NOGRID,
        'yaxis' : NOGRID,
        'annotations' : [{
            'text' : '<i>2017 data for SWAS & UHBT adjusted for incomplete data for 2017</i>',
            'xref' : 'paper',
            'x' : 0,
            'yref' : 'paper',
            'y' : -0.2,
            'showarrow' : False,
            'xanchor' : 'left',
            'align' : 'left'
        }]
    },
    
}


pyo.iplot(fig)

The number of absences lost to stress/anxiety stayed low for UHBT through 2011-2014, before increasing in 2015 & 2016. There has been a clear upwards trend in the number of absences lost to stress/anxiety in NBT, despite a fall in 2015 & 2016. The number of absences lost to stress/anxiety in SWAS has risen dramatically between 2013 and 2016, the drop in 2017 is due to incomplete data. SWAS only reported absences for 3/4 of the 2017/18 financial year, while UHBT are missing a month of data from the same year.

In [21]:
traces = []
for i, trust in enumerate(Q1bData.Trust.unique()):
    if trust == 'NBT':
        traces.append({
            'type' : 'scatter',
            'mode' : 'markers+lines',
            'x' : Q1bData.loc[Q1bData.Trust == trust , 'Year'],
            'y' : Q1bData.loc[Q1bData.Trust == trust , 'pcChangeAbsenceDaysSinceY1'],
            'name' : trust,
            'marker' : {
                'color' : CABLE_COLOURS_LIST[i]
            }
        })
    else:
        # solid lines
        traces.append({
            'type' : 'scatter',
            'mode' : 'markers+lines',
            'x' : Q1bData.loc[(Q1bData.Trust == trust) & (Q1bData.Year <= 2016) , 'Year'],
            'y' : Q1bData.loc[(Q1bData.Trust == trust)  & (Q1bData.Year <= 2016), 'pcChangeAbsenceDaysSinceY1'],
            'name' : trust,
            'marker' : {
                'color' : CABLE_COLOURS_LIST[i]
            },
            'legendgroup' : trust
        })
        # dashed lines
        traces.append({
            'type' : 'scatter',
            'mode' : 'markers+lines',
            'x' : Q1bData.loc[(Q1bData.Trust == trust) & (Q1bData.Year >= 2016) , 'Year'],
            'y' : Q1bData.loc[(Q1bData.Trust == trust)  & (Q1bData.Year >= 2016), 'pcChangeAbsenceDaysSinceY1'],
            'name' : trust,
            'marker' : {
                'color' : CABLE_COLOURS_LIST[i],
            },
            'line' : {
                'dash' : 'dash'
            },
            'showlegend' : False,
            'legendgroup' : trust,
            'hoverinfo' : 'text+x',
            'text' : ['', '{:.0%}'.format(Q1bData.loc[(Q1bData.Trust == trust), 'pcChangeAbsenceDaysSinceY1'].values[-1])]
        })

fig = {
    'data' : traces,
    'layout' : {
        'title' : 'Percentage change in number of days lost to stress has increased for all three trusts',
        'yaxis' : {
            'tickformat' : '%'
        },
        'annotations' : [{
            'text' : '<i>2017 data for SWAS & UHBT adjusted for incomplete data for 2017</i>',
            'xref' : 'paper',
            'x' : 0,
            'yref' : 'paper',
            'y' : -0.1,
            'showarrow' : False,
            'xanchor' : 'left',
            'align' : 'left'
        }],
        'xaxis' : NOGRID
    },
}

fig['layout']['yaxis'].update(NOGRID)
pyo.iplot(fig)
In [22]:
pyo.plot(fig, filename='../Viz/Number of days lost to stress has increased for all three trusts.html')
Out[22]:
'file:///Users/richardmuir/Data Analysis/Bristol Cable/Stress-NHS/Viz/Number of days lost to stress has increased for all three trusts.html'

Between 2010 & 2016 the number of days lost to stress/anxiety absences in NBT has increased by 40%. Between 2011 & 2016 the number of days lost to stress/anxiety absences in UHBT has increased by 50%. Between 2013 & 2016 the number of days lost to stress/anxiety absences in SWAS has increased by more than 80%/has almost doubled. Sickness absences are under-reported in the 2017 data for UHBT and SWAS, accounting for at least some of the decrease.

Q2a How has the number of absence episodes due to stress/anxiety changed for NBT/SWAS together?

In [20]:
Q2aData = pd.read_csv(dataLocs['2a'], index_col=0)
Q2aData
Out[20]:
Year Number Of Staff (FTE) No. of Episodes EpisodesPerFTE pcChangeAbsenceEpisodesSince2013
0 2013 11336.0000 757.0 0.066778 0.000000
1 2014 11278.0000 1069.0 0.094786 0.419416
2 2015 11899.0000 1132.0 0.095134 0.424623
3 2016 11306.0000 1189.0 0.105165 0.574841
4 2017 10130.8575 1163.0 0.114798 0.719085
In [21]:
fig = {
    'data' : [{
        'type' : 'scatter',
        'mode' : 'markers+lines',
        'x' : Q2aData.Year,
        'y' : Q2aData.EpisodesPerFTE,
        'marker' : {
            'color' : CABLE_COLOURS_DICT['green']
        }
    }],
    'layout' : {
        'title' : 'Stress/Anxiety episodes per FTE increased since 2013',
        'annotations' : [{
            'text' : '<i>Only NBT & SWAS. No episode data from UHBT</i>',
            'xref' : 'paper',
            'x' : 0,
            'yref' : 'paper',
            'y' : -0.15,
            'showarrow' : False,
            'xanchor' : 'left',
            'align' : 'left'
        }],
        'xaxis' : {
            'nticks' : 5
        },
        'yaxis' : NOGRID
    }
}

fig['layout']['xaxis'].update(NOGRID)

pyo.iplot(fig)

The number of episodes of stree/anxiety has increased sine 2013, when there was one absence episode per 15 FTE, to 2017, when there was one episode for every 10 FTE. An increase from 0.067 to 0.1 episodes per FTE. We expect the final number for 2017 to rise as more absences are reported for the 2017/18 financial year by SWAS and UHBT.

In [22]:
fig = {
    'data' : [{
        'type' : 'scatter',
        'mode' : 'markers+lines',
        'x' : Q2aData.Year,
        'y' : Q2aData.pcChangeAbsenceEpisodesSince2013,
        'marker' : {
            'color' : CABLE_COLOURS_DICT['blue']
        }
    }],
    'layout' : {
        'title' : 'Stress/Anxiety episodes per FTE increased by 72% since 2013',
        'annotations' : [{
            'text' : '<i>Only NBT & SWAS. No episode data from UHBT</i>',
            'xref' : 'paper',
            'x' : 0,
            'yref' : 'paper',
            'y' : -0.15,
            'showarrow' : False,
            'xanchor' : 'left',
            'align' : 'left'
        }],
        'xaxis' : {
            'nticks' : 4
        },
        'yaxis' : {
            'tickformat' : '%'
        }
    }
}

fig['layout']['yaxis'].update(NOGRID)
fig['layout']['xaxis'].update(NOGRID)

pyo.iplot(fig)

The number of absence episodes per FTE has increased by almost 60% since 2013. We expect the final increase in absences for 2017 to rise as more absences are reported for the 2017/18 financial year by SWAS and UHBT.

Q2b How has the number of absence episodes due to stress/anxiety changed for each of NBT/SWAS ?

In [23]:
Q2bData = pd.read_csv(dataLocs['2b'], index_col=0)
Q2bData
Out[23]:
Trust Year Number Of Staff (FTE) No. of Episodes EpisodesPerFTE pcChangeAbsenceEpisodesSinceY1
0 NBT 2010 7260.7000 372.0 0.051235 0.000000
1 NBT 2011 7346.3900 453.0 0.061663 0.203538
2 NBT 2012 7317.0000 609.0 0.083231 0.624500
3 NBT 2013 7263.0000 536.0 0.073799 0.440404
4 NBT 2014 7206.0000 613.0 0.085068 0.660358
5 NBT 2015 7715.0000 647.0 0.083863 0.636831
6 NBT 2016 7126.0000 617.0 0.086584 0.689954
7 NBT 2017 7182.0000 665.0 0.092593 0.807223
8 SWAS 2013 4073.0000 221.0 0.054260 0.000000
9 SWAS 2014 4072.0000 456.0 0.111984 1.063855
10 SWAS 2015 4184.0000 485.0 0.115918 1.136349
11 SWAS 2016 4180.0000 572.0 0.136842 1.521981
12 SWAS 2017 2948.8575 498.0 0.168879 2.112416
In [24]:
traces = []
for i, trust in enumerate(Q2bData.Trust.unique()):
    if trust == 'NBT' : 
        traces.append({
            'type' : 'scatter',
            'mode' : 'markers+lines',
            'x' : Q2bData.loc[Q2bData.Trust == trust , 'Year'],
            'y' : Q2bData.loc[Q2bData.Trust == trust , 'EpisodesPerFTE'],
            'name' : trust,
            'marker' : {
                'color' : CABLE_COLOURS_LIST[i]
            }
        })
    else:
        # solid lines
        traces.append({
            'type' : 'scatter',
            'mode' : 'markers+lines',
            'x' : Q2bData.loc[(Q2bData.Trust == trust) & (Q2bData.Year <= 2016) , 'Year'],
            'y' : Q2bData.loc[(Q2bData.Trust == trust)  & (Q2bData.Year <= 2016), 'EpisodesPerFTE'],
            'name' : trust,
            'marker' : {
                'color' : CABLE_COLOURS_LIST[i]
            },
            'legendgroup' : trust,
        })
        # dashed lines
        traces.append({
            'type' : 'scatter',
            'mode' : 'markers+lines',
            'x' : Q2bData.loc[(Q2bData.Trust == trust) & (Q2bData.Year >= 2016) , 'Year'],
            'y' : Q2bData.loc[(Q2bData.Trust == trust)  & (Q2bData.Year >= 2016), 'EpisodesPerFTE'],
            'name' : trust,
            'marker' : {
                'color' : CABLE_COLOURS_LIST[i],
            },
            'line' : {
                'dash' : 'dash'
            },
            'showlegend' : False,
            'legendgroup' : trust,
            'hoverinfo' : 'text+x',
            'text' : ['', '{:.2f}'.format(Q2bData.loc[(Q2bData.Trust == trust), 'EpisodesPerFTE'].values[-1])]
        })

fig = {
    'data' : traces,
    'layout' : {
        'title' : 'Number of stress/anxiety absence episodes per FTE by trust',
        'annotations' : [{
            'text' : '<i>Only NBT & SWAS. No episode data from UHBT<br>2017 data for SWAS adjusted for incomplete data for 2017</i>',
            'xref' : 'paper',
            'x' : 0,
            'yref' : 'paper',
            'y' : -0.15,
            'showarrow' : False,
            'xanchor' : 'left',
            'align' : 'left',
        }],
        'xaxis' : {
            'nticks' : 8
        },
        'yaxis' : {
            'tickformat' : '.2f'
        }
    }
}

fig['layout']['xaxis'].update(NOGRID)
fig['layout']['yaxis'].update(NOGRID)

pyo.iplot(fig)

For NBT, the number of absence episodes per FTE due to stress/anxiety increased from 0.051 in 2010 to 0.09 in 2016. It changed from approximately 1 episodes for every 20 FTE to 1 for every 11 FTE.

For SWAS, there were 0.054 episodes per FTE in 2013, increasing to 0.14 episodes per FTE in 2016. This is a change from 1 episodes for every 19 FTE to 1 episode for every 7 FTE. The drop in 2017 is due to incomplete data.

In [25]:
traces = []
for i, trust in enumerate(Q2bData.Trust.unique()):
    if trust == 'NBT' : 
        traces.append({
            'type' : 'scatter',
            'mode' : 'markers+lines',
            'x' : Q2bData.loc[Q2bData.Trust == trust , 'Year'],
            'y' : Q2bData.loc[Q2bData.Trust == trust , 'pcChangeAbsenceEpisodesSinceY1'],
            'name' : trust,
            'marker' : {
                'color' : CABLE_COLOURS_LIST[i]
            }
        })
    else:
        # solid lines
        traces.append({
            'type' : 'scatter',
            'mode' : 'markers+lines',
            'x' : Q2bData.loc[(Q2bData.Trust == trust) & (Q2bData.Year <= 2016) , 'Year'],
            'y' : Q2bData.loc[(Q2bData.Trust == trust)  & (Q2bData.Year <= 2016), 'pcChangeAbsenceEpisodesSinceY1'],
            'name' : trust,
            'marker' : {
                'color' : CABLE_COLOURS_LIST[i]
            },
            'legendgroup' : trust,
        })
        # dashed lines
        traces.append({
            'type' : 'scatter',
            'mode' : 'markers+lines',
            'x' : Q2bData.loc[(Q2bData.Trust == trust) & (Q2bData.Year >= 2016) , 'Year'],
            'y' : Q2bData.loc[(Q2bData.Trust == trust)  & (Q2bData.Year >= 2016), 'pcChangeAbsenceEpisodesSinceY1'],
            'name' : trust,
            'marker' : {
                'color' : CABLE_COLOURS_LIST[i],
            },
            'line' : {
                'dash' : 'dash'
            },
            'showlegend' : False,
            'legendgroup' : trust,
            'hoverinfo' : 'text+x',
            'text' : ['', '{:.0%}'.format(Q2bData.loc[(Q2bData.Trust == trust), 'pcChangeAbsenceEpisodesSinceY1'].values[-1])]
        })

fig = {
    'data' : traces,
    'layout' : {
        'title' : 'Stress anxiety absence episodes increased for NBT & SWAS',
        'annotations' : [{
            'text' : '<i>Only NBT & SWAS. No episode data from UHBT<br>2017 data for SWAS adjusted for incomplete data for 2017</i>',
            'xref' : 'paper',
            'x' : 0,
            'yref' : 'paper',
            'y' : -0.15,
            'showarrow' : False,
            'xanchor' : 'left',
            'align' : 'left',
        }],
        'xaxis' : {
            'nticks' : 8
        },
        'yaxis' : {
            'tickformat' : '%'
        }
    }
}

fig['layout']['xaxis'].update(NOGRID)
fig['layout']['yaxis'].update(NOGRID)

pyo.iplot(fig)
    
    
    

For NBT there was a 70% increase in the number of absence episodes due to anxiety/stress. For SWAS the number of episodes increased by more than 150%/almost tripled between 2013 and 2016. The number of episodes per FTE doubled between 2013 and 2014.

Q3 How has the average length of an anxiety episode changed (NBT & SWAS only)?

In [26]:
Q3Data = pd.read_csv(dataLocs['3'], index_col=0)
Q3Data
Out[26]:
Year Days Lost To Absence No. of Episodes averageLengthOfEpisodeInDays
0 2013 32428.0 757.0 42.837517
1 2014 40056.0 1069.0 37.470533
2 2015 44733.0 1132.0 39.516784
3 2016 41979.0 1189.0 35.306140
4 2017 38509.0 1163.0 33.111780
In [27]:
fig = {
    'data' : [{
        'type' : 'scatter',
        'mode' : 'markers+lines',
        'x' : Q3Data.Year,
        'y' : Q3Data.averageLengthOfEpisodeInDays,
        'marker' : {
            'color' : CABLE_COLOURS_DICT['red']
        }
    }],
    'layout' : {
        'title' : 'Average length in days of stress-related absence episodes decreasing',
        'annotations' : [{
            'text' : '<i>Only NBT & SWAS. No episode data from UHBT</i>',
            'xref' : 'paper',
            'x' : 0,
            'yref' : 'paper',
            'y' : -0.15,
            'showarrow' : False,
            'xanchor' : 'left'
        }],
        'xaxis' : {
            'nticks' : 5
        },
        'yaxis' : NOGRID
    }
}

fig['layout']['xaxis'].update(NOGRID)

pyo.iplot(fig)

The length of an absence episode has decreased since 2013.

Q4 Which staff groups have the biggest rate of change (NBT & UHBT only)?

In [28]:
Q4Data = pd.read_csv(dataLocs['4'], index_col=0)
Q4Data.head()
Out[28]:
Staff Group Year Number Of Staff (FTE) Days Lost To Absence No. of Episodes AbsenceDaysPerFTE EpisodesPerFTE pcChangeAbsenceDaysSinceY1 pcChangeAbsenceEpisodesSinceY1
0 Add Prof Scientific and Technic 2011 1335.03 1336.8308 9.0 1.001349 0.006741 0.000000 0.000000
5 Add Prof Scientific and Technic 2016 1369.00 1909.9852 16.0 1.395168 0.011687 0.393289 0.733664
6 Additional Clinical Services 2011 1155.61 5148.0000 125.0 4.454790 0.108168 0.000000 0.000000
11 Additional Clinical Services 2016 1300.00 5257.0000 175.0 4.043846 0.134615 -0.092248 0.244503
12 Administrative and Clerical 2011 1480.15 4008.0000 99.0 2.707834 0.066885 0.000000 0.000000
In [29]:
traces = []
for sg in Q4Data['Staff Group'].unique():
    dfsg = Q4Data[Q4Data['Staff Group'] == sg]
    traces.append({
        'type' : 'scatter', 
        'mode' : 'markers+lines',
        'x' : dfsg.Year,
        'y' : dfsg['AbsenceDaysPerFTE'],
        'name' : sg
    })
    
    
fig = {
    'data' : traces,
    'layout' : {
        'title' : 'Number of days lost per FTE to stress/anxiety absence by Staff Group',
        'annotations' : [{
            'text' : '<i>Only NBT & UHBT. No staff group data from SWAS</i>',
            'xref' : 'paper',
            'x' : 0,
            'yref' : 'paper',
            'y' : -0.15,
            'showarrow' : False,
            'xanchor' : 'left'
        }],
        'xaxis' : {
            'tickvals' : [2011, 2016],
            'ticktext' : [2011, 2016]
        },
        'yaxis' : NOGRID
    }
}

fig['layout']['xaxis'].update(NOGRID)

pyo.iplot(fig)

Most Staff groups saw an increase in the number of absence days lost.

In [30]:
traces = []
for sg in Q4Data['Staff Group'].unique():
    dfsg = Q4Data[Q4Data['Staff Group'] == sg]
    traces.append({
        'type' : 'scatter', 
        'mode' : 'markers+lines',
        'x' : dfsg.Year,
        'y' : dfsg['pcChangeAbsenceDaysSinceY1'],
        'name' : sg
    })
    
    
fig = {
    'data' : traces,
    'layout' : {
        'title' : 'Percent change in number of stress/anxiety absence episodes by Staff Group',
        'annotations' : [{
            'text' : '<i>Only NBT & UHBT. No staff group data from SWAS</i>',
            'xref' : 'paper',
            'x' : 0,
            'yref' : 'paper',
            'y' : -0.15,
            'showarrow' : False,
            'xanchor' : 'left'
        }],
        'xaxis' : {
            'tickvals' : [2011, 2016],
            'ticktext' : [2011, 2016]
        },
        'yaxis' : {
            'tickformat' : '%'
        }
    }
}

fig['layout']['xaxis'].update(NOGRID)
fig['layout']['yaxis'].update(NOGRID)

pyo.iplot(fig)

Healthcare Scientists had the biggest increase of 122% between 2011 and 2016. Going from 1.7 stress-related absence days per FTE in 2011 to 3.7 in 2016. Administrative & Clerical saw the next biggest increase at 56%. Only Additional Clinical Services saw a decrease in the number of stress-related absence days.

Q5a What are the top 5 absence reasons (excluding Maternity & other non-sickness absences) for NBT & SWAS, and how has the number of days absent per FTE changed?

In [31]:
Q5Data = pd.read_csv(dataLocs['5'], index_col=0)
Q5Data.head()
Out[31]:
Year Absence Reason Number of Staff In Year (FTE) Days Lost To Absence Absence Days Per FTE Rank
170 2013 Not Specified 11336.0 35066.0 3.093331 1
141 2013 Anxiety/Stress 11336.0 32428.0 2.860621 2
168 2013 Musculo-skeletal 11336.0 30695.0 2.707745 3
171 2013 Other 11336.0 30376.0 2.679605 4
162 2013 Injury, Fracture 11336.0 18978.0 1.674135 5
In [32]:
traces = []
for rsn in Q5Data['Absence Reason'].unique():
    tdf = Q5Data[Q5Data['Absence Reason'] == rsn]
    traces.append({
        'type' : 'scatter',
        'mode' : 'markers+lines',
        'x' : tdf.Year,
        'y' : tdf['Absence Days Per FTE'],
        'name' : rsn
    })
    
fig = {
    'data' : traces,
    'layout' : {
        'title' : 'Number of days per FTE lost to absence for top 5 absence reasons each year',
        'annotations' : [{
            'text' : '<i>Only NBT & SWAS. No absence reason data from UHBT</i><br>Absences for maternity have been excluded',
            'xref' : 'paper',
            'x' : 0,
            'yref' : 'paper',
            'y' : -0.15,
            'showarrow' : False,
            'xanchor' : 'left',
            'align' : 'left'
        }],
        'xaxis' : {
            'nticks' : 5
        },
        'yaxis' : NOGRID
    }
}
fig['layout']['xaxis'].update(NOGRID)

pyo.iplot(fig)

Since 2014, anxiety/Stress has been the absence category with the largest number of absence days, accounting for more days per FTE lost than even the catch-all categories of 'Not Specified' and 'Other'. We expect the total absences for 2017 to increase as SWAS finalise their data.

6. What proportion of all sickness absences are due to stress & anxiety?

In [33]:
Q6Data = pd.read_csv(dataLocs['6'], index_col=0)
Q6Data.head()
Out[33]:
Year Trust Absence Reason Days Lost To Absence totalDays propOfDays
0 2010 NBT Anxiety/Stress 16494.0 147228.0 0.112030
1 2010 NBT Other 130734.0 147228.0 0.887970
2 2011 NBT Anxiety/Stress 21263.0 144610.0 0.147037
3 2011 NBT Other 123347.0 144610.0 0.852963
4 2012 NBT Anxiety/Stress 23794.0 151438.0 0.157120
In [34]:
traces = []
for i, trust in enumerate(Q6Data.Trust.unique()):
    tdf = Q6Data[(Q6Data.Trust == trust) & 
                            (Q6Data['Absence Reason'] == 'Anxiety/Stress')]
    traces.append({
        'type' : 'scatter',
        'mode' : 'markers+lines',
        'x' : tdf.Year,
        'y' : tdf.propOfDays,
        'name' : trust,
        'marker' : {
            'color' : CABLE_COLOURS_LIST[i]
        }
    })

fig = {
    'data' : traces,
    'layout' : {
        'title' : 'Days lost to stress/anxiety as a proportion of all days lost to sickness',
        'annotations' : [{
            'text' : '<i>Only NBT & SWAS. No absence reason data from UHBT<br>Absences for maternity & other non-sickness absences have been excluded</i>',
            'xref' : 'paper',
            'x' : 0,
            'yref' : 'paper',
            'y' : -0.2,
            'showarrow' : False,
            'xanchor' : 'left',
            'align' : 'left'
        }],
        'xaxis' : {
            'nticks' : 8
        },
        'yaxis' : {
            'tickformat' : '%'
        }
    }
}

fig['layout']['xaxis'].update(NOGRID)
fig['layout']['yaxis'].update(NOGRID)

pyo.iplot(fig)

For NBT, the proportion of days lost to stress/anxiety peaked at 17% in 2014. For SWAS, the number has risen sharply from 14% in 2013 to 24% in 2016. 1 in 4 sickness days for SWAS staff was due to stress/anxiety. In contrast to the comparison with FTE, these figures are representative as they show the proportion of total absences.

Methodology & Caveats

Introduction

The data used in this analysis is from 6 different sources. Several assumptions & exclusions had to be made in order to make the data suitable for analysis, as such, some inaccuracies will exist.

The following outlines the methods used to extract & clean the data and the assumptions & exclusions made when combining them.

The data comes from releases under the Freedom of Information Act and open data published by the following organisations: North Bristol Trust (NBT), University Hospitals Bristol Trust (UHBT) and South West Ambulance Service (SWAS).

FOI Data

Each of these trusts responded to an FOI request for absence data relating to stress & anxiety. Though the request was the same to each organisation, the level of detail and the format of the response differed for each trust:

North Bristol Trust
  • Provided data from 2010-2017 in calendar years.
  • Provided data on the absence reason (including absences other than stress/anxiety) and the staff group.
  • Provided data on the number of episodes for each absence type, as well as the number of days (an episode can consist of multiple days).
University Hospitals Bristol Trust
  • Provided month-level data from April 2011 to December 2017.
  • Provided the number of absence days due to stress & anxiety split by staff group.
South West Ambulance Service
  • Provided data which lists each absence between April 2013 & 6th January 2018. The start and end dates are shown for each absence, which allows us to calculate the number of days and the number of episodes.
  • Information about the absence reason and department was also given.

Publicly Available Data

In order to account for the changing staffing levels, publicly available information on the number of staff each trust employs was added to the analysis. This information is supplied as Full Time Equivalent (FTE) numbers. Further differences exist in this data:

FTE data is available for the NBT for calendar years 2010-2016 at a staff group level. Also for 2017 at Trust level.

FTE data is available for the UHBT for financial years 2010/11 - 2016/17 at a staff group level. Also until Feb 2018 at Trust level.

FTE data is available for the SWAS for financial years 2013/14 - 2016/17 at a staff group level (data available beforehand, but FOI is only from 2013). Also as of January 2018 at Trust-level.

Data Differences

The table below shows how the data differs superficially for each source:

In [35]:
from IPython.display import HTML
HTML("""
<style type="text/css">
.tg  {border-collapse:collapse;border-spacing:0;border-color:#ccc;}
.tg td{font-family:Arial, sans-serif;font-size:14px;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;border-color:#ccc;color:#333;background-color:#fff;}
.tg th{font-family:Arial, sans-serif;font-size:14px;font-weight:normal;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;border-color:#ccc;color:#333;background-color:#f0f0f0;}
.tg .tg-lqy6{text-align:right;vertical-align:top}
.tg .tg-i9x5{background-color:#656565;color:#ffffff;vertical-align:top}
.tg .tg-b7b8{background-color:#f9f9f9;vertical-align:top}
.tg .tg-p5oz{background-color:#f9f9f9;text-align:right;vertical-align:top}
.tg .tg-yw4l{vertical-align:top}
</style>
<table class="tg">
  <tr>
    <th class="tg-i9x5">Source</th>
    <th class="tg-i9x5">Dates Available</th>
    <th class="tg-i9x5">Date Aggregation</th>
    <th class="tg-i9x5">Staff Group</th>
    <th class="tg-i9x5">Absence Reason</th>
    <th class="tg-i9x5">Number of days</th>
    <th class="tg-i9x5">Number of Episodes</th>
    <th class="tg-i9x5">FTE </th>
  </tr>
  <tr>
    <td class="tg-b7b8">NBT FOI</td>
    <td class="tg-b7b8">2010-2017</td>
    <td class="tg-b7b8">Calendar Year</td>
    <td class="tg-p5oz">Yes</td>
    <td class="tg-p5oz">Yes</td>
    <td class="tg-p5oz">Yes</td>
    <td class="tg-p5oz">Yes</td>
    <td class="tg-b7b8">N/A</td>
  </tr>
  <tr>
    <td class="tg-yw4l">NBT Accounts</td>
    <td class="tg-yw4l">2010-2016 (2017 from press office)</td>
    <td class="tg-yw4l">Calendar Year</td>
    <td class="tg-lqy6">Yes (not for 2017)</td>
    <td class="tg-lqy6">N/A</td>
    <td class="tg-lqy6">N/A</td>
    <td class="tg-lqy6">N/A</td>
    <td class="tg-yw4l">Yes</td>
  </tr>
  <tr>
    <td class="tg-b7b8">UHBT FOI</td>
    <td class="tg-b7b8">Apr '11 - Dec 17</td>
    <td class="tg-b7b8">Months</td>
    <td class="tg-p5oz">Yes</td>
    <td class="tg-p5oz">No</td>
    <td class="tg-p5oz">Yes</td>
    <td class="tg-p5oz">No</td>
    <td class="tg-b7b8">N/A</td>
  </tr>
  <tr>
    <td class="tg-yw4l">UHBT Accounts</td>
    <td class="tg-yw4l">2010/11 - 2016/17 (2017/18 from press office)</td>
    <td class="tg-yw4l">Financial Year</td>
    <td class="tg-lqy6">Yes</td>
    <td class="tg-lqy6">N/A</td>
    <td class="tg-lqy6">N/A</td>
    <td class="tg-lqy6">N/A</td>
    <td class="tg-yw4l">Yes</td>
  </tr>
  <tr>
    <td class="tg-b7b8">SWAS FOI</td>
    <td class="tg-b7b8">Apr '13 - Dec '17</td>
    <td class="tg-b7b8">None (individual absences provided)</td>
    <td class="tg-p5oz">Department</td>
    <td class="tg-p5oz">Yes</td>
    <td class="tg-p5oz">Yes</td>
    <td class="tg-p5oz">Yes</td>
    <td class="tg-b7b8">N/A</td>
  </tr>
  <tr>
    <td class="tg-yw4l">SWAS Accounts</td>
    <td class="tg-yw4l">2013/14 - 2016/17 (2017/18 from press office)</td>
    <td class="tg-yw4l">Financial Year</td>
    <td class="tg-lqy6">Yes</td>
    <td class="tg-lqy6">N/A</td>
    <td class="tg-lqy6">N/A</td>
    <td class="tg-lqy6">N/A</td>
    <td class="tg-yw4l">Yes</td>
  </tr>
</table>
""")
Out[35]:
Source Dates Available Date Aggregation Staff Group Absence Reason Number of days Number of Episodes FTE
NBT FOI 2010-2017 Calendar Year Yes Yes Yes Yes N/A
NBT Accounts 2010-2016 (2017 from press office) Calendar Year Yes (not for 2017) N/A N/A N/A Yes
UHBT FOI Apr '11 - Dec 17 Months Yes No Yes No N/A
UHBT Accounts 2010/11 - 2016/17 (2017/18 from press office) Financial Year Yes N/A N/A N/A Yes
SWAS FOI Apr '13 - Dec '17 None (individual absences provided) Department Yes Yes Yes N/A
SWAS Accounts 2013/14 - 2016/17 (2017/18 from press office) Financial Year Yes N/A N/A N/A Yes

In addition to the superficial differences, the main other data integrity problem was inconsistent reporting in staff group and department.

For UHBT, the staff groups reported on in the FOI request could not be accurately matched to the staff groups provided in the publicly available FTE data due to inconsistent terminology.

For these cases, staff groups were only matched where there is a positive match.

For SWAS, the departments reported on in the FOI request did not correspond at all to the staff groups provided in the publicly available FTE data. Therefore, the SWAS data was not disaggregated to staff group. Instead only a Trust-level and Absence Reason-level data analysis was provided.

The fundamental differences in the data can be summarised as follows:

  • Reporting at both calendar year & financial year
  • Difficulty mapping staff groups in UHBT FOI data to staff groups in SWAS accounts
  • No accurate mapping for departments in SWAS FOI data to staff groups in SWAS accounts
  • No absence reason (other than those for Stress & Anxiety) data for UHBT
  • No episode data for UHBT

Accounting and compensating for the differences

Because other absence data was not provided by UHBT, absences due to stress and anxiety as compared to other absence reasons was not reported for this trust, instead only being reported for NBT & SWAS.

In order to provide meaningful insight into the data, a compensation was made to adjust for the discrepancy in reporting for Financial & Calendar years:

  • For both UHBT & SWAS, the month-level FOI data was mapped directly to the financial years provided in the publically available FTE data.
  • These financial years were then 'converted' into calendar years, accepting that there would be a loss in accuracy.
    • For example, the Financial Year 2011/12 (covering the months from April 2011 to March 2012 inclusive) would be mapped to the 2011 calendar year.
  • This means that four months in each financial year would be misattributed to the corresponding calendar year.

In order to give the most up-to-date picture of stress-related absences, the Trust-level FTE data is used in place of the staff group data where possible, despite not aligning perfectly.

Absences per FTE for UHBT & SWAS will be understated as there is a comparison of 9 months of absence data with staffing levels for a whole year. In order to provide the most up-to-date figures possible, it was necessary to adjust the FTE data for these trusts:

  • UHBT supplied data until the end of December 2017. This means that for the 2017/18 financial year there are 3 months of absence data missing.

  • SWAS supplied data until the 6th January 2018. This means that for the 2017/18 financial year there are ~3 months of absence data missing.

  • The best option to retain data integrity whilst allowing for comparisons is to reduce the total number of FTE by one quarter in order to bring the FTE down to the same period of data as the absences were provided, though this is imperfect.

  • In this analysis, we're using FTE as a proxy for the number of days worked in a year, however FTE is actually the measure of staffing levels at a given point in time. By reducing FTE by 1/4, we are intentionally mis-stating the actual number of staff employed, but doing so in a way that allows us to make an comparison with the provided sickness & absence data.

It must be noted that a mistake in this process led to the number of stress-related absences per FTE being overstated for SWAS in the April 2018 print edition of The Bristol Cable. These figures were rectified in the online copy and a correction will be published in the following edition.

Other steps taken to clean & align the data

The staff groups did not directly match between and within data supplied by NBT & UHBT. This was easily fixed through text replacement. Stress & Anxiety absences were coded slightly differently. This was easily fixed through text replacement.

There were a significant quantity of Absence Reasons that were very closely aligned. These were recoded in order to reduce the number of categories. There were also slight differences in the way similar absence types were recorded by SWAS & NBT. These differences were manually aligned. All code used to align the absence reasons is available on Gitlab.

The absence data for UHBT was censored if the number of absence in a month/staff group combination was below 5. This was done to protect individual staff members. There were 9 such cases where this happened. For these cases, the average of two days was imputed. This was calculated as (1 + 2 + 3 + 4) / 4 = 2.5, rounded down to 2.

When calculating the top absence reasons we excluded maternity, leave and jury service and others, as we were purely looking at illness related absences.

The full list of exclusions is available on Gitlab.

Quality Control

SWAS data was QC'd by creating a pivot table on the source data, then summing the number of days absent and counting the number of records. The absence rates per FTE were then calculated by hand for a few data points.

NBT Data was QC'd by checking the summed absence days after coding against the totals provided in the source data, then calculating the absence rates by hand for a few data points.

UHBT Data was QC'd against the raw data and several data points were checked.

In [36]:
HTML("""<script>
  function code_toggle() {
    if (code_shown){
      $('div.input').hide('500');
      $('#toggleButton').val('Show Code')
    } else {
      $('div.input').show('500');
      $('#toggleButton').val('Hide Code')
    }
    code_shown = !code_shown
  }

  $( document ).ready(function(){
    code_shown=false;
    $('div.input').hide()
  });
</script>
<form action="javascript:code_toggle()"><input type="submit" id="toggleButton" value="Show Code"></form>""")
Out[36]:
In [ ]: