
The Stable Shift Algorithm
stable-shift-algorithm.Rmd
library(autospc)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(scales)
library(DiagrammeR)Introduction
The autospc package implements the Stable Shift Algorithm for re-establishing control limits in statistical process control (SPC) analysis. This vignette describes the problem the algorithm addresses, sets out some useful terminology, and describes the algorithm.
The problem
A standard approach in SPC analysis for quality improvement goes as follows:
- Establish a baseline:
- gather or extract data over a period of time prior to a planned improvement initiative
- plot this data on a control chart, forming baseline control limits
- Extend the baseline limits into the future
- Add data to the chart as time progresses, without updating the control limits
This is exemplified in the following three charts.
facet_stages(
ed_attendances_monthly %>%
filter(row_number() <= 32L),
split_rows = c("Baseline" = 24L,
"Data added beyond baseline" = 28L,
"Further data added" = 32L),
x = Month_Start,
y = Att_All,
chartType = "C'",
periodMin = 24,
use_caption = FALSE,
x_break = 91.25,
x_date_format = "%Y-%b",
extend_limits_to = as.Date("2018-04-01"),
upper_annotation_sf = 0.825
)
#> Registered S3 methods overwritten by 'ggpp':
#> method from
#> heightDetails.titleGrob ggplot2
#> widthDetails.titleGrob ggplot2
As data is added, the process may shift to a higher or lower level. Indeed, this is often the aim of an improvement initiative. This prompts the question of whether and when the control limits should be re-established to better represent the new process. Whilst various textbooks and online resources offer opinions on this issue, there is no universally accepted approach. The Stable Shift Algorithm (SSA) offers an automated, consistent and rigorous approach to re-establishing control limits.
The Stable Shift Algorithm: Overview
The main idea of the SSA is to only re-establish limits where:
- There is evidence that the process has shifted to a new level
- This shift persists for long enough to compute new control limits
In other words, the SSA re-establishes limits at shift rule breaks, provided that the shift is not “transient” in some sense. Here “transient” means that once we calculate the new control limits, there is not a shift rule break back towards the original process. In the next two sections we make this idea precise and describe how it is operationalised in the SSA.
Some terminology
First, it is useful to introduce some terminology.
Rule-breaking run
A rule-breaking run is a run whose length is greater than or
equal to the threshold for a rule break (runRuleLength),
set to
by default in plot_auto_SPC().
Runs that are subsets of longer runs count here, so for example with
the default runRuleLength = 8, a run of length 10 actually
comprises three rule-breaking runs, commencing at the first, second and
third points in the length 10 run. In this example, the first
rule-breaking run has length 10, the second 9, and the third 8. The run
that commences on the fourth point of the length 10 run is not
rule-breaking, since it is of length 7 only.
Triggering rule break
In the SSA, a rule breaking run can trigger consideration of whether to re-establish limits. Such a run is referred to as a triggering rule break.
Candidate limits
In order to decide whether to re-establish control limits at a
triggering rule break, the SSA requires consideration of the set of
limits that would be established. These are referred to as
candidate limits until they are either rejected or accepted.
Candidate limits are formed from the first periodMin points
starting at the first point of the triggering rule break, and this
period is referred to as the candidate calculation period.
Opposing rule break
If there is a rule break within the candidate calculation period, and
that rule break is in the opposite direction to the triggering rule
break, it is referred to as an opposing rule break. We also
sometimes refer to such a rule break as a reversion, as in
reverting to the original limits. If the rule break only reaches the
runRuleLength threshold after the end of the candidate
calculation period, it is referred to as an overhanging
reversion.
Minimum period length
The SSA requires specification of a minimum number of data points to
be used for calculation of control limits,
.
Whilst those using SPC in practice may not often make such a minimum
explicit, in a way it is always there implicitly - nobody would compute
control limits from two data points would they? In fact, various authors
offer guidance on what such a minimum should be, with values ranging
from 17 to 25 points. In plot_auto_SPC(),
is specified by the periodMin argument, defaulting to
21.
The Stable Shift Algorithm: Details
The steps of the algorithm are as follows:
Algorithm counter initialised to the first data point.
Check there is sufficient data to form at least one set of limits. If a time series has fewer points than , then according to the SSA no control limits can be established. If a time series has length with , then there are sufficient data points to form one period. The SSA establishes limits calculated from the first points, with limits extended over the remainder of the data. However, in this case there are not sufficient data points to re-establish a second set of limits, so the SSA terminates after establishing this first set of limits.
From here on, therefore, we assume that the number of data points in the time series to be analysed is . After the first set of limits are established, the counter is set to the first point after the end of this first calculation period.
Main algorithm loop begins
If there are fewer than data points remaining from the counter, then there are insufficient data points to further re-establish limits, and the algorithm terminates.
If there are no further rule-breaking runs, then the algorithm terminates. If there is at least one rule-breaking run, the algorithm sets the counter to the first data point of the first such run, now labelled the triggering run.
Again check there is sufficient data remaining from the counter. If not, terminate. If so, candidate limits are formed from the first data points beginning at the counter.
-
Decide whether to accept or reject the candidate limits:
If there at least one opposing rule break commencing within the candidate calculation period, then the candidate limits are rejected, the counter is set to the next rule-breaking run (after the triggering rule break) against the prevailing limits, and the algorithm continues from (3).
If the final run of the candidate calculation period could become an opposing rule break with the addition of more data to the time series, then the candidate limits are rejected, the counter is set to the next rule-breaking run (after the triggering rule break) against the prevailing limits, and the algorithm continues from (3).
If neither of the conditions of (i) and (ii) are met, then the candidate limits are accepted, established from the start of the triggering rule break, and extended over the rest of the data. The counter is set to the first point after the end of the new calculation period, and the algorithm continues from (3).
The algorithm is visualised in the flow chart below.
grViz(autospc:::algorithm_flow_chart_string)Using the algorithm log
If you would like to view a log of the path through the flow diagram above for a given timeseries, there are two ways you can do this.
Print log to console
To print a log out to the console, set the verbosity
argument of plot_auto_SPC() to either 1 (for
basic log information) or 2 (for full detail).
plot_auto_SPC(
ed_attendances_monthly,
chartType = "C'",
x = Month_Start,
y = Att_All,
verbosity = 1,
x_break = 365,
x_date_format = "%Y-%b"
)
#>
#> C':
#>
#> Counter at 1, 2015-05-31:
#> - Counter initialised to 1.
#> - Sufficient data to form at least one period.
#>
#> Counter at 22, 2017-03-01:
#> - Main algorithm loop commenced.
#> - Sufficient data to proceed. Moving counter to the next shift
#> rule break, commencing at point 23.
#>
#> Counter at 23, 2017-03-31:
#> - There is a shift rule break commencing here, upwards from
#> the current centre line.
#> - Forming candidate limits.
#> - Candidate limits accepted, limits re-established.
#>
#> Counter at 44, 2019-01-01:
#> - Sufficient data to proceed. Moving counter to the next shift
#> rule break, commencing at point 46.
#>
#> Counter at 46, 2019-03-01:
#> - There is a shift rule break commencing here, upwards from
#> the current centre line.
#> - Forming candidate limits.
#> - Candidate limits accepted, limits re-established.
#>
#> Counter at 67, 2020-12-01:
#> - Sufficient data to proceed. Moving counter to the next shift
#> rule break, commencing at point 71.
#>
#> Counter at 71, 2021-03-31:
#> - There is a shift rule break commencing here, upwards from
#> the current centre line.
#> - Forming candidate limits.
#> - Candidate limits accepted, limits re-established.
#>
#> Counter at 92, 2023-01-01:
#> - Insufficient remaining data for further re-establishment of
#> limits.
Save log to a file
To save the log to a file, provide a file path using the
log_file_path argument of plot_auto_SPC(). If
the file path you provide ends ".rds" the log will be saved
to an rds file, if it ends ".csv" the log will be saved to
a csv file.
The data frame that is saved has one row for each log entry, and contains the following columns:
-
counter: the counter value (row number of the dataframe passed to thedfargument ofplot_auto_SPC()) that the log entry refers to -
x: the value of thexcolumn at this counter value (the column passed asxtoplot_auto_SPC()) -
log_entry: the log entry, recorded in shorthand (see below) -
interpretation: text interpretation of the log entry
Note that full log details are saved, regardless of the value of the
verbosity argument.
Log shorthand
The log is populated as the algorithm executes, using a shorthand
notation described in the following table. This shorthand forms the
content of the log_entry column of the log dataframe
exported to log_file_path.
log_exp_tab <- autospc:::log_explanation_table| algorithm_step | shorthand | explanation |
|---|---|---|
| 1 | 0100 | Initialise algorithm. Counter = 1. |
| 2 | 02xx | Check whether there are sufficient data to form at least one set of limits. |
| x = 00: Yes | ||
| x = 10: No | ||
| 3 | 0300 | Main algorithm loop begins. |
| 4 | 04xxyy | Check whether there are sufficient data to proceed (Yes/No), and find subsequent rule breaks. |
| xx = 00: Yes - next rule break within current run | ||
| xx = 01: Yes - next rule break beyond current run | ||
| xx = 10: No | ||
| yy = position of next rule break. | ||
| 5 | 05xxyy | Check whether there are any subsequent rule breaks. |
| xx = 00: Yes | ||
| xx = 10: No | ||
| yy = 01: Next rule break downwards | ||
| yy = 10: Next rule break upwards | ||
| 6 | 06xxyz | Check whether there are sufficient data to proceed. |
| xx = 00: Yes | ||
| xx =10: No | ||
| Examine candidate limits. | ||
| y = 0: No opposing rule break | ||
| y = 1: At least one opposing rule break. | ||
| z = 0: Final run does not prevent re-establishment of limits | ||
| z = 1: Final run prevents re-establishment of limits. | ||
| 7 | 07xx | Decide whether to re-establish limits. xx = 00: Yes, xx = 10: No. |