Factors affecting turnaround time of SARS-CoV-2 sequencing for inpatient infection prevention and control decision making: analysis of data from the COG-UK HOCI study

Background Barriers to rapid return of sequencing results can affect the utility of sequence data for infection prevention and control decisions. Aim To undertake a mixed-methods analysis to identify challenges that sites faced in achieving a rapid turnaround time (TAT) in the COVID-19 Genomics UK Hospital-Onset COVID-19 Infection (COG-UK HOCI) study. Methods For the quantitative analysis, timepoints relating to different stages of the sequencing process were extracted from both the COG-UK HOCI study dataset and surveys of study sites. Qualitative data relating to the barriers and facilitators to achieving rapid TATs were included from thematic analysis. Findings The overall TAT, from sample collection to receipt of sequence report by infection control teams, varied between sites (median 5.1 days, range 3.0–29.0 days). Most variation was seen between reporting of a positive COVID-19 polymerase chain reaction (PCR) result to sequence report generation (median 4.0 days, range 2.3–27.0 days). On deeper analysis, most of this variability was accounted for by differences in the delay between the COVID-19 PCR result and arrival of the sample at the sequencing laboratory (median 20.8 h, range 16.0–88.7 h). Qualitative analyses suggest that closer proximity of sequencing laboratories to diagnostic laboratories, increased staff flexibility and regular transport times facilitated a shorter TAT. Conclusion Integration of pathogen sequencing into diagnostic laboratories may help to improve sequencing TAT to allow sequence data to be of tangible value to infection control practice. Adding a quality control step upstream to increase capacity further down the workflow may also optimize TAT if lower quality samples are removed at an earlier stage.

Factors affecting turnaround time of SARS-CoV-2 sequencing for inpatient infection prevention and control decision making: analysis of data from the COG-UK HOCI study

S U M M A R Y
Background: Barriers to rapid return of sequencing results can affect the utility of sequence data for infection prevention and control decisions. Aim: To undertake a mixed-methods analysis to identify challenges that sites faced in achieving a rapid turnaround time (TAT) in the COVID-19 Genomics UK Hospital-Onset COVID-19 Infection (COG-UK HOCI) study. Methods: For the quantitative analysis, timepoints relating to different stages of the sequencing process were extracted from both the COG-UK HOCI study dataset and surveys of study sites. Qualitative data relating to the barriers and facilitators to achieving rapid TATs were included from thematic analysis.

Introduction
The severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) pandemic has highlighted the utility of large-scale genomic sequencing to influence infection prevention and control (IPC) decisions [1,2]. While the technology to sequence pathogens rapidly using next-generation sequencing has been available for some time, its use has primarily been limited to genomic surveillance and retrospective transmission studies, often performed in large, centralised reference laboratories.
During the pandemic, the COVID-19 Genomics UK (COG-UK) Consortium established a network of sequencing hubs, pioneering a decentralised and distributed model of SARS-CoV-2 sequencing from National Health Service (NHS) hospitals [3]. The COG-UK Hospital-Onset COVID-19 Infection (COG-UK HOCI) study was nested within the COG-UK network, with the aim of assessing the impact of sequencing and its turnaround time (TAT) on several IPC outcomes [4]. The present authors recently reported that the likelihood of SARS-CoV-2 sequencing informing the IPC response to hospital-onset COVID-19 infection was dependent on the return of results within 5 days [5].
The time taken to return a potentially actionable sequence report to the IPC team is dependent on a variety of factors. For this paper, the authors further interrogated the data from the COG-UK HOCI study, alongside additional datapoints, in a posthoc mixed-methods analysis, with the aim of identifying barriers to the achievement of a rapid sequencing TAT.

Background and design of the COG-UK HOCI study
The COG-UK HOCI study was a prospective non-randomised trial to evaluate the implementation and impact of SARS-CoV-2 sequencing on IPC practice. The study was approved by the National Research Ethics Service Committee e Cambridge South (REC 20/EE/0118) [4]. The study ran from December 2020 to April 2021 across 14 UK acute NHS hospital groups. The recruiting sites were linked to one of 11 sequencing laboratories where genomic sequencing took place.
The COG-UK HOCI study was split into baseline, rapid and longer turnaround phases to evaluate whether rapid sequencing (i.e. 48 h) could improve IPC decision making in comparison with longer TAT (5e10 days), akin to using a centralised sequencing laboratory. Possible HOCIs were identified, and the respective samples were sent to the designated sequencing laboratory. A bespoke sequence report tool (SRT) was used to communicate the result to the IPC team for prospective action [6]. The SRT integrates genomic and epidemiological data from HOCIs to provide a one-page report identifying closely matched sequences within the hospital and at ward level, and assigns a probability estimate for nosocomial infection. Samples with genomic coverage <90% could not be used to generate an SRT [5,6].
Parallel independent quantitative and qualitative data collection and analysis were performed with subsequent integration of findings.

Quantitative data extraction and analysis
For each sample, dates and times were extracted from the COG-UK HOCI study dataset for the following timepoints: (i) 'COVID-19 sample [taken] to confirm diagnosis'; (ii) 'COVID-19 result reported', (iii) 'Sequence report generation', and (iv) 'Receipt of sequence report by IPC team' (Figure 1a), in addition to patient study identifier, COG-UK ID, study site, and reason sequence was not returned within expected timeframes. Genomic coverage was extracted from the Cloud Infrastructure for Microbial Bioinformatics (CLIMB), and matched to each sample by COG-UK ID, where available.
Of the 2170 samples in the extract, only samples from the rapid phase of the COG-UK HOCI study were evaluated, when sites attempted to return an SRT within 48 h of sample collection (N¼947, Figure 1b) Missing times for 'Sequence report generation' were estimated based on the 'Receipt of sequence report by IPC team' timepoint. If 'Receipt of sequence report by IPC team' was on the same date as 'Sequence report generation', corresponding missing 'Sequence report generation' times were replaced with either 00:00 if 'Receipt of sequence report by IPC team' was before 02:00 (N¼4/429, 0.9%), or 07:00 if 'Receipt of sequence report by IPC team' was after 02:00 (N¼288/429, 67.1%). All of the other missing times, including for the other timepoints, were replaced with 12:00 providing the dates were available, with sensitivity analyses undertaken to assess the impact of missing data imputation ( Figure S1, see online supplementary material). Samples where duration timepoints were unfeasible were either excluded from the analysis of their respective phase (N¼67/947, 7.1%), or corrected where additional data were available from the site survey (N¼7/947, 0.7%).
Additional data were requested from COG-UK HOCI study sites by e-mail invitation to complete a survey. Information requested included the type of sample received from the diagnostic laboratory (i.e. fresh unextracted or residual nucleic acid), frequency and method of transport between diagnostic and sequencing laboratories, number of sequencing runs per day, sequencing platform used, and availability  of additional timepoints during the sequencing process (dates and times for when each sample arrived at the sequencing laboratory, when the sample was put on the sequencer, and when analysis of raw sequence data was commenced to generate a consensus SARS-CoV-2 sequence from each sample) (Figure 1a). Seven of the 11 sequencing laboratories responded, and six agreed to provide additional timepoints where available. The sites that responded, and their median overall TATs, were Sites E (5.7 days), L (3.0 days), J (5.4 days), M (5.0 days), K (6.0 days), I (4.1 days) and A (11.9 days) (Table S1, see online supplementary material). These sequencing laboratories processed 444 of the COG-UK HOCI rapid phase samples (N¼444/ 947, 46.9%), and just over half had an SRT returned (N¼240/ 444, 54.1%) (Figure 1b). Reliable timepoints were available for 'Arrival at sequencing laboratory' and 'Time started on sequencer' for all six laboratories, which allowed deeper analysis of the sequencing phase from the COG-UK HOCI study dataset ( Figure 1A). The sequencing laboratory for Sites E, J and K was only able to provide the dates for both of these timepoints, so times were estimated based on their standard practice. Although Site H responded to the survey, it was excluded from the analysis as it did not sequence any samples successfully within the rapid phase of the COG-UK HOCI study. If the duration between timepoints was illogical (i.e. <0 h), the sample was excluded from analysis of that specific segment: PCR result to arrival at sequencing laboratory (N¼3/444, 0.7%) and analysis (N¼40/240, 16.7%).
Analysis of variance was used to calculate significance between sites for each of the durations in Figure 1a. Analysis was performed using R Studio (2021.09.1þ372 'Ghost Orchid' Release).

Qualitative design and analyses
Using a purposive subsample of five heterogenous study sites, 39 diverse professional participants, all directly involved in implementing the COG-UK HOCI study, took part in semistructured interviews between 23 rd December 2020 and 2 nd June 2021. Data collection focused on their HOCI experiences. A balance of deductive and inductive thematic analysis was conducted by a team of trained qualitative analysts. In this article, only the findings that relate to the barriers and facilitators to achieving rapid TATs are reported. The main results are presented by integrating the quantitative data on TATs, and qualitative findings where appropriate.
Qualitative analyses illuminated potential reasons for the low rates of meeting expected TATs, highlighting the fragility of the whole rapid SRT pipeline, where problems in any single step had consequences for others: 'you only need one thing to go wrong and it sort of snowballs really'. In this way, TAT was vulnerable to the effects of COVID-19, although automated processes and effective communication could help. Table I details qualitative barriers and facilitators to meeting rapid TATs in each phase. Many rate-limiting factors in the diagnostic phase related to the impacts of COVID-19 at the time of data collection, and reported teething troubles with diagnostic processes. Facilitators to the diagnostic phase focused on the efficient transporting of swabs to the diagnostic laboratory, and automated systems to pick up HOCIs. For the reporting phase, barriers and facilitators related to both the generation and the dissemination of the report. Peer learning across sites generating reports and automated report generation were also notable facilitators.

Detailed breakdown of the sequencing phase
The sequencing phase was broken down using the additional timepoints from the sites that responded to the survey, in order to further analyse sequencing laboratory activity (Figure 3).

COVID-19 result reported to arrival of sample at sequencing laboratory
Large variability was seen between sites for the median duration between 'COVID-19 result reported' and 'Arrival at sequencing laboratory' (median 20.8 h, range 16.0e88.7 h; P<0.0001; Table S2, see online supplementary material). The authors investigated whether the relative location of the diagnostic and sequencing laboratories could explain this variability. Site K was the furthest distance away from its sequencing laboratory (w205 km), and had the longest median time of 88.7 h. Sites I and L had the shortest median times of 16.0 h and 19.0 h, respectively, and their diagnostic and sequencing laboratories were much closer; in addition, the frequency of transport was greater at Site L (Table S3, see online supplementary material). Qualitative analyses suggest that the proximity of laboratories reduced delays, as did regular scheduled transport and dedicated pick-up times ( Table I). The ability of sequencing laboratory staff to work flexibly also enabled more rapid TATs.

Pre-sequencing
Pre-sequencing activity was calculated from 'Arrival at sequencing laboratory' through to 'Started on sequencer', and included extraction (if required), PCR, library preparation, and time between each process. All sequencing laboratories surveyed, except that for Site M, received fresh unextracted samples rather than residual nucleic acid. DNA quantification and normalisation was performed as a quality control (QC) step prior to library preparation at Sites L, M, I and A. Sites L, I and A performed DNA quantification on all samples, whilst Site M selected representative samples and their controls for testing.  Boxplots represent interquartile range (IQR)25, median and IQR75. For 3a, the y axis was broken in order to show outliers using the R package ggbreak [13]. (e) Median durations and number of samples for each stage of process from 'COVID-19 result reported' onwards for the samples processed within the rapid phase from the surveyed sites (N¼444/947). As the 'Primary analysis' timepoint was not available for Sites E, J and K, the 'Sequence report generation' timepoint from the COVID-19 Genomics UK Hospital-Onset COVID-19 Infection study dataset was used in lieu, which corresponds to Figure 3cee (Table S4, see online supplementary material). Site L was the only site that reported performing two library preparations per day. Only minimal differences were seen between the estimated library preparation times by sequencing platform (Table S2, see online supplementary material).

On sequencer
The time spent on the sequencer was calculated from the 'Started on sequencer' timepoint through to the 'Primary analysis' timepoint, as the time the sequencing run was stopped was not readily available. The 'Primary analysis' timepoint was not available for Sites E, J and K, so the 'Sequence report generation' timepoint was used in lieu. The median time spent on the sequencer for samples where the SRT was returned was 17.1 h (range 5.7e62.6 h; P<0.0001) (Table S2, see online supplementary material). Site I had the shortest median time on the sequencer, and reported starting sequencing their samples around 12:00e16:00 to start primary analysis the same evening. Site A had the longest median time spent on the sequencer, and reported having too many other samples to process, or having issues with the reporting tools CLIMB and GLUE (N¼9/11, 81.8%) [7].

Analysis
The duration of analysis was calculated from 'Primary analysis' (or 'Sequence report generation' for Sites E, J and K as mentioned previously) through to 'Receipt of sequence report by IPC team'. The median duration of analysis across all sites was 4.6 h, with a range of 2.9e135.8 h (P<0.0001; Table S2, see online supplementary material). The sites that reported using DNA quantification as a QC step prior to library preparation had higher percentages of SRT return within 5 days than sites that did not have a QC step ( Figure 3E).

Discussion
The COG-UK HOCI study found that returning an SRT within 5 days changed the actions of the IPC teams in approximately 20% of HOCIs [5]. This mixed-methods analysis found that many sites did not manage to return any of their SRTs within 5 days, and identified some of the challenges that sites faced.
As the greatest intersite variability was seen in the time between the diagnostic PCR result and the arrival of the sample at the sequencing laboratory, an obvious factor to optimise TAT would be to reduce the distance and/or increase the transport frequency between diagnostic and sequencing laboratories, as described in the qualitative data. Integrating sequencing into diagnostic laboratories could be an ideal solution, and would also facilitate the transfer of patient level data including current location and prior ward movements from patient administration systems, and provide the geotemporal data required for easy and rapid interpretation of sequence reports. Integrated laboratories have also been reported to increase regional and national processing power for the surveillance of antimicrobial resistance [8]. Where integration is not possible, reducing the distance between laboratories and regular dedicated transport times would allow laboratories to plan their workflows in order to optimise TAT and reduce the likelihood of missing samples.
The second greatest variability was seen in the median duration between the start of primary analysis and receipt of the SRT by the IPC team. As sites reported they were overwhelmed with processing other samples during the COVID-19 pandemic, adding a QC step may increase capacity further down the workflow; however, caution should be applied if CT values are used, given that significant variability between laboratories has been reported [9]. Additionally, if sites were able to run samples on the sequencer earlier in the day, the sequencing process could be stopped and its output analysed within the same day, allowing a shorter sequencing and analysis time.
Outside of the COVID-19 response, there have been successful reports of the use of rapid sequencing to influence the IPC response for other pathogens of interest, such as meticillinresistant Staphylococcus aureus, Clostridioides difficile and vancomycin-resistant enterococci [10,11]. For each pathogen, laboratories would have to consider pathogen-specific factors which could affect TAT, such as the time required for culturing bacteria or the frequency of performing other tests, such as immunoassays. The pressure on the microbiology laboratory workload would also vary at different points of the year, and outside of a pandemic, would likely also have an impact on TAT.
This study was limited by missing and erroneous times within the datasets; however, the authors were able to mitigate this, in part, through either correcting, estimating or excluding timepoints. In addition, the potential of volunteer bias within the survey data is recognised, as sites with shorter TATs were more likely to respond and provide additional data. Although the authors were unable to conclude whether genomic coverage was affected by either the sample type received by the sequencing laboratory, or the time between sample collection and extraction, it is well reported that RNA is at risk of degradation if samples are not processed promptly [12].
In conclusion, IPC interventions in response to presumed nosocomial transmission events are often resource intensive in terms of human, financial and operational impact, and thus practice developments which confirm or refute case linkage within a clinically meaningful time scale have the potential to be of great benefit to healthcare services.
These results present evidence supporting the integration of pathogen sequencing into diagnostic laboratories, in order for sequence data to be of tangible value to IPC practice.
Laboratories using rapid sequencing for IPC purposes may be able to utilise these findings to streamline and optimise their own workflows for SARS-CoV-2 and other pathogens. The challenges and optimal TATs of integrated sequencing for IPC use on a larger scale need further analysis for other pathogens, including whether challenges faced by sites would be similar outside of a pandemic if a short TAT is desired.