Lessons learned: use of WGS in real-time investigation of suspected intrahospital SARS-CoV-2 outbreaks

Background Severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) has been a continuing source of hospital-acquired infection and outbreaks. At Akershus University Hospital in Norway, traditional contact tracing has been combined with whole-genome sequencing (WGS) surveillance in real-time to investigate potential hospital outbreaks. Aim To describe the advantages and challenges encountered when using WGS as a real-time tool in hospital outbreak investigation and surveillance during the SARS-CoV-2 pandemic. Methods Routine contact tracing in the hospital was performed for all healthcare workers (HCWs) who tested positive for SARS-CoV-2. Viral RNA from all positive patient and HCW samples was sequenced in real-time using nanopore sequencing and the ARTIC Network protocol. Suspected outbreaks involving five or more individuals with viral sequences were described. Findings Nine outbreaks were suspected based on contact tracing, and one outbreak was suspected based on WGS results. Five outbreaks were confirmed; of these, two outbreaks were supported but could not be confirmed by WGS with high confidence, one outbreak was found to consist of two different lineages, and two outbreaks were refuted. Conclusions WGS is a valuable tool in hospital outbreak investigations when combined with traditional contact tracing. Inclusion of WGS data improved outbreak demarcation, identified unknown transmission chains, and highlighted weaknesses in existing infection control measures.

Lessons learned: use of WGS in real-time investigation of suspected intrahospital SARS-CoV-2 outbreaks Introduction During the ongoing coronavirus disease 2019 (COVID- 19) pandemic, severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has been a continuing source of hospital-acquired infection and outbreaks [1e5]. Frequent viral transmission in the community makes it challenging to separate external introduction of the virus from intrahospital transmission. As reported by the World Health Organization's weekly epidemiological updates on COVID-19, infection rates and circulating virus variants have been changing throughout the pandemic. There have been uncertainties regarding the transmission potential of each variant and the adequacy of different infection control measures [6]. As the pandemic evolved, detailed outbreak investigation and surveillance have been critical to inform and adapt sufficient, but not excessive, infection control measures.
The use of whole-genome sequencing (WGS) in combination with epidemiological data has been shown to provide a more detailed picture of transmission [2,5,7e11], and enable rapid phylogenetic analyses leading to timely and improved infection control measures [12,13]. However, previous studies have generally been retrospective and covered short time periods.
Akershus University Hospital (Ahus), Lørenskog, Norway is a local hospital serving approximately 10% of the Norwegian population. At Ahus, viral genomes from all eligible patients and employees testing positive for SARS-CoV-2 have been sequenced continuously since February 2021. The sequences have been used to produce phylogenetic trees, identifying clusters of closely related viral genomes which may indicate intrahospital transmission. By combining phylogenetic information with epidemiological data, the hospital's infection control staff could adapt infection control measures to match the ongoing situation. The aim of this study was to describe the advantages and challenges encountered when using WGS for real-time outbreak investigation and surveillance during the SARS-CoV-2 pandemic. Ten potential outbreaks were chosen to highlight the lessons learned.

Setting
Ahus provides all the standard specialties for somatic/ emergency care hospitals, and specialist health services in mental health care and drug addiction. In 2020, the hospital had approximately 760 somatic/emergency care beds and 10,000 employees.

Contact tracing
During the pandemic, the results of SARS-CoV-2 polymerase chain reaction (PCR) tests from all Norwegian laboratories were recorded in a national registry. In addition, the municipal contact tracing teams performed contact tracing in the community around each positive case. All hospital employees and students who tested positive for SARS-CoV-2 were also obliged to contact the hospital's infection control staff, who initiated intrahospital contact tracing around each case. Close contacts of the infected person were quarantined (i.e. hospitalized patients were isolated under contact precautions and staff were quarantined at home). Contact tracing around a case started 48 h before the first symptoms. Close contacts were defined according to national guidelines as: (i) anyone who had been closer than 2 m from the infected person for >15 min; and (ii) anyone who had been in direct physical contact with the infected person or their body fluids without wearing personal protective equipment. If the infected person had a positive sample but no symptoms, contact tracing started 48 h before the positive test was taken. Similar contact tracings were performed for all SARS-CoV-2positive hospital visitors and infected patients who had not been handled with isolation precautions from when they entered the hospital.
All close contacts were logged, allowing the infection control staff a detailed overview of possible transmission routes and hospital outbreaks based on epidemiological data. If SARS-CoV-2 was detected among two or more close contacts, the cases were considered an outbreak with probable direct transmission. Also, if a ward had several new cases <10 days apart, an outbreak was suspected even if close contact between the people involved could not be established. WGS of SARS-CoV-2 was then applied to refute or confirm transmission between the cases.
For this study, suspected outbreaks involving five or more individuals were chosen to illustrate different outbreak settings. The index patient was defined as the first acknowledged case in a suspected outbreak.

RNA extraction
Viral RNA was isolated from naso-/oropharyngeal swabs using NucliSENS easy Mag following the manufacturer's protocol for extraction of total nucleic acids from airways samples (bio-Mérieux, Marcy l'Etoile, France). SARS-CoV-2 was detected using qualitative reverse transcription PCR (RT-PCR) targeting the Egene based on Corman et al.'s method [14]. Cycle threshold (Ct) values were determined for all samples. Samples with high Ct values (>35) or stored incorrectly were not sequenced. All positive samples and eluates were stored at -80 C.

Sequencing
Sequencing was performed routinely twice per week with 48 samples per run. The total time from extraction of nucleic acid to the final analysed sequencing results was approximately 50 h. The selection of samples for sequencing was based on the current outbreak situation, where SARS-CoV-2-positive employees and hospitalized patients were prioritized. The nCoV-2019 sequencing protocol v3 (ARTIC Network) was used for library preparation and sequencing (https://www.protocols.io/view/ ncov-2019-sequencing-protocol-v3-locost-bp2l6n26rgqe). The method uses tiled multiplex primers for direct amplification of cDNA. Samples sequenced before 13 th October 2021 were amplified with the Artic v3 primer set, and the Artic v4 primer set was used from 14 th October 2021 onwards. The annealing temperature was 63 C. Cycle numbers were set to 30 for samples with a Ct value <30 and 35 for samples with Ct values between 30 and 35. The libraries were sequenced on a GridION sequencer (Oxford Nanopore Technologies plc, Oxford, UK). Consensus genomes with low coverage (<90%) were discarded.

Bioinformatics
The bioinformatics pipeline for analysis of viral amplicon data sequenced with nanopore technology developed by the ARTIC Network (https://github.com/artic-network/fieldbioinformati cs) was used to generate consensus genomes. Pango nomenclature (v4.0.6) was used for lineage assignment [15]. Multiple sequence alignments of consensus genomes were made using MAFFT (aligned to reference sequence MN908947.3), and phylogenetic trees were constructed with FastTree in Geneious Prime (v.2022.1.1. Biomatters). Single nucleotide polymorphisms (SNPs) were visualized using Geneious Prime. Consensus genomes sequenced from the local area (Viken County, Norway) during the period were downloaded from GISAID (https://www.gisaid.org/) and used as the phylogenetic background.

Ethics
The study was approved by the Regional Committee for Medical and Health Research Ethics (Ref. No. #159268) and the local Data Protection Officer (2020_171). The employees were given written information and the opportunity to refuse to participate in the study. Data were recorded as part of the hospital's routine for outbreak investigations, as authorized by the institutional infection control programme and the Norwegian regulation of infection control in the healthcare service (FOR-2005-06-17-610).

Results
The major outbreaks were identified in 2021. In total, 729 HCWs at Ahus tested positive for SARS-CoV-2 in 2021. Of these, 513 samples were analysed at Ahus, and 429 of these samples were sequenced successfully. Ten suspected outbreaks based on contact tracing and/or WGS results were further described; of these, nine were identified by contact tracing and one was identified by WGS (Table I).
Affected HCWs and patients with sequenced viral genomes are presented in Table S1 (see online supplementary material). A phylogenetic tree with samples from potential outbreaks from January to April 2021 is presented in Figure 1. All viral genomes sequenced from patients (N¼13) and HCWs (N¼94) during this period, together with background genomes from GISAID (N¼97), were included in the tree. A phylogenetic tree with viral genomes from potential outbreaks from September to December 2021 is presented in Figure 2. All samples sequenced at Ahus in the period were included in the tree. From February to August 2021, no potential outbreaks met the study criteria of five or more successfully sequenced samples.
The individual outbreaks are detailed below. (Suspected) Outbreaks AeI were identified through contact tracing, and (suspected) Outbreak J was identified through WGS surveillance (see Table S1, online supplementary material for details).
Outbreak A January 2021: 11 HCWs and four patients from four different wards were suspected to be involved based on contact tracing. Eight samples were sequenced successfully. All samples were assigned lineage B.1.36.21 and showed no SNP differences. WGS confirmed the outbreak.

Outbreak B
January 2021: This potential outbreak involved five HCWs and three patients from five different wards. Six samples were sequenced successfully. Samples from two patients (P4 and P6; Table S1, see online supplementary material) were assigned lineage B.1.1.333 and showed no SNP differences. The remaining samples were assigned B.1.36.21 (N¼4). P5 had one SNP difference from the HCW samples.
P4 and P6 had been on the same ward. P6 had also been on the same ward as the three HCWs. P5 had been exposed to two of the HCWs. WGS unravelled two smaller hospital outbreaks, separating P4 and P6 from HCW6, HCW7, HCW8 and P5.

Outbreak C
January 2021: 16 HCWs and seven patients from five different wards were suspected to be involved. Nineteen samples were sequenced successfully. All samples were assigned lineage B.1.1.333. Three HCWs (HCW11, HCW15 and HCW21) and one patient (P10) had one SNP difference compared with the main cluster haplotype consisting of the remaining samples (N¼15). HCW92 was from a ward not suspected to be part of the outbreak, and was therefore not identified by contact tracing. HCW92 had symptoms 9 days after the suspected index sample, and showed no SNP differences from the main cluster haplotype. As several of the patients in the outbreak were transferred to the ward where HCW92 worked, this HCW may have been in contact with one or more patients from the outbreak. If so, this was after the patient(s) were isolated, and with the use of protective equipment. One background sequence from GISAID (community sample) grouped to the outbreak. However, this was collected 1 month after the suspected index sample. WGS revealed one outbreak larger than defined by contact tracing, involving one more ward than suspected.  Each virus was not analysed thoroughly at SNP level in realtime due to the large sample volume and limited time resources. All viral genomes clustered together and were considered to be part of the same outbreak. Two samples for P15 were collected over two consecutive days, showing a variation of two SNPs. Two samples (HCW93 and P77), not found by contact tracing, were included in the outbreak based on the WGS results. Contact tracing was performed after WGS, and connections between HCW93 and the outbreak were identified. By comparing data from hospital contact tracing with community contact tracing, contact was also found between the outbreak and community cases. In retrospect, three HCWs (HCW27, HCW30 and HCW31) could be excluded from the outbreak, differing from the outbreak samples by at least five SNPs compared with all other suspected outbreak sequences. No clades with five or more identical (one or fewer SNP differences) sequences were found.

Outbreak D
Why did this outbreak become so extensive? The index patient (P30) was asymptomatic upon admittance and had severe immunodeficiency. Some patients were admitted to an intensive care unit, which transfers patients to many different wards, facilitating transmission to many units. Several patients had haematologic cancer that can disguise symptoms and dispose for long-lasting viral expression [16]. Retrospectively, a connection to a previous outbreak was identified, where the same index patient (P30) had caused a smaller outbreak in August 2021. The patient was readmitted and caused Outbreak D. Later, P30 also infected HCW93 on a different ward 4 weeks after initial sampling.
WGS unravelled a connection between three outbreaks, probably caused by one immunodeficient patient. The patient was contagious for a very long time, but contact precautions were stopped after 2 weeks when there were no COVID-19 symptoms. The outbreak was shown to include community cases. Confirmation by WGS of all transmissions was challenging due to the high number of suspected cases and SNP variations.   haplotype. HCW58 and P41 were identical and shared two SNPs. Based on WGS analysis, three additional samples (HCW94, P78 and P79) were found to be identical to the remaining samples (N¼35). However, they had no known close contact with other infected patients or employees. Due to the high number of cases, hospital data were compared with community contact tracing, identifying possible transmission between outbreak and community cases.

Outbreak E
Two wards and several units with employees who worked with diagnostics or facilities management were involved. The outbreak occurred after extensive vaccination of both HCWs and patients, possibly masking some of the common COVID-19 symptoms. Twenty-eight HCWs and 25 patients had received two or more doses of the COVID-19 mRNA vaccine. WGS confirmed one outbreak. However, two employees were excluded, and the outbreak was shown to include community cases.

Outbreak F
November 2021: 17 HCWs and 16 patients from six different wards were suspected to be involved. Thirty-five samples were sequenced successfully and were assigned lineage AY.112. Five  HCWs (HCW65, HCW67, HCW71, HCW72 and HCW 78) and four patients (P55, P56, P58 and P65) had one SNP difference compared with the main cluster haplotype, whereas HCW75 and P61 had two SNP differences and P67 had four SNP differences. P81 was identical to the remaining samples (N¼23) based on WGS analysis, while contact tracing could not establish a connection to the outbreak.
One HCW (HCW77) from a ward that had not previously been involved in the outbreak had an identical viral sequence as one of the patients. HCW77 had tended to P58 while the patient had been under contact precautions. WGS confirmed one outbreak, with six wards included. However, one patient (P67) was excluded.

Outbreak G
November 2021: Three HCWs and three patients were suspected to be involved. All six samples were sequenced successfully and showed no SNP differences (lineage AY.127). WGS confirmed one outbreak.   617.2, respectively). Based on the WGS results, this outbreak was reduced to a possible transmission between two HCWs, with no patients involved.

Outbreak I
December 2021: Two HCWs and three patients were suspected to be involved. Five samples were sequenced successfully, and all were assigned lineage AY.127. The sequences were identical apart from a drop-out region in P73 and P74, probably due to suboptimal sequencing primers for this lineage. HCW95, who had no known contact with the outbreak, had an identical viral sequence to the main cluster haplotype. In the phylogenetic tree shown in Figure 2, the samples with dropout regions grouped with community samples, and the remaining samples (HCW85, HCW86 and P75) grouped with two HCWs and two community samples. In real-time, this connection was not found. In retrospect, no contact was found between these new samples and the suspected outbreak.
WGS results supported contact tracing results in real-time. However, lineage AY.127 was common in the community at that time, and multiple introductions cannot be ruled out.

Outbreak J
December 2021: Five HCWs and one patient were suspected to be involved based on WGS surveillance. The samples were assigned lineage BA.1.21. Two HCWs (HCW89 and HCW90) shared one SNP difference compared with the main sequence for Outbreak J. The lineage was common in the community at the time, and showed low SNP diversity. There was no known contact between the HCWs and the patient. The outbreak was refuted based on contact tracing and the high prevalence of BA.1.21 in the community.

Discussion
This study shows that combining WGS with traditional contact tracing in a hospital gives a more detailed picture of the outbreak situation. Of the nine suspected outbreaks from contact tracing, five outbreaks were confirmed, two outbreaks were supported but could not be confirmed with high confidence, one outbreak was found to consist of two outbreaks, and one outbreak was refuted based on WGS results. New information was found in three of the five confirmed outbreaks. The suspected outbreak based on WGS (Outbreak J) was refuted and interpreted as originating from different community sources, as no known contact in the hospital was detected during contact tracing. In total, four new possible transmissions were detected, and four transmissions were refuted.
The combination of contact tracing and WGS provided highresolution outbreak investigations that assisted in outbreak demarcation. In periods with high prevalence, contact tracing can potentially link HWCs with unrelated infection sources. In these cases, WGS can be used to exclude individuals from outbreaks. A positive test for SARS-CoV-2 and known contact are insufficient to prove transmission, as illustrated by Outbreak H, where five individuals had known contact and WGS results revealed four different lineages. In an English study, a suspected outbreak in a paediatric general surgical ward was refuted due to WGS results, excluding the need to change the infection control measures [13].
For infection control staff, it is vital to know whether the infection control measures in place in the hospital are working. Hence, refuting a hospital outbreak may serve to inform decisions on whether or not to keep existing routines or to increase costly control measures to prevent in-hospital transmission. In cases with simultaneous import of infection from multiple sources outside the hospital, different control measures designed to prevent this can be assigned, such as stricter visitor control, repeated testing after admission, and stricter work restrictions for HCWs with infected family members.
WGS surveillance can also discover unknown transmissions or outbreaks that were not detected by contact tracing alone. Outbreaks C and D illustrate how WGS can indicate possible shortcomings of infection control measures in a specific ward by establishing a link between a patient under contact precautions and a HCW tending to this patient. This study also found a potential outbreak (Outbreak H) based on WGS surveillance alone. However, no contact was reported between these individuals. The WGS surveillance detected samples from primary care with viral sequences identical to the main cluster haplotypes. In Outbreak E, the hospital contact tracing found connections to the outbreak, showing that WGS surveillance can identify unknown transmission. Similar results have been described previously [2].
There are some challenges when using WGS to confirm or identify new outbreaks. One is defining/setting a cut-off for the number of SNP differences allowed within an outbreak. Diversity in the SARS-CoV-2 genomes was investigated by identifying SNP differences. Previous studies have reported a cut-off number of SNP differences of up to two SNPs [10e12,17]. In outbreaks involving fewer individuals over a short timeframe, a cut-off of one SNP difference was found to be efficient as long as the lineage showed some SNP variation in the community, as illustrated in Outbreaks AeC and GeI. However, the present study found that SNP differences may accumulate in outbreaks spreading over longer periods and involving many individuals. In these cases, a cut-off at one SNP difference was considered to be too strict, potentially resulting in the false exclusion of samples. Two SNP differences was found to be a more reasonable cut-off in Outbreak E and F. Suspected Outbreak D contained sequences with higher SNP diversity than Outbreaks E and F. A possible explanation could be that it involved patients with immunodeficiency, which is associated with rapid accumulation of mutations [18e20]. A cut-off was not set in real-time for this outbreak, but the three samples with five SNP differences could have been excluded from the outbreak with high confidence.
An alternative to investigating SNP differences is to examine study-unique variants, as described by Løvestad et al. [2]. However, at the time of their study, only a few SARS-CoV-2 genomes from Norway (N¼73) were uploaded in GISAID. Now, the GISAID database contains over 40,000 genomes collected in 2021 from Norway, making study-unique variants an unsuitable method. SNP differences were therefore used when analysing phylogeny.
Another challenge when using WGS to confirm or identify new outbreaks is low genetic diversity in the dominant viral variant in the community. This low genetic diversity and high infection rates make it demanding to distinguish outbreak cases from sporadic cases. In Outbreak J, the viruses had a variant (Omicron) newly introduced in the community with low SNP diversity, resulting in a potential outbreak identified by WGS surveillance but refuted based on contact tracing. Limited genetic diversity in outbreak genomes also hampered the reconstruction of individual transmission events in a Swiss outbreak report [21]. WGS analysis alone cannot be trusted to present a complete picture of an outbreak. Optimally, to discover potential outbreaks based on WGS, the outbreak should consist of more than two individuals, appear within a short time period, and be caused by a lineage with some SNP diversity in the community and low/no diversity within the outbreak.
In some cases, WGS indicated similar viral genomes, but contact tracing could not identify any connection. The lack of connection may be due to high prevalence in the community and/or undiscovered contact.
Interestingly, a patient in Outbreak D was found to have caused three outbreaks over 1 month. Outbreak D showcases how WGS can unravel the connection between different outbreaks defined by contact tracing. Both Outbreaks C and D illustrate how WGS can indicate possible shortcomings of infection control measures in a specific ward by establishing a link between a patient under contact precautions and a HCW tending to this patient. In these outbreaks, a possible connection was identified between a HCW and a patient who had been under contact precautions. At the time, the staff used surgical face masks in the patient rooms, and respirators only if the patient was treated with a considerable oxygen flow (>6 L/ min), if aerosol-generating procedures were performed, or if the HCW needed to stay close to the patient for >15 min. This highlights how WGS can be used to confirm theories of transmission where the strict definition of contact used in contact tracing falls short.
Communication of WGS results in a meaningful way is vital. To facilitate communication between the laboratory and hospital infection staff, it may help to introduce a range of confides (high probability, some probability or low probability of being a part of an outbreak). In cases with difficulties interpreting WGS results, contact tracing should take preference. Using the Pango dynamic nomenclature also presents a communication challenge when identical sequences can be defined as different lineages if determined some time apart.
This study was limited to experience from one hospital in a region of Norway. However, the hospital covers approximately 10% of the Norwegian population and the experience is from 2 years. The analyses did not include samples not taken at the hospital or with high Ct values. Some cases were, therefore, not investigated with WGS. However, the analysed sequences give a representative overview of the different situations encountered during the pandemic.
In conclusion, WGS is a valuable tool in outbreak investigation in hospitals when combined with traditional contact tracing. Inclusion of WGS data improved outbreak demarcation, identified unknown transmission chains, and highlighted weaknesses in existing infection control measures.