Advertisement
Leader| Volume 135, P157-162, May 2023

Download started.

Ok

Recognition of hand disinfection by an alcohol-containing gel using two-dimensional imaging in a clinical setting

  • D. Figueroa
    Correspondence
    Corresponding author. Address: Graduate School of Engineering Science, Room J-205. 1-3, Machikaneyama, Toyonaka, Osaka, 560-8531, Japan. Tel.: +81 6-6850-8330.
    Affiliations
    Graduate School of Engineering Science, Osaka University, Toyonaka, Japan
    Search for articles by this author
  • S. Nishio
    Correspondence
    Corresponding author. Address: Graduate School of Engineering Science, Room J-205. 1-3, Machikaneyama, Toyonaka, Osaka, 560-8531, Japan.
    Affiliations
    Institute for Open and Transdisciplinary Research Initiatives, Osaka University, Suita, Japan
    Search for articles by this author
  • R. Yamazaki
    Affiliations
    Institute for Open and Transdisciplinary Research Initiatives, Osaka University, Suita, Japan
    Search for articles by this author
  • E. Ohta
    Affiliations
    Department of Infection Prevention and Control, Osaka University Hospital, Suita, Japan
    Search for articles by this author
  • S. Hamaguchi
    Affiliations
    Department of Infection Prevention and Control, Osaka University Hospital, Suita, Japan

    Department of Transformative Analysis for Human Specimen, Graduate School of Medicine, Osaka University, Suita, Japan

    Division of Fostering Required Medical Human Resources, Center for Infectious Disease Education and Research, Osaka University, Suita, Japan
    Search for articles by this author
  • M. Utsumi
    Affiliations
    Department of Health Science, Graduate School of Medicine, Osaka University, Suita, Japan
    Search for articles by this author
Open AccessPublished:March 02, 2023DOI:https://doi.org/10.1016/j.jhin.2023.01.021

      Summary

      Background

      Hand hygiene compliance is important for the prevention of healthcare-associated infections. The conventional method of measuring hand disinfection guidelines involves an external observer watching the staff personnel, which introduces bias, and observations are only made for a set period of time. An unbiased, non-invasive automated system for assessing hand sanitization actions can provide a better estimate of compliance.

      Aim

      To develop an automated detector to assess hand hygiene compliance in hospitals, without bias from an external observer, capable of making observations at different times of the day, as non-invasive as possible by using only one camera, and collecting as much information as possible from two-dimensional video footage.

      Methods

      Video footage with annotations from various sources was collected to determine when staff performed hand disinfection with gel-based alcohol. The frequency response of wrist movement was used to train a support vector machine to identify hand sanitization events.

      Findings

      This system detected sanitization events with an accuracy of 75.18%, a precision of 72.89%, and a recall of 80.91%. These metrics provide an overall estimate of hand sanitization compliance without bias due to the presence of an external observer while collecting data over time.

      Conclusion

      Investigation of these systems is important because they are not constrained by time-limited observations, are non-invasive, and they eliminate observer bias. Although there is room for improvement, the proposed system provides a fair assessment of compliance that the hospital can use as a reference to take appropriate action.

      Keywords

      Introduction

      Proper hand hygiene is an important factor in healthcare facilities. Improper hand disinfection is the most common route of pathogen transmission between patients and a major cause of healthcare-associated infections in hospitals, putting patients and healthcare workers at risk of infection [
      • Allegranzi B.
      • Pittet D.
      Role of hand hygiene in healthcare-associated infection prevention.
      ,
      • Sayeed Mondol M.A.
      • Stankovic J.A.
      HAWAD: Hand washing detection using wrist wearable inertial sensors.
      ]. To measure hand hygiene compliance, the direct observation by a validated observer has been used as the gold standard. Even with this method, inaccurate data can occur due to observation bias caused by the presence of an observer influencing staff behaviour; observer bias, a systematic error caused by variations in the method of observation; and selection bias, caused by the systematic selection of certain settings to conduct the observation, resulting in inaccurate measurements [
      World Health Organization
      WHO guidelines on hand hygiene in healthcare.
      ]. An ideal indicator is expected to measure hand hygiene performance in an unbiased manner, without affecting the behaviour of the observed individuals, and to capture cleansing, even during complex care activities [
      World Health Organization
      WHO guidelines on hand hygiene in healthcare.
      ]. Several systems have already been developed to achieve this goal. Sayeed and Stankovic used wrist-wearable inertial sensors along with neural network techniques to evaluate detection [
      • Sayeed Mondol M.A.
      • Stankovic J.A.
      HAWAD: Hand washing detection using wrist wearable inertial sensors.
      ]. However, asking healthcare workers to wear a sensor might be intrusive and introduce bias. Other approaches assume that medical staff perform hand disinfection at a specific location, such as a sink or alcohol-based gel dispenser. A study by Singh et al. used depth imaging footage from sensors placed over hand hygiene dispensers and detected when they were used [
      • Singh A.
      • Haque A.
      • Alahi A.
      • Yeung S.
      • Guo M.
      • Glassman J.R.
      • et al.
      Automatic detection of hand hygiene using computer vision technology.
      ]. This achieved high accuracy but limited the disinfection to specific dispensers with multiple sensors. Other studies combine both ideas and employ user identification using radio frequency identification tags on medical personnel and detect which person approaches a dispenser or sink to measure compliance [
      • Dandekar H.
      • Deo S.
      • Deo S.
      • Date S.
      • Chougule Y.
      Internet of Things based user identification and hand sanitization system with non-contact temperature measurement.
      ]. Most of the work involved in identifying and measuring hand hygiene compliance requires multiple sensors, which can make the hospital space cluttered or require staff to wear a device, leading to bias. Therefore, this project aimed to determine hand hygiene with the smallest number of sensors possible, while not requiring medical workers to wear a specific device.
      As there are many alcohol-based gel dispensers in the hospital and medical staff may carry their personal gel bottles, placing cameras or detectors to properly record hand hygiene behaviour would be difficult and expensive. Therefore, we propose the use of image data from a single camera pointing at the hallways to obtain a view showing multiple staff entering and exiting patients' rooms to obtain as much data as possible from a single camera, create a non-invasive system, avoid cluttering the workplace with sensors, and reduce the likelihood of dust and biofilms accumulating on the equipment. We have obtained two-dimensional (2D) images where occlusions or orientation or people due to missing data can lead to false detections, causing problems in identifying hand disinfection actions, e.g. when people use the sanitizer gel sideways and we can only see one of their arms. Our work proposes a method to overcome these problems to some extent to identify hand sanitization in multiple people using multiple disinfection sources. The objective of this project is not to have a system that can perfectly detect every disinfection activity, but to have a general idea of how many hand hygiene activities occur in the hospital. This can be used as an approximation so that the hospital can decide on actions to increase the compliance rate.

      Methods

      The current study proposed to use footage from a single camera placed near the ceiling of the intensive care unit (ICU) at Osaka University Hospital, to track each person and detect when a person performs hand hygiene. The ICU was selected because of its high concentration of medical staff. Because patients in this ward are believed to be in a critical condition of health, hand hygiene is especially important. This study was approved by the ethics committee of the Graduate School of Medicine, Osaka University (approval number: 20475). ICU staff were informed of the recordings and could use an opt-out procedure if someone did not want to be recorded. A wide range of views allows more people to be captured in each image, which is desirable, but also makes it difficult to capture fine details such as fingers. Therefore, we focused on the wrist motion, which can be identified with pose detection algorithms, and obtained motion patterns to classify whether a healthcare worker was performing hand sanitization. There are several challenges in hand sanitization detection with a 2D camera, such as the use of alcohol-based gel from different sources by the staff, movements while rubbing hands, ambiguous data caused by the position or the person with respect to the camera, which allows detection of a limited part of the body, or data loss because a person is too far away from the camera to be detected correctly. We have assumed that there is an inherent bias caused by the introduction of the camera; however, observer bias is avoided because the system always uses the same process. The Hawthorne effect causes short-term changes due to the observation, but we assumed that this could be neglected because the camera was always present.

      Data acquisition

      To record the data, the minimal setup included a camera, a solid-state drive (SSD) to store the recordings, an NVIDIA Jetson Nano board to process the raw data from the camera, and a portable Wi-Fi device so that the research team could monitor any problem with the recording or handle any unexpected event. Camera placement was selected to capture a wide range of views from multiple employees working in different rooms (Figure 1).
      Figure 1
      Figure 1(Upper) Example of the range of view from the camera footage. (Lower) The approximate three zones extracted from one frame that can be processed independently of each other.
      To obtain video footage, our research team programmed the system to record for five-and-a-half hours per day, divided into 30 min segments, over a three-week period. For this study, we used data from the first week of recording, from the perspective shown in Figure 1. On the first day, we had the device set to record for one-and-a-half hours starting at 08:30, 13:00, and 21:00. However, when recording at night, it was difficult to see a person's body and there were fewer people present than during the day, so the recording schedule was changed to 5.5 h starting at 08:30 to have similar lighting conditions and more personnel in the recordings. We obtained 83 videos with a total duration of about 41.5 h.

      Person detection and pose recognition

      To detect hand hygiene actions, we need to recognize the people present in each frame. The perspective of the images shows that the people near the camera appear larger than those at the far end of the hall. This was problematic when using existing recognition algorithms, because most of the people far from the camera were not recognized due to the less visible detail in the distant areas. The output of the pose recognition software contained a considerable amount of noise. For example, reflections on the floor were identified as segments of people, body parts were assigned to medical equipment, and two people were detected as one person when the individuals were too close to each other, making correct detection difficult.
      To overcome these inaccuracies, the video frames were divided into three overlapping zones (Figure 1), and OpenPose was used to obtain groups of keypoints referring to different body parts that the algorithm considers to belong to the same person [
      • Cao Z.
      • Hidalgo Martinez G.
      • Simon T.
      • Wei S.
      • Sheikh Y.A.
      OpenPose: Realtime multi-person 2D pose estimation using part affinity fields.
      ]. Each keypoint contains an identifier and location information in the form of an x and y coordinate pair and a third value corresponding to confidence. This process provided an output of multiple people from each zone that could be associated with the identification in another zone, giving us more information than using the detector on the original frame.
      The recognition output from each zone was filtered so that valid recognitions included at least the upper half of the body, and the image was then reconstructed at the original size. Each of the recognized keypoints was transformed to the original frame size so that the results of each zone had the same coordinate space. To assign different keypoint groups from different zones to the same person, we compared the distance of each keypoint ID from one zone with the corresponding keypoint from another. If the total Euclidean distance of two groups from different zones was below a defined threshold, it was assumed that both keypoint groups referred to the same person. When keypoint groups from different zones recognized the same keypoint for the same person (e.g. two groups recognizing the left shoulder point), the coordinate value of the keypoint was assigned with higher confidence. If one group had keypoints that were not present in the other group, they were added to provide more information about the person's pose in the final output. This process was effective in obtaining more data from each frame (Figure 2).
      Figure 2
      Figure 2Comparison between processing the whole image and our method of processing smaller zones and reconstruction. (Left) Result when using the whole image. (Right) Result of our approach. The image on the right shows more detected people and more detected keypoints over partial recognitions in the complete frame.

      Tracking

      Next, temporal consistency is required so that the keypoints in one frame are associated with those in the next frame. To assign groups of keypoints that correspond to the same person, an x, y pair was determined by taking the average of the keypoints that corresponded to the hips and shoulders. This simplified a person to one point in the frame. Kalman filters were used to track each of these points across multiple frames and each person was assigned an ID number [
      • Kalman R.E.
      A new approach to linear filtering and prediction problems.
      ]. Most hand sanitizing actions were quite fast, with an average duration of 6.99 s; our tracker did not have to track the individuals for many seconds. Depending on the movement of the tracked person in the frame, occlusion, or if the person was standing too close to another for many seconds, the tracker sometimes displayed errors.

      Data annotation

      Data were manually annotated to obtain samples of hand disinfection opportunities that corresponded exclusively to moments 1 and 4 of the World Health Organization guidelines, i.e. before and after touching a patient, respectively [
      World Health Organization
      WHO guidelines on hand hygiene in healthcare.
      ]. Of the opportunities for hand disinfection, the examples in which a person disinfected hands were separated from those in which another person did not. Positive examples were selected by looking for video fragments in which a person performed hand sanitization with alcohol-based gel from fixed dispensers located in the ICU or from personal bottles carried by staff. The total number of hand sanitization opportunities observed in the video data was 3767, whereas the number of positive sanitization acts was only 310. At this step, the examples where medical staff used soap and water for sanitization were removed, since the number of examples was small, and the movement of the hands was different from the examples with gel. These examples were used in our tracking system to assign a tracking number to each person and both the tracking number ID and the frame number were manually annotated when the hand hygiene action began and ended. Some of the data were removed, because in some cases, the ID was not consistent across the sanitization action, as the tracking algorithm could change the ID number if the majority of the body was occluded for more than 15 frames. In all, 237 positive examples were obtained. For the negative examples, an entire 30 min video was randomly selected, the frames of individuals performing hand sanitization removed, and the remaining frames used to track people who were not performing hand sanitization. In this case, it was assumed that any person tracked who did not have an annotated tag indicating that they were cleaning their hands did not do so. This allowed us to extract a large number of negative examples from just one video, giving us 1196 examples.

      Data augmentation and classification

      The nature of the 2D data collected imposes limitations on identifying a person and their pose as people move around the ICU while performing actions. Therefore, there were only a few positive examples where a person facing the camera could clearly be identified while rubbing their hands. In most positively annotated examples, the body was obscured or hand disinfection was performed sideways, so that the detector could only obtain information from one wrist.

      Generation of lost data

      To account for lost data, we tried methods based on heuristics from the video footage. The first technique was to look at the results of detecting the previous frame and, if the corresponding missing wrist was detected, assign the same coordinates plus a sample from a Gaussian distribution to account for the uncertainty in wrist position. If the previous frame did not contain information about the missing keypoint, the second method was used. This method assumes that both wrists should be fairly close to each other because we know that the person is performing a sanitization action. In this case, it is most likely that we can only see a person sideways, hence only one wrist is detected. Therefore, we used the coordinates of the other detected wrist plus a sample from a Gaussian distribution similar to that used in the first method for the missing wrist.
      In the case where we had a sample in which no wrists were detected, the first method was used. If the previous frame did not contain wrist information, it was assumed that person detection should be done backward, in a position where the camera can only see the person's back. It was noticed in the videos that most people rubbed their hands together at chest height. Therefore, we used the shoulder points as a reference and assigned these coordinates and a sample from a Gaussian distribution, similar to the previously mentioned methods.

      Training the classifier

      After filling in the missing data based on our heuristics, we determined the coordinates of each wrist keypoint and calculated the distance between them for each frame in each sample, producing a vector with the same number of video frames but containing the wrist distance.
      The distance vector was normalized by dividing each value by the maximum value, using a sliding window over the vector, and the fast Fourier transform over the resulting vector was then computed. The best combination of parameters we used in our classifier was a window length of 100 frames, with no overlap, and two windows per sample, without using a smoothing window function. Specifying the window length allowed the classifier to detect hand sanitization actions regardless of their duration.
      The next step was to train a support vector machine (SVM) [
      • Cortes C.
      • Vapnik V.
      Support-vector networks.
      ]. To avoid over-fitting due to the number of negative samples being significantly larger than the number of positive samples, the negative class was undersampled by randomly selecting the same number of positive samples (237) from the pool of 1196 negative samples obtained from a random video, as mentioned earlier. In this way, the disinfection and non-disinfection classes received the same amount of training data.
      To refine the SVM, we chose the radial basis function kernel and performed 100-fold stratified cross-validation. At each iteration, 80% of the data were randomly chosen for training and the remaining 20% used for testing.

      Results

      To test the tracker, frames selected from a random video were manually annotated. Two frames per second were extracted from a video of one-and-a-half minutes, thus labelling a total of 180 frames. The tracker was applied to the same video, except that we processed all the frames and compared whether the tracking ID assigned to a person was consistent over time compared to the annotated frames. For example, if the annotations labelled a person with ID X and the tracker labelled the same person with ID Y, the number of times this match occurred over the entire duration of the annotated track was calculated to obtain the tracker's accuracy for each person in the video. The number of correct ID correspondences was then divided by the total length of the track, as determined from the annotated frames. The average accuracy was 43.8%, which was considered sufficient for tracking individuals over short periods of time.
      Regarding the performance of our SVM, an accuracy of 75.18% was achieved, precision of 72.89%, and recall of 80.91%.

      Discussion

      Our proposed system achieved good overall effectiveness, considering that the data are only from 2D images from an unequal perspective. The technique used to divide the original frames into zones and perform detection in each zone was effective in obtaining more data from this type of perspective, which was one of the desired goals. This enabled recognition of the poses of people who appeared to be too far away from the camera and the detection of their hand disinfection actions. However, in some cases, the detections were inconsistent based on the information that the camera was able to capture, because people at the far end of the hall were represented by fewer pixels, and were difficult to recognize even for human annotators. An improvement might be to use a higher resolution camera to obtain more detailed data. In the future, we can probably use one camera to cover a certain number of rooms, as this recognition method requires minimal equipment compared to other solutions.
      The tracking method using Kalman filters may be refined to minimize data loss. In scenarios such as the current study, it is important to have as much data as possible, and the tracking method lost a significant number of positive samples, which were already in short supply anyway given the number of opportunities for hand disinfection. Rubbing hands with alcohol-based gel was usually quick, and took only a few seconds. Therefore, a person's identification number remained the same, although it could change with prolonged tracking. When medical staff disinfected their hands with soap and water, the filter had high accuracy because the person did not move around the room while performing this action. Depending on the change in position and occlusion, the filter performed better or worse, with accuracy varying from 5% to 100% even during tracking lasting several seconds. Other types of filters, such as particle filters, can be used to increase tracking accuracy. As the dimensions of the same person change depending on where the person is in the image, using filters in zones like those used to increase person detection could also increase accuracy for longer tracking consistency.
      Our video footage had limitations because there were many people in the frames, and we can only see the data from one angle and have no depth information. This means a large amount of data is lost when a person is standing backwards to the camera, which is also the case when we can only capture one side of a person standing sideways. To compensate for this loss of information, we had to make assumptions based on annotated data, such as that both wrists should be close together when performing disinfection procedures. This type of empirical heuristic proves useful in generating samples from Gaussian distributions to fill in missing data keypoints. Using the frequency response and sliding window over the original distance vector gives the best results. Having an accuracy of 75.18% is reasonable, considering that a large amount of data must be obtained from heuristics. The results show that the recall is higher than the precision metric, which could be due to the fact that we had more negative than positive samples to choose from. Although we had difficult settings for recognition, we achieved quite good results with minimal equipment. We hope that this can trigger more research of this kind using minimal setup to avoid cluttering space or altering the behaviour of medical personnel due to sensors or too much equipment.
      The proposed method can be used in different facilities by tuning some parameters. The recognition technique using OpenPose over three different zones from the same image allowed us to obtain more data from each image; therefore, the zones should be carefully selected [
      • Cao Z.
      • Hidalgo Martinez G.
      • Simon T.
      • Wei S.
      • Sheikh Y.A.
      OpenPose: Realtime multi-person 2D pose estimation using part affinity fields.
      ]. This process is computationally intensive and can be improved by using a different recognition system. To train the system and obtain good results, annotated video data from the camera perspective is also needed. This could be a costly step at the beginning but it will allow the system to detect future sanitization actions. In the future, we plan to test our system in other hospital settings as well as other facilities, such as nursing homes.
      Feedback on the number of hand disinfections can be provided in a variety of ways. Our system can estimate this metric faster than reports from trained observers. One idea would be to report the number of hand sanitization actions from the previous week on a selected day to shorten the time between observation and feedback. In follow-up studies, we would like to introduce a conversational robot to provide this information and encourage medical staff to increase this behaviour.
      The goal of this project was not to achieve perfect hand sanitization detection with only one camera, but rather to provide a rough estimate that can be used to reinforce hand hygiene in medical facilities. In testing, our SVM performed well in detecting positive hand sanitization actions, without the need for external observer. This avoids the natural bias introduced by an observer, while preventing the normal behaviour of medical personnel from being altered by many sensors or the presence of an observer. This unbiased metric can be used to determine the extent of hand sanitization policy application and to reinforce and train medical personnel when compliance is too low.

      Conflict of interest statement

      None declared.

      Funding source

      This work was partially supported by JSPS KAKENHI Grant Number 21H03222.

      References

        • Allegranzi B.
        • Pittet D.
        Role of hand hygiene in healthcare-associated infection prevention.
        J Hosp Infect. 2009; 73: 305-315
        • Sayeed Mondol M.A.
        • Stankovic J.A.
        HAWAD: Hand washing detection using wrist wearable inertial sensors.
        in: 2020 16th International Conference on Distributed Computing in Sensor Systems. DCOSS, 2020: 11-18
        • World Health Organization
        WHO guidelines on hand hygiene in healthcare.
        WHO, Geneva2009
        • Singh A.
        • Haque A.
        • Alahi A.
        • Yeung S.
        • Guo M.
        • Glassman J.R.
        • et al.
        Automatic detection of hand hygiene using computer vision technology.
        J Am Med Inform Assoc. 2020; 27: 1316-1320
        • Dandekar H.
        • Deo S.
        • Deo S.
        • Date S.
        • Chougule Y.
        Internet of Things based user identification and hand sanitization system with non-contact temperature measurement.
        in: 2021 Fifth International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud). I-SMAC, 2021: 70-77
        • Cao Z.
        • Hidalgo Martinez G.
        • Simon T.
        • Wei S.
        • Sheikh Y.A.
        OpenPose: Realtime multi-person 2D pose estimation using part affinity fields.
        IEEE Trans Pattern Anal Mach Intell. 2019; 43 (arXiv:1812.08008v2 [cs.CV]): 172-186
        • Kalman R.E.
        A new approach to linear filtering and prediction problems.
        J Basic Engng. 1960; 82: 35-45
        • Cortes C.
        • Vapnik V.
        Support-vector networks.
        Mach Learn. 1995; 20: 273-297