Role of data warehousing in healthcare epidemiology

  • D. Wyllie
    Corresponding author. Address: Public Health England Academic Collaborating Centre, John Radcliffe Hospital, Headley Way, Oxford OX3 9DU, UK. Tel.: +44 (0)1865 220860.
    Public Health England Academic Collaborating Centre, John Radcliffe Hospital, Oxford, UK
    Search for articles by this author
  • J. Davies
    Oxford NIHR BRC Informatics Programme, Department of Computer Science, University of Oxford, Oxford, UK
    Search for articles by this author
Published:January 28, 2015DOI:


      Electronic storage of healthcare data, including individual-level risk factors for both infectious and other diseases, is increasing. These data can be integrated at hospital, regional and national levels. Data sources that contain risk factor and outcome information for a wide range of conditions offer the potential for efficient epidemiological analysis of multiple diseases. Opportunities may also arise for monitoring healthcare processes. Integrating diverse data sources presents epidemiological, practical, and ethical challenges. For example, diagnostic criteria, outcome definitions, and ascertainment methods may differ across the data sources. Data volumes may be very large, requiring sophisticated computing technology. Given the large populations involved, perhaps the most challenging aspect is how informed consent can be obtained for the development of integrated databases, particularly when it is not easy to demonstrate their potential. In this article, we discuss some of the ups and downs of recent projects as well as the potential of data warehousing for antimicrobial resistance monitoring.


      To read this article in full you will need to make a payment

      Purchase one-time access:

      Academic & Personal: 24 hour online accessCorporate R&D Professionals: 24 hour online access
      One-time access price info
      • For academic or personal research use, select 'Academic and Personal'
      • For corporate R&D use, select 'Corporate R&D Professionals'


      Subscribe to Journal of Hospital Infection
      Already a print subscriber? Claim online access
      Already an online subscriber? Sign in
      Institutional Access: Sign in to ScienceDirect


        • Hota B.
        Informatics for healthcare epidemiology.
        in: Sintchenko V. Infectious disease informatics. Springer, New York2010: 305-321
        • Tsang C.
        • Palmer W.
        • Bottle A.
        • Majeed A.
        • Aylin P.
        A review of patient safety measures based on routinely collected hospital data.
        Am J Med Qual. 2012; 27: 154-169
        • Kimball R.
        • Ross M.
        The data warehouse toolkit: the definitive guide to dimensional modeling.
        3rd ed. Wiley, Indianapolis, IN2013 (xxxiv, 564 p)
        • García Álvarez L.
        • Aylin P.
        • Tian J.
        • et al.
        Data linkage between existing healthcare databases to support hospital epidemiology.
        J Hosp Infect. 2011; 79: 231-235
        • Finney J.M.
        • Walker A.S.
        • Peto T.E.
        • Wyllie D.H.
        An efficient record linkage scheme using graphical analysis for identifier error detection.
        BMC Med Inform Decis Mak. 2011; 11: 7
        • Ferguson A.R.
        • Nielson J.L.
        • Cragin M.H.
        • Bandrowski A.E.
        • Martone M.E.
        Big data from small data: data-sharing in the ‘long tail’ of neuroscience.
        Nat Neurosci. 2014; 17: 1442-1447
        • Raghupathi W.
        • Raghupathi V.
        Big data analytics in healthcare: promise and potential.
        Health Inform Sci Syst. 2014; 2: 3
        • Wilson S.J.
        • Wong D.
        • Pullinger R.M.
        • Way R.
        • Clifton D.A.
        • Tarassenko L.
        Analysis of a data-fusion system for continuous vital sign monitoring in an emergency department.
        Eur J Emerg Med. 2014 Jul 9; ([Epub ahead of print])
        • McCloskey B.
        • Endericks T.
        • Catchpole M.
        • et al.
        London 2012 Olympic and Paralympic Games: public health surveillance and epidemiology.
        Lancet. 2014; 383: 2083-2089
        • Hay S.I.
        • George D.B.
        • Moyes C.L.
        • Brownstein J.S.
        Big data opportunities for global infectious disease surveillance.
        PLoS Med. 2013; 10: e1001413
        • Trifiro G.
        • Coloma P.M.
        • Rijnbeek P.R.
        • et al.
        Combining multiple healthcare databases for postmarketing drug and vaccine safety surveillance: why and how?.
        J Intern Med. 2014; 275: 551-561
        • Mattmann C.A.
        Computing: a vision for data science.
        Nature. 2013; 493: 473-475
        • Freemantle N.
        • Richardson M.
        • Wood J.
        • et al.
        Weekend hospitalization and additional risk of death: an analysis of inpatient data.
        J R Soc Med. 2012; 105: 74-84
        • Shorr A.F.
        • Myers D.E.
        • Huang D.B.
        • Nathanson B.H.
        • Emons M.F.
        • Kollef M.H.
        A risk score for identifying methicillin-resistant Staphylococcus aureus in patients presenting to the hospital with pneumonia.
        BMC Infect Dis. 2013; 13: 268
        • Delgado-Rodríguez M.
        • Llorca J.
        J Epidemiol Comm Health. 2004; 58: 635-641
        • Lazer D.
        • Kennedy R.
        • King G.
        • Vespignani A.
        The parable of Google Flu: traps in big data analysis.
        Science. 2014; 343: 1203-1205
        • Doll R.
        • Hill A.B.
        A study of the aetiology of carcinoma of the lung.
        BMJ. 1952; 2: 1271-1286
        • Elliott P.
        • Peakman T.C.
        • Biobank U.K.
        The UK Biobank sample handling and storage protocol for the collection, processing and archiving of human blood and urine.
        Int J Epidemiol. 2008; 37: 234-244
        • Weber G.M.
        • Mandl K.D.
        • Kohane I.S.
        Finding the missing link for big biomedical data.
        JAMA. 2014; 311: 2479-2480
        • Larson E.B.
        Building trust in the power of “big data” research to serve the public good.
        JAMA. 2013; 309: 2443-2444
        • Gehrke J.
        Quo vadis, data privacy?.
        Ann NY Acad Sci. 2012; 1260: 45-54
        • Rohde H.
        • Qin J.
        • Cui Y.
        • et al.
        Open-source genomic analysis of Shiga-toxin-producing E. coli O104:H4.
        N Engl J Med. 2011; 365: 718-724
        • Schadt E.E.
        • Woo S.
        • Hao K.
        Bayesian method to predict individual SNP genotypes from gene expression data.
        Nat Genet. 2012; 44: 603-608
        • Gymrek M.
        • McGuire A.L.
        • Golan D.
        • Halperin E.
        • Erlich Y.
        Identifying personal genomes by surname inference.
        Science. 2013; 339: 321-324
        • Hawker J.I.
        • Smith S.
        • Smith G.E.
        • et al.
        Trends in antibiotic prescribing in primary care for clinical syndromes subject to national recommendations to reduce antibiotic resistance, UK 1995‒2011: analysis of a large database of primary care consultations.
        J Antimicrob Chemother. 2014; 69: 3423-3430
        • Cooke J.
        • Stephens P.
        • Ashiru-Oredope D.
        • et al.
        Longitudinal trends and cross-sectional analysis of English national hospital antibacterial use over 5 years (2008–13): working towards hospital prescribing quality measures.
        J Antimicrob Chemother. 2015; 70: 279-285
        • Weist K.
        Surveillance of antimicrobial consumption in Europe.
        ECDC, Stockholm2014
        • Currie C.J.
        • Berni E.
        • Jenkins-Jones S.
        • et al.
        Antibiotic treatment failure in four common infections in UK primary care 1991‒2012: longitudinal analysis.
        BMJ. 2014; 349: g5493
        • Cooper B.S.
        • Kypraios T.
        • Batra R.
        • Wyncoll D.
        • Tosas O.
        • Edgeworth J.D.
        Quantifying type-specific reproduction numbers for nosocomial pathogens: evidence for heightened transmission of an Asian sequence type 239 MRSA clone.
        PLoS Comput Biol. 2012; 8: e1002454