vaccines.watch is an interactive web application developed by the Centre for Genomic Pathogen Surveillance (CGPS) that enables monitoring of vaccine target diversity from pathogen genome data.
The application displays data inferred from high-quality genomes available in the public sequence archives, processed on an ongoing basis via our “always-on” pipeline within Pathogen.watch.
Currently, vaccines.watch displays data on Streptococcus pneumoniae capsular-based serotypes, which form the targets of all licensed pneumococcal vaccines. It also provides data on the capsule (K) and lipopolysaccharide (LPS) O-antigen (O) types from Klebsiella pneumoniae (and related species), as well as the capsule (K) and lipooligosaccharide (LOS) outer core (OC) antigen types from Acinetobacter baumannii.
If you use vaccines.watch in a publication, please cite: XXXX (reference here)
The vaccines.watch application uses genome assemblies generated and processed by our “always-on” pipeline within Pathogen.watch (https://pathogen.watch). The pipeline retrieves all paired-end Illumina samples of the relevant species from the INSDC databases that have an available sampling date from 2010 onwards that is decodeable to at least the year, as well as a sampling location that is decodeable to at least the country level. The platform accepts data with a taxonomy ID recorded as 1313 (S. pneumoniae), 573 (K. pneumoniae), 1463165 (K. quasipneumoniae), 244366 (K. variicola), 2026240 (K. quasivariicola), 2489010 (K. africana) or 470 (A. baumannii) in the European Nucleotide Archive (ENA) metadata. Checks are performed to ensure the consistency and integrity of the data, including a requirement for two fastq files per sample and for sequencing runs to possess at least 20x coverage.
Sequence reads are assembled using a SPAdes workflow (https://gitlab.com/cgps/ghru/pipelines/assembly) and the quality of the resulting assemblies is assessed. Samples that do not meet our defined species-specific criteria (see publication (include hyperlink here)) are excluded. The Speciator tool within Pathogenwatch is used to verify the species of the assemblies. Genome assemblies with a species identification that is inconsistent with the recorded species in the ENA metadata are excluded. For the K. pneumoniae species complex (SC), we accept genomes identified as any of the five species listed above, allowing for inconsistencies with the metadata due to known difficulties with phenotypic identification methods.
Kaptive v3.1.0 (Stanton et al. 2025), implemented in Pathogenwatch, is used to identify the K and O loci (and predicted K and O types) from the K. pneumoniae SC genomes and the K and OC loci (and predicted K and OC types) from A. baumannii genomes (using the database described by Cahill et al. 2026 for the latter). For both pathogens, we only incorporate data into vaccines.watch from genomes where Kaptive has indicated that both the K and O/OC loci are “typeable”. SeroBA v2.0 (Lorenz et al. 2025), also implemented in Pathogenwatch, is used to identify the capsular polysaccharide (cps) loci and predict the resulting serotype from S. pneumoniae genomes. The Pathogenwatch implementation of SeroBA differs from the published method by using simulated reads generated from assemblies rather than the raw sequence reads, although inconsistencies between the methods occur rarely (0.14%) (see https://cgps.gitbook.io/pathogenwatch/technical-descriptions/typing-methods/seroba).
vaccines.watch displays the variants found among each pathogen using community-based schemes implemented in Pathogen.watch. This variant typing comprises multi-locus sequence typing (MLST) for each of S. pneumoniae (PubMLST scheme), K. pneumoniae SC (BIGSdb-Pasteur scheme) and A. baumannii (Pasteur scheme) (Jolley et al. 2018). Genomes are shown with a “Novel” sequence type (ST) within vaccines.watch either if the MLST profile is incomplete or if the profile has not been defined within the MLST database. In addition to MLST, we also use Global Pneumococcal Sequencing Cluster (GPSC) assignments for S. pneumoniae (Gladstone et al. 2019) and “clonal group” assignments from the LIN code nomenclature for K. pneumoniae SC (Hennart et al. 2022).
Genes and mutations associated with particular antimicrobial classes are identified from the genome assemblies using AMRFinderPlus (database version 2021-12-21.1) (Feldgarden et al. 2021). A list of curated genes and mutations included for each pathogen and antimicrobial combination, obtained via a comprehensive literature review, is provided in the vaccines.watch publication (include hyperlink here). Complete matches to a gene are required for reporting of a mechanism within vaccines.watch. AMR mechanisms are identified for antimicrobial classes defined in the 2017 WHO priority pathogen list (WHO, 2017).
More details on the above processes can be found in the vaccines.watch publication (include hyperlink).
As whole genome sequencing is adopted into routine surveillance systems worldwide, shared public genomes will increasingly offer a valuable resource for interrogating geotemporal trends around existing or putative vaccine targets. However, across the different pathogens, currently available public genomes lack broad geographic coverage and have largely been generated for specific research agendas. We therefore advise users to remain highly vigilant to these data limitations and maintain careful consideration of how data shown in vaccines.watch is used and interpreted.
Another important limitation in the current use of genomic data for vaccine development and monitoring efforts is that there is still an incomplete understanding of how the genomic information relates to phenotypes, with many genomic loci encoding unknown structures, including across all three pathogens (and their associated targets) currently included within vaccines.watch. However, for each of these three pathogens, there are active efforts to improve this knowledge base, with regular updates made to the nomenclature databases and phenotype prediction logic provided by the SeroBA and Kaptive tools. In particular, for S. pneumoniae, there is a comprehensive and curated library, SeroBAnk, collating data on the genetic locus and capsular structure of each known serotype (Lorenz et al. 2025).
Genomes within vaccines.watch can be filtered on one or more criteria either using the top filter bar, the map or from within the right-hand panels. The number of genomes represented in vaccines.watch before and after the application of any filters is shown at the top of the page.
By default, the map panel colours countries by the number of genomes, after the application of filters (if any). Countries are coloured light grey if no genomes exist from the country in the present curated public collection of the relevant pathogen, or a darker grey if there are one or more genomes but which do not meet the filtering criteria. Users can filter the genomes by individual countries from the map. If one or more filters, other than the “Country” filter, are applied, the user can also opt to colour countries by the proportion of genomes that meet the filtering criteria.
The “Target overview” panel shows the most frequent vaccine target variants (up to twenty) among all (or selected) genomes, with the ability to toggle between different target types if multiple are available. Users can filter the genomes by individual target variants from this panel, or from within the top filter menu if the target variant of interest is not in the top twenty displayed.
For S. pneumoniae, the target variants represent predicted serotypes based on identification of the corresponding cps loci by SeroBA. For this pathogen, we also include an additional drop-down, “Vaccine formulation”, within the top filter menu, via which users can select an existing or prospective vaccine formulation. This has the effect of filtering the genomes represented in vaccines.watch by the set of serotypes included in the formulation.
For K. pneumoniae, we show data from Kaptive on both the best-matching K/O locus types (genotypes) identified from the K. pneumoniae SC genomes and the predicted K/O types (phenotypes), using the new O serotyping nomenclature proposed recently (Whitfield et al. 2025). We have included both genotypic and predicted phenotypic types to allow exploration of the corresponding relationships. In the case of the O loci/types, these do not conform to one-to-one relationships due to the use of both the O locus and additional genes from outside of the O locus in the phenotype predictions.
For A. baumannii, similarly as for K. pneumoniae SC, we show data from Kaptive on both the best-matching K/OC locus types (genotypes) identified from the genomes and the predicted K/OC types (phenotypes). As with the O loci/types in K. pneumoniae SC, the K loci/types in A. baumannii also do not conform to one-to-one relationships due to the use of both the K locus and additional genes in the phenotype predictions. For the K loci in K. pneumoniae SC and OC loci in A. baumannii, we report the genotypic loci together with the predicted phenotype in parentheses, with an asterisk (*) used in cases where the phenotype is unknown.
The “Variant overview” panel shows the most frequent genotypic variants (e.g. STs) (up to twenty) among selected genomes, with the ability to toggle between different typing schemes if multiple schemes are available. Users can filter the genomes by individual variants from this panel, or from within the top filter menu if the variant of interest is not in the top twenty displayed.
The “Target count over time” and “Target proportion over time” panels, which can be alternated, show the raw number of genomes and proportion of genomes belonging to all (or selected) vaccine target types per year. These can also be alternated with the “Variant count over time” and “Variant proportion over time” panels, showing the equivalent for the genotypic variants.
The “Selected targets count over time” and “Selected targets proportion over time” panels, which can be alternated, show the raw number and proportion of all (or selected) genomes with (blue) or without (green) one of the selected vaccine target types per year.
Visualisations within vaccines.watch with a set of desired filters can be saved and/or shared onwards by clicking the “Share link” icon (top right) which saves a URL to your clipboard. A unique URL is generated for each visualisation that you would like to save/share.
Raw data can be downloaded by users in csv format using the “Download data” icon (top right). The downloaded data will represent all genomes shown in your current view and include a list of all genomes with ENA accession numbers and all data shown in the vaccines.watch interface.
Funding
We are grateful for funding from the National Institute for Health Research (NIHR) and Bill & Melinda Gates Foundation.
We would love to receive feedback on vaccines.watch and will be grateful to hear of any issues you are experiencing with the application. Please contact us at [email protected]
Vaccines.watch is developed and maintained by the Centre for Genomic Pathogen Surveillance.