ParaBioSys - Parallel Biological Systems - MGH, HMS, BU
Projects
Signal Transduction
Host-Pathogen Interactions
Protein Networks/RNA Display
Macrophage Activation
Resources
Software Tools
Microarrays
Bacterial Pathogens
Animal Models
Education
Classes
Seminars
Workshops
About Us
PGA Project Summary
Research Centers
Faculty & Staff
Publications
Policies and Procedures
Terms and Conditions
Data Sharing and Usage
Animal Distribution
Quality Control
Citing Us
Search
Search the Entire PGA
 

P. aeruginosa PA14 Genomic Sequencing Project

PA14 Sequencing and Comparative Alignments

Our PGA determined the complete sequence of the PA14 genome sequence collaboratively with the Harvard Partners Center for Genetics and Genomics (HPCGG, Cambridge, MA). The primary rationale for obtaining the PA14 sequence was to facilitate the construction of the non-redundant PA14 transposon mutant library because there is a significant amount of PA14-specific sequence that is not present in the genome of the publicly available sequence of P. aeruginosa strain PAO1. PA14 Genomic DNA was subjected to double-ended shotgun sequencing and the resulting sequence was analyzed using Phred, Phrap, and Consed . The shotgun phase of the sequencing project has been completed, yielding 6.54 Mb of PA14 sequence (10.2-fold coverage), compared to 6.26 Mb for PAO1. The current assembly consists of a single large contig. Only one gap remains, which is thought to be less than 3.5 kb. This sequence has been deposited in GenBank (accession number AABQ00000000) and can be downloaded here. Alignments of PAO1 with the current PA14 assembly are being carried out using a combination of BLAST searches and analysis with MUMmer, a tool for global alignment of genomes. Our preliminary comparisons of the two genomes (Figure 1) have demonstrated that they are extremely similar but have also pinpointed regions of differences, including a 107911bp insertion in PA14 that is absent in PAO1 (P. aeruginosa Pathogenicity Island 1, or PAPI-1). Approximately 96.3% of the DNA sequence found in PAO1 is also found in PA14 and approximately 92.4% of PA14 DNA sequence is found in PAO1.

Our PGA, in collaboration with the Rahme lab at MGH, identified and characterized two novel P. aeruginosa pathogenicity islands (PAPI-1 and PAPI-2) in the genome of PA14, which are absent from the reference strain PAO1 (He et. al, 2004, PNAS 101:2530-2535). The 108 kb PAPI-1 and 11 kb PAPI-2 exhibit highly modular structures and contain many of the 313 PA14-specific ORFs. More than 80% of the PAPI-1 DNA sequence is unique, and 75 out of the 115 predicted ORF products are unrelated to any known proteins or functional domains. PAPI-2 harbors 15 predicted ORFs and 8 of these are absent from strain PAO1. On the other hand, most of the genes within these islands that are homologous to known genes occur in other human and plant bacterial pathogens. Importantly, many PAPI-1 ORFs are also found in several P. aeruginosa cystic fibrosis isolates.

Click thumbnail at right for larger image. Alignments of the PA14 and PAO1
  genomes. Black vertical lines in the PA14 genome represents regions that are
  inverted relative to the PAO1 genome.  Purple vertical bars (highlighted by
  purple arrows) indicate regions of each genome that are absent in the other.
  A magnified view of the largest PA14-specific region (PAPI-1) is shown at the
  top.  Black arrows indicate the positions of transposon insertions within
  ORFs of unknown function that were tested in a murine sepsis model of
  infection. The percent mortality for each tested mutant is shown (under
  conditions for which wild-type PA14 results in 100% mortality).
Figure 1 - Alignments of the PA14 and PAO1 genomes. Black vertical lines in the PA14 genome represents regions that are inverted relative to the PAO1 genome. Purple vertical bars (highlighted by purple arrows) indicate regions of each genome that are absent in the other. A magnified view of the largest PA14-specific region (PAPI-1) is shown at the top. Black arrows indicate the positions of transposon insertions within ORFs of unknown function that were tested in a murine sepsis model of infection. The percent mortality for each tested mutant is shown (under conditions for which wild-type PA14 results in 100% mortality).

We have generated and analyzed 23 mutant strains, including 10 non-polar deletions and 13 non-redundant library mutants, to assess whether the PAPI-1 and PAPI-2 ORFs that encode hypothetical/unknown functions promote P. aeruginosa pathogenesis. This work showed that both islands carry genes that allow PA14 to thrive on evolutionary diverse hosts, including plants (Arabidopsis) and mammals (mouse). 20 out of the 23 ORFs mutated, encode functions necessary for plant or mammalian virulence, and 12 are required for "wild-type" virulence in both hosts (Figure 1 shows the locations and phenotypes of the mutants obtained from the non-redundant library). Although the majority of these genes encode products of unknown function, their presence in P. aeruginosa clinical isolates, including those from CF patients, suggests that they encode conserved functions important for fitness and survival.

PA14 Genome Annotation and Proteomic Analysis

Characterization of novel pathogenicity factors such as those described above should provide insights into broad host pathogenic and defense mechanisms. Importantly, completion of the PA14 genome sequence annotation, currently in progress, might also lead to the identification of additional PAPI blocks and novel virulence genes in the remaining PA14-specific ORFS that have not yet been studied. The annotation of PA14 was initiated with an automated ORF prediction phase utilizing both BLAST and Glimmer software tools and this list of putative ORFs is currently being manually edited as a collaborative effort involving researchers from MGH and Harvard Medical School (Boston MA). A draft version of the PA14 annotation will be posted on this site when it is completed.

To facilitate the manual annotation effort, we have initiated a proteomic analysis of PA14 in collaboration with the MS Proteomics Center at the Harvard Partners Center for Genetics and Genomics (HPCGG, Cambridge, MA) and with the laboratory of George Church at Harvard Medical School (Boston, MA). Briefly, PA14 are grown in liquid media, harvested in mid-log phase, and sonicated in the presence of Urea and detergents. The complex protein mixture is reduced and alkylated and subjected to SDS-PAGE to separate it into several molecular weight fractions. Tryptic digests are performed in-gel, and the resulting peptides are analyzed by mass spectrometry. In a pilot analysis, we have positively identified proteins that have previously been experimentally verified in PAO1 (or other P. aeruginosa strains), proteins corresponding to ORFs annotated in PAO1 as hypothetical an not experimentally validated, as well as hypothetical ORFs in PA14 that do not appear in PAO1. An example of the latter case is shown in Figure 2. We are currently increasing the scale of the proteomic analysis to sample the entire PA14 proteome.

Validation of a "PA14-Specific" ORF by
  Proteomics.  In PAO1, the gene in between lecB and amiR is PA3362 (top), a
  gene annotated as a hypothetical protein.  In PA14, a different predicted
  ORF, 6327, lies between these two genes (bottom).  ORF 6327 does NOT have a
  homolog in PAO1, and is 53% identical (76% similar) to a conserved
  hypothetical protein found in Pseudomonas putita.  Preliminary proteomic
  analysis of PA14 has identified a peptide mapping to this ORF (but not to any
  other predicted ORF in PA14).
Figure 2 - Validation of a "PA14-Specific" ORF by Proteomics. In PAO1, the gene in between lecB and amiR is PA3362 (top), a gene annotated as a hypothetical protein. In PA14, a different predicted ORF, 6327, lies between these two genes (bottom). ORF 6327 does NOT have a homolog in PAO1, and is 53% identical (76% similar) to a conserved hypothetical protein found in Pseudomonas putita. Preliminary proteomic analysis of PA14 has identified a peptide mapping to this ORF (but not to any other predicted ORF in PA14).

Genotyping of P. aeruginosa Strains Using Microarrays

Regions of PA14 absent in PAO1 but conserved in other clinical isolates of P. aeruginosa are of particular interest to our group. To determine, on a genome-wide level, whether "PA14-specific" sequences (i.e. - absent in PAO1) are conserved in other strains, we are developing a P. aeruginosa spotted oligo microarray in collaboration with the MGH Microarray Core. By designing probes for sequences present in PA14 and absent in PAO1, or vice versa, a large number of unsequenced P. aeruginosa strains can be analyzed by labeling their genomic DNA and hybridizing to the array to determine which sequences appear to be conserved in virulent strains.

The initial version of this array will focus primarily on sequences that are present in PA14 but absent in PAO1, present in PAO1 but absent in PA14, as well as additional sequences found in other P. aeruginosa strains but absent in at least one of the sequenced strains (suggestions for additional P. aeruginosa sequences to be included in later versions of the microarray are welcomed). Controls include sequences that are found in both PA14 and PAO1 and likely to be present in all P. aeruginosa strains (e.g. - essential housekeeping genes) as well as sequences likely to be absent in all P. aeruginosa strains. The next generation of this array will be a full genome chip, including probes for all PA14/PAO1 sequences as well as sequences from other strains. This chip will be made available to the research community and can be utilized for genotyping of strains as described above, for traditional transcriptional profiling studies, as well as being used in conjunction with the PA14 mutant library for high-throughput screens of pooled mutants utilizing a methodology called TraSH (Transposon Site Hybridization).

A pilot microarray of 95 oligos (and one blank) spotted in quadruplicate has been produced to validate its utility for genomic typing. Included in the microarray are 14 sequences present in both PA14 and PAO1, 36 sequences found in PA14, 32 sequences found in PAO1, and 13 (P. aeruginosa) sequences found in neither strain. PA14 and PAO1 genomic DNA was fragmented with a restriction enzyme (Hae III), labeled with cy3 or cy5, and hybridized to the array (Figure 3). Since the two labeled samples yielded different levels of overall hybridization, each data set was normalized (by dividing each hybridization intensity by the average hybridization intensity for the total sample) and an arbitrary cut-off was determined: a normalized intensity of >0.5 resulted in the sequence being assigned as present, and a normalized intensity of <0.5 was used to assign sequences as absent. Using these criteria, 94% of the probes were correctly assigned as present or absent in each of the two strains. We are currently examining the specific probes that have led to incorrect assignments in either or both of these strains and to manipulate the hybridization conditions to improve accuracy with the two sequenced strains. The first generation (non-pilot) array containing 465 probes (PA14-specific, PAO1-specific, or control sequences) is scheduled for production and an initial panel of 22 strains (including lab strains, environmental isolates, and clinical isolates from a variety of infection sites in the human body) will be analyzed. Finally, a second generation array that comprises a (near-)complete set of all P. aeruginosa sequences will be designed for genomic typing and transcriptional profiling experiments as well as additional sequences custom designed for our PA14 mutant library to allow for analysis of pooled mutants utilizing TraSH.

Click thumbnail at right for larger image. Pilot Genomic DNA Hybridizations.  The
  pilot <i>P. aeruginosa</i> spotted oligo array was hybridized with labeled
  genomic DNA from PAO1 (left) or PA14 (right) to assess the ability to utilize
  this array for genomic typing of unknown strains.  95 probe sequences were
  designed to hybridize to regions present in both strains, in one but not the
  other strain, or in neither strain (but present in another
  <i>P. aeruginosa</i> strain).  The last (96th) data point in each graph
  corresponds to a blank with no spotted oligo.  Turquoise bars below each
  graph indicate probes corresponding to sequences present in the strain.
  Since the average intensities for the two genomic DNA samples differred, an
  arbitrary cut-off to assign sequences as present or absent was not based on
  absolute intensities, but rather normalized values in the following manner:
  each intensity was divided by the average intensity for the entire sample to
  adjust for overall intensity differences between samples, and any probe with
  an adjusted intensity greater than an 0.5 was defined as present in the
  genome (data not shown).  Using this criterion, 94% of the probes were
  correctly assigned for PAO1 (6 probes in regions absent in PAO1 were
  incorrectly assigned as present) and 94% of the probes were correctly
  assigned for PA14 (3 probes in regions present in PA14 were incorrectly
  assigned as absent, and 3 probes in regions absent in PA14 were incorrectly
  assigned as present).  Incorrect assignments based on this arbitrary cut-off
  are indicated by white circles.
Figure 3 - Pilot Genomic DNA Hybridizations. The pilot P. aeruginosa spotted oligo array was hybridized with labeled genomic DNA from PAO1 (left) or PA14 (right) to assess the ability to utilize this array for genomic typing of unknown strains. 95 probe sequences were designed to hybridize to regions present in both strains, in one but not the other strain, or in neither strain (but present in another P. aeruginosa strain). The last (96th) data point in each graph corresponds to a blank with no spotted oligo. Turquoise bars below each graph indicate probes corresponding to sequences present in the strain. Since the average intensities for the two genomic DNA samples differred, an arbitrary cut-off to assign sequences as present or absent was not based on absolute intensities, but rather normalized values in the following manner: each intensity was divided by the average intensity for the entire sample to adjust for overall intensity differences between samples, and any probe with an adjusted intensity greater than an 0.5 was defined as present in the genome (data not shown). Using this criterion, 94% of the probes were correctly assigned for PAO1 (6 probes in regions absent in PAO1 were incorrectly assigned as present) and 94% of the probes were correctly assigned for PA14 (3 probes in regions present in PA14 were incorrectly assigned as absent, and 3 probes in regions absent in PA14 were incorrectly assigned as present). Incorrect assignments based on this arbitrary cut-off are indicated by white circles.

  Back to Host-Pathogen Interactions

NHLBI Home Page | NHLBI-PGA Home Page | Webmaster