PatternQuery is an interactive, user-friendly, and platform independent web service enabling the user to effectively define, extract, and analyze structural patterns or biomolecular complexes using the PatternQuery language. Such analysis is particularly useful not only in the structural and functional assignment of uncharacterized or newly determined proteins, but also represents a key point in rational design and engineering of novel functional sites, and comparative protein structural analyses.

PatternQuery is currently available in 3 modes:

Welcome to PatternQuery submission page. Access different tabs for running the PQ service for the detection of structural patterns over the whole Protein Data Bank or in your own data set of molecular structures.

Query Explanation

PatternQuery relies on an internal chemical language. The image gives an example of a query composed in the PatternQuery language. It identifies residues containing pyran rings (typically sugars and sugar derivatives), and retrieves each such residue together with all residues within 3Å of it. To learn more about how the language works, you can start with the language principles or this guide describing how to build a query.

Sehnal, D., Pravda, L., Ionescu C.-M., Svobodová Vařeková, R. and Koča, J. (2015) PatternQuery: web application for fast detection of biomacromolecular structural patterns in the entire Protein Data Bank. Nucleic Acids Res., 43, W383–W388.

Testosterone Binding Site Browse Data

Testosterone is a steroid sex hormone found in a variety of vertebrates. It activates an androgen receptor (NR3C4) upon binding in either its pure form or its derivative dihydrotestosterone and is primarily responsible for the development of male primary sexual characteristics [1].

Residues("TES") Residues annotated as TES ...
  .AmbientResidues(4) ... and all residues within 4Å of the particular TES occurrence.

The Protein Data Bank contains 26 instances of residues annotated as testosterone (TES) originating from 26 PDB entries (as of Dec 23 2014). The testosterone derivatives with different annotation (BDT, DHT, FFA, TH2) were not queried here. All the structures are complete and correct. The immediate surrounding of TES residues is rich in positively charged (Arg, His) and polar residues (Thr, Gln, Tyr).

Zinc Fingers Cys2His2 Zinc Finger Transcription Factor Browse Data

Cys2His2 zinc fingers (C2H2-ZFs) are the largest family of DNA-binding proteins in metazoans. They provide a stable, versatile and conserved framework for double-stranded DNA recognition [1].

RegularMotifs(".{2}C.{2,4}C.{3}[F|Y].{5}[AILFPGV].{2}H.{3,5}H") Detect specified primary sequence motif ...
  .ConnectedAtoms(1) ... and atoms connected to it.

Each zinc finger spans ~30 amino acid residues and folds into a simple ββα-motif around a tetrahedrally coordinated zinc ion. The motif can be identified by a regular expression X2-C-X2,4-C-X12-H-X3,5-H1,2, where X represents any amino acid. The X12 region usually has the form: X3-[F|Y]-X5-Ψ-X1,2, where Ψ denotes a hydrophobic residue [2]. The zinc ion is coordinated by two cysteine residues and two histidines.

PQ was employed in the discovery and extraction of all structural patterns which satisfy the sequence condition. In total, 354 patterns representing zinc fingers were successfully identified in 233 distinct PDB entries. The majority of patterns come from Homo sapiens and other mammals.

LecB Sugar Binding Sites Browse Data

Pseudomonas aeruginosa is an opportunistic pathogen associated with a number of chronic infections. This pathogen forms a biofilm shield enabling it to survive both the response of the host immune system and antibiotic treatment. One of the cornerstones of biofilm formation is the presence of carbohydrate-binding proteins (lectins) on the outer cell membrane: LecA (PA-IL) and LecB (PA-IIL). Their inhibition is considered to be a promising approach for anti-pseudomonadal treatment [1].

Near(4, Atoms("Ca"), Atoms("Ca")) Pairs of Ca atoms closer than 4Å ...
  .ConnectedResidues(1) ... with residues connected to any one of them ...
  .Filter(lambda l: ... the residues must contains a sugar ring (pyran, furan) ...
      l.Count(Or(Rings(5 * ["C"] + ["O"]), Rings(4 * ["C"] + ["O"]))) > 0)
  .Filter(lambda l: l.Count(Atoms("P")) == 0) ... and no P atoms (to exclude nucleotides).

We employed MQ in the discovery of sugar binding sites with similar geometry as the tetrameric PA-IIL protein in the Protein Data Bank. The carbohydrate-binding domain is calcium dependent, with two calcium ions stabilizing the binding site and contributing to sugar binding [2]. Therefore, just the structures containing calcium ions were queried. The query was searching for 2 calcium ions at most 4Å far from each other, and all the residues with direct interaction with either of the ions. Furthermore, just the molecular patterns containing a furan or pyran ring were kept. Additionally, in case the phosphorus atom was present, the pattern was filtered out in order to avoid nucleotides.

The outcome of this query is made up of 87 distinct patterns originating from 36 PDB entries of 7 different organisms. The majority of them originated from P.aeruginosa, however, binding sites from other pathogens such as R. solanacearum, B. cenocepacia or C. violaceum were identified. In 83 patterns, the sugar-binding site is composed of 3x Asp, 2x Asn, 1x Glu and 1x Gly residues, in agreement with literature reports of the binding site for 24 PDB entries belonging to 3 pathogens.

