PaperBLAST

PaperBLAST – Find papers about a protein or its homologs

PaperBLAST

PaperBLAST Hits for tr|Q9HTU3|Q9HTU3_PSEAE Small-conductance mechanosensitive channel OS=Pseudomonas aeruginosa (strain ATCC 15692 / DSM 22644 / CIP 104116 / JCM 14847 / LMG 12228 / 1C / PRS 101 / PAO1) OX=208964 GN=PA5251 PE=3 SV=1 (192 a.a., MEDLQVLTQT...)

Other sequence analysis tools:

Find functional residues: SitesBLAST

Search for conserved domains

Find the best match in UniProt

Compare to protein structures

Predict transmenbrane helices: Phobius

Predict protein localization: PSORTb

Find homologs in fast.genomics

Fitness BLAST: loading...

Show query sequence

Found 10 similar proteins in the literature:

Q9HTU3 Small-conductance mechanosensitive channel from Pseudomonas aeruginosa (strain ATCC 15692 / DSM 22644 / CIP 104116 / JCM 14847 / LMG 12228 / 1C / PRS 101 / PAO1)
PA5251 hypothetical protein from Pseudomonas aeruginosa PAO1
100% identity, 100% coverage

Proteome-wide identification of druggable targets and inhibitors for multidrug-resistant <i>Pseudomonas aeruginosa</i> using an integrative subtractive proteomics and virtual screening approach
Vemula, Heliyon 2025
- “...4498 Q9I395 171 G3XD78 1253 Q9HTT8 2335 Q9HT37 3417 Q9HYM6 4499 Q9I396 172 G3XDA1 1254 Q9HTU3 2336 Q9HT39 3418 Q9HYM7 4500 Q9I397 173 O33877 1255 Q9HTU5 2337 Q9HT40 3419 Q9HYM9 4501 Q9I399 175 O68282 1256 Q9HTU6 2338 Q9HT41 3420 Q9HYN0 4502 Q9I3A0 1257 Q9HTU8 2339 Q9HT42...”
Additive Effects of Quorum Sensing Anti-Activators on Pseudomonas aeruginosa Virulence Traits and Transcriptome
Asfahl, Frontiers in microbiology 2017
- “...PA5194 hypothetical protein 1.7 NC NC NC PA5250 conserved hypothetical protein 1.7 NC NC NC PA5251 hypothetical protein 1.7 NC NC NC PA5320 coaC Phosphopantothenoylcysteine synthase/(R)-4-phospho-N-pantothenoylcysteine decarboxylase ( coaB ; coaBCI ; dfp ) 1.4 NC NC 1.3 PA5361 phoR two-component sensor PhoR 1.6 NC NC...”
Two families of mechanosensitive channel proteins
Pivetti, Microbiology and molecular biology reviews : MMBR 2003
- “...Conserved hypothetical protein PA1775 Hypothetical protein PA5251 CmpX Hypothetical protein RP047 Hypothetical protein Conserved hypothetical protein SMa1582...”

HZ99_16630 mechanosensitive ion channel family protein from Pseudomonas fluorescens
69% identity, 96% coverage

Transcriptomic analysis of the response of Pseudomonas fluorescens to epigallocatechin gallate by RNA-seq
Liu, PloS one 2017
- “...1.03E-05 porin OprD HZ99_18605 7.28 5.74E-03 poly-beta-1,6-N-acetyl-D-glucosamine N-deacetylase PgaB HZ99_20290 7.11 9.43E-03 glycosyl transferase WcaA HZ99_16630 6.70 2.58E-02 small-conductance mechanosensitive channel MscS HZ99_08015 5.00 1.44E-03 lipoprotein component MlaA HZ99_12515 -2.53 1.87E-02 rare lipoprotein A RlpA HZ99_17495 -3.02 1.84E-03 glycosyl transferase WcaA HZ99_04930 -6.50 4.74E-02 mechanosensitive ion...”

PP0197 conserved hypothetical protein from Pseudomonas putida KT2440
69% identity, 93% coverage

UEG Week 2024 Poster Presentations
, United European gastroenterology journal 2024
UEG Week 2023 Poster Presentations
, United European gastroenterology journal 2023

MscS / b2924 small conductance mechanosensitive channel MscS from Escherichia coli K-12 substr. MG1655 (see 58 papers)
mscS / P0C0S1 small conductance mechanosensitive channel MscS from Escherichia coli (strain K12) (see 81 papers)
MSCS_ECOLI / P0C0S1 Small-conductance mechanosensitive channel from Escherichia coli (strain K12) (see 13 papers)
TC 1.A.23.2.1 / P0C0S1 Major MscS channel protein, YggB. Seven residues, mostly hydrophobic, in the first and second transmembrane helices are lipid-sensing residues from Escherichia coli (see 15 papers)
mscS small-conductance mechanosensitive channel from Escherichia coli K12 (see 15 papers)
P0C0S2 Small-conductance mechanosensitive channel from Escherichia coli O157:H7
NP_417399 small conductance mechanosensitive channel MscS from Escherichia coli str. K-12 substr. MG1655
b2924 mechanosensitive channel from Escherichia coli str. K-12 substr. MG1655
29% identity, 59% coverage

function: Mechanosensitive channel that participates in the regulation of osmotic pressure changes within the cell, opening in response to stretch forces in the membrane lipid bilayer, without the need for other proteins. Contributes to normal resistance to hypoosmotic shock. Forms an ion channel of 1.0 nanosiemens conductance with a slight preference for anions. The channel is sensitive to voltage; as the membrane is depolarized, less tension is required to open the channel and vice versa. The channel is characterized by short bursts of activity that last for a few seconds.
subunit: Homoheptamer.
substrates: ions
tcdb comment: X-ray structures are available (Lai et al. 2013). The cytoplasmic cage domain senses macromolecular crowding (Rowe et al. 2014). A gating mechanism has been proposed (Malcolm et al. 2015). The thermodynamics of K+ leak have been studied (Koprowski et al. 2015). In the MscS crystal structure (PDB 2OAU ), a narrow, hydrophobic opening is visible in the crystal structure, and a vapor lock, created by hydrophobic seals consisting of L105 and L109, is the barrier to water and ions (Rasmussen et al. 2015). The voltage dependence of inactivation occurs independently of the positive charges of R46, R54, and R74 (Nomura et al. 2016). The closed-to-open transition may involve rotation and tilt of the pore-lining helices (Edwards et al. 2005)
Charged pore-lining residues are required for normal channel kinetics in the eukaryotic mechanosensitive ion channel MSL1
Schlegel, Channels (Austin, Tex.) 2020
- “...available databases with accession numbers as follows: Escherichia coli MscS ( Ec MscS), UniProt ID P0C0S2; Arabidopsis thaliana MSL1 (MSL1), At4g00290; Arabidopsis thaliana MSL8 (MSL8), At2g17010; Arabidopsis thaliana MSL10 (MSL10), At5g12080; Corynebacterium glutamicum MscCG, RefSeq WP_011014245.1; Chlamydomonas reinhardtii MSC1, GenBank ID AB288852.1; Silicibacter pomeroyi MscSP, UniProt...”
MscS inactivation: an exception rather than the rule. An extremophilic MscS reveals diversity within the family
Vásquez, Biophysical journal 2013
- “...characterized. ecMscS from E. coli accession number (AN) P0C0S2 (15,16 ); MscSP from S. pomeroyi AN Q5LMR6 (4 ); MSC1 from C. reinhardtii AN...”
State-specific morphological deformations of the lipid bilayer explain mechanosensitive gating of MscS ion channels.
Park, eLife 2023
- GeneRIF: State-specific morphological deformations of the lipid bilayer explain mechanosensitive gating of MscS ion channels.
Interaction between mechanosensitive channels embedded in lipid membrane.
Zhu, Journal of the mechanical behavior of biomedical materials 2020 (PubMed)
- GeneRIF: Interaction between mechanosensitive channels embedded in lipid membrane.
Voltage-Dependent Inactivation of MscS Occurs Independently of the Positively Charged Residues in the Transmembrane Domain.
Nomura, BioMed research international 2016
- GeneRIF: inactivation process of wild-type MscS was strongly affected by voltage. The wild-type MscS inactivated at +60 to +80 mV but not at -60 to +40 mV.
The mechanosensitive channel of small conductance (MscS) functions as a Jack-in-the box.
Malcolm, Biochimica et biophysica acta 2015
- GeneRIF: Data indicate that mechanosensitive channel of small conductance (MscS) protein opens in response to a relief of intrinsic lipid bilayer pressure.
Unidirectional incorporation of a bacterial mechanosensitive channel into liposomal membranes.
Nomura, FASEB journal : official publication of the Federation of American Societies for Experimental Biology 2015 (PubMed)
- GeneRIF: Data show that trifluoroethanol (TFE) affects mechanosensitive channel of small conductance (MscS) channel gating kinetics in spheroplasts and liposomes.
Mutations in a Conserved Domain of E. coli MscS to the Most Conserved Superfamily Residue Leads to Kinetic Changes.
Malcolm, PloS one 2015
- GeneRIF: Mutations in a conserved domain of E. coli MscS to the most conserved superfamily residue leads to kinetic changes.
The role of lipids in mechanosensation.
Pliotas, Nature structural & molecular biology 2015
- GeneRIF: Molecular dynamics and biophysical analyses show that the volume of the pockets and thus the number of lipid acyl chains within them decreases upon channel opening.
Identification of bacterial factors involved in type 1 fimbria expression using an Escherichia coli K12 proteome chip
Chen, Molecular & cellular proteomics : MCP 2014
- “...NC_000913 NP_414786 NP_414592 NP_415912 NP_417164 NP_415785 NP_416249 NP_417399 NP_417573 NP_415468 NP_418508 NP_415858 NP_417776 YjcZ YdaV TrpL YebN Spr YleB...”
More
Biodistribution of ⁸⁹Zr-DFO-labeled avian pathogenic Escherichia coli outer membrane vesicles by PET imaging in chickens
Li, Poultry science 2023
- “...ACFD Function unknown Cell inner membrane 356 P77804 YDGA Function unknown Cell inner membrane 357 P0C0S1 MSCS Cell wall/membrane/envelope biogenesis Cell inner membrane 358 P0AB98 ATP6 Energy production and conversion Cell inner membrane 359 P0ADA3 NLPD Cell wall/membrane/envelope biogenesis Cell inner membrane 360 P23894 HTPX Posttranslational...”
Proteomic analyses revealed the antibacterial mechanism of Aronia melanocarpa isolated anthocyanins against Escherichia coli O157: H7
Deng, Current research in food science 2022
- “...and heat shock response B1X6E2 mscL Large-conductance mechanosensitive channel Up B1XDP7 groL 60kDa chaperonin Down P0C0S1 mscS Small-conductance mechanosensitive channel Down B7NE05 hslO 33kDa chaperonin Down P0AEB5 ynaI Low conductance mechanosensitive channel YnaI Down P0ACH0 hslR Heat shock protein 15 Down B7NQY6 ibpB Small heat shock...”
Interdependence of a mechanosensitive anion channel and glutamate receptors in distal wound signaling
Moe-Lange, Science advances 2021
- “...and their UniProt or NCBI accession numbers are as follows: MscS ( E. coli , P0C0S1), MscK ( E. coli , P77338), MscM ( E. coli , P39285), MSL1 ( A. thaliana , Q8VZL4), MSL2 ( A. thaliana , Q56X46), MSL3 ( A. thaliana , Q8L7W1),...”
Contribution of mechanosensitive channels to osmoadaptation and ectoine excretion in Halomonas elongata
Vandrich, Extremophiles : life under extreme conditions 2020
- “...channels. (a) We downloaded the entry for each mechanosensitive channel of Escherichia coli (mscS-related: UniProt P0C0S1 mscS, P77338 mscK, P39285 mscM, POAEB5 ynaI, P75783 ybiO, P0AAT4 mscM; mscL-related: P0A742 mscL) (Berrier et al. 1996 ) and of Corynebacterium glutamicum (Cgl0879 mscL UniProt:Q8NS07; Cgl1270 yggB UniProt:P42531; KIQ_000100...”
United in diversity: mechanosensitive ion channels in plants
Hamilton, Annual review of plant biology 2015
- “...Information Resource (TAIR) accession numbers, or Phytozome accession numbers are as follows: Escherichia coli MscS (P0C0S1), YbdG (P0AAT4); Synechocystis sp. PCC6803 bCNGa (M1ME31); Helicobacter pylori MscS (E1Q2W1); Corynebacterium glutamicum MscCG (P42531); Thermoanaerobacter tengcongensis MscS (Q8R6L9); Toxoplasma gondii (B6KM08); Plasmodium falciparum (Q8IIS3); Dictyostelium discoideum (Q54ZV3); Schizosaccharomyces pombe...”
Bioinformatic analyses of integral membrane transport proteins encoded within the genome of the planctomycetes species, Rhodopirellula baltica.
Paparoditis, Biochimica et biophysica acta 2014
- “...1.A.23.1.1 P77338 11 nonselective nonselective Q7URB6 12 1.A.23.1.3 P39285 13 nonselective nonselective Q7UKC6 12 1.A.23.2.1 P0C0S1 4 nonselective nonselective Q7UW85 4 1.A.23.2.1 P0C0S1 4 nonselective nonselective Q7USJ8 4 1.A.23.2.1 P0C0S1 4 nonselective nonselective Q7UF95 3 1.A.23.2.1 P0C0S1 4 nonselective nonselective Q7USJ6 3 1.A.23.2.1 P0C0S1 4 nonselective...”
The Escherichia coli proteome: past, present, and future prospects
Han, Microbiology and molecular biology reviews : MMBR 2006
- “...protein 5.85/39,938.08 8.62/64,460.71 MscS (YggB) P0C0S1 Small-conductance mechanosensitive channel 7.9/30,896.02 MsrA P0A744 Peptide methionine sulfoxide...”
Transcriptomic analysis of carboxylic acid challenge in Escherichia coli: beyond membrane damage
Royce, PloS one 2014
- “...stress b3405 ompR 0.013 0.163 2.7 OMP TR b3515 gadW 0.026 0.219 2.6 AR2 TR b2924 mscS 0.001 0.028 2.4 IMP mechanosensitive (MS) channel; non-specific transporter b0850 ybjC 0.003 0.011 2.4 marA/SoxS induced b3024 ygiW 0.004 0.014 2.4 IMP transporter, SR b0775 (c0855) bioB 0.018 0.193...”
Global transcriptomic analysis of an engineered Escherichia coli strain lacking the phosphoenolpyruvate: carbohydrate phosphotransferase system during shikimic acid production in rich culture medium
Cortés-Tolalpa, Microbial cell factories 2014
- “...narU b1469 NarU MFS nitrate/nitrite antiporter Transport 62.1767 ompC b2215 OMP C Transport 37.5373 yggB b2924 Mechano sensitive channel MscS Transport 22.3863 a Retrieved from EcoCyc database, b Biological functions were assigned according to the EcoCyc database for E. coli strain MG1655 and broadly grouped on...”
A PhoQ/P-regulated small RNA regulates sensitivity of Escherichia coli to antimicrobial peptides
Moon, Molecular microbiology 2009
- “...formate-lyase subunit ygdQ b2832 15.0 9.2 3.0 4.3 putative transport protein mscS ( yggB ) b2924 2.3 2.1 1.8 1.4 Subunit of mechanosensitive channel pitA b3493 2.7 2.7 1.5 1.5 low-affinity phosphate transport eptB b3546 9.7 8.7 1.8 2.8 phosphoethanolamine transferase yibK b3606 2.8 2.7 1.7...”
DNA microarray analyses of the long-term adaptive response of Escherichia coli to acetate and propionate
Polen, Applied and environmental microbiology 2003
- “...b2869 b2869 1 Putative transcriptional regulator 0.41 0.49* 1.25 b2924 yggB 1 Component of MscS 2.15* 1.56* 1.59* b2973 b2974 b2973 b2974 1 1 ORF, hypothetical...”

WP_000389819 small-conductance mechanosensitive channel MscS from Escherichia coli
29% identity, 59% coverage

A novel mechanosensitive channel controls osmoregulation, differentiation, and infectivity in Trypanosoma cruzi
Dave, eLife 2021
- “...brucei (Tb427.10.9030) and Leishmania major (LmjF.36.5770), with two bacterial MscS channels from E scherichia coli (WP_000389819) and H elicobacter pylori (WP_000343449.1). The positions of the transmembrane domains TM1, TM2, and TM3 are underlined, the positions of the putative gate residues are indicated by red arrows, andconserved...”
- “...brucei (Tb427.10.9030) and Leishmania major (LmjF.36.5770), with two bacterial MscS channels from E scherichia coli (WP_000389819) and H elicobacter pylori (WP_000343449.1). The positions of the transmembrane domains TM1, TM2, and TM3 are underlined, the position of the putative gate residues is indicated by red arrows, andconserved...”

TDE2295 mechanosensitive ion channel family protein from Treponema denticola ATCC 35405
23% identity, 59% coverage

Transcriptional profiles of Treponema denticola in response to environmental conditions
McHardy, PloS one 2010
- “...respond via conformational changes, the two mechano-sensitive transporter homologs of T. denticola , TDE2323 and TDE2295, were not induced at the 1 hour time point tested in this study. Homologues to osmo-regulated periplasmic glucans that certain gram-negative species employ as an additional layer of protection [16]...”

CC3612 conserved hypothetical protein from Caulobacter crescentus CB15
31% identity, 48% coverage

Two families of mechanosensitive channel proteins
Pivetti, Microbiology and molecular biology reviews : MMBR 2003
- “...membrane protein Cj1007c Conserved hypothetical protein CC3612 Conserved hypothetical protein CC3000 AefA protein (fragment) Hypothetical 30.6-kDa protein...”

NP_230134, VC0480 conserved hypothetical protein from Vibrio cholerae O1 biovar eltor str. N16961
27% identity, 55% coverage

The mechanoelectrical response of the cytoplasmic membrane of Vibrio cholerae
Rowe, The Journal of general physiology 2013
- “...identity) and a likely orthologue of E. coli MscS (available at RefSeq under accession no. NP_230134 ; 287 aa, 48% overall identity with an 84% conserved TM3- domain region). A more distant MscS orthologue is coded by RefSeq accession number NP_232581 (291 aa, 21% identity). An...”
Transcriptional profiling of Vibrio cholerae recovered directly from patient specimens during early and late stages of human infection
Larocque, Infection and immunity 2005
- “...4.9 105 3.2 104 Hypothetical proteins (conserved) VC0480 VC0641 VC0762 VC1317 VC1723 VC2479 VC2706 VC2720 VCA0769 Conserved Conserved Conserved Conserved...”
Two families of mechanosensitive channel proteins
Pivetti, Microbiology and molecular biology reviews : MMBR 2003
- “...protein TP0822 Conserved hypothetical protein VC0480 Conserved hypothetical protein VC1751 Conserved hypothetical protein VC0265 Hypothetical protein...”

7onjA Mechanosensitive channel mscs solubilized with lmng in open conformation (see paper)
28% identity, 63% coverage

Ligand: lauryl maltose neopentyl glycol (7onjA)

Igni_0056 MscS Mechanosensitive ion channel from Ignicoccus hospitalis KIN4/I
35% identity, 32% coverage

A Complex Endomembrane System in the Archaeon Ignicoccus hospitalis Tapped by Nanoarchaeum equitans
Heimerl, Frontiers in microbiology 2017
- “...cell is also supported by comparative proteomic analysis correlating a higher expression of mechanosensitive channels (Igni_0056 and Igni_0235) in I. hospitalis with an increasing number of attached N. equitans cells (Giannone et al., 2015 ). A direct cytoplasmic connection between the two organisms was previously proposed...”

New Search

For advice on how to use these tools together, see Interactive tools for functional annotation of bacterial genomes.

Statistics

The PaperBLAST database links 798,070 different protein sequences to 1,261,478 scientific articles. Searches against EuropePMC were last performed on May 12 2025.

How It Works

PaperBLAST builds a database of protein sequences that are linked to scientific articles. These links come from automated text searches against the articles in EuropePMC and from manually-curated information from GeneRIF, UniProtKB/Swiss-Prot, BRENDA, CAZy (as made available by dbCAN), BioLiP, CharProtDB, MetaCyc, EcoCyc, TCDB, REBASE, the Fitness Browser, and a subset of the European Nucleotide Archive with the /experiment tag. Given this database and a protein sequence query, PaperBLAST uses protein-protein BLAST to find similar sequences with E < 0.001.

To build the database, we query EuropePMC with locus tags, with RefSeq protein identifiers, and with UniProt accessions. We obtain the locus tags from RefSeq or from MicrobesOnline. We use queries of the form "locus_tag AND genus_name" to try to ensure that the paper is actually discussing that gene. Because EuropePMC indexes most recent biomedical papers, even if they are not open access, some of the links may be to papers that you cannot read or that our computers cannot read. We query each of these identifiers that appears in the open access part of EuropePMC, as well as every locus tag that appears in the 500 most-referenced genomes, so that a gene may appear in the PaperBLAST results even though none of the papers that mention it are open access. We also incorporate text-mined links from EuropePMC that link open access articles to UniProt or RefSeq identifiers. (This yields some additional links because EuropePMC uses different heuristics for their text mining than we do.)

For every article that mentions a locus tag, a RefSeq protein identifier, or a UniProt accession, we try to select one or two snippets of text that refer to the protein. If we cannot get access to the full text, we try to select a snippet from the abstract, but unfortunately, unique identifiers such as locus tags are rarely provided in abstracts.

PaperBLAST also incorporates manually-curated protein functions:

Proteins from NCBI's RefSeq are included if a GeneRIF entry links the gene to an article in PubMed^®. GeneRIF also provides a short summary of the article's claim about the protein, which is shown instead of a snippet.
Proteins from Swiss-Prot (the curated part of UniProt) are included if the curators identified experimental evidence for the protein's function (evidence code ECO:0000269). For these proteins, the fields of the Swiss-Prot entry that describe the protein's function are shown (with bold headings).
Proteins from BRENDA, a curated database of enzymes, are included if they are linked to a paper in PubMed and their full sequence is known.
Every protein from the non-redundant subset of BioLiP, a database of ligand-binding sites and catalytic residues in protein structures, is included. Since BioLiP itself does not include descriptions of the proteins, those are taken from the Protein Data Bank. Descriptions from PDB rely on the original submitter of the structure and cannot be updated by others, so they may be less reliable. (For SitesBLAST and Sites on a Tree, we use a larger subset of BioLiP so that every ligand is represented among a group of structures with similar sequences, but for PaperBLAST, we use the non-redundant set provided by BioLiP.)
Every protein from EcoCyc, a curated database of the proteins in Escherichia coli K-12, is included, regardless of whether they are characterized or not.
Proteins from the MetaCyc metabolic pathway database are included if they are linked to a paper in PubMed and their full sequence is known.
Proteins from the Transport Classification Database (TCDB) are included if they have known substrate(s), have reference(s), and are not described as uncharacterized or putative. (Some of the references are not visible on the PaperBLAST web site.)
Every protein from CharProtDB, a database of experimentally characterized protein annotations, is included.
Proteins from the CAZy database of carbohydrate-active enzymes are included if they are associated with an Enzyme Classification number. Even though CAZy does not provide links from individual protein sequences to papers, these should all be experimentally-characterized proteins.
Proteins from the REBASE database of restriction enzymes are included if they have known specificity.
Every protein with an evidence-based reannotation (based on mutant phenotypes) in the Fitness Browser is included.
Sequence-specific transcription factors (including sigma factors and DNA-binding response regulators) with experimentally-determined DNA binding sites from the PRODORIC database of gene regulation in prokaryotes.
Putative transcription factors from RegPrecise that have manually-curated predictions for their binding sites. These predictions are based on conserved putative regulatory sites across genomes that contain similar transcription factors, so PaperBLAST clusters the TFs at 70% identity and retains just one member of each cluster.
Coding sequence (CDS) features from the European Nucleotide Archive (ENA) are included if the /experiment tag is set (implying that there is experimental evidence for the annotation), the nucleotide entry links to paper(s) in PubMed, and the nucleotide entry is from the STD data class (implying that these are targeted annotated sequences, not from shotgun sequencing). Also, to filter out genes whose transcription or translation was detected, but whose function was not studied, nucleotide entries or papers with more than 25 such proteins are excluded. Descriptions from ENA rely on the original submitter of the sequence and cannot be updated by others, so they may be less reliable.

Except for GeneRIF and ENA, the curated entries include a short curated description of the protein's function. For entries from BioLiP, the protein's function may not be known beyond binding to the ligand. Many of these entries also link to articles in PubMed.

For more information see the PaperBLAST paper (mSystems 2017) or the code. You can download PaperBLAST's database here.

Changes to PaperBLAST since the paper was written:

November 2023: incorporated PRODORIC and RegPrecise. Many PRODORIC entries were not linked to a protein sequence (no UniProt identifier), so we added this information.
February 2023: BioLiP changed their download format. PaperBLAST now includes their non-redundant subset. SitesBLAST and Sites on a Tree use a larger non-redundant subset that ensures that every ligand is represented within each cluster. This should ensure that every binding site is represented.
June 2022: incorporated some coding sequences from ENA with the /experiment tag.
March 2022: incorporated BioLiP.
April 2020: incorporated TCDB.
April 2019: EuropePMC now returns table entries in their search results. This has expanded PaperBLAST's database, but most of the new entries are of low relevance, and the resulting snippets are often just lists of locus tags with annotations.
February 2018: the alignment page reports the conservation of the hit's functional sites (if available from from Swiss-Prot or UniProt)
January 2018: incorporated BRENDA.
December 2017: incorporated MetaCyc, CharProtDB, CAZy, REBASE, and the reannotations from the Fitness Browser.
September 2017: EuropePMC no longer returns some table entries in their search results. This has shrunk PaperBLAST's database, but has also reduced the number of low-relevance hits.

Many of these changes are described in Interactive tools for functional annotation of bacterial genomes.

Secrets

PaperBLAST cannot provide snippets for many of the papers that are published in non-open-access journals. This limitation applies even if the paper is marked as "free" on the publisher's web site and is available in PubmedCentral or EuropePMC. If a journal that you publish in is marked as "secret," please consider publishing elsewhere.

Omissions from the PaperBLAST Database

Many important articles are missing from PaperBLAST, either because the article's full text is not in EuropePMC (as for many older articles), or because the paper does not mention a protein identifier such as a locus tag, or because of PaperBLAST's heuristics. If you notice an article that characterizes a protein's function but is missing from PaperBLAST, please notify the curators at UniProt or add an entry to GeneRIF. Entries in either of these databases will eventually be incorporated into PaperBLAST. Note that to add an entry to UniProt, you will need to find the UniProt identifier for the protein. If the protein is not already in UniProt, you can ask them to create an entry. To add an entry to GeneRIF, you will need an NCBI Gene identifier, but unfortunately many prokaryotic proteins in RefSeq do not have corresponding Gene identifers.

References

PaperBLAST: Text-mining papers for information about homologs.
M. N. Price and A. P. Arkin (2017). mSystems, 10.1128/mSystems.00039-17.

Europe PMC in 2017.
M. Levchenko et al (2017). Nucleic Acids Research, 10.1093/nar/gkx1005.

Gene indexing: characterization and analysis of NLM's GeneRIFs.
J. A. Mitchell et al (2003). AMIA Annu Symp Proc 2003:460-464.

UniProt: the universal protein knowledgebase.
The UniProt Consortium (2016). Nucleic Acids Research, 10.1093/nar/gkw1099.

BRENDA in 2017: new perspectives and new tools in BRENDA.
S. Placzek et al (2017). Nucleic Acids Research, 10.1093/nar/gkw952.

The EcoCyc database: reflecting new knowledge about Escherichia coli K-12.
I. M. Keeseler et al (2016). Nucleic Acids Research, 10.1093/nar/gkw1003.

The MetaCyc database of metabolic pathways and enzymes.
R. Caspi et al (2018). Nucleic Acids Research, 10.1093/nar/gkx935.

CharProtDB: a database of experimentally characterized protein annotations.
R. Madupu et al (2012). Nucleic Acids Research, 10.1093/nar/gkr1133.

The carbohydrate-active enzymes database (CAZy) in 2013.
V. Lombard et al (2014). Nucleic Acids Research, 10.1093/nar/gkt1178.

The Transporter Classification Database (TCDB): recent advances
M. H. Saier, Jr. et al (2016). Nucleic Acids Research, 10.1093/nar/gkv1103.

REBASE - a database for DNA restriction and modification: enzymes, genes and genomes.
R. J. Roberts et al (2015). Nucleic Acids Research, 10.1093/nar/gku1046.

Deep annotation of protein function across diverse bacteria from mutant phenotypes.
M. N. Price et al (2016). bioRxiv, 10.1101/072470.

by Morgan Price, Arkin group
Lawrence Berkeley National Laboratory