PaperBLAST

PaperBLAST – Find papers about a protein or its homologs

PaperBLAST

PaperBLAST Hits for MCAODC_10750 (73 a.a., MTTQSSPVIT...)

Other sequence analysis tools:

Find functional residues: SitesBLAST

Search for conserved domains

Find the best match in UniProt

Compare to protein structures

Predict transmenbrane helices: Phobius

Predict protein localization: PSORTb

Find homologs in fast.genomics

Fitness BLAST: loading...

Show query sequence

Found 19 similar proteins in the literature:

SEN2806 glucarate dehydratase-related protein from Salmonella enterica subsp. enterica serovar Enteritidis str. P125109
97% identity, 16% coverage

Whole genome sequencing provides insights into the genetic determinants of invasiveness in Salmonella Dublin
Mohammed, Epidemiology and infection 2016
- “...SEN2783), the gene encoding a probable glucarate dehydratase 2 (SEN2806), the gene encoding the outer membrane usher protein LpfC (SEN3461) and the gene 2434 M....”
- “...phosphotransferase system permease SEN0784 SEN2182* SEN2783 SEN2806 SEN3461 SEN3672 Enteritidis (PT4) Dublin (Irish isolates) Gallinarum (287/91) Cholerasuis...”
Genomic Comparison of the Closely Related Salmonella enterica Serovars Enteritidis and Dublin
Betancor, The open microbiology journal 2012
- “...I restriction modification system protein (SEN4290), and the gene encoding a probable glucarate dehydratase 2 (SEN2806 or ygcY ). The other two genes that complete this list are mglA (SEN2182) and shdA (SEN2493), which are pseudogenes in S. Typhi CT18 and Ty2 as well as in...”
- “...ATCAACCGGTTTGTCATTCG Reverse TACCGTCCCAGTCGCCGTTG Reverse2 SEN2783 GTGAGGTATATCAACAAAAAAGACCA Forward TCCAGAGGCAATCCAGGA Forward2 TGTGCAGGCGCCGTTG Forward3 ACGGACGGGGAGCCAGG Reverse CAACCTCTTTGCGTGTATCAACC Reverse2 SEN2806 GTGCTGGTAGGCGATATTAAG Forward CTTCCCGGACGCGCGTAT Forward2 AACCTGCATTTCAGTCACTACAG Reverse SEN3461 TTTGGCACGGCTGGCGACAT Forward GAATGCCCTGCTGGTGGATT Forward2 CGTGCCGGGAACTATAACAG Forward3 AGCACCGACCCGCCCAACA Reverse GCCGCGCAAACCGTAGTTCA Reverse2 SEN3672 GGCCTGGTCACGTCTGTAAC Forward CTCTCTTTTGTCTTCGGTATCC Forward2 TATGACGGTTTGATGACAATGG Reverse SEN4290 AACGCTTGAGGATTTAATAGAA Forward CTGATTCAGTACCGTCAGTG Reverse Table...”

SL1344_2942 enolase C-terminal domain-like protein from Salmonella enterica subsp. enterica serovar Typhimurium str. SL1344
96% identity, 16% coverage

speG Is Required for Intracellular Replication of Salmonella in Various Human Cells and Affects Its Polyamine Metabolism and Global Transcriptomes
Fang, Frontiers in microbiology 2017
- “...which are involved in the periplasmic nitrate reductase system; ygcX, ygcZ, garL, garR , and SL1344_2942, which are associated with glucarate metabolism; SL1344_3736 and SL1344_4467, which are related to the phosphotransferase system; cyoA, cyoB , and cyoC , which encode cytochrome-related proteins; and sdhA, sdhB, sdhC...”
- “...garL SL1344_3222 5-Keto-4-deoxy-D-glucarate aldolase 2.819 garR SL1344_3221 2-Hydroxy-3-oxopropionate reductase 2.760 ygcX SL1344_2941 Glucarate dehydratase 2.037 SL1344_2942 SL1344_2942 Glucarate dehydratase 1.257 Genes of Phosphotransferase System SL1344_3736 SL1344_3736 Putative PTS system protein 1.673 SL1344_4467 SL1344_4467 PTS transport system, IIB component 1.621 Genes of Cytochromes cyoA SL1344_0437 Cytochrome o...”

Z4104 No description from Escherichia coli O157:H7 EDL933
100% identity, 100% coverage

Clonal and antigenic analysis of serogroup A Neisseria meningitidis with particular reference to epidemiological features of epidemic meningitis in the People's Republic of China
Wang, Infection and immunity 1992
- “...Z4069, Z4070, Z4071, Z4073, Z4075, Z4079, Z4081, Z4097, Z4104, Z4109, Z4115, Z4736, Z4737, Z4738, Z4739, Z4740, Z4744, Z4745, Z4747, Z4748, Z4749, Z4750 Z4752,...”

S2995 putative glucarate dehydratase from Shigella flexneri 2a str. 2457T
97% identity, 16% coverage

Analysis of the type 1 pilin gene cluster fim in Salmonella: its distinct evolutionary histories in the 5' and 3' regions
Boyd, Journal of bacteriology 1999
- “...S. bongori a fim1 fim2 fim3 fim4 fim5 fim6 S3333 S4194 S2985 S2993 S2995 S3057 S2978 S2979 S3015 S3027 S3013 S3014 S2980 S2983 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1...”
- “...isolates of subspecies I. One isolate of subspecies VI (S2995) gave a PCR product approximately 1,500 bp larger than that expected, whereas isolate S3057 gave a...”

GudX / b2788 glucarate dehydratase-related protein from Escherichia coli K-12 substr. MG1655 (see 2 papers)
GUDX_ECOLI / Q46915 Glucarate dehydratase-related protein; GDH-RP; GlucDRP; EC 4.2.1.- from Escherichia coli (strain K12) (see paper)
gudX / RF|NP_417268 glucarate dehydratase-related protein from Escherichia coli K12
b2788 predicted glucarate dehydratase from Escherichia coli str. K-12 substr. MG1655
96% identity, 16% coverage

function: Does not seem to have an in-vivo activity on glucarate or idarate. Its real substrate is unknown
cofactor: a divalent metal cation
A common regulator for the operons encoding the enzymes involved in D-galactarate, D-glucarate, and D-glycerate utilization in Escherichia coli
Monterrubio, Journal of bacteriology 2000
- “...permease (b2789), a nonfunctional D-glucarate dehydratase-related protein (b2788), and the functional D-glucarate dehydratase (b2787). The other two units are...”

c3352 Glucarate dehydratase related protein from Escherichia coli CFT073
96% identity, 16% coverage

Latency, Anti-Bacterial Resistance Pattern, and Bacterial Infection-Related Glomerulonephritis
John, Clinical journal of the American Society of Nephrology : CJASN 2021 (secret)

OA04_36650 enolase C-terminal domain-like protein from Pectobacterium versatile
84% identity, 15% coverage

The PhoPQ Two-Component System Is the Major Regulator of Cell Surface Properties, Stress Responses and Plant-Derived Substrate Utilisation During Development of Pectobacterium versatile-Host Plant Pathosystems
Kravchenko, Frontiers in microbiology 2020
- “...13.8 raiA OA04_34460 2.18 Stationary phase translation inhibitor and ribosome stability factor CCGTTTTTTTTATGGTTAG 7.3 gudX OA04_36650 4.2 Glucarate dehydratase ACTTTTTACTGAGGTTGGT 7.5 metF OA04_43310 3.58 5,10-methylenetetrahydrofolate reductase a Wild type vs. phoP mutant ratio. For operons, the value for the first gene is shown. Supplementary Table 3...”

3n6hB / A6VQF6 Crystal structure of mandelate racemase/muconate lactonizing protein from actinobacillus succinogenes 130z complexed with magnesium/sulfate
74% identity, 16% coverage

Ligand: magnesium ion (3n6hB)

STM2960 d-glucarate dehydratase from Salmonella typhimurium LT2
SL1344_2941, STM14_3568 glucarate dehydratase from Salmonella enterica subsp. enterica serovar Typhimurium str. 14028S
65% identity, 16% coverage

Hydrogen-stimulated carbon acquisition and conservation in Salmonella enterica serovar Typhimurium
Lamichhane-Khadka, Journal of bacteriology 2011
- “...Carbon transport and metabolism Carbohydrates STM2962 STM4077/78 STM2960 STM3557 STM2190 STM3884 STM1830/31 STM4325 STM0685 STM4074 STM4075/76 Genea VOL. 193,...”
Salmonella serovar identification using PCR-based detection of gene presence and absence
Arrach, Journal of clinical microbiology 2008
- “...STM2767, STM2816, STM2914, STM2917, STM2922, STM2941, STM2960, STM3024, STM3026, STM3028, STM3036, STM3082, STM3120, STM3253, STM3254, STM3256, STM3257,...”
Genome-Wide Identification and Expression Analysis of SOS Response Genes in Salmonella enterica Serovar Typhimurium
Mérida-Floriano, Cells 2021
- “...8.27 STM14_3214 -- 7.70 1 6 12.33 STM14_5094 lexA 6.86 2 6, 27 14.48, 7.94 STM14_3568 gudD 5.43 1 0 16.97 STM14_3405 yqaB 5.32 1 12 16.26 STM14_1439 dinI Gifsy-3 4.56 1 19 5.26 STM14_2752 yejK 4.54 1 102 20.89 STM14_2422 umuC ** 3.91 1 --...”
speG Is Required for Intracellular Replication of Salmonella in Various Human Cells and Affects Its Polyamine Metabolism and Global Transcriptomes
Fang, Frontiers in microbiology 2017
- “...SL1344_2943 Glucarate transporter 3.248 garL SL1344_3222 5-Keto-4-deoxy-D-glucarate aldolase 2.819 garR SL1344_3221 2-Hydroxy-3-oxopropionate reductase 2.760 ygcX SL1344_2941 Glucarate dehydratase 2.037 SL1344_2942 SL1344_2942 Glucarate dehydratase 1.257 Genes of Phosphotransferase System SL1344_3736 SL1344_3736 Putative PTS system protein 1.673 SL1344_4467 SL1344_4467 PTS transport system, IIB component 1.621 Genes of Cytochromes...”

Z4102 putative glucarate dehydratase from Escherichia coli O157:H7 EDL933
64% identity, 16% coverage

Clonal and antigenic analysis of serogroup A Neisseria meningitidis with particular reference to epidemiological features of epidemic meningitis in the People's Republic of China
Wang, Infection and immunity 1992
- “...Z3771, Z3786, Z3787 Z3905, Z3909 B503, Z3917, Z3921, Z4102, Z4735 Z3911, Z3912, Z3913, Z3914, Z3915, Z3916, Z3920, Z3922, Z3923, Z3924, Z3925, Z3926, Z3927...”

YgcX / b2787 D-glucarate dehydratase (EC 4.2.1.40) from Escherichia coli K-12 substr. MG1655 (see 6 papers)
gudD / P0AES2 D-glucarate dehydratase (EC 4.2.1.40) from Escherichia coli (strain K12) (see 5 papers)
GUDD_ECOLI / P0AES2 Glucarate dehydratase; GDH; GlucD; D-glucarate dehydratase; EC 4.2.1.40 from Escherichia coli (strain K12) (see 3 papers)
gudD / RF|NP_417267 glucarate dehydratase; EC 4.2.1.40 from Escherichia coli K12 (see 8 papers)
b2787 (D)-glucarate dehydratase 1 from Escherichia coli str. K-12 substr. MG1655
NP_417267 D-glucarate dehydratase from Escherichia coli str. K-12 substr. MG1655
64% identity, 16% coverage

function: Catalyzes the dehydration of glucarate or L-idarate to 5- keto-4-deoxy-D-glucarate (5-kdGluc) (PubMed:9772162). Also catalyzes the epimerization of D-glucarate and L-idarate (PubMed:11513584).
catalytic activity: D-glucarate = 5-dehydro-4-deoxy-D-glucarate + H2O (RHEA:14573)
cofactor: Mg(2+)
subunit: Homodimer.
A common regulator for the operons encoding the enzymes involved in D-galactarate, D-glucarate, and D-glycerate utilization in Escherichia coli
Monterrubio, Journal of bacteriology 2000
- “...protein (b2788), and the functional D-glucarate dehydratase (b2787). The other two units are located at min 70 and are divergently transcribed,...”
Evolution of enzymatic activities in the enolase superfamily: characterization of the (D)-glucarate/galactarate catabolic pathway in Escherichia coli.
Hubbard, Biochemistry 1998 (PubMed)
- GeneRIF: N-terminus verified by Edman degradation on mature peptide

1ec9D / P0AES2 E. Coli glucarate dehydratase bound to xylarohydroxamate (see paper)
65% identity, 16% coverage

Ligands: magnesium ion; xylarohydroxamate (1ec9D)

RSc1079 PROBABLE GLUCARATE DEHYDRATASE PROTEIN from Ralstonia solanacearum GMI1000
60% identity, 15% coverage

Changes in DNA methylation contribute to rapid adaptation in bacterial plant pathogen evolution
Gopalan-Nair, PLoS biology 2024
- “...6mA 6A 6A 6A 6A 6A 6A 6A 6mA 6A 6A 6mA 6A RSc1078 / RSc1079 GTAAAC upstream /gudD1 Transcription regulator / D-Glucarate dehydratase 1134729 6mA 6mA 6mA 6mA 6mA 6mA 6mA 6mA 6mA 6mA 6mA 6mA 6mA 6mA 6mA 6A 6mA 6mA 6mA 6mA 6mA...”
- “...2.62 1.94 1.80 3.29 2.96 -0.14 -0.66 0.40 0.05 0.09 0.13 0.28 -0.11 0.26 0.81 RSc1079 0.05 -0.21 -1.31 0.15 -0.98 0.22 -0.79 0.23 -0.41 0.38 -0.21 -0.39 0.40 -1.07 -1.29 -0.13 0.30 -0.01 -0.91 0.62 0.30 -0.09 -0.52 -0.13 -0.21 0.37 -0.08 0.06 0.21 -0.17...”

BCAL1043 glucarate dehydratase from Burkholderia cenocepacia J2315
K562_RS13470 glucarate dehydratase from Burkholderia cenocepacia
60% identity, 15% coverage

Elucidation of the mechanism behind the potentiating activity of baicalin against Burkholderia cenocepacia biofilms
Slachmuylders, PloS one 2018
- “...chain - 1.7 BCAL2622 ( ppa ) Polyphosphate kinase - -1.5 Glucarate/galactarate metabolism to 2-oxo-glutarate BCAL1043 ( gudD ) Glucarate dehydratase 2.6 1.5 BCAM2511 ( garD ) Putative galactarate dehydratase 2.3 1.6 BCAM2512 5-dehydro-4-deoxyglucarate dehydratase 2.2 2.9 BCAM2514* Putative fatty aldehyde dehydrogenase 2.0 1.6 Quorum sensing...”
Comparative transcriptomic analysis of the Burkholderia cepacia tyrosine kinase bceF mutant reveals a role in tolerance to stress, biofilm formation, and virulence
Ferreira, Applied and environmental microbiology 2013
- “...ion transport and metabolism BCAL0040 BCAL0475 BCAL0665 BCAL1043 BCAL1047 BCAL1728 BCAL2112 BCAL2458 BCAL2782 BCAL3049 BCAL3094 BCAM2626 1.4 1.2 1.5 1.3 1.2...”
The mechanism of action of auranofin analogs in B. cenocepacia revealed by chemogenomic profiling
Maydaniuk, Microbiology spectrum 2024
- “...family protein), K562_RS12100 (acyl-CoA dehydrogenase), K562_RS01045 (Raf kinase inhibitor-like protein), K562_RS06455 (putative PHA depolymerase protein), K562_RS13470 ( gudD , glucarate dehydratase), K562_RS16220 (DUF3025 domain-containing protein), K562_RS18550 (hypothetical protein), and K562_RS28510 (hypothetical protein). Fitness values for each strain were calculated as the log 2 (reads in experimental...”

gudD / Q6FFQ2 D-glucarate dehydratase subunit (EC 4.2.1.40) from Acinetobacter baylyi (strain ATCC 33305 / BD413 / ADP1) (see paper)
GUDD_ACIAD / Q6FFQ2 Glucarate dehydratase; GDH; GlucD; EC 4.2.1.40 from Acinetobacter baylyi (strain ATCC 33305 / BD413 / ADP1) (see paper)
Q6FFQ2 glucarate dehydratase (EC 4.2.1.40) from Acinetobacter baylyi (see paper)
ACIAD0128 D-glucarate dehydratase from Acinetobacter sp. ADP1
58% identity, 16% coverage

function: Catalyzes the dehydration of glucarate to 5-keto-4-deoxy-D- glucarate (5-kdGluc).
catalytic activity: D-glucarate = 5-dehydro-4-deoxy-D-glucarate + H2O (RHEA:14573)
cofactor: Mg(2+)
L-Hydroxyproline and d-Proline Catabolism in Sinorhizobium meliloti
Chen, Journal of bacteriology 2016
- “...using the A. baylyi enzymes D-glucarate dehydratase (ACIAD0128) and D-5-keto-4-deoxyglucarate dehydratase (ACIAD0130), which were overexpressed from E. coli as...”
- “...Alain Perret for clones carrying the A. baylyi proteins ACIAD0128 and ACIAD0130 used for synthesis of -KGSA, and Seiya Watanabe for clones carrying the P....”
New insights into the alternative D-glucarate degradation pathway
Aghaie, The Journal of biological chemistry 2008
- “...on D-glucarate Gene ID ACIAD0127 ACIAD0128 ACIAD0130 ACIAD0131 ACIAD0244 ACIAD2275 ACIAD2417 ACIAD2876 Function D-Glucarate/D-Galactarate D-Glucarate...”
- “...are co-located on the genome (ACIAD0127, ACIAD0128, ACIAD0130, and ACIAD0131) and are respectively annotated as D-glucarate/D-galactarate permease, D-glucarate...”

3p0wB / B2UIZ1 Crystal structure of d-glucarate dehydratase from ralstonia solanacearum complexed with mg and d-glucarate
63% identity, 16% coverage

Ligands: magnesium ion; d-glucarate (3p0wB)

gudD / P42206 D-glucarate dehydratase subunit (EC 4.2.1.40) from Pseudomonas putida (see paper)
GUDD_PSEPU / P42206 Glucarate dehydratase; GDH; GlucD; EC 4.2.1.40 from Pseudomonas putida (Arthrobacter siderocapsulatus) (see paper)
68% identity, 14% coverage

function: Catalyzes the dehydration of glucarate to 5-keto-4-deoxy-D- glucarate (5-kdGluc)
catalytic activity: D-glucarate = 5-dehydro-4-deoxy-D-glucarate + H2O (RHEA:14573)
cofactor: Mg(2+)
subunit: Homotetramer.

3nxlC / Q39KL8 Crystal structure of glucarate dehydratase from burkholderia cepacia complexed with magnesium
59% identity, 16% coverage

Ligands: magnesium ion; carbonate ion (3nxlC)

3nfuA / Q1QUN0 Crystal structure of probable glucarate dehydratase from chromohalobacter salexigens dsm 3043 complexed with magnesium
45% identity, 15% coverage

Ligand: magnesium ion (3nfuA)

New Search

For advice on how to use these tools together, see Interactive tools for functional annotation of bacterial genomes.

Statistics

The PaperBLAST database links 798,070 different protein sequences to 1,261,478 scientific articles. Searches against EuropePMC were last performed on May 12 2025.

How It Works

PaperBLAST builds a database of protein sequences that are linked to scientific articles. These links come from automated text searches against the articles in EuropePMC and from manually-curated information from GeneRIF, UniProtKB/Swiss-Prot, BRENDA, CAZy (as made available by dbCAN), BioLiP, CharProtDB, MetaCyc, EcoCyc, TCDB, REBASE, the Fitness Browser, and a subset of the European Nucleotide Archive with the /experiment tag. Given this database and a protein sequence query, PaperBLAST uses protein-protein BLAST to find similar sequences with E < 0.001.

To build the database, we query EuropePMC with locus tags, with RefSeq protein identifiers, and with UniProt accessions. We obtain the locus tags from RefSeq or from MicrobesOnline. We use queries of the form "locus_tag AND genus_name" to try to ensure that the paper is actually discussing that gene. Because EuropePMC indexes most recent biomedical papers, even if they are not open access, some of the links may be to papers that you cannot read or that our computers cannot read. We query each of these identifiers that appears in the open access part of EuropePMC, as well as every locus tag that appears in the 500 most-referenced genomes, so that a gene may appear in the PaperBLAST results even though none of the papers that mention it are open access. We also incorporate text-mined links from EuropePMC that link open access articles to UniProt or RefSeq identifiers. (This yields some additional links because EuropePMC uses different heuristics for their text mining than we do.)

For every article that mentions a locus tag, a RefSeq protein identifier, or a UniProt accession, we try to select one or two snippets of text that refer to the protein. If we cannot get access to the full text, we try to select a snippet from the abstract, but unfortunately, unique identifiers such as locus tags are rarely provided in abstracts.

PaperBLAST also incorporates manually-curated protein functions:

Proteins from NCBI's RefSeq are included if a GeneRIF entry links the gene to an article in PubMed^®. GeneRIF also provides a short summary of the article's claim about the protein, which is shown instead of a snippet.
Proteins from Swiss-Prot (the curated part of UniProt) are included if the curators identified experimental evidence for the protein's function (evidence code ECO:0000269). For these proteins, the fields of the Swiss-Prot entry that describe the protein's function are shown (with bold headings).
Proteins from BRENDA, a curated database of enzymes, are included if they are linked to a paper in PubMed and their full sequence is known.
Every protein from the non-redundant subset of BioLiP, a database of ligand-binding sites and catalytic residues in protein structures, is included. Since BioLiP itself does not include descriptions of the proteins, those are taken from the Protein Data Bank. Descriptions from PDB rely on the original submitter of the structure and cannot be updated by others, so they may be less reliable. (For SitesBLAST and Sites on a Tree, we use a larger subset of BioLiP so that every ligand is represented among a group of structures with similar sequences, but for PaperBLAST, we use the non-redundant set provided by BioLiP.)
Every protein from EcoCyc, a curated database of the proteins in Escherichia coli K-12, is included, regardless of whether they are characterized or not.
Proteins from the MetaCyc metabolic pathway database are included if they are linked to a paper in PubMed and their full sequence is known.
Proteins from the Transport Classification Database (TCDB) are included if they have known substrate(s), have reference(s), and are not described as uncharacterized or putative. (Some of the references are not visible on the PaperBLAST web site.)
Every protein from CharProtDB, a database of experimentally characterized protein annotations, is included.
Proteins from the CAZy database of carbohydrate-active enzymes are included if they are associated with an Enzyme Classification number. Even though CAZy does not provide links from individual protein sequences to papers, these should all be experimentally-characterized proteins.
Proteins from the REBASE database of restriction enzymes are included if they have known specificity.
Every protein with an evidence-based reannotation (based on mutant phenotypes) in the Fitness Browser is included.
Sequence-specific transcription factors (including sigma factors and DNA-binding response regulators) with experimentally-determined DNA binding sites from the PRODORIC database of gene regulation in prokaryotes.
Putative transcription factors from RegPrecise that have manually-curated predictions for their binding sites. These predictions are based on conserved putative regulatory sites across genomes that contain similar transcription factors, so PaperBLAST clusters the TFs at 70% identity and retains just one member of each cluster.
Coding sequence (CDS) features from the European Nucleotide Archive (ENA) are included if the /experiment tag is set (implying that there is experimental evidence for the annotation), the nucleotide entry links to paper(s) in PubMed, and the nucleotide entry is from the STD data class (implying that these are targeted annotated sequences, not from shotgun sequencing). Also, to filter out genes whose transcription or translation was detected, but whose function was not studied, nucleotide entries or papers with more than 25 such proteins are excluded. Descriptions from ENA rely on the original submitter of the sequence and cannot be updated by others, so they may be less reliable.

Except for GeneRIF and ENA, the curated entries include a short curated description of the protein's function. For entries from BioLiP, the protein's function may not be known beyond binding to the ligand. Many of these entries also link to articles in PubMed.

For more information see the PaperBLAST paper (mSystems 2017) or the code. You can download PaperBLAST's database here.

Changes to PaperBLAST since the paper was written:

November 2023: incorporated PRODORIC and RegPrecise. Many PRODORIC entries were not linked to a protein sequence (no UniProt identifier), so we added this information.
February 2023: BioLiP changed their download format. PaperBLAST now includes their non-redundant subset. SitesBLAST and Sites on a Tree use a larger non-redundant subset that ensures that every ligand is represented within each cluster. This should ensure that every binding site is represented.
June 2022: incorporated some coding sequences from ENA with the /experiment tag.
March 2022: incorporated BioLiP.
April 2020: incorporated TCDB.
April 2019: EuropePMC now returns table entries in their search results. This has expanded PaperBLAST's database, but most of the new entries are of low relevance, and the resulting snippets are often just lists of locus tags with annotations.
February 2018: the alignment page reports the conservation of the hit's functional sites (if available from from Swiss-Prot or UniProt)
January 2018: incorporated BRENDA.
December 2017: incorporated MetaCyc, CharProtDB, CAZy, REBASE, and the reannotations from the Fitness Browser.
September 2017: EuropePMC no longer returns some table entries in their search results. This has shrunk PaperBLAST's database, but has also reduced the number of low-relevance hits.

Many of these changes are described in Interactive tools for functional annotation of bacterial genomes.

Secrets

PaperBLAST cannot provide snippets for many of the papers that are published in non-open-access journals. This limitation applies even if the paper is marked as "free" on the publisher's web site and is available in PubmedCentral or EuropePMC. If a journal that you publish in is marked as "secret," please consider publishing elsewhere.

Omissions from the PaperBLAST Database

Many important articles are missing from PaperBLAST, either because the article's full text is not in EuropePMC (as for many older articles), or because the paper does not mention a protein identifier such as a locus tag, or because of PaperBLAST's heuristics. If you notice an article that characterizes a protein's function but is missing from PaperBLAST, please notify the curators at UniProt or add an entry to GeneRIF. Entries in either of these databases will eventually be incorporated into PaperBLAST. Note that to add an entry to UniProt, you will need to find the UniProt identifier for the protein. If the protein is not already in UniProt, you can ask them to create an entry. To add an entry to GeneRIF, you will need an NCBI Gene identifier, but unfortunately many prokaryotic proteins in RefSeq do not have corresponding Gene identifers.

References

PaperBLAST: Text-mining papers for information about homologs.
M. N. Price and A. P. Arkin (2017). mSystems, 10.1128/mSystems.00039-17.

Europe PMC in 2017.
M. Levchenko et al (2017). Nucleic Acids Research, 10.1093/nar/gkx1005.

Gene indexing: characterization and analysis of NLM's GeneRIFs.
J. A. Mitchell et al (2003). AMIA Annu Symp Proc 2003:460-464.

UniProt: the universal protein knowledgebase.
The UniProt Consortium (2016). Nucleic Acids Research, 10.1093/nar/gkw1099.

BRENDA in 2017: new perspectives and new tools in BRENDA.
S. Placzek et al (2017). Nucleic Acids Research, 10.1093/nar/gkw952.

The EcoCyc database: reflecting new knowledge about Escherichia coli K-12.
I. M. Keeseler et al (2016). Nucleic Acids Research, 10.1093/nar/gkw1003.

The MetaCyc database of metabolic pathways and enzymes.
R. Caspi et al (2018). Nucleic Acids Research, 10.1093/nar/gkx935.

CharProtDB: a database of experimentally characterized protein annotations.
R. Madupu et al (2012). Nucleic Acids Research, 10.1093/nar/gkr1133.

The carbohydrate-active enzymes database (CAZy) in 2013.
V. Lombard et al (2014). Nucleic Acids Research, 10.1093/nar/gkt1178.

The Transporter Classification Database (TCDB): recent advances
M. H. Saier, Jr. et al (2016). Nucleic Acids Research, 10.1093/nar/gkv1103.

REBASE - a database for DNA restriction and modification: enzymes, genes and genomes.
R. J. Roberts et al (2015). Nucleic Acids Research, 10.1093/nar/gku1046.

Deep annotation of protein function across diverse bacteria from mutant phenotypes.
M. N. Price et al (2016). bioRxiv, 10.1101/072470.

by Morgan Price, Arkin group
Lawrence Berkeley National Laboratory