PaperBLAST – Find papers about a protein or its homologs

 

PaperBLAST

PaperBLAST Hits for Q02287 T-protein (Enterobacter agglomerans) (373 a.a., MVAELTALRD...)

Other sequence analysis tools:

Find functional residues: SitesBLAST

Search for conserved domains

Find the best match in UniProt

Compare to protein structures

Predict transmenbrane helices: Phobius

Predict protein localization: PSORTb

Find homologs in fast.genomics

Fitness BLAST: loading...

Found 69 similar proteins in the literature:

Q02287 T-protein from Enterobacter agglomerans
100% identity, 100% coverage

UTI89_C2933 bifunctional chorismate mutase/prephenate dehydratase from Escherichia coli UTI89
88% identity, 100% coverage

A0A140N544 T-protein from Escherichia coli (strain B / BL21-DE3)
88% identity, 100% coverage

ECs3463 chorismate mutase-T / prephenate dehydrogenase from Escherichia coli O157:H7 str. Sakai
88% identity, 100% coverage

TyrA / b2600 fused chorismate mutase/prephenate dehydrogenase (EC 5.4.99.5; EC 1.3.1.12) from Escherichia coli K-12 substr. MG1655 (see 4 papers)
tyrA / P07023 fused chorismate mutase/prephenate dehydrogenase (EC 5.4.99.5; EC 1.3.1.12) from Escherichia coli (strain K12) (see 12 papers)
P07023 T-protein from Escherichia coli (strain K12)
b2600 fused chorismate mutase T/prephenate dehydrogenase from Escherichia coli str. K-12 substr. MG1655
88% identity, 100% coverage

YP_0399 T-protein [includes: chorismate mutase and prephenate dehydrogenase] from Yersinia pestis biovar Medievalis str. 91001
86% identity, 100% coverage

ETAE_2836 bifunctional chorismate mutase/prephenate dehydrogenase from Edwardsiella tarda EIB202
78% identity, 100% coverage

VP0547 chorismate mutase/prephenate dehydrogenase from Vibrio parahaemolyticus RIMD 2210633
64% identity, 99% coverage

P43902 prephenate dehydrogenase (EC 1.3.1.12) from Haemophilus influenzae (see paper)
HI1290 chorismate mutase / prephenate dehydrogenase (tyrA) from Haemophilus influenzae Rd KW20
59% identity, 97% coverage

SO1362 chorismate mutase/prephenate dehydrogenase from Shewanella oneidensis MR-1
58% identity, 98% coverage

2pv7B / P43902 Crystal structure of chorismate mutase / prephenate dehydrogenase (tyra) (1574749) from haemophilus influenzae rd at 2.00 a resolution (see paper)
58% identity, 75% coverage

FTN_0055 prephenate dehydrogenase from Francisella tularensis subsp. novicida U112
39% identity, 72% coverage

Npun_R1269 prephenate dehydrogenase from Nostoc punctiforme
36% identity, 95% coverage

COO91_00780 bifunctional chorismate mutase/prephenate dehydrogenase from Nostoc flagelliforme CCNUN1
34% identity, 95% coverage

all0418 chorismate mutase/prephenate dehydrogenase from Nostoc sp. PCC 7120
43% identity, 66% coverage

FTL_0048 prephenate dehydrogenase. from Francisella tularensis subsp. holarctica
42% identity, 57% coverage

J9XQS6 prephenate dehydrogenase (EC 1.3.1.12) from uncultured bacterium (see paper)
41% identity, 67% coverage

D3S601 Prephenate dehydrogenase from Methanocaldococcus sp. (strain FS406-22)
30% identity, 57% coverage

MMP1514 Prephenate dehydrogenase from Methanococcus maripaludis S2
30% identity, 59% coverage

MM1275 Prephenate dehydrogenase from Methanosarcina mazei Goe1
30% identity, 54% coverage

A0A101IGG2 prephenate dehydrogenase (NADP+) (EC 1.3.1.13) from Methanothrix harundinacea (see paper)
30% identity, 67% coverage

DVU0464 prephenate and/or arogenate dehydrogenase from Desulfovibrio vulgaris Hildenborough JW710
DVU0464 prephenate dehydrogenase from Desulfovibrio vulgaris Hildenborough
29% identity, 69% coverage

CPI83_19940 prephenate dehydrogenase dimerization domain-containing protein from Rhodococcus sp. H-CA8f
29% identity, 71% coverage

A8AAX2 prephenate dehydrogenase (NADP+) (EC 1.3.1.13) from Ignicoccus hospitalis (see 2 papers)
27% identity, 68% coverage

O30012 prephenate dehydrogenase (EC 1.3.1.12); prephenate dehydratase (EC 4.2.1.51); chorismate mutase (EC 5.4.99.5) from Archaeoglobus fulgidus (see paper)
AF0227 chorismate mutase/prephenate dehydratase (pheA) from Archaeoglobus fulgidus DSM 4304
27% identity, 41% coverage

Ddes_0334 Prephenate dehydrogenase from Desulfovibrio desulfuricans subsp. desulfuricans str. ATCC 27774
29% identity, 61% coverage

PFLU_1770 prephenate dehydrogenase dimerization domain-containing protein from Pseudomonas [fluorescens] SBW25
27% identity, 58% coverage

plu3562 No description from Photorhabdus luminescens subsp. laumondii TTO1
28% identity, 68% coverage

Dde_3485 Prephenate dehydrogenase from Desulfovibrio desulfuricans G20
Dde_3485 prephenate dehydrogenase/arogenate dehydrogenase family protein from Oleidesulfovibrio alaskensis G20
29% identity, 61% coverage

PAPC_STRPR / P72540 4-amino-4-deoxyprephenate dehydrogenase; EC 1.3.1.121 from Streptomyces pristinaespiralis (see paper)
32% identity, 50% coverage

cmlC / F2RB78 4-amino-4-deoxyprephenate dehydrogenase (EC 1.3.1.121) from Streptomyces venezuelae (strain ATCC 10712 / CBS 650.69 / DSM 40230 / JCM 4526 / NBRC 13096 / PD 04745) (see 2 papers)
CMLC_STRVP / F2RB78 4-amino-4-deoxyprephenate dehydrogenase; EC 1.3.1.121 from Streptomyces venezuelae (strain ATCC 10712 / CBS 650.69 / DSM 40230 / JCM 4526 / NBRC 13096 / PD 04745) (see paper)
32% identity, 48% coverage

papC / BAD21141.1 4-amino-4-deoxyprephenate dehydrogenase from Streptomyces venezuelae (see paper)
32% identity, 48% coverage

H16_A0792 prephenate dehydratase, Chorismate mutase from Ralstonia eutropha H16
H16_A0792 prephenate dehydratase from Cupriavidus necator H16
57% identity, 12% coverage

TyrAAT1 / Q944B6 arogenate dehydrogenase (EC 1.3.1.78) from Arabidopsis thaliana (see paper)
TYRA1_ARATH / Q944B6 Arogenate dehydrogenase 1, chloroplastic; TYRATC; TyrAAT1; EC 1.3.1.78 from Arabidopsis thaliana (Mouse-ear cress) (see 3 papers)
Q944B6 arogenate dehydrogenase (NADP+) (EC 1.3.1.78) from Arabidopsis thaliana (see 2 papers)
AT5G34930 arogenate dehydrogenase from Arabidopsis thaliana
27% identity, 24% coverage

NP_001331736 arogenate dehydrogenase from Arabidopsis thaliana
27% identity, 23% coverage

Ga0059261_2298 prephenate and/or arogenate dehydrogenase (EC 1.3.1.13) from Sphingomonas koreensis DSMZ 15582
32% identity, 41% coverage

E1R5M5 arogenate dehydrogenase [NAD(P)+] (EC 1.3.1.79) from Sediminispirochaeta smaragdinae (see paper)
28% identity, 42% coverage

SCO2019 chorismate mutase from Streptomyces coelicolor A3(2)
38% identity, 21% coverage

ACIAD2222 bifunctional protein [Includes: putative prephenate or cyclohexadienyl dehydrogenase; 3-phosphoshikimate 1-carboxyvinyltransferase (5-enolpyruvylshikimate-3-phosphate synthase) (EPSP synthase) (EPSPS) (AroA)] from Acinetobacter sp. ADP1
26% identity, 27% coverage

Q74NC4 prephenate dehydratase (EC 4.2.1.51) from Nanoarchaeum equitans (see paper)
NEQ192 NEQ192 from Nanoarchaeum equitans Kin4-M
23% identity, 39% coverage

LOC100284089 arogenate dehydrogenase from Zea mays
29% identity, 39% coverage

M271_36305 chorismate mutase from Streptomyces rapamycinicus NRRL 5491
37% identity, 21% coverage

ELZ14_13330 isochorismate lyase from Pseudomonas brassicacearum
40% identity, 20% coverage

SXYL_01513 prephenate dehydrogenase from Staphylococcus xylosus
22% identity, 56% coverage

PFLU_1772 chorismate mutase from Pseudomonas [fluorescens] SBW25
27% identity, 23% coverage

Afu2g10450 prephenate dehydrogenase from Aspergillus fumigatus Af293
21% identity, 53% coverage

BT3933 prephenate dehydrogenase (EC 1.3.1.13) from Bacteroides thetaiotaomicron VPI-5482
24% identity, 44% coverage

O67085 Bifunctional chorismate mutase/prephenate dehydratase from Aquifex aeolicus (strain VF5)
47% identity, 14% coverage

Echvi_0125 prephenate dehydrogenase from Echinicola vietnamensis DSM 17526
23% identity, 48% coverage

CE140_03015 isochorismate lyase from Pseudomonas thivervalensis
42% identity, 17% coverage

D820_RS07095, SMU_531 chorismate mutase from Streptococcus mutans ATCC 25175
37% identity, 21% coverage

TK0259 prephenate dehydrogenase from Thermococcus kodakaraensis KOD1
24% identity, 45% coverage

O25931 Prephenate dehydrogenase (TyrA) from Helicobacter pylori (strain ATCC 700392 / 26695)
HP1380 prephenate dehydrogenase (tyrA) from Helicobacter pylori 26695
20% identity, 56% coverage

Q5Z9H5 Os06g0708832 protein from Oryza sativa subsp. japonica
23% identity, 58% coverage

Q0PBJ3 Bifunctional chorismate mutase/prephenate dehydratase from Campylobacter jejuni subsp. jejuni serotype O:2 (strain ATCC 700819 / NCTC 11168)
34% identity, 25% coverage

AroDH-1 / B4FY98 arogenate dehydrogenase 1 (EC 1.3.1.43) from Zea mays (see paper)
25% identity, 58% coverage

DET0461 chorismate mutase/prephenate dehydratase from Dehalococcoides ethenogenes 195
36% identity, 20% coverage

Ssal_00456 chorismate mutase from Streptococcus salivarius 57.I
38% identity, 20% coverage

CHMU_METJA / Q57696 Chorismate mutase; CM; Monofunctional chorismate mutase AroQ(f); EC 5.4.99.5 from Methanocaldococcus jannaschii (strain ATCC 43067 / DSM 2661 / JAL-1 / JCM 10045 / NBRC 100440) (Methanococcus jannaschii) (see paper)
39% identity, 16% coverage

SMc03858 PUTATIVE CHORISMATE MUTASE PROTEIN from Sinorhizobium meliloti 1021
33% identity, 21% coverage

Ddes_1346 chorismate mutase related enzyme from Desulfovibrio desulfuricans subsp. desulfuricans str. ATCC 27774
36% identity, 20% coverage

B488_11240 prephenate/arogenate dehydrogenase family protein from Liberibacter crescens BT-1
23% identity, 59% coverage

Ddes_0336 chorismate mutase from Desulfovibrio desulfuricans subsp. desulfuricans str. ATCC 27774
40% identity, 19% coverage

Fisuc_2558 Chorismate mutase from Fibrobacter succinogenes subsp. succinogenes S85
32% identity, 22% coverage

GSU2608 chorismate mutase/prephenate dehydratase from Geobacter sulfurreducens PCA
39% identity, 22% coverage

ZMO0563 chorismate mutase from Zymomonas mobilis subsp. mobilis ZM4
37% identity, 21% coverage

plu3564 No description from Photorhabdus luminescens subsp. laumondii TTO1
26% identity, 23% coverage

PGN_1053 putative phospho-2-dehydro-3-deoxyheptonate aldolase/chorismate mutase from Porphyromonas gingivalis ATCC 33277
PG0885 phospho-2-dehydro-3-deoxyheptonate aldolase/chorismate mutase from Porphyromonas gingivalis W83
30% identity, 21% coverage

str1594 hypothetical protein from Streptococcus thermophilus CNRZ1066
36% identity, 20% coverage

New Search

For advice on how to use these tools together, see Interactive tools for functional annotation of bacterial genomes.

Statistics

The PaperBLAST database links 793,807 different protein sequences to 1,259,118 scientific articles. Searches against EuropePMC were last performed on March 13 2025.

How It Works

PaperBLAST builds a database of protein sequences that are linked to scientific articles. These links come from automated text searches against the articles in EuropePMC and from manually-curated information from GeneRIF, UniProtKB/Swiss-Prot, BRENDA, CAZy (as made available by dbCAN), BioLiP, CharProtDB, MetaCyc, EcoCyc, TCDB, REBASE, the Fitness Browser, and a subset of the European Nucleotide Archive with the /experiment tag. Given this database and a protein sequence query, PaperBLAST uses protein-protein BLAST to find similar sequences with E < 0.001.

To build the database, we query EuropePMC with locus tags, with RefSeq protein identifiers, and with UniProt accessions. We obtain the locus tags from RefSeq or from MicrobesOnline. We use queries of the form "locus_tag AND genus_name" to try to ensure that the paper is actually discussing that gene. Because EuropePMC indexes most recent biomedical papers, even if they are not open access, some of the links may be to papers that you cannot read or that our computers cannot read. We query each of these identifiers that appears in the open access part of EuropePMC, as well as every locus tag that appears in the 500 most-referenced genomes, so that a gene may appear in the PaperBLAST results even though none of the papers that mention it are open access. We also incorporate text-mined links from EuropePMC that link open access articles to UniProt or RefSeq identifiers. (This yields some additional links because EuropePMC uses different heuristics for their text mining than we do.)

For every article that mentions a locus tag, a RefSeq protein identifier, or a UniProt accession, we try to select one or two snippets of text that refer to the protein. If we cannot get access to the full text, we try to select a snippet from the abstract, but unfortunately, unique identifiers such as locus tags are rarely provided in abstracts.

PaperBLAST also incorporates manually-curated protein functions:

Except for GeneRIF and ENA, the curated entries include a short curated description of the protein's function. For entries from BioLiP, the protein's function may not be known beyond binding to the ligand. Many of these entries also link to articles in PubMed.

For more information see the PaperBLAST paper (mSystems 2017) or the code. You can download PaperBLAST's database here.

Changes to PaperBLAST since the paper was written:

Many of these changes are described in Interactive tools for functional annotation of bacterial genomes.

Secrets

PaperBLAST cannot provide snippets for many of the papers that are published in non-open-access journals. This limitation applies even if the paper is marked as "free" on the publisher's web site and is available in PubmedCentral or EuropePMC. If a journal that you publish in is marked as "secret," please consider publishing elsewhere.

Omissions from the PaperBLAST Database

Many important articles are missing from PaperBLAST, either because the article's full text is not in EuropePMC (as for many older articles), or because the paper does not mention a protein identifier such as a locus tag, or because of PaperBLAST's heuristics. If you notice an article that characterizes a protein's function but is missing from PaperBLAST, please notify the curators at UniProt or add an entry to GeneRIF. Entries in either of these databases will eventually be incorporated into PaperBLAST. Note that to add an entry to UniProt, you will need to find the UniProt identifier for the protein. If the protein is not already in UniProt, you can ask them to create an entry. To add an entry to GeneRIF, you will need an NCBI Gene identifier, but unfortunately many prokaryotic proteins in RefSeq do not have corresponding Gene identifers.

References

PaperBLAST: Text-mining papers for information about homologs.
M. N. Price and A. P. Arkin (2017). mSystems, 10.1128/mSystems.00039-17.

Europe PMC in 2017.
M. Levchenko et al (2017). Nucleic Acids Research, 10.1093/nar/gkx1005.

Gene indexing: characterization and analysis of NLM's GeneRIFs.
J. A. Mitchell et al (2003). AMIA Annu Symp Proc 2003:460-464.

UniProt: the universal protein knowledgebase.
The UniProt Consortium (2016). Nucleic Acids Research, 10.1093/nar/gkw1099.

BRENDA in 2017: new perspectives and new tools in BRENDA.
S. Placzek et al (2017). Nucleic Acids Research, 10.1093/nar/gkw952.

The EcoCyc database: reflecting new knowledge about Escherichia coli K-12.
I. M. Keeseler et al (2016). Nucleic Acids Research, 10.1093/nar/gkw1003.

The MetaCyc database of metabolic pathways and enzymes.
R. Caspi et al (2018). Nucleic Acids Research, 10.1093/nar/gkx935.

CharProtDB: a database of experimentally characterized protein annotations.
R. Madupu et al (2012). Nucleic Acids Research, 10.1093/nar/gkr1133.

The carbohydrate-active enzymes database (CAZy) in 2013.
V. Lombard et al (2014). Nucleic Acids Research, 10.1093/nar/gkt1178.

The Transporter Classification Database (TCDB): recent advances
M. H. Saier, Jr. et al (2016). Nucleic Acids Research, 10.1093/nar/gkv1103.

REBASE - a database for DNA restriction and modification: enzymes, genes and genomes.
R. J. Roberts et al (2015). Nucleic Acids Research, 10.1093/nar/gku1046.

Deep annotation of protein function across diverse bacteria from mutant phenotypes.
M. N. Price et al (2016). bioRxiv, 10.1101/072470.

by Morgan Price, Arkin group
Lawrence Berkeley National Laboratory