PaperBLAST – Find papers about a protein or its homologs

 

PaperBLAST

PaperBLAST Hits for BRENDA::P29976 3-deoxy-7-phosphoheptulonate synthase (EC 2.5.1.54) (Arabidopsis thaliana) (525 a.a., MALSNASSLS...)

Other sequence analysis tools:

Find functional residues: SitesBLAST

Search for conserved domains

Find the best match in UniProt

Compare to protein structures

Predict transmenbrane helices: Phobius

Predict protein localization: PSORTb

Find homologs in fast.genomics

Fitness BLAST: loading...

Found 87 similar proteins in the literature:

P29976 3-deoxy-7-phosphoheptulonate synthase (EC 2.5.1.54) from Arabidopsis thaliana (see paper)
AT4G39980 DHS1 (3-DEOXY-D-ARABINO-HEPTULOSONATE 7-PHOSPHATE SYNTHASE 1); 3-deoxy-7-phosphoheptulonate synthase from Arabidopsis thaliana
100% identity, 100% coverage

O24046 Phospho-2-dehydro-3-deoxyheptonate aldolase from Morinda citrifolia
78% identity, 98% coverage

I1JGU8 Phospho-2-dehydro-3-deoxyheptonate aldolase from Glycine max
77% identity, 98% coverage

LOC123205490 phospho-2-dehydro-3-deoxyheptonate aldolase 1, chloroplastic-like from Mangifera indica
78% identity, 96% coverage

A0A0D5ZBC4 3-deoxy-7-phosphoheptulonate synthase (EC 2.5.1.54) from Gossypium hirsutum (see paper)
75% identity, 97% coverage

M1BC24 Phospho-2-dehydro-3-deoxyheptonate aldolase from Solanum tuberosum
80% identity, 95% coverage

F6H0X2 Phospho-2-dehydro-3-deoxyheptonate aldolase from Vitis vinifera
75% identity, 95% coverage

O24051 Phospho-2-dehydro-3-deoxyheptonate aldolase from Morinda citrifolia
78% identity, 93% coverage

Q9SK84 Phospho-2-dehydro-3-deoxyheptonate aldolase from Arabidopsis thaliana
AT1G22410 2-dehydro-3-deoxyphosphoheptonate aldolase, putative / 3-deoxy-D-arabino-heptulosonate 7-phosphate synthase, putative / DAHP synthetase, putative from Arabidopsis thaliana
77% identity, 99% coverage

Q75LR2 Phospho-2-dehydro-3-deoxyheptonate aldolase 1, chloroplastic from Oryza sativa subsp. japonica
74% identity, 94% coverage

Q0D4J5 Phospho-2-dehydro-3-deoxyheptonate aldolase from Oryza sativa subsp. japonica
80% identity, 88% coverage

AROG1_PETHY / A0A067XH53 Phospho-2-dehydro-3-deoxyheptonate aldolase 1, chloroplastic; 3-deoxy-D-arabino-heptulosonate 7-phosphate synthase 1; DAHP synthase 1; PhDAHP1; Phospho-2-keto-3-deoxyheptonate aldolase 1; EC 2.5.1.54 from Petunia hybrida (Petunia) (see 2 papers)
75% identity, 98% coverage

LOC107867463 phospho-2-dehydro-3-deoxyheptonate aldolase 1, chloroplastic from Capsicum annuum
79% identity, 90% coverage

O22407 Phospho-2-dehydro-3-deoxyheptonate aldolase from Petroselinum crispum
77% identity, 88% coverage

Q75W16 Phospho-2-dehydro-3-deoxyheptonate aldolase 2, chloroplastic from Oryza sativa subsp. japonica
80% identity, 88% coverage

P21357 Phospho-2-dehydro-3-deoxyheptonate aldolase 1, chloroplastic from Solanum tuberosum
84% identity, 84% coverage

AT4G33510 DHS2 (3-deoxy-d-arabino-heptulosonate 7-phosphate synthase); 3-deoxy-7-phosphoheptulonate synthase from Arabidopsis thaliana
Q00218 Phospho-2-dehydro-3-deoxyheptonate aldolase 2, chloroplastic from Arabidopsis thaliana
79% identity, 92% coverage

Sb01g033590 No description from Sorghum bicolor
85% identity, 81% coverage

A0A3Q7H097 Phospho-2-dehydro-3-deoxyheptonate aldolase from Solanum lycopersicum
78% identity, 89% coverage

Sb02g039660 No description from Sorghum bicolor
77% identity, 89% coverage

F2D2N1 Phospho-2-dehydro-3-deoxyheptonate aldolase from Hordeum vulgare subsp. vulgare
84% identity, 85% coverage

GRMZM2G365160 uncharacterized protein LOC100501272 from Zea mays
76% identity, 90% coverage

AROG2_PETHY / A0A067XGX8 Phospho-2-dehydro-3-deoxyheptonate aldolase 2, chloroplastic; 3-deoxy-D-arabino-heptulosonate 7-phosphate synthase 2; DAHP synthase 2; PhDAHP2; Phospho-2-keto-3-deoxyheptonate aldolase 2; EC 2.5.1.54 from Petunia hybrida (Petunia) (see paper)
78% identity, 92% coverage

D0VBC1 Phospho-2-dehydro-3-deoxyheptonate aldolase from Vitis vinifera
80% identity, 88% coverage

A0A0A0L679 Phospho-2-dehydro-3-deoxyheptonate aldolase from Cucumis sativus
79% identity, 86% coverage

GRMZM2G396212 uncharacterized protein LOC100274492 from Zea mays
74% identity, 86% coverage

PA2843 probable aldolase from Pseudomonas aeruginosa PAO1
62% identity, 85% coverage

5uxmA / Q9I000 Type ii dah7ps from pseudomonas aeruginosa with trp bound (see paper)
62% identity, 85% coverage

B7FRJ9 Phospho-2-dehydro-3-deoxyheptonate aldolase from Phaeodactylum tricornutum (strain CCAP 1055/1)
59% identity, 86% coverage

PflSS101_1729 class II 3-deoxy-7-phosphoheptulonate synthase from Pseudomonas lactis
60% identity, 84% coverage

PP1866 phospho-2-dehydro-3-deoxyheptonate aldolase, class II from Pseudomonas putida KT2440
61% identity, 85% coverage

XP_024399688 phospho-2-dehydro-3-deoxyheptonate aldolase 2, chloroplastic-like isoform X3 from Physcomitrium patens
67% identity, 78% coverage

Q4K8T7 Phospho-2-dehydro-3-deoxyheptonate aldolase from Pseudomonas fluorescens (strain ATCC BAA-477 / NRRL B-23932 / Pf-5)
60% identity, 84% coverage

M217_RS0108615 class II 3-deoxy-7-phosphoheptulonate synthase from Pseudomonas chlororaphis HT66
60% identity, 84% coverage

CC2300 phospho-2-dehydro-3-deoxyheptonate aldolase, class II from Caulobacter crescentus CB15
57% identity, 86% coverage

BMEI0971 PHOSPHO-2-DEHYDRO-3-DEOXYHEPTONATE ALDOLASE from Brucella melitensis 16M
61% identity, 85% coverage

BruAb1_1018 Dhs, phospho-2-dehydro-3-deoxyheptonate aldolase, class II from Brucella abortus biovar 1 str. 9-941
61% identity, 85% coverage

jhp0122 PHOSPHO-2-DEHYDRO-3-DEOXYHEPTONATE ALDOLASE from Helicobacter pylori J99
Q9ZMU5 Phospho-2-dehydro-3-deoxyheptonate aldolase from Helicobacter pylori (strain J99 / ATCC 700824)
55% identity, 85% coverage

HP0134 3-deoxy-D-arabino-heptulosonate 7-phosphate synthase (dhs1) from Helicobacter pylori 26695
O24947 Phospho-2-dehydro-3-deoxyheptonate aldolase from Helicobacter pylori (strain ATCC 700392 / 26695)
55% identity, 84% coverage

F4JIZ3 Phospho-2-dehydro-3-deoxyheptonate aldolase from Arabidopsis thaliana
76% identity, 62% coverage

Cj0716 putative phospho-2-dehydro-3-deoxyheptonate aldolase from Campylobacter jejuni subsp. jejuni NCTC 11168
54% identity, 85% coverage

D6A8C0 Phospho-2-dehydro-3-deoxyheptonate aldolase from Streptomyces viridosporus (strain ATCC 14672 / DSM 40746 / JCM 4963 / KCTC 9882 / NRRL B-12104 / FH 1290)
54% identity, 84% coverage

CF54_24340 class II 3-deoxy-7-phosphoheptulonate synthase from Streptomyces sp. Tu 6176
54% identity, 84% coverage

P80574 Phospho-2-dehydro-3-deoxyheptonate aldolase from Streptomyces coelicolor (strain ATCC BAA-471 / A3(2) / M145)
SCO2115 2-dehydro-3-deoxyphosphoheptonate aldolase from Streptomyces coelicolor A3(2)
54% identity, 84% coverage

WP_031081129 class II 3-deoxy-7-phosphoheptulonate synthase from Streptomyces sp. NRRL WC-3549
54% identity, 84% coverage

DT87_05590 class II 3-deoxy-7-phosphoheptulonate synthase from Streptomyces sp. NTK 937
53% identity, 84% coverage

Q6YH16 Phospho-2-dehydro-3-deoxyheptonate aldolase (Fragment) from Vitis vinifera
86% identity, 49% coverage

B488_07360, B488_RS03555 3-deoxy-7-phosphoheptulonate synthase class II from Liberibacter crescens BT-1
49% identity, 85% coverage

SACE_2874 phospho-2-dehydro-3-deoxyheptonate aldolase from Saccharopolyspora erythraea NRRL 2338
52% identity, 83% coverage

WP_118914924 class II 3-deoxy-7-phosphoheptulonate synthase from Dermacoccus abyssi
50% identity, 83% coverage

ZMO0187 3-deoxy-7-phosphoheptulonate synthase from Zymomonas mobilis subsp. mobilis ZM4
48% identity, 83% coverage

SCO3210 2-dehydro-3-deoxyheptonate aldolase from Streptomyces coelicolor A3(2)
46% identity, 92% coverage

SACE_1708 phospho-2-dehydro-3-deoxyheptonate aldolase from Saccharopolyspora erythraea NRRL 2338
49% identity, 82% coverage

O68903 3-deoxy-7-phosphoheptulonate synthase (EC 2.5.1.54) from Actinosynnema pretiosum subsp. auranticum (see paper)
49% identity, 82% coverage

PAAG_03237 phospho-2-dehydro-3-deoxyheptonate aldolase from Paracoccidioides lutzii Pb01
48% identity, 84% coverage

MMAR_1854 3-deoxy-D-arabino-heptulosonate 7-phosphate synthase AroG_1 from Mycobacterium marinum M
49% identity, 82% coverage

MMAR_3222 3-deoxy-D-arabino-heptulosonate 7-phosphate synthase AroG from Mycobacterium marinum M
49% identity, 82% coverage

AROG_MYCTU / O53512 Phospho-2-dehydro-3-deoxyheptonate aldolase AroG; 3-deoxy-D-arabino-heptulosonate 7-phosphate synthase; DAHP synthase; Phospho-2-keto-3-deoxyheptonate aldolase; EC 2.5.1.54 from Mycobacterium tuberculosis (strain ATCC 25618 / H37Rv) (see 2 papers)
O53512 3-deoxy-7-phosphoheptulonate synthase (EC 2.5.1.54) from Mycobacterium tuberculosis (see 3 papers)
Rv2178c Probable 3-deoxy-D-arabino-heptulosonate 7-phosphate synthase AroG (DAHP synthetase, phenylalanine-repressible) from Mycobacterium tuberculosis H37Rv
NP_216694 phospho-2-dehydro-3-deoxyheptonate aldolase AroG from Mycobacterium tuberculosis H37Rv
48% identity, 82% coverage

MUL_3533 3-deoxy-D-arabino-heptulosonate 7-phosphate synthase AroG_1 from Mycobacterium ulcerans Agy99
49% identity, 82% coverage

3nv8B / O53512 The structure of 3-deoxy-d-arabino-heptulosonate 7-phosphate synthase in complex with phosphoenol pyruvate and manganese (thesit-free) (see paper)
48% identity, 82% coverage

B1MP18 Phospho-2-dehydro-3-deoxyheptonate aldolase from Mycobacteroides abscessus (strain ATCC 19977 / DSM 44196 / CCUG 20993 / CIP 104536 / JCM 13569 / NCTC 13031 / TMC 1543 / L948)
MAB_1987 Probable 3-deoxy-D-arabino-heptulosonate 7-phosphate synthase AroG from Mycobacterium abscessus ATCC 19977
48% identity, 82% coverage

MSMEG_4244 3-deoxy-7-phosphoheptulonate synthase from Mycobacterium smegmatis str. MC2 155
48% identity, 82% coverage

PADG_03114 3-deoxy-7-phosphoheptulonate synthase from Paracoccidioides brasiliensis Pb18
47% identity, 84% coverage

5hudD / Q8NNL5 Non-covalent complex of and dahp synthase and chorismate mutase from corynebacterium glutamicum with bound transition state analog (see paper)
48% identity, 82% coverage

NCgl2098 class II 3-deoxy-7-phosphoheptulonate synthase from Corynebacterium glutamicum ATCC 13032
cg2391 phospho-2-dehydro-3-deoxyheptonate aldolase from Corynebacterium glutamicum ATCC 13032
48% identity, 82% coverage

DIP1616 class II 3-deoxy-7-phosphoheptulonate synthase from Corynebacterium diphtheriae NCTC 13129
49% identity, 82% coverage

Pc18g02920 uncharacterized protein from Penicillium rubens
47% identity, 84% coverage

MicB006_3510 class II 3-deoxy-7-phosphoheptulonate synthase from Micromonospora sp. B006
47% identity, 83% coverage

DKG71_31600 3-deoxy-7-phosphoheptulonate synthase class II from Streptomyces sp. NEAU-S7GS2
44% identity, 81% coverage

Sare_1254 3-deoxy-7-phosphoheptulonate synthase from Salinispora arenicola CNS205
40% identity, 88% coverage

N0CZ35 Phospho-2-dehydro-3-deoxyheptonate aldolase from Streptomyces microflavus DSM 40593
38% identity, 81% coverage

MicB006_2892 3-deoxy-7-phosphoheptulonate synthase from Micromonospora sp. B006
36% identity, 80% coverage

DT87_23865 3-deoxy-7-phosphoheptulonate synthase from Streptomyces sp. NTK 937
37% identity, 81% coverage

G3XCJ9 3-deoxy-7-phosphoheptulonate synthase (EC 2.5.1.54) from Pseudomonas aeruginosa (see paper)
PA4212 phenazine biosynthesis protein PhzC from Pseudomonas aeruginosa PAO1
PA1901 phenazine biosynthesis protein PhzC from Pseudomonas aeruginosa PAO1
38% identity, 80% coverage

PA14_39945 phenazine biosynthesis protein PhzC from Pseudomonas aeruginosa UCBPP-PA14
38% identity, 80% coverage

PA14_09460 phenazine biosynthesis protein PhzC from Pseudomonas aeruginosa UCBPP-PA14
38% identity, 80% coverage

6bmcA / G3XCJ9 The structure of a dimeric type ii dah7ps associated with pyocyanin biosynthesis in pseudomonas aeruginosa (see paper)
38% identity, 80% coverage

SSHG_05330 3-deoxy-7-phosphoheptulonate synthase from Streptomyces albidoflavus
36% identity, 86% coverage

B1MFJ4 Phospho-2-dehydro-3-deoxyheptonate aldolase from Mycobacteroides abscessus (strain ATCC 19977 / DSM 44196 / CCUG 20993 / CIP 104536 / JCM 13569 / NCTC 13031 / TMC 1543 / L948)
MAB_0295 Putative phenazine biosynthesis protein PhzC from Mycobacterium abscessus ATCC 19977
36% identity, 81% coverage

lpg0063 phospho-2-dehydro-3-deoxyheptonate aldolase from Legionella pneumophila subsp. pneumophila str. Philadelphia 1
34% identity, 81% coverage

DT87_29890 3-deoxy-7-phosphoheptulonate synthase from Streptomyces sp. NTK 937
37% identity, 81% coverage

M217_RS0112890 3-deoxy-7-phosphoheptulonate synthase from Pseudomonas chlororaphis HT66
34% identity, 80% coverage

Pchl3084_4953 3-deoxy-7-phosphoheptulonate synthase from Pseudomonas chlororaphis subsp. aureofaciens 30-84
34% identity, 80% coverage

CXP47_RS25520 3-deoxy-7-phosphoheptulonate synthase from Pseudomonas chlororaphis
34% identity, 80% coverage

WP_028443630 3-deoxy-7-phosphoheptulonate synthase from Streptomyces sp. SID4912
37% identity, 81% coverage

EY04_RS25725 3-deoxy-7-phosphoheptulonate synthase from Pseudomonas chlororaphis
34% identity, 80% coverage

StrepF001_25935 3-deoxy-7-phosphoheptulonate synthase from Streptomyces sp. F001
37% identity, 80% coverage

New Search

For advice on how to use these tools together, see Interactive tools for functional annotation of bacterial genomes.

Statistics

The PaperBLAST database links 789,361 different protein sequences to 1,256,019 scientific articles. Searches against EuropePMC were last performed on January 10 2025.

How It Works

PaperBLAST builds a database of protein sequences that are linked to scientific articles. These links come from automated text searches against the articles in EuropePMC and from manually-curated information from GeneRIF, UniProtKB/Swiss-Prot, BRENDA, CAZy (as made available by dbCAN), BioLiP, CharProtDB, MetaCyc, EcoCyc, TCDB, REBASE, the Fitness Browser, and a subset of the European Nucleotide Archive with the /experiment tag. Given this database and a protein sequence query, PaperBLAST uses protein-protein BLAST to find similar sequences with E < 0.001.

To build the database, we query EuropePMC with locus tags, with RefSeq protein identifiers, and with UniProt accessions. We obtain the locus tags from RefSeq or from MicrobesOnline. We use queries of the form "locus_tag AND genus_name" to try to ensure that the paper is actually discussing that gene. Because EuropePMC indexes most recent biomedical papers, even if they are not open access, some of the links may be to papers that you cannot read or that our computers cannot read. We query each of these identifiers that appears in the open access part of EuropePMC, as well as every locus tag that appears in the 500 most-referenced genomes, so that a gene may appear in the PaperBLAST results even though none of the papers that mention it are open access. We also incorporate text-mined links from EuropePMC that link open access articles to UniProt or RefSeq identifiers. (This yields some additional links because EuropePMC uses different heuristics for their text mining than we do.)

For every article that mentions a locus tag, a RefSeq protein identifier, or a UniProt accession, we try to select one or two snippets of text that refer to the protein. If we cannot get access to the full text, we try to select a snippet from the abstract, but unfortunately, unique identifiers such as locus tags are rarely provided in abstracts.

PaperBLAST also incorporates manually-curated protein functions:

Except for GeneRIF and ENA, the curated entries include a short curated description of the protein's function. For entries from BioLiP, the protein's function may not be known beyond binding to the ligand. Many of these entries also link to articles in PubMed.

For more information see the PaperBLAST paper (mSystems 2017) or the code. You can download PaperBLAST's database here.

Changes to PaperBLAST since the paper was written:

Many of these changes are described in Interactive tools for functional annotation of bacterial genomes.

Secrets

PaperBLAST cannot provide snippets for many of the papers that are published in non-open-access journals. This limitation applies even if the paper is marked as "free" on the publisher's web site and is available in PubmedCentral or EuropePMC. If a journal that you publish in is marked as "secret," please consider publishing elsewhere.

Omissions from the PaperBLAST Database

Many important articles are missing from PaperBLAST, either because the article's full text is not in EuropePMC (as for many older articles), or because the paper does not mention a protein identifier such as a locus tag, or because of PaperBLAST's heuristics. If you notice an article that characterizes a protein's function but is missing from PaperBLAST, please notify the curators at UniProt or add an entry to GeneRIF. Entries in either of these databases will eventually be incorporated into PaperBLAST. Note that to add an entry to UniProt, you will need to find the UniProt identifier for the protein. If the protein is not already in UniProt, you can ask them to create an entry. To add an entry to GeneRIF, you will need an NCBI Gene identifier, but unfortunately many prokaryotic proteins in RefSeq do not have corresponding Gene identifers.

References

PaperBLAST: Text-mining papers for information about homologs.
M. N. Price and A. P. Arkin (2017). mSystems, 10.1128/mSystems.00039-17.

Europe PMC in 2017.
M. Levchenko et al (2017). Nucleic Acids Research, 10.1093/nar/gkx1005.

Gene indexing: characterization and analysis of NLM's GeneRIFs.
J. A. Mitchell et al (2003). AMIA Annu Symp Proc 2003:460-464.

UniProt: the universal protein knowledgebase.
The UniProt Consortium (2016). Nucleic Acids Research, 10.1093/nar/gkw1099.

BRENDA in 2017: new perspectives and new tools in BRENDA.
S. Placzek et al (2017). Nucleic Acids Research, 10.1093/nar/gkw952.

The EcoCyc database: reflecting new knowledge about Escherichia coli K-12.
I. M. Keeseler et al (2016). Nucleic Acids Research, 10.1093/nar/gkw1003.

The MetaCyc database of metabolic pathways and enzymes.
R. Caspi et al (2018). Nucleic Acids Research, 10.1093/nar/gkx935.

CharProtDB: a database of experimentally characterized protein annotations.
R. Madupu et al (2012). Nucleic Acids Research, 10.1093/nar/gkr1133.

The carbohydrate-active enzymes database (CAZy) in 2013.
V. Lombard et al (2014). Nucleic Acids Research, 10.1093/nar/gkt1178.

The Transporter Classification Database (TCDB): recent advances
M. H. Saier, Jr. et al (2016). Nucleic Acids Research, 10.1093/nar/gkv1103.

REBASE - a database for DNA restriction and modification: enzymes, genes and genomes.
R. J. Roberts et al (2015). Nucleic Acids Research, 10.1093/nar/gku1046.

Deep annotation of protein function across diverse bacteria from mutant phenotypes.
M. N. Price et al (2016). bioRxiv, 10.1101/072470.

by Morgan Price, Arkin group
Lawrence Berkeley National Laboratory