PaperBLAST – Find papers about a protein or its homologs

 

PaperBLAST

PaperBLAST Hits for Pf6N2E2_2394 (75 a.a., MTSVFDREDI...)

Other sequence analysis tools:

Find functional residues: SitesBLAST

Search for conserved domains

Find the best match in UniProt

Compare to protein structures

Predict transmenbrane helices: Phobius

Predict protein localization: PSORTb

Find homologs in fast.genomics

Fitness BLAST: loading...

Found 82 similar proteins in the literature:

AL066_04890 MbtH family protein from Pseudomonas nunensis
92% identity, 100% coverage

PFL_4178 MbtH-like protein-related protein from Pseudomonas fluorescens Pf-5
89% identity, 99% coverage

Avin_25630 MbtH-like protein from Azotobacter vinelandii AvOP
89% identity, 89% coverage

PP3808, PP_3808 conserved hypothetical protein from Pseudomonas putida KT2440
85% identity, 96% coverage

jpw_15990 MbtH family protein from Pseudomonas asiatica
85% identity, 96% coverage

JNO42_04010 MbtH family protein from Pseudomonas putida
83% identity, 96% coverage

PA14_33510 putative MbtH-like protein from Pseudomonas aeruginosa UCBPP-PA14
NP_251102 hypothetical protein from Pseudomonas aeruginosa PAO1
Q9I169 MbtH-like domain-containing protein from Pseudomonas aeruginosa (strain ATCC 15692 / DSM 22644 / CIP 104116 / JCM 14847 / LMG 12228 / 1C / PRS 101 / PAO1)
PA2412 hypothetical protein from Pseudomonas aeruginosa PAO1
78% identity, 96% coverage

Hsero_2339 MbtH family protein from Herbaspirillum seropedicae SmR1
69% identity, 93% coverage

HW44_RS01565 MbtH family protein from Nitrosococcus oceani
64% identity, 78% coverage

BCAL1689 MbtH-like protein from Burkholderia cenocepacia J2315
59% identity, 89% coverage

MXAN_3118 MbtH-like domain protein from Myxococcus xanthus DK 1622
MXAN_RS15115, WP_011553168 MbtH family protein from Myxococcus xanthus DZ2
69% identity, 85% coverage

AA671_12395 MbtH family protein from Delftia tsuruhatensis
61% identity, 78% coverage

Atu3678 putative siderophore biosynthesis protein from Agrobacterium tumefaciens str. C58 (Cereon)
58% identity, 91% coverage

Avin_50380 MbtH-like protein from Azotobacter vinelandii AvOP
59% identity, 84% coverage

Achr_39010 MbtH family protein from Azotobacter chroococcum NCIMB 8003
54% identity, 87% coverage

A8CF84 KtzJ from Kutzneria sp. (strain 744)
56% identity, 84% coverage

BTH_I2426 mbtH-like protein from Burkholderia thailandensis E264
65% identity, 63% coverage

XALc_1065 putative mbth-like protein from Xanthomonas albilineans
52% identity, 100% coverage

BPSL1786 conserved hypothetical protein from Burkholderia pseudomallei K96243
65% identity, 65% coverage

W7IRY5 MbtH protein from Actinokineospora spheciospongiae
59% identity, 84% coverage

AORI_1481 MbtH family protein from Amycolatopsis keratiniphila
57% identity, 84% coverage

AWZ11_RS05060 MbtH family protein from Streptomyces europaeiscabiei
56% identity, 81% coverage

YP_640626 MbtH-like protein from Mycobacterium sp. MCS
53% identity, 88% coverage

Q93N85 MbtH-like domain-containing protein from Streptomyces lavendulae
47% identity, 97% coverage

Q333U6 MbtH homologue from Micromonospora sp. ML1
56% identity, 84% coverage

STRAU_RS01625 MbtH family protein from Streptomyces aurantiacus
52% identity, 88% coverage

SCLAV_p1293 MbtH family protein from Streptomyces clavuligerus
58% identity, 83% coverage

Q9F8V3 CouY from Streptomyces rishiriensis
45% identity, 92% coverage

ADK37_26530 MbtH family protein from Streptomyces resistomycificus
52% identity, 88% coverage

RHA1_ro04717 conserved hypothetical protein, MbtH family from Rhodococcus sp. RHA1
47% identity, 89% coverage

WP_109379533 MbtH family protein from Streptomyces sp. NWU339
52% identity, 88% coverage

Q0X0B7 MbtH family protein from Streptomyces lasalocidi
51% identity, 89% coverage

Q70AZ5 MbtH-like short polypeptide from Actinoplanes teichomyceticus
45% identity, 92% coverage

MSMEG_4508 hypothetical protein from Mycobacterium smegmatis str. MC2 155
47% identity, 88% coverage

Q939Y8 MbtH-like domain-containing protein from Amycolatopsis balhimycina
52% identity, 84% coverage

SCAB_3331 MbtH-like protein from Streptomyces scabiei 87.22
46% identity, 92% coverage

Q3L893 Conserved domain protein from Mycolicibacterium smegmatis (strain ATCC 700084 / mc(2)155)
MSMEG_0399 hypothetical protein from Mycobacterium smegmatis str. MC2 155
46% identity, 89% coverage

XNR_3456 MbtH family protein from Streptomyces albidoflavus
42% identity, 87% coverage

DMB42_RS42820 MbtH family protein from Nonomuraea sp. WAC 01424
51% identity, 84% coverage

BCG_2391c putative protein mbtH from Mycobacterium bovis BCG str. Pasteur 1173P2
P9WIP4 Protein MbtH from Mycobacterium tuberculosis (strain CDC 1551 / Oshkosh)
45% identity, 92% coverage

SLUN_RS38480 MbtH family protein from Streptomyces lunaelactis
48% identity, 88% coverage

B2HHJ4 Conserved hypothetical MbtH-like protein from Mycobacterium marinum (strain ATCC BAA-535 / M)
48% identity, 83% coverage

SCO0489 hypothetical protein from Streptomyces coelicolor A3(2)
SLIV_35495 MbtH family protein from Streptomyces lividans TK24
48% identity, 89% coverage

mbtH / P9WIP5 putative conserved protein from Mycobacterium tuberculosis (strain ATCC 25618 / H37Rv) (see 10 papers)
NP_216893, Rv2377c PUTATIVE CONSERVED PROTEIN MBTH from Mycobacterium tuberculosis H37Rv
45% identity, 92% coverage

SCAB_85461 hypothetical protein from Streptomyces scabiei 87.22
44% identity, 91% coverage

B1MAR0 MbtH family protein from Mycobacteroides abscessus (strain ATCC 19977 / DSM 44196 / CCUG 20993 / CIP 104536 / JCM 13569 / NCTC 13031 / TMC 1543 / L948)
48% identity, 88% coverage

BASU_2822 MbtH family protein from Bacillus velezensis UCMB5113
51% identity, 87% coverage

MAB_4100c MbtH-like protein from Mycobacterium abscessus ATCC 19977
46% identity, 89% coverage

ACM01_RS10820 thaxtomin biosynthesis NRPS accessory protein TxtH from Streptomyces viridochromogenes
46% identity, 87% coverage

CGL27_RS10110 MbtH family protein from Streptomyces sp. 11-1-2
55% identity, 71% coverage

CGL27_RS02360 MbtH family protein from Streptomyces sp. 11-1-2
44% identity, 96% coverage

SACE_2692 MbtH-like protein from Saccharopolyspora erythraea NRRL 2338
46% identity, 87% coverage

MAH_2060, MAP4_2610, OCQ_31530 MbtH family protein from Mycobacterium avium subsp. paratuberculosis MAP4
49% identity, 83% coverage

RSp0640 MbtH family protein from Ralstonia pseudosolanacearum GMI1000
51% identity, 73% coverage

BV309_02880 MbtH family protein from Streptomyces clavuligerus
49% identity, 83% coverage

MAP1872c MbtH_2 from Mycobacterium avium subsp. paratuberculosis str. k10
MAH_2755 MbtH family protein from Mycobacterium avium subsp. hominissuis TH135
41% identity, 95% coverage

BPSL1726 conserved hypothetical protein from Burkholderia pseudomallei K96243
42% identity, 89% coverage

Q73XY9 MbtH_3 from Mycolicibacterium paratuberculosis (strain ATCC BAA-968 / K-10)
MAH_1634, MAH_RS08065, MAP_RS11035 MbtH family protein from Mycobacterium avium subsp. hominissuis TH135
41% identity, 84% coverage

CP977_04015 MbtH family protein from Streptomyces cinereoruber
43% identity, 87% coverage

B7HR49 MbtH-like protein from Bacillus cereus (strain AH187)
40% identity, 83% coverage

BSU31959 hypothetical protein from Bacillus subtilis subsp. subtilis str. 168
44% identity, 85% coverage

BPSS1267 putative MbtH-like protein from Burkholderia pseudomallei K96243
46% identity, 88% coverage

SCO3218 small conserved hypothetical protein from Streptomyces coelicolor A3(2)
Q7BRC2 Cda-orfX from Streptomyces coelicolor
43% identity, 89% coverage

BCE2403 mbtH-like protein from Bacillus cereus ATCC 10987
40% identity, 83% coverage

ACSP50_3047 MbtH family protein from Actinoplanes sp. SE50/110
40% identity, 84% coverage

GBAA2373 mbtH-like protein from Bacillus anthracis str. 'Ames Ancestor'
40% identity, 83% coverage

Q7N7D2 L-alanine-L-anticapsin ligase (EC 6.3.2.49) from Photorhabdus laumondii subsp. laumondii (see paper)
54% identity, 10% coverage

MSMEG_0016 hypothetical protein from Mycobacterium smegmatis str. MC2 155
39% identity, 83% coverage

BC2309 Antibiotic/siderophore biosynthesis protein from Bacillus cereus ATCC 14579
39% identity, 83% coverage

JQN84_14135 AMP-binding protein from Micromonospora humidisoli
52% identity, 4% coverage

ltxB / Q5V8A7 (-)-indolactam synthase from Lyngbya majuscula (see 4 papers)
44% identity, 14% coverage

MEG1_RS04960 phototemtide A NRPS accessory protein PttA from Photorhabdus temperata subsp. temperata Meg1
38% identity, 85% coverage

XIS1_1050018 MbtH family NRPS accessory protein from Xenorhabdus innexi
40% identity, 76% coverage

jk1783 hypothetical protein from Corynebacterium jeikeium K411
34% identity, 79% coverage

TERTU_4066 MbtH-like protein from Teredinibacter turnerae T7901
37% identity, 61% coverage

slgN1 / CBA11570.1 non-ribosomal peptide synthetase from Streptomyces lydicus (see paper)
37% identity, 11% coverage

4gr5C / D1GLU5 Crystal structure of slgn1deltaasub in complex with ampcpp (see paper)
38% identity, 12% coverage

Z0726 No description from Escherichia coli O157:H7 EDL933
29% identity, 93% coverage

c0672 hypothetical protein from Escherichia coli CFT073
27% identity, 93% coverage

UTI89_C0587 hypothetical protein from Escherichia coli UTI89
27% identity, 93% coverage

t2281 conserved hypothetical protein from Salmonella enterica subsp. enterica serovar Typhi Ty2
28% identity, 91% coverage

STM0587 putative cytoplasmic protein from Salmonella typhimurium LT2
28% identity, 91% coverage

New Search

For advice on how to use these tools together, see Interactive tools for functional annotation of bacterial genomes.

Statistics

The PaperBLAST database links 793,807 different protein sequences to 1,259,118 scientific articles. Searches against EuropePMC were last performed on March 13 2025.

How It Works

PaperBLAST builds a database of protein sequences that are linked to scientific articles. These links come from automated text searches against the articles in EuropePMC and from manually-curated information from GeneRIF, UniProtKB/Swiss-Prot, BRENDA, CAZy (as made available by dbCAN), BioLiP, CharProtDB, MetaCyc, EcoCyc, TCDB, REBASE, the Fitness Browser, and a subset of the European Nucleotide Archive with the /experiment tag. Given this database and a protein sequence query, PaperBLAST uses protein-protein BLAST to find similar sequences with E < 0.001.

To build the database, we query EuropePMC with locus tags, with RefSeq protein identifiers, and with UniProt accessions. We obtain the locus tags from RefSeq or from MicrobesOnline. We use queries of the form "locus_tag AND genus_name" to try to ensure that the paper is actually discussing that gene. Because EuropePMC indexes most recent biomedical papers, even if they are not open access, some of the links may be to papers that you cannot read or that our computers cannot read. We query each of these identifiers that appears in the open access part of EuropePMC, as well as every locus tag that appears in the 500 most-referenced genomes, so that a gene may appear in the PaperBLAST results even though none of the papers that mention it are open access. We also incorporate text-mined links from EuropePMC that link open access articles to UniProt or RefSeq identifiers. (This yields some additional links because EuropePMC uses different heuristics for their text mining than we do.)

For every article that mentions a locus tag, a RefSeq protein identifier, or a UniProt accession, we try to select one or two snippets of text that refer to the protein. If we cannot get access to the full text, we try to select a snippet from the abstract, but unfortunately, unique identifiers such as locus tags are rarely provided in abstracts.

PaperBLAST also incorporates manually-curated protein functions:

Except for GeneRIF and ENA, the curated entries include a short curated description of the protein's function. For entries from BioLiP, the protein's function may not be known beyond binding to the ligand. Many of these entries also link to articles in PubMed.

For more information see the PaperBLAST paper (mSystems 2017) or the code. You can download PaperBLAST's database here.

Changes to PaperBLAST since the paper was written:

Many of these changes are described in Interactive tools for functional annotation of bacterial genomes.

Secrets

PaperBLAST cannot provide snippets for many of the papers that are published in non-open-access journals. This limitation applies even if the paper is marked as "free" on the publisher's web site and is available in PubmedCentral or EuropePMC. If a journal that you publish in is marked as "secret," please consider publishing elsewhere.

Omissions from the PaperBLAST Database

Many important articles are missing from PaperBLAST, either because the article's full text is not in EuropePMC (as for many older articles), or because the paper does not mention a protein identifier such as a locus tag, or because of PaperBLAST's heuristics. If you notice an article that characterizes a protein's function but is missing from PaperBLAST, please notify the curators at UniProt or add an entry to GeneRIF. Entries in either of these databases will eventually be incorporated into PaperBLAST. Note that to add an entry to UniProt, you will need to find the UniProt identifier for the protein. If the protein is not already in UniProt, you can ask them to create an entry. To add an entry to GeneRIF, you will need an NCBI Gene identifier, but unfortunately many prokaryotic proteins in RefSeq do not have corresponding Gene identifers.

References

PaperBLAST: Text-mining papers for information about homologs.
M. N. Price and A. P. Arkin (2017). mSystems, 10.1128/mSystems.00039-17.

Europe PMC in 2017.
M. Levchenko et al (2017). Nucleic Acids Research, 10.1093/nar/gkx1005.

Gene indexing: characterization and analysis of NLM's GeneRIFs.
J. A. Mitchell et al (2003). AMIA Annu Symp Proc 2003:460-464.

UniProt: the universal protein knowledgebase.
The UniProt Consortium (2016). Nucleic Acids Research, 10.1093/nar/gkw1099.

BRENDA in 2017: new perspectives and new tools in BRENDA.
S. Placzek et al (2017). Nucleic Acids Research, 10.1093/nar/gkw952.

The EcoCyc database: reflecting new knowledge about Escherichia coli K-12.
I. M. Keeseler et al (2016). Nucleic Acids Research, 10.1093/nar/gkw1003.

The MetaCyc database of metabolic pathways and enzymes.
R. Caspi et al (2018). Nucleic Acids Research, 10.1093/nar/gkx935.

CharProtDB: a database of experimentally characterized protein annotations.
R. Madupu et al (2012). Nucleic Acids Research, 10.1093/nar/gkr1133.

The carbohydrate-active enzymes database (CAZy) in 2013.
V. Lombard et al (2014). Nucleic Acids Research, 10.1093/nar/gkt1178.

The Transporter Classification Database (TCDB): recent advances
M. H. Saier, Jr. et al (2016). Nucleic Acids Research, 10.1093/nar/gkv1103.

REBASE - a database for DNA restriction and modification: enzymes, genes and genomes.
R. J. Roberts et al (2015). Nucleic Acids Research, 10.1093/nar/gku1046.

Deep annotation of protein function across diverse bacteria from mutant phenotypes.
M. N. Price et al (2016). bioRxiv, 10.1101/072470.

by Morgan Price, Arkin group
Lawrence Berkeley National Laboratory