PaperBLAST – Find papers about a protein or its homologs

 

PaperBLAST

PaperBLAST Hits for Shew_2785 (85 a.a., MALLIDDSCI...)

Other sequence analysis tools:

Find functional residues: SitesBLAST

Search for conserved domains

Find the best match in UniProt

Compare to protein structures

Predict transmenbrane helices: Phobius

Predict protein localization: PSORTb

Find homologs in fast.genomics

Fitness BLAST: loading...

Found 70 similar proteins in the literature:

swp_3663 4Fe-4S ferredoxin, iron-sulfur binding from Shewanella piezotolerans WP3
82% identity, 99% coverage

NP_249053 4Fe-4S ferredoxin from Pseudomonas aeruginosa PAO1
PA0362 ferredoxin (4Fe-4S) from Pseudomonas aeruginosa PAO1
71% identity, 96% coverage

FXO12_19935 YfhL family 4Fe-4S dicluster ferredoxin from Pseudomonas sp. J380
67% identity, 96% coverage

CH_091751 ferredoxin from Pseudomonas aeruginosa (see paper)
70% identity, 95% coverage

2fgoA / Q9I6D2 Structure of the 2[4fe-4s] ferredoxin from pseudomonas aeruginosa (see paper)
70% identity, 95% coverage

PSPTO_0416 ferredoxin from Pseudomonas syringae pv. tomato str. DC3000
66% identity, 96% coverage

FER_ALLVD / P00208 Ferredoxin; 2[4Fe-4S] ferredoxin from Allochromatium vinosum (strain ATCC 17899 / DSM 180 / NBRC 103801 / NCIMB 10441 / D) (Chromatium vinosum) (see 4 papers)
fdx / GI|1518927 ferredoxin (see 3 papers)
65% identity, 96% coverage

NGO1859 putative ferredoxin from Neisseria gonorrhoeae FA 1090
67% identity, 93% coverage

MU9_1665 YfhL family 4Fe-4S dicluster ferredoxin from Morganella morganii subsp. morganii KT
69% identity, 92% coverage

3eunA / P00208 Crystal structure of the 2[4fe-4s] c57a ferredoxin variant from allochromatium vinosum (see paper)
63% identity, 95% coverage

HI0527 ferredoxin (fdx-2) from Haemophilus influenzae Rd KW20
64% identity, 92% coverage

YfhL / b2562 putative 4Fe-4S cluster-containing protein YfhL from Escherichia coli K-12 substr. MG1655 (see 3 papers)
YFHL_ECOLI / P52102 Ferredoxin YfhL; EcFd from Escherichia coli (strain K12) (see paper)
b2562 predicted 4Fe-4S cluster-containing protein from Escherichia coli str. K-12 substr. MG1655
64% identity, 92% coverage

2zvsA / P52102 Crystal structure of the 2[4fe-4s] ferredoxin from escherichia coli (see paper)
63% identity, 92% coverage

Mmc1_0249 4Fe-4S ferredoxin, iron-sulfur binding domain protein from Magnetococcus sp. MC-1
58% identity, 95% coverage

A1S_2297 putative 4Fe-4S ferredoxin from Acinetobacter baumannii ATCC 17978
64% identity, 75% coverage

jhp0262 Ferredoxin from Helicobacter pylori J99
53% identity, 96% coverage

FDX2_SORC5 / A9FH21 Ferredoxin Fdx2 from Sorangium cellulosum (strain So ce56) (Polyangium cellulosum (strain So ce56)) (see paper)
56% identity, 74% coverage

HP0277 ferredoxin from Helicobacter pylori 26695
HPG27_256 ferrodoxin from Helicobacter pylori G27
HPG27_RS01390 YfhL family 4Fe-4S dicluster ferredoxin from Helicobacter pylori G27
53% identity, 96% coverage

fdx / CAA12251.1 ferredoxin from Thauera aromatica (see paper)
53% identity, 95% coverage

1rgvA / O88151 Crystal structure of the ferredoxin from thauera aromatica (see paper)
52% identity, 94% coverage

RPA0662 ferredoxin from Rhodopseudomonas palustris CGA009
49% identity, 93% coverage

SMGWSS_137 ferredoxin (4Fe-4S) from Candidatus Sulcia muelleri GWSS
48% identity, 94% coverage

Q8KCZ6 Ferredoxin-1 from Chlorobaculum tepidum (strain ATCC 49652 / DSM 12025 / NBRC 103806 / TLS)
56% identity, 69% coverage

FE46_RS08120 4Fe-4S dicluster domain-containing protein from Flavobacterium psychrophilum
36% identity, 68% coverage

DMIN_01330 4Fe-4S binding domain protein from Candidatus Sulcia muelleri DMIN
57% identity, 68% coverage

all2512 transcriptional regulator from Nostoc sp. PCC 7120
47% identity, 10% coverage

BMF77_02997 transcriptional regulator from Dolichospermum sp. UHCC 0315A
40% identity, 12% coverage

Npun_R0334 4Fe-4S ferredoxin iron-sulfur binding domain-containing protein from Nostoc punctiforme
46% identity, 9% coverage

RPA4631 ferredoxin 2[4Fe-4S], fdxN from Rhodopseudomonas palustris CGA009
54% identity, 64% coverage

Avin_10510 4Fe-4S ferredoxin, iron-sulfur binding domain protein from Azotobacter vinelandii AvOP
54% identity, 66% coverage

ZMO1818 4Fe-4S ferredoxin iron-sulfur binding domain protein from Zymomonas mobilis subsp. mobilis ZM4
47% identity, 62% coverage

CRC_01763 helix-turn-helix domain-containing protein from Cylindrospermopsis raciborskii CS-505
43% identity, 10% coverage

WP_026790610 4Fe-4S binding protein from Pleomorphomonas oryzae DSM 16300
45% identity, 75% coverage

bsr1739 ferredoxin from Bradyrhizobium japonicum USDA 110
49% identity, 62% coverage

cce_1898 transcriptional regulator from Cyanothece sp. ATCC 51142
38% identity, 12% coverage

bsr1760 ferredoxin-like protein from Bradyrhizobium japonicum USDA 110
44% identity, 64% coverage

msl8750 ferredoxin 2[4Fe-4S], fdxN from Mesorhizobium loti MAFF303099
47% identity, 67% coverage

fdxN / AAC33373.1 FdxN from Rippkaea orientalis PCC 8801 (see paper)
36% identity, 63% coverage

Cj0333c ferredoxin from Campylobacter jejuni subsp. jejuni NCTC 11168
48% identity, 61% coverage

HH0646 ferredoxin from Helicobacter hepaticus ATCC 51449
46% identity, 62% coverage

SWOL_RS10890 4Fe-4S binding protein from Syntrophomonas wolfei subsp. wolfei str. Goettingen G311
47% identity, 67% coverage

Mmc1_1207 4Fe-4S ferredoxin, iron-sulfur binding domain protein from Magnetococcus sp. MC-1
40% identity, 61% coverage

MSR1_18600 4Fe-4S dicluster domain-containing protein from Magnetospirillum gryphiswaldense MSR-1
50% identity, 67% coverage

FER_GOTA9 / P00198 4Fe-4S ferredoxin FdxA; Ferredoxin from Gottschalkia acidurici (strain ATCC 7906 / DSM 604 / BCRC 14475 / CIP 104303 / KCTC 5404 / NCIMB 10678 / 9a) (Clostridium acidurici) (see 2 papers)
44% identity, 67% coverage

SMUL_0303 DUF362 domain-containing protein from Sulfurospirillum multivorans DSM 12446
43% identity, 61% coverage

PDB|1FDN ferredoxin from Clostridium acidurici (see 5 papers)
43% identity, 66% coverage

1fcaA / P00198 Structure of the ferredoxin from clostridium acidurici: model at 1.8 angstroms resolution (see paper)
43% identity, 66% coverage

UCYN_05600 helix-turn-helix domain-containing protein from Candidatus Atelocyanobacterium thalassa isolate ALOHA
42% identity, 9% coverage

Aazo_1357 4Fe-4S binding protein from 'Nostoc azollae' 0708
35% identity, 73% coverage

X276_26075 DUF362 domain-containing protein from Clostridium beijerinckii NRRL B-598
Cbei_0118 4Fe-4S ferredoxin iron-sulfur binding domain-containing protein from Clostridium beijerincki NCIMB 8052
41% identity, 66% coverage

BT2414, BT_2414 ferredoxin from Bacteroides thetaiotaomicron VPI-5482
46% identity, 72% coverage

SYN_03059 ferridoxin from Syntrophus aciditrophicus SB
43% identity, 72% coverage

CLSPOx_00425 DUF362 domain-containing protein from Clostridium sporogenes
39% identity, 66% coverage

CLSA_c01660 DUF362 domain-containing protein from Clostridium saccharobutylicum DSM 13864
39% identity, 66% coverage

GI|144806 ferredoxin from Clostridium pasteurianum (see 4 papers)
P00195 Ferredoxin from Clostridium pasteurianum
41% identity, 66% coverage

BT_RS12205 DUF362 domain-containing protein from Bacteroides thetaiotaomicron VPI-5482
46% identity, 72% coverage

HM1_2505 DUF362 domain-containing protein from Heliomicrobium modesticaldum Ice1
HM1_2505 4fe-4S ferredoxin, iron-sulfur binding domain protein from Heliobacterium modesticaldum Ice1
39% identity, 73% coverage

CH_016396 ferredoxin from Clostridium butyricum (see 2 papers)
40% identity, 65% coverage

PIN17_RS06895, PIOMA14_I_1049, PIOMA14_RS05315 4Fe-4S binding protein from Prevotella intermedia
47% identity, 69% coverage

PG1421 ferredoxin, 4Fe-4S from Porphyromonas gingivalis W83
HMPREF1322_RS09700, PG_RS10480 DUF362 domain-containing protein from Porphyromonas gingivalis W83
43% identity, 72% coverage

1clfA / P00195 Clostridium pasteurianum ferredoxin (see paper)
40% identity, 65% coverage

BMF77_01997 DUF362 domain-containing protein from Dolichospermum sp. UHCC 0315A
34% identity, 72% coverage

Cbs_1773, X276_18165 [FeFe] hydrogenase, group A from Clostridium beijerinckii ATCC 35702
Cbei_1773 hydrogenase, Fe-only from Clostridium beijerincki NCIMB 8052
42% identity, 9% coverage

Csac_0737 4Fe-4S ferredoxin, iron-sulfur binding domain protein from Caldicellulosiruptor saccharolyticus DSM 8903
43% identity, 72% coverage

CAC0303 Ferredoxin from Clostridium acetobutylicum ATCC 824
CA_C0303 DUF362 domain-containing protein from Clostridium acetobutylicum ATCC 824
39% identity, 66% coverage

B2M23_RS02975 DUF362 domain-containing protein from Eubacterium limosum
44% identity, 73% coverage

8zqdA / A0A1I9RYV3 Anaerobically isolated active [fefe]-hydrogenase cba5h
42% identity, 9% coverage

D3S191 CoB--CoM heterodisulfide reductase iron-sulfur subunit A from Ferroglobus placidus (strain DSM 10642 / AEDII12DO)
36% identity, 8% coverage

1durA / P00193 Replacement for 1fdx 2(4fe4s) ferredoxin from (now) peptostreptococcus asaccharolyticus
43% identity, 72% coverage

swp_2142 4Fe-4S ferredoxin, iron-sulfur binding from Shewanella piezotolerans WP3
34% identity, 33% coverage

New Search

For advice on how to use these tools together, see Interactive tools for functional annotation of bacterial genomes.

Statistics

The PaperBLAST database links 793,807 different protein sequences to 1,259,118 scientific articles. Searches against EuropePMC were last performed on March 13 2025.

How It Works

PaperBLAST builds a database of protein sequences that are linked to scientific articles. These links come from automated text searches against the articles in EuropePMC and from manually-curated information from GeneRIF, UniProtKB/Swiss-Prot, BRENDA, CAZy (as made available by dbCAN), BioLiP, CharProtDB, MetaCyc, EcoCyc, TCDB, REBASE, the Fitness Browser, and a subset of the European Nucleotide Archive with the /experiment tag. Given this database and a protein sequence query, PaperBLAST uses protein-protein BLAST to find similar sequences with E < 0.001.

To build the database, we query EuropePMC with locus tags, with RefSeq protein identifiers, and with UniProt accessions. We obtain the locus tags from RefSeq or from MicrobesOnline. We use queries of the form "locus_tag AND genus_name" to try to ensure that the paper is actually discussing that gene. Because EuropePMC indexes most recent biomedical papers, even if they are not open access, some of the links may be to papers that you cannot read or that our computers cannot read. We query each of these identifiers that appears in the open access part of EuropePMC, as well as every locus tag that appears in the 500 most-referenced genomes, so that a gene may appear in the PaperBLAST results even though none of the papers that mention it are open access. We also incorporate text-mined links from EuropePMC that link open access articles to UniProt or RefSeq identifiers. (This yields some additional links because EuropePMC uses different heuristics for their text mining than we do.)

For every article that mentions a locus tag, a RefSeq protein identifier, or a UniProt accession, we try to select one or two snippets of text that refer to the protein. If we cannot get access to the full text, we try to select a snippet from the abstract, but unfortunately, unique identifiers such as locus tags are rarely provided in abstracts.

PaperBLAST also incorporates manually-curated protein functions:

Except for GeneRIF and ENA, the curated entries include a short curated description of the protein's function. For entries from BioLiP, the protein's function may not be known beyond binding to the ligand. Many of these entries also link to articles in PubMed.

For more information see the PaperBLAST paper (mSystems 2017) or the code. You can download PaperBLAST's database here.

Changes to PaperBLAST since the paper was written:

Many of these changes are described in Interactive tools for functional annotation of bacterial genomes.

Secrets

PaperBLAST cannot provide snippets for many of the papers that are published in non-open-access journals. This limitation applies even if the paper is marked as "free" on the publisher's web site and is available in PubmedCentral or EuropePMC. If a journal that you publish in is marked as "secret," please consider publishing elsewhere.

Omissions from the PaperBLAST Database

Many important articles are missing from PaperBLAST, either because the article's full text is not in EuropePMC (as for many older articles), or because the paper does not mention a protein identifier such as a locus tag, or because of PaperBLAST's heuristics. If you notice an article that characterizes a protein's function but is missing from PaperBLAST, please notify the curators at UniProt or add an entry to GeneRIF. Entries in either of these databases will eventually be incorporated into PaperBLAST. Note that to add an entry to UniProt, you will need to find the UniProt identifier for the protein. If the protein is not already in UniProt, you can ask them to create an entry. To add an entry to GeneRIF, you will need an NCBI Gene identifier, but unfortunately many prokaryotic proteins in RefSeq do not have corresponding Gene identifers.

References

PaperBLAST: Text-mining papers for information about homologs.
M. N. Price and A. P. Arkin (2017). mSystems, 10.1128/mSystems.00039-17.

Europe PMC in 2017.
M. Levchenko et al (2017). Nucleic Acids Research, 10.1093/nar/gkx1005.

Gene indexing: characterization and analysis of NLM's GeneRIFs.
J. A. Mitchell et al (2003). AMIA Annu Symp Proc 2003:460-464.

UniProt: the universal protein knowledgebase.
The UniProt Consortium (2016). Nucleic Acids Research, 10.1093/nar/gkw1099.

BRENDA in 2017: new perspectives and new tools in BRENDA.
S. Placzek et al (2017). Nucleic Acids Research, 10.1093/nar/gkw952.

The EcoCyc database: reflecting new knowledge about Escherichia coli K-12.
I. M. Keeseler et al (2016). Nucleic Acids Research, 10.1093/nar/gkw1003.

The MetaCyc database of metabolic pathways and enzymes.
R. Caspi et al (2018). Nucleic Acids Research, 10.1093/nar/gkx935.

CharProtDB: a database of experimentally characterized protein annotations.
R. Madupu et al (2012). Nucleic Acids Research, 10.1093/nar/gkr1133.

The carbohydrate-active enzymes database (CAZy) in 2013.
V. Lombard et al (2014). Nucleic Acids Research, 10.1093/nar/gkt1178.

The Transporter Classification Database (TCDB): recent advances
M. H. Saier, Jr. et al (2016). Nucleic Acids Research, 10.1093/nar/gkv1103.

REBASE - a database for DNA restriction and modification: enzymes, genes and genomes.
R. J. Roberts et al (2015). Nucleic Acids Research, 10.1093/nar/gku1046.

Deep annotation of protein function across diverse bacteria from mutant phenotypes.
M. N. Price et al (2016). bioRxiv, 10.1101/072470.

by Morgan Price, Arkin group
Lawrence Berkeley National Laboratory