PaperBLAST – Find papers about a protein or its homologs

 

PaperBLAST

PaperBLAST Hits for CA265_RS20885 (81 a.a., MNLTEQVEQA...)

Other sequence analysis tools:

Find functional residues: SitesBLAST

Search for conserved domains

Find the best match in UniProt

Compare to protein structures

Predict transmenbrane helices: Phobius

Predict protein localization: PSORTb

Find homologs in fast.genomics

Fitness BLAST: loading...

Found 98 similar proteins in the literature:

Npun_R0364 Rieske (2Fe-2S) domain-containing protein from Nostoc punctiforme
50% identity, 26% coverage

HWX41_RS22890 NifU family protein from Bacillus paramycoides
51% identity, 88% coverage

2z51A / Q93W20 Crystal structure of arabidopsis cnfu involved in iron-sulfur cluster biosynthesis (see paper)
52% identity, 46% coverage

NIFU2_ARATH / Q93W20 NifU-like protein 2, chloroplastic; AtCNfu2; AtCnfU-V from Arabidopsis thaliana (Mouse-ear cress) (see 4 papers)
NP_568715 NIFU-like protein 2 from Arabidopsis thaliana
AT5G49940 NFU2 (NIFU-LIKE PROTEIN 2); structural molecule from Arabidopsis thaliana
52% identity, 30% coverage

Q816B6 NifU protein from Bacillus cereus (strain ATCC 14579 / DSM 31 / CCUG 7414 / JCM 2152 / NBRC 15305 / NCIMB 9373 / NCTC 2599 / NRRL B-3711)
B7HUU3 NifU domain protein from Bacillus cereus (strain AH187)
BC4952 NifU protein from Bacillus cereus ATCC 14579
49% identity, 88% coverage

Tery_3360 nitrogen-fixing NifU-like from Trichodesmium erythraeum IMS101
45% identity, 42% coverage

SYNPCC7002_A1413 NifU like protein from Synechococcus sp. PCC 7002
52% identity, 85% coverage

BSU32220 putative iron-sulfur scaffold protein from Bacillus subtilis subsp. subtilis str. 168
51% identity, 55% coverage

alr0692 similar to NifU protein from Nostoc sp. PCC 7120
43% identity, 47% coverage

SACOL0939 NifU domain protein from Staphylococcus aureus subsp. aureus COL
44% identity, 86% coverage

USA300HOU_0897 possible NifU family protein from Staphylococcus aureus subsp. aureus USA300_TCH1516
SA0797 hypothetical protein from Staphylococcus aureus subsp. aureus N315
SAV0936 nitrogen fixation protein NifU from Staphylococcus aureus subsp. aureus Mu50
SAUSA300_0839 hypothetical protein from Staphylococcus aureus subsp. aureus USA300_FPR3757
SAR0898 conserved hypothetical protein from Staphylococcus aureus subsp. aureus MRSA252
44% identity, 88% coverage

NIFU3_ARATH / Q84RQ7 NifU-like protein 3, chloroplastic; AtCNfu3; AtCnfU-IVa from Arabidopsis thaliana (Mouse-ear cress) (see paper)
AT4G25910 NFU3; structural molecule from Arabidopsis thaliana
NP_567735 NFU domain protein 3 from Arabidopsis thaliana
52% identity, 31% coverage

LMOf2365_2371 NifU family protein from Listeria monocytogenes str. 4b F2365
46% identity, 86% coverage

lmo2397 similar to NifU protein from Listeria monocytogenes EGD-e
46% identity, 86% coverage

WP_060381373 NifU family protein from Flavobacterium covae
45% identity, 91% coverage

SERP0522 NifU domain protein from Staphylococcus epidermidis RP62A
SE0630 nitrogen fixation protein NifU from Staphylococcus epidermidis ATCC 12228
42% identity, 88% coverage

ssl2667 NifU family protein from Synechocystis sp. PCC 6803
49% identity, 89% coverage

DN052_04095 NifU family protein from Acidithiobacillus ferrooxidans
42% identity, 24% coverage

Ava_4600 Nitrogen-fixing NifU-like from Anabaena variabilis ATCC 29413
42% identity, 47% coverage

XP_001470367 conserved hypothetical protein from Leishmania infantum JPCM5
52% identity, 20% coverage

I0YUZ0 HIRA-interacting protein 5 from Coccomyxa subellipsoidea (strain C-169)
43% identity, 35% coverage

E1ZCG7 Scaffold protein Nfu/NifU N-terminal domain-containing protein from Chlorella variabilis
44% identity, 32% coverage

MLD56_21125 NifU family protein from Paenibacillus peoriae
40% identity, 90% coverage

CD630_08500 NifU family protein from Clostridioides difficile 630
41% identity, 90% coverage

GPNADHDJ_02081 NfuA family Fe-S biogenesis protein from Stenotrophomonas maltophilia
39% identity, 36% coverage

Sb01g037130 No description from Sorghum bicolor
48% identity, 26% coverage

MSMEG_2718 iron-sulfur cluster-binding protein, Rieske family protein, putative from Mycobacterium smegmatis str. MC2 155
37% identity, 25% coverage

AMET1_0462 NifU family protein from Methanonatronarchaeum thermophilum
44% identity, 79% coverage

DET1632 NifU-like protein from Dehalococcoides ethenogenes 195
47% identity, 89% coverage

NP_501917 NFU1 iron-sulfur cluster scaffold homolog, mitochondrial from Caenorhabditis elegans
40% identity, 33% coverage

XP_007510766 nitrogen-fixing NifU domain protein from Bathycoccus prasinos
48% identity, 20% coverage

A0A2K3D340 Scaffold protein Nfu/NifU N-terminal domain-containing protein from Chlamydomonas reinhardtii
41% identity, 27% coverage

A4RUX0 Scaffold protein Nfu/NifU N-terminal domain-containing protein from Ostreococcus lucimarinus (strain CCE9901)
43% identity, 33% coverage

SPBC1709.19c NifU-like protein from Schizosaccharomyces pombe
40% identity, 29% coverage

PXO_01524 protein GntY from Xanthomonas oryzae pv. oryzae PXO99A
38% identity, 36% coverage

Francci3_4477 HesB/YadR/YfhF from Frankia sp. CcI3
40% identity, 36% coverage

ZP_01463912 NifU domain protein from Stigmatella aurantiaca DW4/3-1
40% identity, 33% coverage

Q9ZCQ2 Scaffold protein Nfu/NifU N-terminal domain-containing protein from Rickettsia prowazekii (strain Madrid E)
RP667 unknown from Rickettsia prowazekii str. Madrid E
42% identity, 38% coverage

A0A2K3D318 Scaffold protein Nfu/NifU N-terminal domain-containing protein from Chlamydomonas reinhardtii
42% identity, 22% coverage

Q8Z223 Fe/S biogenesis protein NfuA from Salmonella typhi
42% identity, 37% coverage

STM3511 putative Thioredoxin-like proteins and domain from Salmonella typhimurium LT2
Q8ZLI7 Fe/S biogenesis protein NfuA from Salmonella typhimurium (strain LT2 / SGSC1412 / ATCC 700720)
42% identity, 37% coverage

Q01C69 NIF system FeS cluster assembly, NifU-like scaffold, N-terminal from Ostreococcus tauri
43% identity, 27% coverage

K8F1V7 Scaffold protein Nfu/NifU N-terminal domain-containing protein from Bathycoccus prasinos
41% identity, 22% coverage

D3ZA85 NFU1 iron-sulfur cluster scaffold homolog, mitochondrial from Rattus norvegicus
42% identity, 26% coverage

MMAR_1868 hypothetical protein from Mycobacterium marinum M
33% identity, 23% coverage

XP_063141883 NFU1 iron-sulfur cluster scaffold homolog, mitochondrial isoform X1 from Rattus norvegicus
41% identity, 27% coverage

NIFU1_ARATH / Q93W77 NifU-like protein 1, chloroplastic; AtCNfu1; AtCnfU-IVb from Arabidopsis thaliana (Mouse-ear cress) (see 2 papers)
AT4G01940 NFU1; structural molecule from Arabidopsis thaliana
NP_567219 NFU domain protein 1 from Arabidopsis thaliana
48% identity, 27% coverage

Q9QZ23 NFU1 iron-sulfur cluster scaffold homolog, mitochondrial from Mus musculus
41% identity, 27% coverage

Q9C8J2 NifU-like protein 5, mitochondrial from Arabidopsis thaliana
AT1G51390 NFU5; ATP binding / structural molecule from Arabidopsis thaliana
43% identity, 24% coverage

TGME49_212930 NifU family domain-containing protein from Toxoplasma gondii ME49
34% identity, 22% coverage

NFU1_HUMAN / Q9UMS0 NFU1 iron-sulfur cluster scaffold homolog, mitochondrial; HIRA-interacting protein 5 from Homo sapiens (Human) (see 12 papers)
NP_001002755 NFU1 iron-sulfur cluster scaffold homolog, mitochondrial isoform 2 from Homo sapiens
40% identity, 26% coverage

NIFU4_ARATH / Q9LIG6 NifU-like protein 4, mitochondrial; AtNfu-III; AtNfu4 from Arabidopsis thaliana (Mouse-ear cress) (see paper)
AT3G20970 NFU4; structural molecule from Arabidopsis thaliana
43% identity, 23% coverage

FRAAL6802 Hypothetical protein in nifB-nifU intergenic region (ORF2) from Frankia alni ACN14a
38% identity, 38% coverage

XP_845796 HIRA-interacting protein 5, putative from Trypanosoma brucei brucei TREU927
46% identity, 21% coverage

CG32857 uncharacterized protein from Drosophila melanogaster
CG32500 uncharacterized protein from Drosophila melanogaster
CG33502 uncharacterized protein from Drosophila melanogaster
Q8SY96 NFU1 iron-sulfur cluster scaffold homolog, mitochondrial from Drosophila melanogaster
39% identity, 24% coverage

C9J8Q1 NFU1 iron-sulfur cluster scaffold (Fragment) from Homo sapiens
41% identity, 66% coverage

YPO0127 conserved hypothetical protein from Yersinia pestis CO92
YPTB3773 hypothetical protein from Yersinia pseudotuberculosis IP 32953
39% identity, 37% coverage

YhgI / b3414 iron-sulfur cluster carrier protein NfuA from Escherichia coli K-12 substr. MG1655 (see 15 papers)
nfuA / P63020 iron-sulfur cluster carrier protein NfuA from Escherichia coli (strain K12) (see 13 papers)
NFUA_ECOLI / P63020 Fe/S biogenesis protein NfuA from Escherichia coli (strain K12) (see 4 papers)
nfuA / GB|AAC76439.1 Fe/S-biogenesis protein NfuA from Escherichia coli K12 (see 7 papers)
b3414 predicted gluconate transport associated protein from Escherichia coli str. K-12 substr. MG1655
NP_417873 iron-sulfur cluster carrier protein NfuA from Escherichia coli str. K-12 substr. MG1655
Z4769 orf, hypothetical protein from Escherichia coli O157:H7 EDL933
ECs4256 hypothetical protein from Escherichia coli O157:H7 str. Sakai
B21_RS17040 Fe-S biogenesis protein NfuA from Escherichia coli BL21(DE3)
39% identity, 37% coverage

RSP_2214 Nitrogen-fixing NifU from Rhodobacter sphaeroides 2.4.1
33% identity, 40% coverage

CC0062, CC_0062 NifU-like domain protein from Caulobacter crescentus CB15
38% identity, 30% coverage

CCNA_00060 mitochondrial-type Fe-S cluster assembly protein NFU from Caulobacter crescentus NA1000
38% identity, 35% coverage

RLV_2767 NifU family protein from Rhizobium leguminosarum bv. viciae
35% identity, 36% coverage

C1DLW0 Fe/S biogenesis protein NfuA from Azotobacter vinelandii (strain DJ / ATCC BAA-1303)
37% identity, 36% coverage

NFU1_YEAST / P32860 NifU-like protein, mitochondrial from Saccharomyces cerevisiae (strain ATCC 204508 / S288c) (Baker's yeast) (see 3 papers)
NP_012884 Nfu1p from Saccharomyces cerevisiae S288C
NP_012884, YKL040C Nfu1p from Saccharomyces cerevisiae
37% identity, 31% coverage

RL0400 putative nifU iron-sulphur cluster scaffold protein from Rhizobium leguminosarum bv. viciae 3841
35% identity, 36% coverage

C1EHF7 Scaffold protein Nfu/NifU N-terminal domain-containing protein from Micromonas commoda (strain RCC299 / NOUM17 / CCMP2709)
38% identity, 23% coverage

AFUA_1G04680 NifU-related protein from Aspergillus fumigatus Af293
38% identity, 24% coverage

A0QUN2 Rieske domain-containing protein from Mycolicibacterium smegmatis (strain ATCC 700084 / mc(2)155)
MSMEG_2268 hypothetical protein from Mycobacterium smegmatis str. MC2 155
38% identity, 24% coverage

Atu0351 hypothetical protein from Agrobacterium tumefaciens str. C58 (Cereon)
36% identity, 37% coverage

PADG_03852 uncharacterized protein from Paracoccidioides brasiliensis Pb18
38% identity, 24% coverage

BAB1_0139 Nitrogen-fixing NifU, C-terminal from Brucella melitensis biovar Abortus 2308
35% identity, 37% coverage

NE1445 Nitrogen-fixing protein NifU from Nitrosomonas europaea ATCC 19718
37% identity, 39% coverage

PST_1350 Fe-S cluster assembly protein NifU from Pseudomonas stutzeri A1501
37% identity, 22% coverage

PGUG_02696 uncharacterized protein from Meyerozyma guilliermondii ATCC 6260
39% identity, 31% coverage

A4YC18 Fe/S biogenesis protein NfuA from Shewanella putrefaciens (strain CN-32 / ATCC BAA-453)
36% identity, 34% coverage

Q9I2P8 Fe/S biogenesis protein NfuA from Pseudomonas aeruginosa (strain ATCC 15692 / DSM 22644 / CIP 104116 / JCM 14847 / LMG 12228 / 1C / PRS 101 / PAO1)
NP_250538 Fe/S biogenesis protein NfuA from Pseudomonas aeruginosa PAO1
PA1847 hypothetical protein from Pseudomonas aeruginosa PAO1
32% identity, 38% coverage

XP_638146 NIF system FeS cluster assembly domain-containing protein from Dictyostelium discoideum AX4
37% identity, 22% coverage

VF_2461 putative DNA uptake protein from Vibrio fischeri ES114
40% identity, 31% coverage

SG2325 hypothetical protein from Sodalis glossinidius str. 'morsitans'
37% identity, 37% coverage

NHN26_05060 iron-sulfur cluster assembly scaffold protein from Rhodovulum tesquicola
37% identity, 29% coverage

rrnAC3083 unknown from Haloarcula marismortui ATCC 43049
33% identity, 57% coverage

nifU / AAC33371.1 NifU from Rippkaea orientalis PCC 8801 (see paper)
41% identity, 22% coverage

CRC_02888 Fe-S cluster assembly protein NifU from Cylindrospermopsis raciborskii CS-505
42% identity, 24% coverage

bll0800 bll0800 from Bradyrhizobium japonicum USDA 110
32% identity, 36% coverage

NIFU_AZOVI / P05340 Nitrogen fixation protein NifU from Azotobacter vinelandii (see 2 papers)
C1DH18 Nitrogen fixation protein NifU from Azotobacter vinelandii (strain DJ / ATCC BAA-1303)
Avin_01620 Nitrogen fixation Fe-S cluster scaffold protein from Azotobacter vinelandii AvOP
33% identity, 24% coverage

BJ6T_08050 NifU family protein from Bradyrhizobium japonicum USDA 6
32% identity, 36% coverage

AM1146 hypothetical protein from Anaplasma marginale str. St. Maries
31% identity, 38% coverage

A5F4R9 Fe/S biogenesis protein NfuA from Vibrio cholerae serotype O1 (strain ATCC 39541 / Classical Ogawa 395 / O395)
37% identity, 32% coverage

VC2720 conserved hypothetical protein from Vibrio cholerae O1 biovar eltor str. N16961
37% identity, 32% coverage

Q92GV4 Scaffold protein Nfu/NifU N-terminal domain-containing protein from Rickettsia conorii (strain ATCC VR-613 / Malish 7)
41% identity, 30% coverage

Achr_1480 Fe-S cluster assembly protein NifU from Azotobacter chroococcum NCIMB 8003
32% identity, 24% coverage

pE3SP1_p070 NifU family protein from Polaromonas sp. E3S
42% identity, 34% coverage

RAYM_01100 NifU family protein from Riemerella anatipestifer RA-YM
39% identity, 22% coverage

JHW33_RS22810 Fe-S cluster assembly protein NifU from Rahnella aceris
32% identity, 25% coverage

TA19885 Nifu-like protein, putative from Theileria annulata
33% identity, 42% coverage

all1456 nitrogen fixation protein from Nostoc sp. PCC 7120
42% identity, 18% coverage

Ava_3915 Fe-S cluster assembly protein NifU from Anabaena variabilis ATCC 29413
42% identity, 18% coverage

ECH_0202 NifU domain protein from Ehrlichia chaffeensis str. Arkansas
34% identity, 33% coverage

New Search

For advice on how to use these tools together, see Interactive tools for functional annotation of bacterial genomes.

Statistics

The PaperBLAST database links 793,807 different protein sequences to 1,259,118 scientific articles. Searches against EuropePMC were last performed on March 13 2025.

How It Works

PaperBLAST builds a database of protein sequences that are linked to scientific articles. These links come from automated text searches against the articles in EuropePMC and from manually-curated information from GeneRIF, UniProtKB/Swiss-Prot, BRENDA, CAZy (as made available by dbCAN), BioLiP, CharProtDB, MetaCyc, EcoCyc, TCDB, REBASE, the Fitness Browser, and a subset of the European Nucleotide Archive with the /experiment tag. Given this database and a protein sequence query, PaperBLAST uses protein-protein BLAST to find similar sequences with E < 0.001.

To build the database, we query EuropePMC with locus tags, with RefSeq protein identifiers, and with UniProt accessions. We obtain the locus tags from RefSeq or from MicrobesOnline. We use queries of the form "locus_tag AND genus_name" to try to ensure that the paper is actually discussing that gene. Because EuropePMC indexes most recent biomedical papers, even if they are not open access, some of the links may be to papers that you cannot read or that our computers cannot read. We query each of these identifiers that appears in the open access part of EuropePMC, as well as every locus tag that appears in the 500 most-referenced genomes, so that a gene may appear in the PaperBLAST results even though none of the papers that mention it are open access. We also incorporate text-mined links from EuropePMC that link open access articles to UniProt or RefSeq identifiers. (This yields some additional links because EuropePMC uses different heuristics for their text mining than we do.)

For every article that mentions a locus tag, a RefSeq protein identifier, or a UniProt accession, we try to select one or two snippets of text that refer to the protein. If we cannot get access to the full text, we try to select a snippet from the abstract, but unfortunately, unique identifiers such as locus tags are rarely provided in abstracts.

PaperBLAST also incorporates manually-curated protein functions:

Except for GeneRIF and ENA, the curated entries include a short curated description of the protein's function. For entries from BioLiP, the protein's function may not be known beyond binding to the ligand. Many of these entries also link to articles in PubMed.

For more information see the PaperBLAST paper (mSystems 2017) or the code. You can download PaperBLAST's database here.

Changes to PaperBLAST since the paper was written:

Many of these changes are described in Interactive tools for functional annotation of bacterial genomes.

Secrets

PaperBLAST cannot provide snippets for many of the papers that are published in non-open-access journals. This limitation applies even if the paper is marked as "free" on the publisher's web site and is available in PubmedCentral or EuropePMC. If a journal that you publish in is marked as "secret," please consider publishing elsewhere.

Omissions from the PaperBLAST Database

Many important articles are missing from PaperBLAST, either because the article's full text is not in EuropePMC (as for many older articles), or because the paper does not mention a protein identifier such as a locus tag, or because of PaperBLAST's heuristics. If you notice an article that characterizes a protein's function but is missing from PaperBLAST, please notify the curators at UniProt or add an entry to GeneRIF. Entries in either of these databases will eventually be incorporated into PaperBLAST. Note that to add an entry to UniProt, you will need to find the UniProt identifier for the protein. If the protein is not already in UniProt, you can ask them to create an entry. To add an entry to GeneRIF, you will need an NCBI Gene identifier, but unfortunately many prokaryotic proteins in RefSeq do not have corresponding Gene identifers.

References

PaperBLAST: Text-mining papers for information about homologs.
M. N. Price and A. P. Arkin (2017). mSystems, 10.1128/mSystems.00039-17.

Europe PMC in 2017.
M. Levchenko et al (2017). Nucleic Acids Research, 10.1093/nar/gkx1005.

Gene indexing: characterization and analysis of NLM's GeneRIFs.
J. A. Mitchell et al (2003). AMIA Annu Symp Proc 2003:460-464.

UniProt: the universal protein knowledgebase.
The UniProt Consortium (2016). Nucleic Acids Research, 10.1093/nar/gkw1099.

BRENDA in 2017: new perspectives and new tools in BRENDA.
S. Placzek et al (2017). Nucleic Acids Research, 10.1093/nar/gkw952.

The EcoCyc database: reflecting new knowledge about Escherichia coli K-12.
I. M. Keeseler et al (2016). Nucleic Acids Research, 10.1093/nar/gkw1003.

The MetaCyc database of metabolic pathways and enzymes.
R. Caspi et al (2018). Nucleic Acids Research, 10.1093/nar/gkx935.

CharProtDB: a database of experimentally characterized protein annotations.
R. Madupu et al (2012). Nucleic Acids Research, 10.1093/nar/gkr1133.

The carbohydrate-active enzymes database (CAZy) in 2013.
V. Lombard et al (2014). Nucleic Acids Research, 10.1093/nar/gkt1178.

The Transporter Classification Database (TCDB): recent advances
M. H. Saier, Jr. et al (2016). Nucleic Acids Research, 10.1093/nar/gkv1103.

REBASE - a database for DNA restriction and modification: enzymes, genes and genomes.
R. J. Roberts et al (2015). Nucleic Acids Research, 10.1093/nar/gku1046.

Deep annotation of protein function across diverse bacteria from mutant phenotypes.
M. N. Price et al (2016). bioRxiv, 10.1101/072470.

by Morgan Price, Arkin group
Lawrence Berkeley National Laboratory