PaperBLAST – Find papers about a protein or its homologs

 

PaperBLAST

PaperBLAST Hits for ABID97_RS24730 (78 a.a., MSKKHIEDCV...)

Other sequence analysis tools:

Find functional residues: SitesBLAST

Search for conserved domains

Find the best match in UniProt

Compare to protein structures

Predict transmenbrane helices: Phobius

Predict protein localization: PSORTb

Find homologs in fast.genomics

Fitness BLAST: loading...

Found 45 similar proteins in the literature:

BCAL3335 DNA-binding protein from Burkholderia cenocepacia J2315
64% identity, 99% coverage

azo2893 putative fis-like DNA-binding protein from Azoarcus sp. BH72
51% identity, 99% coverage

LHK_03207 Fis from Laribacter hongkongensis HLHK9
52% identity, 91% coverage

PMI3622 DNA-binding protein Fis from Proteus mirabilis HI4320
43% identity, 78% coverage

A0J47_RS00510 DNA-binding transcriptional regulator Fis from Photobacterium damselae subsp. damselae
42% identity, 78% coverage

A1S_2186 DNA-binding protein from Acinetobacter baumannii ATCC 17978
ABUW_1533 DNA-binding transcriptional regulator Fis from Acinetobacter baumannii
48% identity, 82% coverage

WP_012883002 DNA-binding transcriptional regulator Fis from Dickeya dianthicola
42% identity, 78% coverage

Fis / b3261 DNA-binding transcriptional dual regulator Fis from Escherichia coli K-12 substr. MG1655 (see 8 papers)
FIS_ECOLI / P0A6R3 DNA-binding protein Fis; Factor-for-inversion stimulation protein; Hin recombinational enhancer-binding protein from Escherichia coli (strain K12) (see 4 papers)
Fis / P0A6R3 Transcription factor Fis (activator/repressor) from Escherichia coli K12 MG1655 (see 25 papers)
Fis / FIS_SALTI Transcription factor Fis (repressor) from Salmonella enterica
5ds9B / P0A6R3 Crystal structure of fis bound to 27bp DNA f1-8a (aaattagtttgaattttgagctaattt) (see paper)
fis / GB|AAN44763.1 DNA-binding protein fis from Shigella sonnei Ss046 (see 13 papers)
STM3385 site-specific DNA inversion stimulation factor from Salmonella typhimurium LT2
B5R1C8 DNA-binding protein Fis from Salmonella enteritidis PT4 (strain P125109)
NP_417727 DNA-binding transcriptional dual regulator Fis from Escherichia coli str. K-12 substr. MG1655
NP_462295 site-specific DNA inversion stimulation factor from Salmonella enterica subsp. enterica serovar Typhimurium str. LT2
AP_003801 global DNA-binding transcriptional dual regulator from Escherichia coli W3110
b3261 DNA-binding protein Fis from Escherichia coli str. K-12 substr. MG1655
t3300 Fis DNA-binding protein from Salmonella enterica subsp. enterica serovar Typhi Ty2
KP1_4989 DNA-binding protein from Klebsiella pneumoniae NTUH-K2044
ECs4133 site-specific DNA inversion stimulation factor from Escherichia coli O157:H7 str. Sakai
BN49_RS03455, ECL_04646, KP1_RS23255, SENTW_3516, STM14_4083, T_RS16745 DNA-binding transcriptional regulator Fis from Klebsiella pneumoniae subsp. pneumoniae NTUH-K2044
KPHS_48020 DNA-binding protein Fis from Klebsiella pneumoniae subsp. pneumoniae HS11286
42% identity, 78% coverage

VC0290 factor-for-inversion stimulation protein from Vibrio cholerae O1 biovar eltor str. N16961
VP2885 factor-for-inversion stimulation protein from Vibrio parahaemolyticus RIMD 2210633
42% identity, 78% coverage

ECA0255 DNA-binding transcriptional regulator Fis from Pectobacterium atrosepticum SCRI1043
42% identity, 78% coverage

YPTB3577 DNA-binding protein Fis from Yersinia pseudotuberculosis IP 32953
YPK_0452 Fis family transcriptional regulator from Yersinia pseudotuberculosis YPIII
42% identity, 78% coverage

SO0393, SO_0393 DNA-binding protein Fis from Shewanella oneidensis MR-1
K3G22_17135 DNA-binding transcriptional regulator Fis from Shewanella putrefaciens
47% identity, 65% coverage

HD0449 DNA-binding protein from Haemophilus ducreyi 35000HP
42% identity, 74% coverage

APL_0190 DNA-binding protein Fis from Actinobacillus pleuropneumoniae L20
42% identity, 74% coverage

PA14_64190 DNA-binding protein Fis from Pseudomonas aeruginosa UCBPP-PA14
NP_253540 Fis family transcriptional regulator from Pseudomonas aeruginosa PAO1
PA4853 DNA-binding protein Fis from Pseudomonas aeruginosa PAO1
47% identity, 67% coverage

PP4821 DNA-binding protein Fis from Pseudomonas putida KT2440
46% identity, 68% coverage

HI0980 Hin recombinational enhancer binding protein (fis) from Haemophilus influenzae Rd KW20
39% identity, 77% coverage

PMCN03_0067, PmCQ2_004565, WP_005723276 DNA-binding transcriptional regulator Fis from Pasteurella multocida
41% identity, 77% coverage

XC_0520 DNA-binding protein from Xanthomonas campestris pv. campestris str. 8004
46% identity, 82% coverage

AKJ12_RS16695 DNA-binding transcriptional regulator Fis from Xanthomonas arboricola pv. juglandis
46% identity, 82% coverage

BU400 factor-for-inversion stimulation protein from Buchnera aphidicola str. APS (Acyrthosiphon pisum)
41% identity, 76% coverage

BUE60_07695 DNA-binding transcriptional regulator Fis from Pseudomonas syringae pv. actinidiae ICMP 19099
46% identity, 65% coverage

PmVP161_0076 DNA-binding transcriptional regulator Fis from Pasteurella multocida
41% identity, 77% coverage

ACP86_06670 DNA-binding transcriptional regulator Fis from Marinobacter sp. CP1
44% identity, 69% coverage

XF_RS13495 DNA-binding transcriptional regulator Fis from Xylella fastidiosa 9a5c
44% identity, 83% coverage

A1B9J9 DNA-binding transcriptional regulator NtrC from Paracoccus denitrificans (strain Pd 1222)
Pden_4129 two component, sigma54 specific, transcriptional regulator, Fis family from Paracoccus denitrificans PD1222
47% identity, 13% coverage

SPO2087 nitrogen regulation protein NtrC from Silicibacter pomeroyi DSS-3
SPO2087 sigma-54-dependent transcriptional regulator from Ruegeria pomeroyi DSS-3
47% identity, 13% coverage

lpp0606 hypothetical protein from Legionella pneumophila str. Paris
lpg0542 DNA binding protein Fis from Legionella pneumophila subsp. pneumophila str. Philadelphia 1
41% identity, 71% coverage

Atu1446 two component response regulator from Agrobacterium tumefaciens str. C58 (Cereon)
AGRO_4553, ATU_RS07125 nitrogen regulation protein NR(I) from Agrobacterium sp. ATCC 31749
45% identity, 14% coverage

Bd0039 site-specific DNA inversion stimulation factor from Bdellovibrio bacteriovorus HD100
40% identity, 72% coverage

RSP_2838 nitrogen metabolism transcriptional regulator, NtrC from Rhodobacter sphaeroides 2.4.1
43% identity, 15% coverage

ntrC / CAA86065.1 Nitrogen assimilation regulatory protein from Azospirillum brasilense (see paper)
P45671 DNA-binding transcriptional regulator NtrC from Azospirillum brasilense
48% identity, 13% coverage

AZOLI_1343 nitrogen regulation protein NR(I) from Azospirillum lipoferum 4B
43% identity, 14% coverage

A6A40_05215 nitrogen regulation protein NR(I) from Azospirillum humicireducens
43% identity, 14% coverage

AZC_3086 nitrogen assimilation regulatory protein ntrC from Azorhizobium caulinodans ORS 571
48% identity, 13% coverage

blr4488 two-component response regulator from Bradyrhizobium japonicum USDA 110
44% identity, 13% coverage

P10576 DNA-binding transcriptional regulator NtrC from Bradyrhizobium sp. (strain RP501 Parasponia)
44% identity, 13% coverage

BMEI0866 NITROGEN ASSIMILATION REGULATORY PROTEIN from Brucella melitensis 16M
43% identity, 14% coverage

CCNA_01815 nitrogen assimilation regulatory protein from Caulobacter crescentus NA1000
CC1741 nitrogen regulation protein NR(I) from Caulobacter crescentus CB15
44% identity, 13% coverage

RL2257 two component response regulator nitrogen regulation protein NR(I) from Rhizobium leguminosarum bv. viciae 3841
43% identity, 14% coverage

SAMCFNEI73_Ch1589 nitrogen regulation protein NR(I) from Sinorhizobium americanum
42% identity, 14% coverage

P10577 DNA-binding transcriptional regulator NtrC from Rhizobium meliloti (strain 1021)
SMc01043 NITROGEN ASSIMILATION REGULATORY PROTEIN from Sinorhizobium meliloti 1021
42% identity, 14% coverage

lpg1370 Hypothetical protein from Legionella pneumophila subsp. pneumophila str. Philadelphia 1
lpp1324 hypothetical protein from Legionella pneumophila str. Paris
35% identity, 70% coverage

lpp1707 hypothetical protein from Legionella pneumophila str. Paris
lpg1743 Fis transcriptional activator from Legionella pneumophila subsp. pneumophila str. Philadelphia 1
51% identity, 45% coverage

Adeh_1992 two component, sigma54 specific, transcriptional regulator, Fis family from Anaeromyxobacter dehalogenans 2CP-C
52% identity, 9% coverage

New Search

For advice on how to use these tools together, see Interactive tools for functional annotation of bacterial genomes.

Statistics

The PaperBLAST database links 793,807 different protein sequences to 1,259,118 scientific articles. Searches against EuropePMC were last performed on March 13 2025.

How It Works

PaperBLAST builds a database of protein sequences that are linked to scientific articles. These links come from automated text searches against the articles in EuropePMC and from manually-curated information from GeneRIF, UniProtKB/Swiss-Prot, BRENDA, CAZy (as made available by dbCAN), BioLiP, CharProtDB, MetaCyc, EcoCyc, TCDB, REBASE, the Fitness Browser, and a subset of the European Nucleotide Archive with the /experiment tag. Given this database and a protein sequence query, PaperBLAST uses protein-protein BLAST to find similar sequences with E < 0.001.

To build the database, we query EuropePMC with locus tags, with RefSeq protein identifiers, and with UniProt accessions. We obtain the locus tags from RefSeq or from MicrobesOnline. We use queries of the form "locus_tag AND genus_name" to try to ensure that the paper is actually discussing that gene. Because EuropePMC indexes most recent biomedical papers, even if they are not open access, some of the links may be to papers that you cannot read or that our computers cannot read. We query each of these identifiers that appears in the open access part of EuropePMC, as well as every locus tag that appears in the 500 most-referenced genomes, so that a gene may appear in the PaperBLAST results even though none of the papers that mention it are open access. We also incorporate text-mined links from EuropePMC that link open access articles to UniProt or RefSeq identifiers. (This yields some additional links because EuropePMC uses different heuristics for their text mining than we do.)

For every article that mentions a locus tag, a RefSeq protein identifier, or a UniProt accession, we try to select one or two snippets of text that refer to the protein. If we cannot get access to the full text, we try to select a snippet from the abstract, but unfortunately, unique identifiers such as locus tags are rarely provided in abstracts.

PaperBLAST also incorporates manually-curated protein functions:

Except for GeneRIF and ENA, the curated entries include a short curated description of the protein's function. For entries from BioLiP, the protein's function may not be known beyond binding to the ligand. Many of these entries also link to articles in PubMed.

For more information see the PaperBLAST paper (mSystems 2017) or the code. You can download PaperBLAST's database here.

Changes to PaperBLAST since the paper was written:

Many of these changes are described in Interactive tools for functional annotation of bacterial genomes.

Secrets

PaperBLAST cannot provide snippets for many of the papers that are published in non-open-access journals. This limitation applies even if the paper is marked as "free" on the publisher's web site and is available in PubmedCentral or EuropePMC. If a journal that you publish in is marked as "secret," please consider publishing elsewhere.

Omissions from the PaperBLAST Database

Many important articles are missing from PaperBLAST, either because the article's full text is not in EuropePMC (as for many older articles), or because the paper does not mention a protein identifier such as a locus tag, or because of PaperBLAST's heuristics. If you notice an article that characterizes a protein's function but is missing from PaperBLAST, please notify the curators at UniProt or add an entry to GeneRIF. Entries in either of these databases will eventually be incorporated into PaperBLAST. Note that to add an entry to UniProt, you will need to find the UniProt identifier for the protein. If the protein is not already in UniProt, you can ask them to create an entry. To add an entry to GeneRIF, you will need an NCBI Gene identifier, but unfortunately many prokaryotic proteins in RefSeq do not have corresponding Gene identifers.

References

PaperBLAST: Text-mining papers for information about homologs.
M. N. Price and A. P. Arkin (2017). mSystems, 10.1128/mSystems.00039-17.

Europe PMC in 2017.
M. Levchenko et al (2017). Nucleic Acids Research, 10.1093/nar/gkx1005.

Gene indexing: characterization and analysis of NLM's GeneRIFs.
J. A. Mitchell et al (2003). AMIA Annu Symp Proc 2003:460-464.

UniProt: the universal protein knowledgebase.
The UniProt Consortium (2016). Nucleic Acids Research, 10.1093/nar/gkw1099.

BRENDA in 2017: new perspectives and new tools in BRENDA.
S. Placzek et al (2017). Nucleic Acids Research, 10.1093/nar/gkw952.

The EcoCyc database: reflecting new knowledge about Escherichia coli K-12.
I. M. Keeseler et al (2016). Nucleic Acids Research, 10.1093/nar/gkw1003.

The MetaCyc database of metabolic pathways and enzymes.
R. Caspi et al (2018). Nucleic Acids Research, 10.1093/nar/gkx935.

CharProtDB: a database of experimentally characterized protein annotations.
R. Madupu et al (2012). Nucleic Acids Research, 10.1093/nar/gkr1133.

The carbohydrate-active enzymes database (CAZy) in 2013.
V. Lombard et al (2014). Nucleic Acids Research, 10.1093/nar/gkt1178.

The Transporter Classification Database (TCDB): recent advances
M. H. Saier, Jr. et al (2016). Nucleic Acids Research, 10.1093/nar/gkv1103.

REBASE - a database for DNA restriction and modification: enzymes, genes and genomes.
R. J. Roberts et al (2015). Nucleic Acids Research, 10.1093/nar/gku1046.

Deep annotation of protein function across diverse bacteria from mutant phenotypes.
M. N. Price et al (2016). bioRxiv, 10.1101/072470.

by Morgan Price, Arkin group
Lawrence Berkeley National Laboratory