PaperBLAST – Find papers about a protein or its homologs

 

PaperBLAST

PaperBLAST Hits for 86 a.a. (AAGSKALGSA...)

Other sequence analysis tools:

Find functional residues: SitesBLAST

Search for conserved domains

Find the best match in UniProt

Compare to protein structures

Predict transmenbrane helices: Phobius

Predict protein localization: PSORTb

Find homologs in fast.genomics

Fitness BLAST: loading...

Found 103 similar proteins in the literature:

SRBP1_MOUSE / Q9WTN3 Sterol regulatory element-binding protein 1; SREBP-1; Sterol regulatory element-binding transcription factor 1 from Mus musculus (Mouse) (see 17 papers)
100% identity, 8% coverage

NP_001300908 sterol regulatory element-binding protein 1 isoform b from Mus musculus
100% identity, 8% coverage

SRBP1_RAT / P56720 Sterol regulatory element-binding protein 1; SREBP-1; Adipocyte determination- and differentiation-dependent factor 1; ADD1; Sterol regulatory element-binding transcription factor 1 from Rattus norvegicus (Rat) (see 2 papers)
NP_001263636 sterol regulatory element-binding protein 1 isoform 1 precursor from Rattus norvegicus
99% identity, 8% coverage

F7E4A8 Sterol regulatory element-binding protein 1 from Macaca mulatta
94% identity, 7% coverage

Q60416 Sterol regulatory element-binding protein 1 from Cricetulus griseus
97% identity, 8% coverage

SRBP1_HUMAN / P36956 Sterol regulatory element-binding protein 1; SREBP-1; Class D basic helix-loop-helix protein 1; bHLHd1; Sterol regulatory element-binding transcription factor 1 from Homo sapiens (Human) (see 12 papers)
NP_004167 sterol regulatory element-binding protein 1 isoform 2 from Homo sapiens
93% identity, 7% coverage

LOC108635517 LOW QUALITY PROTEIN: sterol regulatory element-binding protein 1-like from Capra hircus
94% identity, 10% coverage

NP_001272684 sterol regulatory element-binding protein 1 from Capra hircus
94% identity, 7% coverage

NP_001106773 sterol regulatory element-binding protein 1 from Bos taurus
93% identity, 7% coverage

NP_999322 sterol regulatory element-binding protein 1 from Sus scrofa
O97676 Sterol regulatory element-binding protein 1 from Sus scrofa
91% identity, 7% coverage

1am9C / P36956 Human srebp-1a bound to ldl receptor promoter (see paper)
95% identity, 88% coverage

NP_989457 sterol regulatory element-binding protein 1 from Gallus gallus
83% identity, 8% coverage

XP_015149594 sterol regulatory element-binding protein 1 isoform X2 from Gallus gallus
83% identity, 7% coverage

NP_001098599 sterol regulatory element-binding protein 1 from Danio rerio
81% identity, 7% coverage

SRBP2_DANRE / A3KNA7 Sterol regulatory element-binding protein 2; SREBP-2; Sterol regulatory element-binding transcription factor 2 from Danio rerio (Zebrafish) (Brachydanio rerio) (see paper)
66% identity, 7% coverage

SRBP2_CRIGR / Q60429 Sterol regulatory element-binding protein 2; SREBP-2; Sterol regulatory element-binding transcription factor 2 from Cricetulus griseus (Chinese hamster) (Cricetulus barabensis griseus) (see 2 papers)
70% identity, 6% coverage

XP_021502167 sterol regulatory element-binding protein 2 isoform X1 from Meriones unguiculatus
70% identity, 6% coverage

SRBP2_MOUSE / Q3U1N2 Sterol regulatory element-binding protein 2; SREBP-2; Sterol regulatory element-binding transcription factor 2 from Mus musculus (Mouse) (see 10 papers)
NP_150087 sterol regulatory element-binding protein 2 isoform 1 from Mus musculus
70% identity, 6% coverage

NP_001028866 sterol regulatory element-binding protein 2 from Rattus norvegicus
70% identity, 6% coverage

SRBP2_HUMAN / Q12772 Sterol regulatory element-binding protein 2; SREBP-2; Class D basic helix-loop-helix protein 2; bHLHd2; Sterol regulatory element-binding transcription factor 2 from Homo sapiens (Human) (see 12 papers)
NP_004590 sterol regulatory element-binding protein 2 from Homo sapiens
70% identity, 6% coverage

XP_015144523 sterol regulatory element-binding protein 2 from Gallus gallus
69% identity, 6% coverage

XP_974195 sterol regulatory element-binding protein 1 from Tribolium castaneum
71% identity, 7% coverage

NP_001262064 sterol regulatory element binding protein, isoform D from Drosophila melanogaster
75% identity, 5% coverage

Smp_000530 zinc finger transcription factor gli2 from Schistosoma mansoni
55% identity, 3% coverage

EGR_00781 Sterol regulatory element-binding protein from Echinococcus granulosus
54% identity, 8% coverage

PADG_03295, XP_010758341 uncharacterized protein from Paracoccidioides brasiliensis Pb18
34% identity, 8% coverage

SRBPH_CAEEL / Q9XX00 Sterol regulatory element binding protein sbp-1; SREBP sbp-1 from Caenorhabditis elegans (see 10 papers)
NP_499472 Processed sterol regulatory element binding protein sbp-1 from Caenorhabditis elegans
54% identity, 6% coverage

Q5AVL9 BHLH transcription factor (Eurofung) from Emericella nidulans (strain FGSC A4 / ATCC 38163 / CBS 112.46 / NRRL 194 / M139)
ANIA_07661 hypothetical protein from Aspergillus nidulans FGSC A4
39% identity, 8% coverage

PAAG_03792, XP_002794199 hypothetical protein from Paracoccidioides lutzii Pb01
33% identity, 8% coverage

SRE2_SCHPO / O43019 Putative transcription factor sre2; Sterol regulatory element-binding protein 2 from Schizosaccharomyces pombe (strain 972 / ATCC 24843) (Fission yeast) (see 2 papers)
NP_595229, SPBC354.05c membrane-tethered transcription factor (predicted) (PMID 11790253) from Schizosaccharomyces pombe
45% identity, 8% coverage

Pc20g05880 uncharacterized protein from Penicillium rubens
39% identity, 6% coverage

SRBA_ASPFU / Q4WIN1 Transcription regulator srbA precursor; Sterol regulatory element-binding protein A from Aspergillus fumigatus (strain ATCC MYA-4609 / CBS 101355 / FGSC A1100 / Af293) (Neosartorya fumigata) (see 16 papers)
AFUA_2G01260, Afu2g01260, XP_749262 HLH transcription factor, putative from Aspergillus fumigatus Af293
40% identity, 7% coverage

ATEG_08156 uncharacterized protein from Aspergillus terreus NIH2624
36% identity, 8% coverage

An03g05170 uncharacterized protein from Aspergillus niger
35% identity, 8% coverage

FGSG_02814 hypothetical protein from Fusarium graminearum PH-1
39% identity, 21% coverage

SREBP_SCHPO / Q9UUD1 Sterol regulatory element-binding protein 1 from Schizosaccharomyces pombe (strain 972 / ATCC 24843) (Fission yeast) (see 9 papers)
sre1 / RF|NP_595694.1 sterol regulatory element binding protein, transcription factor Sre1 from Schizosaccharomyces pombe (see 4 papers)
NP_595694 sterol regulatory element binding protein Sre1 from Schizosaccharomyces pombe
NP_595694 transcription factor Sre1 from Schizosaccharomyces pombe
40% identity, 7% coverage

NCU04731 HLH transcription factor from Neurospora crassa OR74A
38% identity, 6% coverage

BCIN_01g05780 hypothetical protein from Botrytis cinerea B05.10
36% identity, 8% coverage

MGG_11534 uncharacterized protein from Pyricularia oryzae 70-15
38% identity, 7% coverage

CPH2_CANAL / Q59RL7 Transcription factor CPH2; Candida pseudohyphal regulator 2 from Candida albicans (strain SC5314 / ATCC MYA-2876) (Yeast) (see 11 papers)
CPH2 Myc-bHLH family transcriptional activator of hyphal growth from Candida albicans (see 4 papers)
XP_712305 Cph2p from Candida albicans SC5314
35% identity, 8% coverage

VDAG_01557 sterol regulatory element binding protein Sre1 from Verticillium dahliae VdLs.17
38% identity, 7% coverage

7f2fB / P33122 The complex of DNA with thE C-terminal domain of tye7 from saccharomyces cerevisiae.
42% identity, 73% coverage

CPAR2_603440 uncharacterized protein from Candida parapsilosis
35% identity, 9% coverage

XP_015148452 microphthalmia-associated transcription factor isoform X1 from Gallus gallus
37% identity, 16% coverage

XP_027323079 microphthalmia-associated transcription factor isoform X4 from Anas platyrhynchos
37% identity, 17% coverage

NP_570998 melanocyte inducing transcription factor a isoform 1 from Danio rerio
Q9PWC2 Melanocyte inducing transcription factor a isoform 1 from Danio rerio
39% identity, 17% coverage

XP_013011968 microphthalmia-associated transcription factor isoform X1 from Cavia porcellus
37% identity, 16% coverage

MITF_HUMAN / O75030 Microphthalmia-associated transcription factor; Class E basic helix-loop-helix protein 32; bHLHe32 from Homo sapiens (Human) (see 20 papers)
37% identity, 16% coverage

NP_001033090 microphthalmia-associated transcription factor from Sus scrofa
37% identity, 20% coverage

AO090011000215, XP_001825886 uncharacterized protein from Aspergillus oryzae RIB40
43% identity, 20% coverage

XP_006196543 microphthalmia-associated transcription factor isoform X1 from Vicugna pacos
37% identity, 16% coverage

HLH30_CAEEL / H2KZZ2 Helix-loop-helix protein 30 from Caenorhabditis elegans (see 9 papers)
41% identity, 13% coverage

NP_937802 microphthalmia-associated transcription factor isoform 1 from Homo sapiens
37% identity, 16% coverage

XP_005169362 transcription factor E3b isoform X1 from Danio rerio
41% identity, 13% coverage

NP_500461 Helix-loop-helix protein 30 from Caenorhabditis elegans
41% identity, 14% coverage

XP_005222683 microphthalmia-associated transcription factor isoform X3 from Bos taurus
XP_006055927 microphthalmia-associated transcription factor isoform X3 from Bubalus bubalis
37% identity, 16% coverage

NP_001269071 transcription factor E3 isoform 2 from Homo sapiens
42% identity, 13% coverage

XP_006527641 transcription factor E3 isoform X1 from Mus musculus
42% identity, 13% coverage

D3ZAW6 Transcription factor binding to IGHM enhancer 3 from Rattus norvegicus
38% identity, 15% coverage

NP_001093747 microphthalmia-associated transcription factor from Xenopus tropicalis
37% identity, 17% coverage

NP_766060 transcription factor E3 isoform a from Mus musculus
42% identity, 10% coverage

TFE3_HUMAN / P19532 Transcription factor E3; Class E basic helix-loop-helix protein 33; bHLHe33 from Homo sapiens (Human) (see 17 papers)
NP_006512 transcription factor E3 isoform 1 from Homo sapiens
42% identity, 10% coverage

TFE3_MOUSE / Q64092 Transcription factor E3; mTFE3 from Mus musculus (Mouse) (see 7 papers)
42% identity, 10% coverage

Q3UKG7 Transcription factor EB from Mus musculus
NP_035679 transcription factor EB isoform a from Mus musculus
36% identity, 14% coverage

An14g02540 uncharacterized protein from Aspergillus niger
34% identity, 26% coverage

NP_001165646 melanocyte inducing transcription factor L homeolog from Xenopus laevis
36% identity, 17% coverage

NP_996888 upstream stimulatory factor 1 isoform 2 from Homo sapiens
40% identity, 25% coverage

NP_032627 microphthalmia-associated transcription factor isoform 2 from Mus musculus
37% identity, 20% coverage

NP_001020878 transcription factor EB from Rattus norvegicus
36% identity, 14% coverage

7d8tA / O66738,O75030 Mitf bhlhlz complex with m-box DNA (see paper)
40% identity, 30% coverage

MITF_RAT / O88368 Microphthalmia-associated transcription factor from Rattus norvegicus (Rat) (see paper)
37% identity, 16% coverage

NP_001161299 transcription factor EB isoform 2 from Homo sapiens
36% identity, 16% coverage

TFEB_HUMAN / P19484 Transcription factor EB; Class E basic helix-loop-helix protein 35; bHLHe35 from Homo sapiens (Human) (see 27 papers)
NP_001258873 transcription factor EB isoform 1 from Homo sapiens
36% identity, 16% coverage

MITF_MOUSE / Q08874 Microphthalmia-associated transcription factor from Mus musculus (Mouse) (see 7 papers)
37% identity, 16% coverage

XP_005172942 transcription factor EB isoform X2 from Danio rerio
36% identity, 14% coverage

FGSG_09308 hypothetical protein from Fusarium graminearum PH-1
32% identity, 15% coverage

TFEB_MOUSE / Q9R210 Transcription factor EB from Mus musculus (Mouse) (see 9 papers)
NP_001155194 transcription factor EB isoform b from Mus musculus
36% identity, 16% coverage

NP_001330723 basic helix-loop-helix (bHLH) DNA-binding superfamily protein from Arabidopsis thaliana
41% identity, 10% coverage

BIM1_ARATH / Q9LEZ3 Transcription factor BIM1; BES1-interacting Myc-like protein 1; Basic helix-loop-helix protein 46; AtbHLH46; bHLH 46; Transcription factor EN 126; bHLH transcription factor bHLH046 from Arabidopsis thaliana (Mouse-ear cress) (see 3 papers)
AT5G08130 BIM1; DNA binding / protein binding / transcription factor from Arabidopsis thaliana
41% identity, 10% coverage

NP_571922 melanocyte inducing transcription factor b from Danio rerio
36% identity, 14% coverage

NCU03077 hypothetical protein from Neurospora crassa OR74A
33% identity, 17% coverage

NP_001033808 mitf, isoform A from Drosophila melanogaster
36% identity, 9% coverage

8ia3E / Q15853 Crystal structure of human usf2 bhlhlz domain in complex with DNA (see paper)
36% identity, 53% coverage

SRBB_ASPFU / Q4W9W8 Transcription factor srbB; Sterol regulatory element-binding protein B from Aspergillus fumigatus (strain ATCC MYA-4609 / CBS 101355 / FGSC A1100 / Af293) (Neosartorya fumigata) (see paper)
AFUA_4G03460, Afu4g03460 HLH DNA binding domain protein, putative from Aspergillus fumigatus Af293
41% identity, 20% coverage

NP_571923 transcription factor E3a from Danio rerio
40% identity, 11% coverage

Q15853 Upstream stimulatory factor 2 from Homo sapiens
NP_003358 upstream stimulatory factor 2 isoform 1 from Homo sapiens
37% identity, 19% coverage

NP_001001162 upstream stimulatory factor 2 from Bos taurus
37% identity, 19% coverage

XP_019112415 transcription factor EB isoform X1 from Larimichthys crocea
31% identity, 13% coverage

USF2_MOUSE / Q64705 Upstream stimulatory factor 2; Major late transcription factor 2; Upstream transcription factor 2 from Mus musculus (Mouse) (see paper)
NP_112401 upstream stimulatory factor 2 from Rattus norvegicus
NP_035810 upstream stimulatory factor 2 isoform 1 from Mus musculus
37% identity, 19% coverage

LOC110114469 LOW QUALITY PROTEIN: transcription factor BIM2 from Dendrobium catenatum
42% identity, 15% coverage

Pc12g14660 uncharacterized protein from Penicillium rubens
42% identity, 22% coverage

XP_975837 microphthalmia-associated transcription factor isoform X1 from Tribolium castaneum
33% identity, 13% coverage

USF1_HUMAN / P22415 Upstream stimulatory factor 1; Class B basic helix-loop-helix protein 11; bHLHb11; Major late transcription factor 1 from Homo sapiens (Human) (see 2 papers)
38% identity, 23% coverage

XP_006250328 upstream stimulatory factor 1 isoform X1 from Rattus norvegicus
40% identity, 21% coverage

NP_033506 upstream stimulatory factor 1 isoform 1 from Mus musculus
NP_001292606 upstream stimulatory factor 1 isoform 1 from Mus musculus
40% identity, 21% coverage

Afu1g17060 HLH DNA binding domain protein, putative from Aspergillus fumigatus Af293
32% identity, 25% coverage

BIM2_ARATH / Q9CAA4 Transcription factor BIM2; BES1-interacting Myc-like protein 2; Basic helix-loop-helix protein 102; AtbHLH102; bHLH 102; Transcription factor EN 125; bHLH transcription factor bHLH102 from Arabidopsis thaliana (Mouse-ear cress) (see 2 papers)
AT1G69010 BIM2 (BES1-interacting Myc-like protein 2); DNA binding / transcription factor from Arabidopsis thaliana
35% identity, 18% coverage

TYE7_CANAL / Q5AL36 Carbohydrate metabolism regulator TYE7 from Candida albicans (strain SC5314 / ATCC MYA-2876) (Yeast) (see 8 papers)
TYE7 transcription factor with bHLH from Candida albicans (see paper)
XP_722152 Tye7p from Candida albicans SC5314
33% identity, 29% coverage

CCM_04014 membrane-tethered transcription factor (predicted) from Cordyceps militaris CM01
32% identity, 12% coverage

Q6MYV5 Possible bhlh transcription factor from Aspergillus fumigatus
32% identity, 17% coverage

XP_012252483 microphthalmia-associated transcription factor isoform X1 from Athalia rosae
32% identity, 10% coverage

An08g04000 uncharacterized protein from Aspergillus niger
33% identity, 18% coverage

XP_011304746 transcription factor EC isoform X2 from Fopius arisanus
33% identity, 12% coverage

New Search

For advice on how to use these tools together, see Interactive tools for functional annotation of bacterial genomes.

Statistics

The PaperBLAST database links 798,070 different protein sequences to 1,261,478 scientific articles. Searches against EuropePMC were last performed on May 12 2025.

How It Works

PaperBLAST builds a database of protein sequences that are linked to scientific articles. These links come from automated text searches against the articles in EuropePMC and from manually-curated information from GeneRIF, UniProtKB/Swiss-Prot, BRENDA, CAZy (as made available by dbCAN), BioLiP, CharProtDB, MetaCyc, EcoCyc, TCDB, REBASE, the Fitness Browser, and a subset of the European Nucleotide Archive with the /experiment tag. Given this database and a protein sequence query, PaperBLAST uses protein-protein BLAST to find similar sequences with E < 0.001.

To build the database, we query EuropePMC with locus tags, with RefSeq protein identifiers, and with UniProt accessions. We obtain the locus tags from RefSeq or from MicrobesOnline. We use queries of the form "locus_tag AND genus_name" to try to ensure that the paper is actually discussing that gene. Because EuropePMC indexes most recent biomedical papers, even if they are not open access, some of the links may be to papers that you cannot read or that our computers cannot read. We query each of these identifiers that appears in the open access part of EuropePMC, as well as every locus tag that appears in the 500 most-referenced genomes, so that a gene may appear in the PaperBLAST results even though none of the papers that mention it are open access. We also incorporate text-mined links from EuropePMC that link open access articles to UniProt or RefSeq identifiers. (This yields some additional links because EuropePMC uses different heuristics for their text mining than we do.)

For every article that mentions a locus tag, a RefSeq protein identifier, or a UniProt accession, we try to select one or two snippets of text that refer to the protein. If we cannot get access to the full text, we try to select a snippet from the abstract, but unfortunately, unique identifiers such as locus tags are rarely provided in abstracts.

PaperBLAST also incorporates manually-curated protein functions:

Except for GeneRIF and ENA, the curated entries include a short curated description of the protein's function. For entries from BioLiP, the protein's function may not be known beyond binding to the ligand. Many of these entries also link to articles in PubMed.

For more information see the PaperBLAST paper (mSystems 2017) or the code. You can download PaperBLAST's database here.

Changes to PaperBLAST since the paper was written:

Many of these changes are described in Interactive tools for functional annotation of bacterial genomes.

Secrets

PaperBLAST cannot provide snippets for many of the papers that are published in non-open-access journals. This limitation applies even if the paper is marked as "free" on the publisher's web site and is available in PubmedCentral or EuropePMC. If a journal that you publish in is marked as "secret," please consider publishing elsewhere.

Omissions from the PaperBLAST Database

Many important articles are missing from PaperBLAST, either because the article's full text is not in EuropePMC (as for many older articles), or because the paper does not mention a protein identifier such as a locus tag, or because of PaperBLAST's heuristics. If you notice an article that characterizes a protein's function but is missing from PaperBLAST, please notify the curators at UniProt or add an entry to GeneRIF. Entries in either of these databases will eventually be incorporated into PaperBLAST. Note that to add an entry to UniProt, you will need to find the UniProt identifier for the protein. If the protein is not already in UniProt, you can ask them to create an entry. To add an entry to GeneRIF, you will need an NCBI Gene identifier, but unfortunately many prokaryotic proteins in RefSeq do not have corresponding Gene identifers.

References

PaperBLAST: Text-mining papers for information about homologs.
M. N. Price and A. P. Arkin (2017). mSystems, 10.1128/mSystems.00039-17.

Europe PMC in 2017.
M. Levchenko et al (2017). Nucleic Acids Research, 10.1093/nar/gkx1005.

Gene indexing: characterization and analysis of NLM's GeneRIFs.
J. A. Mitchell et al (2003). AMIA Annu Symp Proc 2003:460-464.

UniProt: the universal protein knowledgebase.
The UniProt Consortium (2016). Nucleic Acids Research, 10.1093/nar/gkw1099.

BRENDA in 2017: new perspectives and new tools in BRENDA.
S. Placzek et al (2017). Nucleic Acids Research, 10.1093/nar/gkw952.

The EcoCyc database: reflecting new knowledge about Escherichia coli K-12.
I. M. Keeseler et al (2016). Nucleic Acids Research, 10.1093/nar/gkw1003.

The MetaCyc database of metabolic pathways and enzymes.
R. Caspi et al (2018). Nucleic Acids Research, 10.1093/nar/gkx935.

CharProtDB: a database of experimentally characterized protein annotations.
R. Madupu et al (2012). Nucleic Acids Research, 10.1093/nar/gkr1133.

The carbohydrate-active enzymes database (CAZy) in 2013.
V. Lombard et al (2014). Nucleic Acids Research, 10.1093/nar/gkt1178.

The Transporter Classification Database (TCDB): recent advances
M. H. Saier, Jr. et al (2016). Nucleic Acids Research, 10.1093/nar/gkv1103.

REBASE - a database for DNA restriction and modification: enzymes, genes and genomes.
R. J. Roberts et al (2015). Nucleic Acids Research, 10.1093/nar/gku1046.

Deep annotation of protein function across diverse bacteria from mutant phenotypes.
M. N. Price et al (2016). bioRxiv, 10.1101/072470.

by Morgan Price, Arkin group
Lawrence Berkeley National Laboratory