PaperBLAST

PaperBLAST – Find papers about a protein or its homologs

PaperBLAST

PaperBLAST Hits for 74 a.a. (MNRKQRSIPL...)

Other sequence analysis tools:

Find functional residues: SitesBLAST

Search for conserved domains

Find the best match in UniProt

Compare to protein structures

Predict transmenbrane helices: Phobius

Predict protein localization: PSORTb

Find homologs in fast.genomics

Fitness BLAST: loading...

Show query sequence

Found 32 similar proteins in the literature:

GCN4_YEAST / P03069 General control transcription factor GCN4; Amino acid biosynthesis regulatory protein; General control protein GCN4 from Saccharomyces cerevisiae (strain ATCC 204508 / S288c) (Baker's yeast) (see 25 papers)
NP_010907 amino acid starvation-responsive transcription factor GCN4 from Saccharomyces cerevisiae S288C
NP_010907, YEL009C Gcn4p from Saccharomyces cerevisiae
99% identity, 26% coverage

function: Master transcriptional regulator that mediates the response to amino acid starvation (PubMed:11390663, PubMed:29628310). Binds variations of the DNA sequence 5'-ATGA[CG]TCAT-3' in canonical nucleosome-depleted 5'-positioned promoters, and also within coding sequences and 3' non-coding regions (PubMed:11390663, PubMed:1473154, PubMed:1939099, PubMed:2204805, PubMed:2277632, PubMed:29628310, PubMed:3530496, PubMed:3532321, PubMed:3678204, PubMed:7664107). During nutrient starvation (low or poor amino acid, carbon or purine sources), it activates genes required for amino acid biosynthesis and transport, autophagy, cofactor biosynthesis and transport, mitochondrial transport, and additional downstream transcription factors (PubMed:10733573, PubMed:11390663, PubMed:1939099, PubMed:29628310, PubMed:7862116, PubMed:8336737). Activates transcription by recruiting multiple coactivators, including the mediator complex, the SAGA complex, and the SWI/SNF complex, to enable assembly of the pre- initiation complex at core promoters (PubMed:10549298, PubMed:19940160, PubMed:9488488).
subunit: Homodimer (PubMed:1473154, PubMed:3678204). Each subunit binds overlapping and non-identical half-sites that flank the central CG base-pair in the pseudo-palindromic motif 5'-ATGA[CG]TCAT-3' (PubMed:1473154, PubMed:2204805, PubMed:3678204, PubMed:7664107). Interacts with the mediator tail; the interaction with GAL11/MED15 is direct (PubMed:10549298, PubMed:19940160, PubMed:9488488). Interacts with the SAGA histone acetyltransferase complex (PubMed:10549298, PubMed:19940160, PubMed:9488488). Interacts with the SWI/SNF chromatin remodeling complex (PubMed:10549298, PubMed:19940160).
disruption phenotype: Abolishes recruitment of the mediator complex to the upstream activating sequence (UAS) of amino-acid starvation responsive genes (PubMed:19940160). Decreases RNA level of genes involved in amino acid biosynthesis and cofactor biosynthesis during amino acid starvation or methyl methanesulfonate stress (PubMed:11390663, PubMed:29628310, PubMed:8336737). Growth dependent on amino acid supplementation (PubMed:10733573). Sensitive to amino acid starvation (PubMed:10549298). Sensitive to purine starvation (PubMed:8336737). Decreases cellular glycogen levels during glucose starvation (PubMed:10733573).
Mechanism of actin filament branch formation by Arp2/3 complex revealed by a high-resolution cryo-EM structureof the branch junction
Chou, Proceedings of the National Academy of Sciences of the United States of America 2022
- “...Arp2/3 complex ( 24 ); recombinant Saccharomyces cerevisiae GCN4 leucine zipper (residues 249281) (UniProt ID: P03069) fused to bovine N-WASP VCA motif (residues 401505) (UniProt ID: Q95107); and recombinant mouse capping protein CapZ 1/2 heterodimer ( 42 ). Assembly of Specimens with Short Branches. The Ca...”
Quantifying Coexistence Concentrations in Multi-Component Phase-Separating Systems Using Analytical HPLC.
Bremer, Biomolecules 2022
- “...The variant of Gcn4 spans the central activation domain (residues 101141) from S. cerevisiae (UniProt: P03069) connected by a short (GS) 4 -linker to the DNA-binding domain of Gcn4 (residues 222281). 3.2. Phase Separation Assay Phase separation of A1-LCD and FUS-PLD, respectively, was induced by adding...”
Competitive inhibition of the classical complement pathway using exogenous single-chain C1q recognition proteins.
Vadászi, The Journal of biological chemistry 2022
- “...linkers. Dimers and trimers were created using the GCN4 leucine-zipper coding fragment (Leu-253Arg281; UniProt ID: P03069 ) and the -helical neck region of SpD (Asp222Phe253; UniProt ID: P50404 ), respectively ( Fig.1 , A and B ) ( 40 , 41 ). DNA sequences coding dimerizing...”
Moonlighting Proteins in the Fuzzy Logic of Cellular Metabolism
Liu, Molecules (Basel, Switzerland) 2020
- “..., 84 ] General control protein GCN4 Saccharomyces cerevisiae Transcription factor Nucleus Ribonuclease Cytoplasm 32.38% P03069 [ 85 , 86 , 87 ] Bifunctional ligase/repressor BirA Escherichia coli Biotin synthetase, biotin[acetylCoA-carboxylase] ligase Cytoplasm Biotin operon repressor, activity depends on cellular concentration of biotin Bound to DNA...”
Interrogation of kinase genetic interactions provides a global view of PAK1-mediated signal transduction pathways
Kim, The Journal of biological chemistry 2020 (secret)
How IGF-II Binds to the Human Type 1 Insulin-like Growth Factor Receptor
Xu, Structure (London, England : 1993) 2020
- “...Genome Reference Consortium UniProt: P08069 Gene sequence of Saccharomyces cerevisiae GCN4 Saccharomyces Genome Database UniProt: P03069 CryoEM structure of IGF-I-bound holo IGF-1R ( Li etal., 2019 ) PDB: 6PYH Crystal structure of GCN4 leucine zipper ( O'Shea etal., 1991 ) PDB: 2ZTA Crystal structure of apo...”
- “...residues 1-905 of IGF-1R (UniProt entry P08069-1), a 33-residue GCN4 zipper sequence RMKQLEDKVEELLSKNYHLENEVARLKKLVGER (UniProt entry P03069), a three-serine spacer and the c-myc tag sequence EQKLISEEDLN) was cloned into the Hind III / Xba1 sites (Genscript; Piscataway, New Jersey) of the pEE14 mammalian expression vector (Lonza; Basel,...”
Artificial intelligence-based multi-objective optimization protocol for protein structure refinement.
Wang, Bioinformatics (Oxford, England) 2020
MIP diversity from Trichoderma: Structural considerations and transcriptional modulation during mycoparasitic association with Fusarium solani olive trees
Ben, PloS one 2018
- “...1 (0,9%) 1 (0,9%) 1 (0,8%) 0 GAL4 P04386 2 (1,9%) 0 0 0 GCN4 P03069 20 (14,4%) 12 (10,5%) 16 (13%) 9 (8,1%) GCR1 P07261 9 (8,7%) 12 (10,5%) 18 (14,6%) 21 (18,9%) LEU3 P08638 1 (0,9%) 1 (0,9%) 0 0 MCM1 P11746 1 (0,9%)...”
More
Differential stability of Gcn4p controls its cell-specific activity in differentiated yeast colonies.
Váchová, mBio 2024
- GeneRIF: Differential stability of Gcn4p controls its cell-specific activity in differentiated yeast colonies.
Stochastic scanning events on the GCN4 mRNA 5' untranslated region generate cell-to-cell heterogeneity in the yeast nutritional stress response.
Meng, Nucleic acids research 2023
- GeneRIF: Stochastic scanning events on the GCN4 mRNA 5' untranslated region generate cell-to-cell heterogeneity in the yeast nutritional stress response.
Gcn4 impacts metabolic fluxes to promote yeast chronological lifespan.
Gulias, PloS one 2023
- GeneRIF: Gcn4 impacts metabolic fluxes to promote yeast chronological lifespan.
Multiomics of GCN4-Dependent Replicative Lifespan Extension Models Reveals Gcn4 as a Regulator of Protein Turnover in Yeast.
Mariner, International journal of molecular sciences 2023
- GeneRIF: Multiomics of GCN4-Dependent Replicative Lifespan Extension Models Reveals Gcn4 as a Regulator of Protein Turnover in Yeast.
Origin of translational control by eIF2α phosphorylation: insights from genome-wide translational profiling studies in fission yeast.
Asano, Current genetics 2021
- GeneRIF: Origin of translational control by eIF2alpha phosphorylation: insights from genome-wide translational profiling studies in fission yeast.
Mediator subunit Med15 dictates the conserved "fuzzy" binding mechanism of yeast transcription activators Gal4 and Gcn4.
Tuttle, Nature communications 2021
- GeneRIF: Mediator subunit Med15 dictates the conserved ""fuzzy"" binding mechanism of yeast transcription activators Gal4 and Gcn4.
The molecular aetiology of tRNA synthetase depletion: induction of a GCN4 amino acid starvation response despite homeostatic maintenance of charged tRNA levels.
McFarland, Nucleic acids research 2020
- GeneRIF: ln4p depletion reduces this sequestration capacity, allowing uncharged tRNAGln to interact with Gcn2 kinase. The study sheds new light on mutant aaRS disease aetiologies, and explains how aaRS sequestration of uncharged tRNAs can prevent GCN4 activation under non-starvation conditions.
Genome-scale reconstruction of Gcn4/ATF4 networks driving a growth program.
Srinivasan, PLoS genetics 2020
- GeneRIF: Genome-scale reconstruction of Gcn4/ATF4 networks driving a growth program.
More
The bZIP transcription factor BIP1 of the rice blast fungus is essential for infection and regulates a specific set of appressorium genes
Lambou, PLoS pathogens 2024
- “...A . nidulans (AnBIP1, ANIA_00825) and P . nodorum (SNOG_11592). S . cerevisiae Gcn4 (ScGCN4, YEL009C) and S . cerevisiae Yap1 (ScYAP1, YML007W) TFs were added for comparison. bZIP domains were extracted from protein sequences and aligned using Clustal omega. 100% identical amino acids are highlighted...”
Inferring Gene Regulatory Networks from RNA-seq Data Using Kernel Classification
Al-Aamri, Biology 2023
- “...example for the transcription factor (TF) gene YNL068C. Another example network for the TF gene YEL009C is shown in Figure 8 b. These networks represent part of a bigger transcription network showing the transcription factor genes and their target genes. Some of the network connections with...”
- “...gene and its target genes. ( a ) TF: YNL068C ; ( b ) TF: YEL009C . The shared connections that are predicted by RNA-seq and microarray, and validated by Yeastract are represented in the graph by the solid lines. The parallel lines denote new potential...”
A simplified and easy-to-use HIP HOP assay provides insights into chalcone antifungal mechanisms of action
Prescott, FEBS letters 2022
- “...ion channels GCN3 HOP Pramoxin, dyclonine YKR026c Y35097 Blocks ion channels GCN4 HOP Pramoxin, dyclonine YEL009c Y30249 TOR signalling KOG1 HIP Caffeine, rapamycin YHR186c Y22880 TOR signalling TOR1 HOP Caffeine, rapamycin YJR066w Y36864 ER stress IRE1 HOP Tunicamycin, E1210/gepinacin YHR079c Y31907 ER stress HAC1 HOP Tunicamycin,...”
Protein functional module identification method combining topological features and gene expression data
Zhao, BMC genomics 2021
- “...YMR061W, YNL317W, YPR107C 5.92e-18 YBR081C, YBR198C, YDR145W, YDR167W, transcription factor RNA polymerase YDR176W, YDR216W, YDR448W, YEL009C, TFIID complex II transcriptional YER148W, YGL112C, YGR274C, YML015C, preinitiation YML098W, YMR236W complex assembly 1.86e-18 YCR035C, YDL111C, YDR280W, YGR095C, exosome polyadenylation YGR158C, YGR195W, YHR069C, YNL189W, (RNase complex) -dependent YNL232W, YOL021C, YOR001W,...”
Widespread Cumulative Influence of Small Effect Size Mutations on Yeast Quantitative Traits
Hua, Cell systems 2018
- “...by inspecting the YFP distribution of the raw data. These genes are: GAL4 (YPL248C), GCN4 (YEL009C), GAL80 (YML051W), GAL1 (YBR020W), SNF3 (YDL194W), STI1 (YOR027W), REG1 (YDR028C), GAL3 (YDR009W), SNF2 (YOR290C), HSC82 (YMR186W) . This is important when calculating the explained heritability for top N genes (see...”
Nuclear Magnetic Resonance Structures of GCN4p Are Largely Conserved When Ion Pairs Are Disrupted at Acidic pH but Show a Relaxation of the Coiled Coil Superhelix
Kaplan, Biochemistry 2017
- “...used to prepare a synthetic gene corresponding to residues Met250Glu280 of GCN4 (NCBI accession number NP_010907), optimized for codon usage in Escherichia coli . 25 The cDNA fragment was ligated into the Bam HI/ Eco RI sites of plasmid pHisTrx2, a derivative of pET-32a (Novagen), that...”
Identification of Genes in Saccharomyces cerevisiae that Are Haploinsufficient for Overcoming Amino Acid Starvation
Bae, G3 (Bethesda, Md.) 2017
- “...been omitted. Table 1 SMM-sensitive heterozygous deletion mutants ORF ID Gene SGD Description Phenotypes Transcription YEL009C GCN4 bZIP transcriptional activator of amino acid biosynthetic genes; activator responds to amino acid starvation M + , E, C, v YPR104C FHL1 Regulator of ribosomal protein (RP) transcription; has...”
More

1ysaC / P03069 The gcn4 basic region leucine zipper binds DNA as a dimer of uninterrupted alpha helices: crystal structure of the protein-DNA complex (see paper)
98% identity, 76% coverage

Ligand: dna (1ysaC)

CTRG_02060 hypothetical protein from Candida tropicalis MYA-3404
63% identity, 22% coverage

Non-albicans Candida Species: Immune Response, Evasion Mechanisms, and New Plant-Derived Alternative Therapies
Gómez-Gaviria, Journal of fungi (Basel, Switzerland) 2022
- “...], it was found that C. albicans Gcn4 is closely related to C. tropicalis ORF CTRG_02060 , C. parapsilosis , and N. glabrata GCN4 , suggesting that most likely these NAC species also carry out the evasion mechanism by adapting to amino acid starvation. Biotin restriction...”
Transcriptional Control of Drug Resistance, Virulence and Immune System Evasion in Pathogenic Fungi: A Cross-Species Comparison
Pais, Frontiers in cellular and infection microbiology 2016
- “...proteins found in other CTG clade species, including a C. tropicalis protein encoded by ORF CTRG_02060 and a C. parapsilosis Gcn4 protein. Additionally, phylome analysis also revealed the C. glabrata Gcn4 regulator as being closely related. Moreover, A. fumigatus harbors a regulator with a related function:...”
- “...C. albicans ) N.A. N.A. Gcn4 ( C. parapsilosis ) CpcA ( A. fumigatus ) CTRG_02060 ; Gcn4 ( C. glabrata ) Gliotoxin production MtfA ( A. fumigatus ) N.A. N.A. N.A. N.A. Melanin production Mbs1 ( C. neoformans ) N.A. Mbp1 ( C. albicans )...”

CPAR2_806570 uncharacterized protein from Candida parapsilosis
64% identity, 20% coverage

Alternative sulphur metabolism in the fungal pathogen Candida parapsilosis
Lombardi, Nature communications 2024
- “...to osmotic stress response and oxidative stress in disruptions of the MAPK kinase PBS2 ( CPAR2_806570 ) 40 (Fig. 1B ). Deletion of the transcriptional regulators SFU1 ( CPAR2_700810 ) and TUP1 ( CPAR2_109520 ) led to increased sensitivity to copper, as previously observed in C....”

GCN4 coordinator of morphogenesis and amino acid starvation response from Candida albicans (see 6 papers)
58% identity, 22% coverage

CharProtDB CGD description: Transcriptional activator of general amino acid control response; required for Efg1p-dependent pseudohyphal filament induction by amino acid starvation but not by serum; upregulated in the presence of human whole blood or PMN cells

1llmC / P03069,P08046 Crystal structure of a zif23-gcn4 chimera bound to DNA (see paper)
62% identity, 68% coverage

Ligands: dna; zinc ion (1llmC)

CPCA_ASPNG / Q00096 Cross-pathway control protein A from Aspergillus niger (see paper)
62% identity, 22% coverage

function: Master transcriptional regulator that mediates the response to amino acid starvation (By similarity). Binds variations of the DNA sequence 5'-ATGA[CG]TCAT-3' (By similarity).
subunit: Homodimer.

An01g07900 leucine zipper cpcA-Aspergillus niger [putative frameshift] from Aspergillus niger
62% identity, 21% coverage

Comprehensive phenotypic analysis of multiple gene deletions of α-glucan synthase and Crh-transglycosylase gene families in Aspergillus niger highlighting the versatility of the fungal cell wall
Ost, Cell surface (Amsterdam, Netherlands) 2025
- “...2019 . Table 2 1 cpcA : Basic leucine zipper transcription factor of cross-pathway control (An01g07900); 2 rlmA : Conserved MADS box transcription factor of CWI pathway (An02g12210); 3 galX : Putative zinc-binding transcription factor of galactose catabolism via oxidoreductive pathway (An16g01640); 4 rhaR : Putative...”
Trancriptional landscape of Aspergillus niger at breaking of conidial dormancy revealed by RNA-sequencing
Novodvorska, BMC genomics 2013
- “...An17g00860 translation initiation factor ( A. fumigatus cpcC ) no change 2 45.91 60.85 1.32 An01g07900 cpcA , transcription factor 3.55 18.59 123.24 6.62 An01g08850 transcription factor ( A. nidulans cpcB ) 3.86 23.52 530.23 22.54 An11g06180 transcription factor ( A. nidulans prnA ) 2.59 20.6...”
- “...at breaking of dormancy. The signal from CpcC is transmitted to the transcription factor CpcA (An01g07900) (homologue of S. cerevisiae Gcn4p), a global regulator in A. niger induced by amino acid starvation. Our data showed that transcript levels from cpcA increased during the early stage of...”
Genome-wide expression analysis upon constitutive activation of the HacA bZIP transcription factor in Aspergillus niger reveals a coordinated cellular response to counteract ER stress
Carvalho, BMC genomics 2012
- “...]. In our transcriptomic profiles, a gcn2 homologue (An17g00860) is not differentially expressed, whereas cpcA (An01g07900) shows2 fold higher expression in comparison with the wild-type strain. According to our results the activation of cpcA is likely to occur in a Gcn2p-independent way and it is tempting...”
Transcriptomic comparison of Aspergillus niger growing on two different sugars reveals coordinated regulation of the secretory pathway
Jørgensen, BMC genomics 2009
- “...(UPR and ER associated degradation): An08g01480 TRL1 (YJL087c) tRNA ligase 0.7 1.110 -4 3.710 -3 An01g07900 cpcA GCN4 (YEL009c) bZIP transcription factor 0.8 2.610 -3 3.310 -2 An11g11250 * protein kinase inhibitor p58 ( Rattus norvegicus ) 1.6 4.710 -5 2.110 -3 An01g08980 ORM1 (YGR038w) conserved...”
- “...apparently regulates transcription of many UPR induced genes [ 34 ]. In our study, CpcA (An01g07900), a homolog of Gcn4p, was down-regulated on maltose, much in contradiction with its putative function as positive regulator of transcription of several UPR target genes in the secretory pathway. The...”

3i5cB / P03069,Q9HXT9 Crystal structure of a fusion protein containing the leucine zipper of gcn4 and the ggdef domain of wspr from pseudomonas aeruginosa (see paper)
97% identity, 15% coverage

Ligand: 9,9'-[(2r,3r,3as,5s,7ar,9r,10r,10as,12s,14ar)-3,5,10,12-tetrahydroxy-5,12-dioxidooctahydro-2h,7h-difuro[3,2-d:3',2'-j][1,3,7,9,2,8]tetraoxadiphosphacyclododecine-2,9-diyl]bis(2-amino-1,9-dihydro-6h-purin-6-one) (3i5cB)

ATEG_03131 cross-pathway control protein A from Aspergillus terreus NIH2624
60% identity, 21% coverage

Phytotoxin production in Aspergillus terreus is regulated by independent environmental signals
Gressler, eLife 2015
- “...; Hynes, 1975 ; Davis et al., 2005 ), the cross pathway control regulator CpcA (ATEG_03131) ( Hoffmann et al., 2001 ; Krappmann et al., 2004 ), the stress response bZIP transcription factor AtfA (ATEG_04664) ( Balazs et al., 2010 ; Lara-Rojas et al., 2011 ),...”
- “...for downstream experiments are denoted by X. ( D ) Deletion of the cpcA locus (ATEG_03131) in SBUG844 akuB. ( E ) Deletion of the rhbA locus (ATEG_09480) in SBUG844 akuB. ( F ) Deletion of the areA locus (ATEG_07264) in SBUG844 akuB and complementation with...”

3crpB / P03069 A heterospecific leucine zipper tetramer (see paper)
82% identity, 45% coverage

Ligand: peptide (3crpB)

AO090009000459 uncharacterized protein from Aspergillus oryzae RIB40
57% identity, 21% coverage

Survey of protein-DNA interactions in Aspergillus oryzae on a genomic scale
Wang, Nucleic acids research 2015
- “...S3). This core sequence was identified in the binding site of the transcriptional activator CpcA (AO090009000459) in A. oryzae and AP-1 binding site of GCN4 in yeast. Genes targeted by UPR_motif 11 were enriched in the molecular function of ribosomal subunit export from the nucleus for...”
Survey of the transcriptome of Aspergillus oryzae via massively parallel mRNA sequencing
Wang, Nucleic acids research 2010
- “...), which was also confirmed in our study under ER stress conditions (GCN4 homologs gene AO090009000459 in A. oryzae , Figure 7 A and Supplementary Table S11 ) and in Trichoderma reesei under ER stress conditions (GCN4 homologs gene CPCI) ( 43 ). Repression under secretion...”

cpcA CPCA from Emericella nidulans (see 2 papers)
59% identity, 18% coverage

CharProtDB Description: Putative transcription factor of the c-Jun-like transcriptional activator family, involved in cross-pathway control of amino acid biosynthesis in response to amino acid starvation; functional homolog of Saccharomyces cerevisiae Gcn4p; Source:AspGD

AFUA_4G12470, Afu4g12470, XP_751584 bZIP transcription factor CpcA from Aspergillus fumigatus Af293
58% identity, 21% coverage

The sulfur-related metabolic status of <i>Aspergillus fumigatus</i> during infection reveals cytosolic serine hydroxymethyltransferase as a promising antifungal target
Alharthi, Virulence 2025
- “...transcription factor (HapX) Proven [ 29 ] AFUA_2G07680 L-ornithine N5-oxygenase (SidA) Proven [ 30 ] AFUA_4G12470 bZIP transcription factor (CpcA) Proven [ 31 ] AFUA_6G08790 C6 transcription factor (PrnA) Unknown AFUA_5G06060 SCF-complex subunit (SkpA) Suspected [ 32 , 33 ] AFUA_8G04340 cystathionine gamma-lyase (MecB) Proven [...”
Mycoparasitic transcription factor 1 (BcMTF1) participates in the Botrytis cinerea response against Trichoderma atroviride
Olivares-Yañez, iScience 2025
- “...an overall aa identity of 49.1% with the bZIP TF CpcA of A.fumigatus (gene ID AFUA_4G12470) and 50.0% with the GCN4 protein of S.cerevisiae. In Aspergillus niger , CpcA has been primarily associated with cross-pathway (control) regulation, hence the name. This TF responds to aa starvation...”
Analysis of putative quadruplex-forming sequences in fungal genomes: novel antifungal targets?
Warner, Microbial genomics 2021
- “...TTTATGA GGG C GGG Afu2g12630 aspf13 Allergen Asp f 13 GG AT GG T GGGGG Afu4g12470 cpcA Transcriptional activator of the cross-pathway control system of amino acid biosynthesis GG T GG C GGGGG Afu6g09660 gliP Nonribosomal peptide synthetase gliP GGG AT GG CC GGGG AT GG...”
Nitrogen, Iron and Zinc Acquisition: Key Nutrients to Aspergillus fumigatus Virulence
Perez-Cuesta, Journal of fungi (Basel, Switzerland) 2021
- “...cpcC Decreased resistance to starvation Normal [ 29 ] Transcriptional activator of the CPC system Afu4g12470 cpcA Decreased competitive fitness Decreased [ 30 ] Imidazol-glycerol-phosphate dehydratase Afu6g04700 hisB Histidine auxotrophy Decreased [ 31 ] Homoaconitase Afu5g08890 lysF Lysine auxotrophy Decreased [ 32 ] Leucine transcriptional activator...”
Genome-Wide Association Analysis for Triazole Resistance in Aspergillus fumigatus
Fan, Pathogens (Basel, Switzerland) 2021
- “...transcription factor, putative 1.732.22 NA [ 39 ] AFUA_8G07360 1.901.92 NA [ 39 ] cpcA (AFUA_4G12470) BZIP transcription factor NA >1.50~5.50 [ 38 ] AFUA_1G16460 BZIP transcription factor (LziP), putative 1.752.12 NA [ 39 ] AFUA_7G03910 C2H2 zinc finger protein 2.502.86 NA [ 39 ] ace1...”
The sino-nasal warzone: transcriptomic and genomic studies on sino-nasal aspergillosis in dogs
Valdes, NPJ biofilms and microbiomes 2020
- “...Hypoxia SrbB (Afu4g03460) c.397G>A p.Ala133Thr 0 24 changes in response to hypoxia Amino-acid metabolism CpcA (Afu4g12470) c.439T>C p.Ser147Pro 0 4 Amino-acid homeostasis changed Light sensing LreB (Afu4g12690) c.58C>T p.Gln20* 0 11 Light-induced morphogenesis changed c.1123C>T p.Gln375* 0 6 Hyphal morphology Gin4 (Afu6g02300) c.3808T>C p.*1270Gln 0 6...”
Aspergillus fumigatus adhesion factors in dormant conidia revealed through comparative phenotypic and transcriptomic analyses
Takahashi-Nakaguchi, Cellular microbiology 2018
- “...hypothetical protein 5.0 AFUA_4G06370 conserved hypothetical protein 18.5 AFUA_4G09600 GPI anchored protein, putative * 62.4 AFUA_4G12470 bZIP transcription factor CpcA 2.6 AFUA_4G14530 glutathione Stransferase Ure2like, putative 9.7 AFUA_5G00590 hypothetical protein 1.1 AFUA_5G02320 conserved hypothetical protein 3.7 AFUA_6G03210 conidiationspecific protein (Con10), putative * 21.3 AFUA_6G03350 GNAT family...”
- “...in conidial formation or cell wall localization in conidia (e.g., AFUA_4G02805, an Asp haemolysinlike protein; AFUA_4G12470, the bZIP transcription factor CpcA; AFUA_3G07160, a putative class V chitinase; AFUA_8G07060, a putative hydrophobin; AFUA_2G17580, the scytalone dehydratase Arp1, which is involved in conidial pigment biosynthesis; AFUA_6G03210, homologous to...”
Aspergillus fumigatus virulence through the lens of transcription factors
Bultman, Medical mycology 2017
- “...CrzA Afu3g11250 Afu2g05830 Afu2g12330 Afu6g01970 Afu4g12470 Afu1g06900 Conidial formation, cell wall architecture Gluconeogenesis, iron acquisition...”
More

5apwB / P03069 Sequence matkdd inserted between gcn4 adaptors - structure t6 (see paper)
93% identity, 39% coverage

Ligand: calcium ion (5apwB)

NCU04050, XP_957665 cross-pathway control protein 1 from Neurospora crassa OR74A
53% identity, 21% coverage

Transcriptomic and genetic analysis reveals a Zn2Cys6 transcription factor specifically required for conidiation in submerged cultures of Thermothelomyces thermophilus
Drescher, mBio 2025
- “...these 28 genes have a homolog in N. crassa . One gene, a homolog of NCU04050 (MYCTH_2315566), which encodes the Cross Pathway Control ( cpc-1 ) transcriptional regulator in N. crassa, showed elevated expression levels in the res1 mutant relative to the WT strain, particularly at...”
The nutrient-sensing GCN2 signaling pathway is essential for circadian clock function by regulating histone acetylation under amino acid starvation
Liu, eLife 2023
- “...Strain, strain background ( Neurospora crassa ) ras-1 bd ;cpc-1 KO Fungal Genetics Stock Center NCU04050 Strain, strain background ( Neurospora crassa ) ras-1 bd ;gcn-5 KO Fungal Genetics Stock Center NCU10847 Strain, strain background ( Neurospora crassa ) ras-1 bd ;ada-2 KO Fungal Genetics Stock...”
- “...the ras-1 bd background ( Belden et al., 2007 ). cpc-3 KO (NCU01187), cpc-1 KO (NCU04050), gcn-5 KO (NCU10847), ada-2 KO (NCU04459), and hda-1 KO (NCU01525) strains were obtained from the Fungal Genetic Stock Center (FGSC) and were crossed with a ras-1 bd strain to create...”
DNA affinity purification sequencing and transcriptional profiling reveal new aspects of nitrogen regulation in a filamentous fungus
Huberman, Proceedings of the National Academy of Sciences of the United States of America 2021 (secret)
Cross-pathway control gene CPC1/GCN4 coordinates with histone acetyltransferase GCN5 to regulate catalase-3 expression under oxidative stress in Neurospora crassa.
Qi, Free radical biology & medicine 2018 (PubMed)
- GeneRIF: These results disclosed a distinctive function of CPC1/GCN4 in the regulatory pathway of cat-3 transcription, which is mediated by GCN5-dependent acetylation.
Functional Profiling of Transcription Factor Genes in Neurospora crassa
Carrillo, G3 (Bethesda, Md.) 2017
- “...pp-1 , asl-1 / NCU01345 , ts (formerly asl-2 ; NCU01459 ), and cpc-1 / NCU04050 . We analyzed the 242 viable mutants for an array of growth and developmental phenotypes, beginning with the linear growth rate on minimal medium ( Colot et al. 2006 ;...”
A fungal transcription factor essential for starch degradation affects integration of carbon and nitrogen metabolism
Xiong, PLoS genetics 2017
- “...annotated to be hypothetical, and the other three were vib-1 (NCU03725), nit-2 (NCU09068), and cpc-1 (NCU04050) ( Fig 3E ). VIB-1 ( v egetative i ncompatibility b lock-1) is required for extracellular protease secretion in response to both carbon and nitrogen starvation [ 27 ] and...”
Genome-wide analysis of the endoplasmic reticulum stress response during lignocellulase production in Neurospora crassa
Fan, Biotechnology for biofuels 2015
- “...conditions of cellulase synthesis, including the well-characterized UPR regulator HAC-1 (NCU01856) as well as CPC-1 (NCU04050). We found N. crassa HAC-1 to act as an important factor for lignocellulase secretion while not mediating the RESS feed-back loop (profiling data mentioned above are presented in Additional file...”
The stringency of start codon selection in the filamentous fungus Neurospora crassa
Wei, The Journal of biological chemistry 2013
- “...protein); NCU01813 (high affinity glucose transporter); NCU04050 (cross-pathway control protein 1, cpc-1); NCU06882 (RING-5); NCU09104 (hypothetical protein)....”
- “...Supplemental Fig. S2. In all but one case, NCU04050, no out-of-frame AUG codon is located between the conserved in-frame near-cognate and conventional...”
More

2bniC / P03069 Pli mutant e20c l16g y17h, antiparallel (see paper)
70% identity, 45% coverage

Ligand: peptide (2bniC)

P87090 Cross-pathway control protein 1 from Cryphonectria parasitica
43% identity, 29% coverage

Transcriptome Analysis of Plenodomus tracheiphilus Infecting Rough Lemon (Citrus jambhiri Lush.) Indicates a Multifaceted Strategy during Host Pathogenesis.
Sicilia, Biology 2022
- “...( Leptosphaeria biglobosa ), SequenceID: KAH9877693.1 78% 0.0 2.660 4528.0 Cross-pathway control protein 1 SwissProt: P87090 ( Alternaria alternata ), Sequence ID: KAH8629237.1 67% 5 10 148 2.079 18021.0 Catechol 1,2-dioxygenase SwissProt: P86029 ( Alternaria panax ), Sequence ID: KAG9186190.1 94% 0.0 1.602 12526.0 Cell wall...”

UV8b_06218 uncharacterized protein from Ustilaginoidea virens
45% identity, 13% coverage

UvHOS3-mediated histone deacetylation is essential for virulence and negatively regulates ustilaginoidin biosynthesis in Ustilaginoidea virens
Wang, Molecular plant pathology 2024
- “...protein), UV8b_03424 (cytochrome P450), UV8b_08192 (rapamycin binding protein FKBP12), five transcription factors UV8b_00896, UV8b_01263, UV8b_03320, UV8b_06218 and UV8b_06275 and four downregulated genes encoding transcription factors UV8b_03203 (sporulation resulting in formation of a cellular spore), UV8b_04413 (carbon response regulator CreA), UV8b_04482 (nitrogen response regulator AreA) and UV8b_04588...”

XP_001906068 uncharacterized protein from Podospora anserina S mat+
48% identity, 12% coverage

Translation Initiation from Conserved Non-AUG Codons Provides Additional Layers of Regulation and Coding Capacity
Ivanov, mBio 2017
- “.... We note that some automated annotations of CPC1 homologs include this N-terminal extension (e.g., XP_001906068 , EGR46729 , and EKJ70155 ), but annotations do not resolve where initiation occurs. The presence of this feature in both Sordariomycetes and Eurotiomycetes suggests that it was present in...”

1favA / P03069,P03377 The structure of an HIV-1 specific cell entry inhibitor in complex with the HIV-1 gp41 trimeric core (see paper)
71% identity, 36% coverage

Ligand: peptide (1favA)

FFUJ_04122 probable cross-pathway control protein from Fusarium fujikuroi IMI 58289
48% identity, 18% coverage

Genome-Wide Identification and Functional Analysis of the bZIP Transcription Factor Family in Rice Bakanae Disease Pathogen, Fusarium fujikuroi
Zhao, International journal of molecular sciences 2022
- “...35397921 chrom02 2838516..2839721 383 1152 FfbZIP10 XP_023426709.1 FFUJ_04132 35397613 chrom02 3893261..3894237 277 834 FfbZIP11 XP_023426718.1 FFUJ_04122 35397603 chrom02 3926173..3927310 333 1002 FfbZIP12 XP_023427258.1 FFUJ_02097 35395580 chrom03 34849..35802 317 954 FfbZIP13 XP_023427633.1 FFUJ_02504 35395986 chrom03 1197966..1198797 259 780 FfbZIP14 XP_023427873.1 FFUJ_02765 35396247 chrom03 2066960..2068584 460 1383 FfbZIP15...”

FGSG_09286 hypothetical protein from Fusarium graminearum PH-1
52% identity, 23% coverage

Control of Fusarium Head Blight of Wheat with Bacillus velezensis E2 and Potential Mechanisms of Action
Ma, Journal of fungi (Basel, Switzerland) 2024
- “...CGCCAAAAGTGTTCTCGCC FGSG_10858 ALG-11 NC_026476.1 F: CCGACCCGAGAAGAACCATC R: CTCAGCCAGTCCAGAACCTC FGSG_06885 TCB1 NC_026477.1 F: TCAAGGGCGAGGATGGAC R: GGCAGGTCGGAACAGAAGTC FGSG_09286 cpc -1 NC_026477.1 F: GCCTTTTCCTCACCTGCTGT R: CCGACTTGCGACGGTTCA FGSG_04400 rhoA NC_026475.1 F: GGCGATGGTGCTTGTGGTAA R: GAGGGAGTCGGGAGAGTCAA Note: Accession number was from the NCBI database. jof-10-00390-t004_Table 4 Table 4 Effect of B. velezensis...”
A Phenome-Wide Association Study of the Effects of Fusarium graminearum Transcription Factors on Fusarium Graminearum Virus 1 Infection
Yu, Frontiers in microbiology 2021
- “...( Figure 3B ). Among TF deletion mutants in Group 2, FgV1-infected mutants including GzbZIP015 (FGSG_09286), GzC2H024 (FGSG_04083), and GzZC033 (FGSG_13652) produced fluffy but low density of aerial mycelia ( Figure 1 ) and also accumulated D-RNAs during FgV1 replication ( Figure 3C , right panel)....”
- “...0.4 5.34 2.5* Virus response FGSG_00324 32.6 2.05 1.3 1.79 1.0 MD, os Group 2 FGSG_09286 46.3 1.20 0.2 5.09 2.4 Virus response FGSG_03881 41.7 1.28 0.3 3.62 1.7* Virus response FGSG_04083 47.3 1.33 0.1 0.78 0.2 Virus response FGSG_08617 42.2 0.86 0.3 2.32 0.4 Virus...”
RNA-Seq Revealed Differences in Transcriptomes between 3ADON and 15ADON Populations of Fusarium graminearum In Vitro and In Planta
Puri, PloS one 2016
- “...60 ] using deletion mutation, five genes (FGSG_00764 and FGSG_01298 with C2H2 zinc finger domain, FGSG_09286 and FGSG_10142 with bZIP domain, and FGSG_09871 with bromo domain) up-regulated in the 3ADON population were involved in either virulence or DON biosynthesis or both. Thus, the 3ADON population may...”

1uo4B / P03069 Structure based engineering of internal molecular surfaces of four helix bundles (see paper)
74% identity, 42% coverage

Ligand: iodobenzene (1uo4B)

MYCTH_2315566 uncharacterized protein from Thermothelomyces thermophilus ATCC 42464
44% identity, 26% coverage

Transcriptomic and genetic analysis reveals a Zn2Cys6 transcription factor specifically required for conidiation in submerged cultures of Thermothelomyces thermophilus
Drescher, mBio 2025
- “...28 genes have a homolog in N. crassa . One gene, a homolog of NCU04050 (MYCTH_2315566), which encodes the Cross Pathway Control ( cpc-1 ) transcriptional regulator in N. crassa, showed elevated expression levels in the res1 mutant relative to the WT strain, particularly at the...”

MAC_02758 General control protein from Metarhizium acridum
50% identity, 10% coverage

Genome sequencing and comparative transcriptomics of the model entomopathogenic fungi Metarhizium anisopliae and M. acridum
Gao, PLoS genetics 2011
- “...in fungi; however, our transcriptome data shows that a putative bZIP transcription factor (MAA_02048 or MAC_02758) is highly expressed by each Metarhizium species coincident with up-regulation of protein kinase A (see below). The physiological role(s) of MAA_02048 are currently under investigation. Comparative transcriptome analysis Insect bioassays...”

MAA_02048 Basic-leucine zipper (bZIP) transcription factor from Metarhizium robertsii ARSEF 23
46% identity, 12% coverage

Basic leucine zipper (bZIP) domain transcription factor MBZ1 regulates cell wall integrity, spore adherence, and virulence in Metarhizium robertsii
Huang, The Journal of biological chemistry 2015
- “...1.00 0.21 0.25 0.095a 1.78 0.94 Subtilisin protease genes MAA_02048 MAA_05675 MAA_10246 MAA_10260 1.00 0.18 2.44 0.25 0 0 6.58 0.65b 26.13 2.36b 0.33 0.04b...”
Genome sequencing and comparative transcriptomics of the model entomopathogenic fungi Metarhizium anisopliae and M. acridum
Gao, PLoS genetics 2011
- “...been characterized in fungi; however, our transcriptome data shows that a putative bZIP transcription factor (MAA_02048 or MAC_02758) is highly expressed by each Metarhizium species coincident with up-regulation of protein kinase A (see below). The physiological role(s) of MAA_02048 are currently under investigation. Comparative transcriptome analysis...”

FVEG_03822 hypothetical protein from Fusarium verticillioides 7600
48% identity, 13% coverage

Careful with That Axe, Gene, Genome Perturbation after a PEG-Mediated Protoplast Transformation in Fusarium verticillioides
Scala, Toxins 2017
- “...results. Among the genes affected by genomic variations, we analyzed the relative expression of FVEG_03821, FVEG_03822, FVEG_13121, FVEG_13122, FVEG_13123, FVEG_07317 and FVEG_07318 ( Figure 5 ). Results indicated a profound alteration of gene expression in Fv_ lds1 D and Fv_ lds1 T strains at two and...”
- “...the WT strain. DIP1 (DIP-variation) produced a differential expression in both affected genes FVEG_03821 and FVEG_03822 (respectively, p < 0.05 and p < 0.001). Specifically, in Fv_ lds1 D strain, FVEG_03821 and FVEG_03822 expression is higher with respect to Fv10027_t1 ( p -value < 0.05). In...”

1unyB / P03069 Structure based engineering of internal molecular surfaces of four helix bundles (see paper)
73% identity, 41% coverage

Ligand: peptide (1unyB)

XP_019021532 uncharacterized protein from Saitoella complicata NRRL Y-17804
47% identity, 17% coverage

Gcn2 eIF2α kinase mediates combinatorial translational regulation through nucleotide motifs and uORFs in target mRNAs
Chikashige, Nucleic acids research 2020
- “...OLL24616 (CpcA); from Taphrinomycotina incertae sedis, Saitoella , XP_019023598 (Gcn2), XP_019024465 (Hri), XP_019027573 (5MP) and XP_019021532 (CpcA); and from Coprinopsis representing the subphylum Agaricomycotina class Agaricomycetes, XP_001828226 (Gcn2), XP_001830176 (5MP) and Cpc1 ( 57 ). (B-D) uORFs found in Hri ( B ), Gcn5 ( C...”

PTRG_00426, XP_001930759 cross-pathway control protein 1 from Pyrenophora tritici-repentis
50% identity, 19% coverage

The cross-pathway control system regulates production of the secondary metabolite toxin, sirodesmin PL, in the ascomycete, Leptosphaeria maculans
Elliott, BMC microbiology 2011
- “...hypothetical protein PTT_10495 P. teres f. teres 0-1 EFQ92415.1 4e -72 cross-pathway control protein 1 PTRG_00426 XP_001930759 1e- 70 P. tritici-repentis GTA9; dsp3; GU332624 209 bp upstream Zn(II)2Cys6-DNA binding predicted protein [ Aspergillus terreus NIH2624] XP_001209939 4e -38 hypothetical protein AN5274.2 XP_662878 4e- 34 A. nidulans...”
- “...protein PTT_10495 P. teres f. teres 0-1 EFQ92415.1 4e -72 cross-pathway control protein 1 PTRG_00426 XP_001930759 1e- 70 P. tritici-repentis GTA9; dsp3; GU332624 209 bp upstream Zn(II)2Cys6-DNA binding predicted protein [ Aspergillus terreus NIH2624] XP_001209939 4e -38 hypothetical protein AN5274.2 XP_662878 4e- 34 A. nidulans These...”

New Search

For advice on how to use these tools together, see Interactive tools for functional annotation of bacterial genomes.

Statistics

The PaperBLAST database links 798,070 different protein sequences to 1,261,478 scientific articles. Searches against EuropePMC were last performed on May 12 2025.

How It Works

PaperBLAST builds a database of protein sequences that are linked to scientific articles. These links come from automated text searches against the articles in EuropePMC and from manually-curated information from GeneRIF, UniProtKB/Swiss-Prot, BRENDA, CAZy (as made available by dbCAN), BioLiP, CharProtDB, MetaCyc, EcoCyc, TCDB, REBASE, the Fitness Browser, and a subset of the European Nucleotide Archive with the /experiment tag. Given this database and a protein sequence query, PaperBLAST uses protein-protein BLAST to find similar sequences with E < 0.001.

To build the database, we query EuropePMC with locus tags, with RefSeq protein identifiers, and with UniProt accessions. We obtain the locus tags from RefSeq or from MicrobesOnline. We use queries of the form "locus_tag AND genus_name" to try to ensure that the paper is actually discussing that gene. Because EuropePMC indexes most recent biomedical papers, even if they are not open access, some of the links may be to papers that you cannot read or that our computers cannot read. We query each of these identifiers that appears in the open access part of EuropePMC, as well as every locus tag that appears in the 500 most-referenced genomes, so that a gene may appear in the PaperBLAST results even though none of the papers that mention it are open access. We also incorporate text-mined links from EuropePMC that link open access articles to UniProt or RefSeq identifiers. (This yields some additional links because EuropePMC uses different heuristics for their text mining than we do.)

For every article that mentions a locus tag, a RefSeq protein identifier, or a UniProt accession, we try to select one or two snippets of text that refer to the protein. If we cannot get access to the full text, we try to select a snippet from the abstract, but unfortunately, unique identifiers such as locus tags are rarely provided in abstracts.

PaperBLAST also incorporates manually-curated protein functions:

Proteins from NCBI's RefSeq are included if a GeneRIF entry links the gene to an article in PubMed^®. GeneRIF also provides a short summary of the article's claim about the protein, which is shown instead of a snippet.
Proteins from Swiss-Prot (the curated part of UniProt) are included if the curators identified experimental evidence for the protein's function (evidence code ECO:0000269). For these proteins, the fields of the Swiss-Prot entry that describe the protein's function are shown (with bold headings).
Proteins from BRENDA, a curated database of enzymes, are included if they are linked to a paper in PubMed and their full sequence is known.
Every protein from the non-redundant subset of BioLiP, a database of ligand-binding sites and catalytic residues in protein structures, is included. Since BioLiP itself does not include descriptions of the proteins, those are taken from the Protein Data Bank. Descriptions from PDB rely on the original submitter of the structure and cannot be updated by others, so they may be less reliable. (For SitesBLAST and Sites on a Tree, we use a larger subset of BioLiP so that every ligand is represented among a group of structures with similar sequences, but for PaperBLAST, we use the non-redundant set provided by BioLiP.)
Every protein from EcoCyc, a curated database of the proteins in Escherichia coli K-12, is included, regardless of whether they are characterized or not.
Proteins from the MetaCyc metabolic pathway database are included if they are linked to a paper in PubMed and their full sequence is known.
Proteins from the Transport Classification Database (TCDB) are included if they have known substrate(s), have reference(s), and are not described as uncharacterized or putative. (Some of the references are not visible on the PaperBLAST web site.)
Every protein from CharProtDB, a database of experimentally characterized protein annotations, is included.
Proteins from the CAZy database of carbohydrate-active enzymes are included if they are associated with an Enzyme Classification number. Even though CAZy does not provide links from individual protein sequences to papers, these should all be experimentally-characterized proteins.
Proteins from the REBASE database of restriction enzymes are included if they have known specificity.
Every protein with an evidence-based reannotation (based on mutant phenotypes) in the Fitness Browser is included.
Sequence-specific transcription factors (including sigma factors and DNA-binding response regulators) with experimentally-determined DNA binding sites from the PRODORIC database of gene regulation in prokaryotes.
Putative transcription factors from RegPrecise that have manually-curated predictions for their binding sites. These predictions are based on conserved putative regulatory sites across genomes that contain similar transcription factors, so PaperBLAST clusters the TFs at 70% identity and retains just one member of each cluster.
Coding sequence (CDS) features from the European Nucleotide Archive (ENA) are included if the /experiment tag is set (implying that there is experimental evidence for the annotation), the nucleotide entry links to paper(s) in PubMed, and the nucleotide entry is from the STD data class (implying that these are targeted annotated sequences, not from shotgun sequencing). Also, to filter out genes whose transcription or translation was detected, but whose function was not studied, nucleotide entries or papers with more than 25 such proteins are excluded. Descriptions from ENA rely on the original submitter of the sequence and cannot be updated by others, so they may be less reliable.

Except for GeneRIF and ENA, the curated entries include a short curated description of the protein's function. For entries from BioLiP, the protein's function may not be known beyond binding to the ligand. Many of these entries also link to articles in PubMed.

For more information see the PaperBLAST paper (mSystems 2017) or the code. You can download PaperBLAST's database here.

Changes to PaperBLAST since the paper was written:

November 2023: incorporated PRODORIC and RegPrecise. Many PRODORIC entries were not linked to a protein sequence (no UniProt identifier), so we added this information.
February 2023: BioLiP changed their download format. PaperBLAST now includes their non-redundant subset. SitesBLAST and Sites on a Tree use a larger non-redundant subset that ensures that every ligand is represented within each cluster. This should ensure that every binding site is represented.
June 2022: incorporated some coding sequences from ENA with the /experiment tag.
March 2022: incorporated BioLiP.
April 2020: incorporated TCDB.
April 2019: EuropePMC now returns table entries in their search results. This has expanded PaperBLAST's database, but most of the new entries are of low relevance, and the resulting snippets are often just lists of locus tags with annotations.
February 2018: the alignment page reports the conservation of the hit's functional sites (if available from from Swiss-Prot or UniProt)
January 2018: incorporated BRENDA.
December 2017: incorporated MetaCyc, CharProtDB, CAZy, REBASE, and the reannotations from the Fitness Browser.
September 2017: EuropePMC no longer returns some table entries in their search results. This has shrunk PaperBLAST's database, but has also reduced the number of low-relevance hits.

Many of these changes are described in Interactive tools for functional annotation of bacterial genomes.

Secrets

PaperBLAST cannot provide snippets for many of the papers that are published in non-open-access journals. This limitation applies even if the paper is marked as "free" on the publisher's web site and is available in PubmedCentral or EuropePMC. If a journal that you publish in is marked as "secret," please consider publishing elsewhere.

Omissions from the PaperBLAST Database

Many important articles are missing from PaperBLAST, either because the article's full text is not in EuropePMC (as for many older articles), or because the paper does not mention a protein identifier such as a locus tag, or because of PaperBLAST's heuristics. If you notice an article that characterizes a protein's function but is missing from PaperBLAST, please notify the curators at UniProt or add an entry to GeneRIF. Entries in either of these databases will eventually be incorporated into PaperBLAST. Note that to add an entry to UniProt, you will need to find the UniProt identifier for the protein. If the protein is not already in UniProt, you can ask them to create an entry. To add an entry to GeneRIF, you will need an NCBI Gene identifier, but unfortunately many prokaryotic proteins in RefSeq do not have corresponding Gene identifers.

References

PaperBLAST: Text-mining papers for information about homologs.
M. N. Price and A. P. Arkin (2017). mSystems, 10.1128/mSystems.00039-17.

Europe PMC in 2017.
M. Levchenko et al (2017). Nucleic Acids Research, 10.1093/nar/gkx1005.

Gene indexing: characterization and analysis of NLM's GeneRIFs.
J. A. Mitchell et al (2003). AMIA Annu Symp Proc 2003:460-464.

UniProt: the universal protein knowledgebase.
The UniProt Consortium (2016). Nucleic Acids Research, 10.1093/nar/gkw1099.

BRENDA in 2017: new perspectives and new tools in BRENDA.
S. Placzek et al (2017). Nucleic Acids Research, 10.1093/nar/gkw952.

The EcoCyc database: reflecting new knowledge about Escherichia coli K-12.
I. M. Keeseler et al (2016). Nucleic Acids Research, 10.1093/nar/gkw1003.

The MetaCyc database of metabolic pathways and enzymes.
R. Caspi et al (2018). Nucleic Acids Research, 10.1093/nar/gkx935.

CharProtDB: a database of experimentally characterized protein annotations.
R. Madupu et al (2012). Nucleic Acids Research, 10.1093/nar/gkr1133.

The carbohydrate-active enzymes database (CAZy) in 2013.
V. Lombard et al (2014). Nucleic Acids Research, 10.1093/nar/gkt1178.

The Transporter Classification Database (TCDB): recent advances
M. H. Saier, Jr. et al (2016). Nucleic Acids Research, 10.1093/nar/gkv1103.

REBASE - a database for DNA restriction and modification: enzymes, genes and genomes.
R. J. Roberts et al (2015). Nucleic Acids Research, 10.1093/nar/gku1046.

Deep annotation of protein function across diverse bacteria from mutant phenotypes.
M. N. Price et al (2016). bioRxiv, 10.1101/072470.

by Morgan Price, Arkin group
Lawrence Berkeley National Laboratory