PaperBLAST
PaperBLAST Hits for 59 a.a. (RGHRFTKENV...)
Show query sequence
>59 a.a. (RGHRFTKENV...)
RGHRFTKENVRILESWFAKNIENPYLDTKGLENLMKNTSLSRIQIKNWVSNRRRKEKTI
Running BLASTp...
Found 25 similar proteins in the literature:
MTAL2_YEAST / P0CY08 Mating-type protein ALPHA2; MATalpha2 protein; Alpha-2 repressor from Saccharomyces cerevisiae (strain ATCC 204508 / S288c) (Baker's yeast) (see 10 papers)
NP_009866, YCL067C Hmlalpha2p from Saccharomyces cerevisiae
NP_009868 homeodomain mating type protein alpha2 from Saccharomyces cerevisiae S288C
YCR039C Homeobox-domain protein that, with Mcm1p, represses a-specific genes in haploids; acts with A1p to repress transcription of haploid-specific genes in diploids; one of two genes encoded by the MATalpha mating type cassette from Saccharomyces cerevisiae
100% identity, 28% coverage
- function: Mating type proteins are sequence specific DNA-binding proteins that act as master switches in yeast differentiation by controlling gene expression in a cell type-specific fashion. Transcriptional corepressor that binds cooperatively with MCM1 to a 31- basepair DNA sequence termed the a-specific gene (asg) operator, to repress the transcription of a-cell-specific genes. Additionally, in a/alpha diploid cells, binds cooperatively with the A1 protein to a 21- basepair DNA sequence termed the haploid-specific gene (hsg) operator, to repress transcription of haploid-specific genes and of MATALPHA1.
subunit: Binds DNA with a high specificity as a heterotetramer consisting of an ALPHA2 dimer and an MCM1 dimer. Also binds DNA with a high specificity as a heterodimer of A1 and ALPHA2 in a/alpha diploid cells. Interacts with the general transcription repressor complex SSN6/TUP1. - Laboratory Evolution of a Saccharomyces cerevisiae × S. eubayanus Hybrid Under Simulated Lager-Brewing Conditions
Gorter, Frontiers in genetics 2019 - “...Sc :: Se (YKL057C-YKR end ) IMS0549 + IMS0550 + IMS0551 + Sc (YCL end -YCL067C), Sc (YCR039C-YCR end ) IMS0552 + Sc :: Se (YHL end -YHL023C) 228 IMS0604 + + Sc (YKL032C-YKL054C), Sc :: Se (YLR305C-YLR end ) SeBET2 G550A IMS0605 + + Sc...”
- “...(YKL032C-YKL054C) occurred in 13 strains, Sc (YDR261C-YDR211W) occurred in two strains, and Sc (YCL end -YCL067C) and Sc (YCR039C-YCR end ) occurred together in one strain. The internal recombinations Sc (YKL032C-YKL054C) and Sc (YDR261C-YDR211W) both resulted in loss of the sequence between the recombination sites. The...”
- Genome sequence of the highly weak-acid-tolerant Zygosaccharomyces bailii IST302, amenable to genetic manipulations and physiological studies
Palma, FEMS yeast research 2017 - “.../ YCR096C) Silenced copy of a1/a2 at HMR - - ZYRO0C18348g HMLALPHA1 / 2 (YCL066W/ YCL067C) Silenced copy of ALPHA1/2 at HML - - ZYRO0F15818g MATALPHA1 (YCR040W) Transcriptional co-activator involved in regulation of mating type specific gene expression ZBIST_5098 BN860_00122g_l ZYRO0F15840g MATALPHA2 (YCR039C) Transcriptional repressor of...”
- Three distinct mechanisms of long-distance modulation of gene expression in yeast
Du, PLoS genetics 2017 - “...kud MET3pr and S . cer MET3pr integrated at three different locations: ECM18 (profile 1), YCL067C (profile 2), and TDH3 (profile 3). The data were normalized to the ECM18 GFP intensity. Note that the two promoters show similar activation kinetics and steady state levels. C) PCR...”
- Sex-determination system in the diploid yeast Zygosaccharomyces sapae
Solieri, G3 (Bethesda, Md.) 2014 - “...orthologous MAT2 annotated in Z. rouxii (ZrMAT2; GenBank: XP_0024978881) and S. cerevisiae genomes (ScMAT2; GenBank: NP_009866). The S. cerevisiae DNA binding homeodomain of MAT2 (Pfam PF00046) consisting in three three-helix globular domains that contact major groove bases and the DNA backbone are indicated by horizontal black...”
- NLStradamus: a simple Hidden Markov Model for nuclear localization signal prediction
Nguyen, BMC bioinformatics 2009 - “...YBL105C PKC1 810 813 15643058 YBR098W MMS4 244 263 14642571 YCL017C NFS1 312 316 11110795 YCL067C HMLALPHA2 1 13 1976249 YCL067C HMLALPHA2 141 159 1976249 YCR039C MATALPHA2 2 13 8757785 YCR039C MATALPHA2 141 159 8757785 YDR034C LYS14 190 250 10975256 YEL032W MCM3 766 772 16093348 YEL061C...”
- Variation in gene duplicates with low synonymous divergence in Saccharomyces cerevisiae relative to Caenorhabditis elegans
Katju, Genome biology 2009 - “...YHR056C 0.0000 Partial 3 YCL065W YCR041W - 0.0019 Chimeric III/III 2,509 YCL066W YCR040W 0.0000 Complete YCL067C YCR039C 0.0000 Complete YCL068C YCR038C 0.0058 Chimeric 4 YNL033W YNL019C 0.0000 0.0077 Complete XIV/XIV 4,247 YNL034W YNL018C 0.0450 Complete 5 YAR073W/75W YHR216W 0.1074 0.0087 Complete I/VIII 7,445 YAR071W YHR215W 0.0069...”
- Ionizing radiation and restriction enzymes induce microhomology-mediated illegitimate recombination in Saccharomyces cerevisiae
Chan, Nucleic acids research 2007 - “...genomic location, locus and gene of the target sites are as follows: KC15, Chr. III, YCL067C, HMLALPHA2 ; KC16, Chr. IX, YIL137C, RBF108 ; KC17, Chr. XI, YKL197C, PEX1 ; KC18, Chr. II, 21590, intergenic region; KC19, Chr. XIV, YNL298W, CLA4 ; KC20, Chr. II, YBL104C,...”
- Global chromatin structure of 45,000 base pairs of chromosome III in a- and alpha-cell yeast and during mating-type switching
Ercan, Molecular and cellular biology 2004 - “...chromosome III Locus description YCL069W ARS301 YCL068C YCL067C YCL066W YCL065W ARS302 ARS303 ARS320 YCL064C YCL063W YCL061C YCL059C YCL058W-A YCL058C YCL057C-A...”
- More
- DNA binding by the MATα2 transcription factor controls its access to alternative ubiquitin-modification pathways.
Hickey, Molecular biology of the cell 2018 - GeneRIF: MATalpha2 (alpha2) mutants with impaired DNA binding become inaccessible to the Slx5/Slx8 pathway but are still rapidly degraded through efficient shunting to the Doa10 pathway.
- STUbL-mediated degradation of the transcription factor MATα2 requires degradation elements that coincide with corepressor binding sites.
Hickey, Molecular biology of the cell 2015 - GeneRIF: Authors propose that competitive binding to MATalpha2 by the ubiquitylation machinery and alpha2 cofactors is balanced so that alpha2 can function in transcription repression yet be short lived enough to allow cell-type switching.
- The short-lived Matalpha2 transcriptional repressor is protected from degradation in vivo by interactions with its corepressors Tup1 and Ssn6.
Laney, Molecular and cellular biology 2006 - GeneRIF: Matalpha2 corepressors Tup1 and Ssn6 modify the in vivo degradation rate of Matalpha2.
- Repression of the yeast HO gene by the MATalpha2 and MATa1 homeodomain proteins.
Mathias, Nucleic acids research 2004 - GeneRIF: Analysis of MATalpha2 binding sites on the HO promotor that mediate repression of the HO gene.
- A general strategy to construct small molecule biosensors in eukaryotes
Feng, eLife 2015 - “...Clones containing an N-terminal degron were similarly cloned fusing residues 167 of Mat2 (UniProt ID P0CY08) to the 5- end of G-DIG-V. Plasmids were transformed into yeast using the Gietz method( Gietz and Schiestl, 2007 ), with transformants being plated on synthetic complete media lacking uracil...”
- Ancestral Sequence Reconstruction as a Tool to Detect and Study De Novo Gene Emergence
Vakirlis, Genome biology and evolution 2024 - “...For the remaining 1,076 ORFs, different ASR methodological variations gave at least partly conflicting estimates (YCR039C, YJL077W-B, and YOR202W were removed from the analysis due to missing sequences in at least one of the species, which led to failure of some ASR tools). Two examples of...”
- Unlocking the genome of the non-sourdough Kazachstania humilis MAW1: insights into inhibitory factors and phenotypic properties
Mielecki, Microbial cell factories 2024 - “...locus genes and their orthologues from Saccharomyces cerevisiae S288C, being MATALPHA1 (locus tag: YCR040W), MATALPHA2 (YCR039C), HMRA1 (YCR097W), and HMRA2 (YCR096C), from Kazachstania naganishii CBS8797, being MATALPHA1 (KNAG_0C00150), MATALPHA2 (KNAG_0C00160), and MATA1 (KNAG_0C00795), and Kazachstania africana CBS 2517, being MATALPHA1 (KAFR_0D00710), MATALPHA2 (KAFR_0D00720), and MATA1 (KAFR_0G00180)....”
- Laboratory Evolution of a Saccharomyces cerevisiae × S. eubayanus Hybrid Under Simulated Lager-Brewing Conditions
Gorter, Frontiers in genetics 2019 - “...a previously-observed circularization of chromosome III by a recombination between the HMLALPHA2 (YCL067C) and MATALPHA2 (YCR039C) loci, leading to loss of both chromosome extremities (Newlon et al., 1991 ). Figure 2 Total number of occurrences of whole-chromosome (A) and segmental (B) aneuploidy for each chromosome of...”
- Genome sequence of the highly weak-acid-tolerant Zygosaccharomyces bailii IST302, amenable to genetic manipulations and physiological studies
Palma, FEMS yeast research 2017 - “...Transcriptional co-activator involved in regulation of mating type specific gene expression ZBIST_5098 BN860_00122g_l ZYRO0F15840g MATALPHA2 (YCR039C) Transcriptional repressor of a-specific genes in haploids - - - MATA1 Homeodomain protein involved in transcriptional regulation of mating type specific genes - - - MATA2 Protein of unknown function;...”
- Genomes of Ashbya fungi isolated from insects reveal four mating-type loci, numerous translocations, lack of transposons, and distinct gene duplications
Dietrich, G3 (Bethesda, Md.) 2013 - “...found in the reference strain ATCC10895. These genes are orthologs of MAT1 (YCR040W) and MAT2 (YCR039C) genes of S. cerevisiae and map at the right subtelomeric region of chromosome VI, which harbors in the reference strain the originally overlooked fourth MATa copy ( Figure 5 )....”
- NLStradamus: a simple Hidden Markov Model for nuclear localization signal prediction
Nguyen, BMC bioinformatics 2009 - “...YCL017C NFS1 312 316 11110795 YCL067C HMLALPHA2 1 13 1976249 YCL067C HMLALPHA2 141 159 1976249 YCR039C MATALPHA2 2 13 8757785 YCR039C MATALPHA2 141 159 8757785 YDR034C LYS14 190 250 10975256 YEL032W MCM3 766 772 16093348 YEL061C CIN8 994 1000 11694576 YGL103W RPL28 24 30 2104804 YGL103W...”
- Variation in gene duplicates with low synonymous divergence in Saccharomyces cerevisiae relative to Caenorhabditis elegans
Katju, Genome biology 2009 - “...0.0000 Partial 3 YCL065W YCR041W - 0.0019 Chimeric III/III 2,509 YCL066W YCR040W 0.0000 Complete YCL067C YCR039C 0.0000 Complete YCL068C YCR038C 0.0058 Chimeric 4 YNL033W YNL019C 0.0000 0.0077 Complete XIV/XIV 4,247 YNL034W YNL018C 0.0450 Complete 5 YAR073W/75W YHR216W 0.1074 0.0087 Complete I/VIII 7,445 YAR071W YHR215W 0.0069 Complete...”
- Population genomics of the wild yeast Saccharomyces paradoxus: Quantifying the life cycle
Tsai, Proceedings of the National Academy of Sciences of the United States of America 2008 - “...significant homology to HML and includes 800 bp to the left of YCR039C and 190 bp to the right of YCR040W. 0.7 Far East 0.6 0.5 0.4 0.3 0.2 0.1 0 0 50 100 150...”
- More
MATA2_YEASX / P0CY12 Putative mating-type protein A2; MATa2 protein from Saccharomyces cerevisiae (Baker's yeast) (see paper)
YCR096C Hmra2p from Saccharomyces cerevisiae
100% identity, 50% coverage
- function: Probably not a functional protein. Cells lacking A2 show no obvious alterations in mating, sporulation and cell growth.
- Unlocking the genome of the non-sourdough Kazachstania humilis MAW1: insights into inhibitory factors and phenotypic properties
Mielecki, Microbial cell factories 2024 - “...from Saccharomyces cerevisiae S288C, being MATALPHA1 (locus tag: YCR040W), MATALPHA2 (YCR039C), HMRA1 (YCR097W), and HMRA2 (YCR096C), from Kazachstania naganishii CBS8797, being MATALPHA1 (KNAG_0C00150), MATALPHA2 (KNAG_0C00160), and MATA1 (KNAG_0C00795), and Kazachstania africana CBS 2517, being MATALPHA1 (KAFR_0D00710), MATALPHA2 (KAFR_0D00720), and MATA1 (KAFR_0G00180). Additionally, the homologues of genes...”
- Genome sequence of the highly weak-acid-tolerant Zygosaccharomyces bailii IST302, amenable to genetic manipulations and physiological studies
Palma, FEMS yeast research 2017 - “...MFA1/2 (YDR461W / YNL145W) Mating pheromone a-factor ZBIST_2952 - - HMRA1 / 2 (YCR097W / YCR096C) Silenced copy of a1/a2 at HMR - - ZYRO0C18348g HMLALPHA1 / 2 (YCL066W/ YCL067C) Silenced copy of ALPHA1/2 at HML - - ZYRO0F15818g MATALPHA1 (YCR040W) Transcriptional co-activator involved in regulation...”
- A DNA microarray-based approach to elucidate the effects of the immunosuppressant SR31747A on gene expression in Saccharomyces cerevisiae
Cinato, Gene expression 2002 - “...SST2 (YLR452C) ALPHA1 (YCR040C) KAR4 (YCL055W) A2 (YCR096C) SAG1 (YJR004C) RNA processing STP4 (YDL048C) TAD3 (YLR316C) SEN2 (YLR105C) PRP19 (YLL036C) Cell...”
- Yeast Upf proteins required for RNA surveillance affect global expression of the yeast transcriptome
Lelivelt, Molecular and cellular biology 1999 - “...YOL165C YMR320W YKR012C YMR065W YMR316C-b YLL067c YPL144W YCR096c YFL057C YNL270C YKL071W YIR031C YER180C YFL020C YER076c YER187w YER188w YFL061W YLL060c...”
1akhB / P0CY08 Mat a1/alpha2/DNA ternary complex (see paper)
100% identity, 76% coverage
1mnmC / P0CY08 Yeast matalpha2/mcm1/DNA ternary transcription complex crystal structure (see paper)
100% identity, 75% coverage
MTAL2_CANGA / Q86Z42 Mating-type-like protein ALPHA2; MTL1alpha2 protein from Candida glabrata (strain ATCC 2001 / BCRC 20586 / JCM 3761 / NBRC 0622 / NRRL Y-65 / CBS 138) (Yeast) (Nakaseomyces glabratus) (see paper)
63% identity, 32% coverage
- function: Mating type proteins are sequence specific DNA-binding proteins that act as master switches in yeast differentiation by controlling gene expression in a cell type-specific fashion
KAFR_0D00720 homeodomain mating type protein alpha2 from Kazachstania africana CBS 2517
61% identity, 29% coverage
- Unlocking the genome of the non-sourdough Kazachstania humilis MAW1: insights into inhibitory factors and phenotypic properties
Mielecki, Microbial cell factories 2024 - “...(KNAG_0C00150), MATALPHA2 (KNAG_0C00160), and MATA1 (KNAG_0C00795), and Kazachstania africana CBS 2517, being MATALPHA1 (KAFR_0D00710), MATALPHA2 (KAFR_0D00720), and MATA1 (KAFR_0G00180). Additionally, the homologues of genes flanking MATA1, MATALPHA1, or MATALPHA2 in K. humilis YMX004033 and K. humilis MAW1 assemblies, as well as all main genes connected with...”
VDAG_07897 uncharacterized protein from Verticillium dahliae VdLs.17
44% identity, 7% coverage
- Transcription factors containing both C2H2 and homeobox domains play different roles in Verticillium dahliae
Tang, mSphere 2024 - “...f actors in the genome of VdLs.17 are VDAG_00465, VDAG_02532, VDAG_02889, VDAG_04660, VDAG_04837, VDAG_04891, and VDAG_07897, named VdChtf1 to VdChtf7, respectively (Fig. S1A and B). To further investigate their functions, we knocked them out using Agrobacterium-mediated genetic transformation ( 23 ) and confirmed the knockout by...”
TRIATDRAFT_288678 uncharacterized protein from Trichoderma atroviride
43% identity, 6% coverage
- A comprehensive transcription factor and DNA-binding motif resource for the construction of gene regulatory networks in Botrytis cinerea and Trichoderma atroviride
Olivares-Yañez, Computational and structural biotechnology journal 2021 - “...& TF domain Fungi TRIATDRAFT_31689; 1323 Zinc Cluster and fungal specific TF domain Bcin12g03330; 3295 TRIATDRAFT_288678; 4128 Bcin13g00670; 1934 Zn_Cluster & TF domain Fungi TRIATDRAFT_42504; 1305 Zn_Cluster domain TF Bcin13g05200; 3033 TRIATDRAFT_163506; 4099 Bcin02g08650; 1892 Bcskn7 Response regulator TF TRIATDRAFT_22050; 1301 Zn_Cluster & TF domain Fungi...”
- “...1300 C2H2 domain TF Bcin11g02190; 3048 TRIATDRAFT_295974; 3917 Bcin12g01230; 1801 Zn_Cluster & TF domain Fungi TRIATDRAFT_288678; 1293 Homeobox domain and Zinc finger C2H2-type To characterize the topology of the reference GRNs, their properties were analyzed. Both reference networks comprised a single connected component, indicating that there...”
MTAL2_CANAL / Q9UW22 Mating-type-like protein ALPHA2; MTLalpha2 protein from Candida albicans (strain SC5314 / ATCC MYA-2876) (Yeast) (see 3 papers)
MTLALPHA2 / GB|AAD51408.1 mating-type-like protein ALPHA2 (MTLalpha2 protein) from Candida albicans (see 6 papers)
46% identity, 31% coverage
- function: Mating type proteins are sequence specific DNA-binding proteins that act as master switches in yeast differentiation by controlling gene expression in a cell type-specific fashion. Transcriptional corepressor that acts in conjunction with A1 to repress transcription both of homozygote-specific genes and of genes necessary for the white-opaque switch, a prerequisite for mating.
subunit: Forms a heterodimer with A1. - CharProtDB CGD description: A1p and Alpha2p together repress white-opaque switching and mating (an opaque-specific process); homeodomain; gene of MTLalpha (Mating Type Like) locus; a/alpha mating type may increase virulence, provides competitive advantage
VDAG_04837 uncharacterized protein from Verticillium dahliae VdLs.17
40% identity, 7% coverage
- Transcription factors containing both C2H2 and homeobox domains play different roles in Verticillium dahliae
Tang, mSphere 2024 - “...2 -homeobox transcription f actors in the genome of VdLs.17 are VDAG_00465, VDAG_02532, VDAG_02889, VDAG_04660, VDAG_04837, VDAG_04891, and VDAG_07897, named VdChtf1 to VdChtf7, respectively (Fig. S1A and B). To further investigate their functions, we knocked them out using Agrobacterium-mediated genetic transformation ( 23 ) and confirmed...”
FGSG_07909 hypothetical protein from Fusarium graminearum PH-1
48% identity, 7% coverage
- A comparative hidden Markov model analysis pipeline identifies proteins characteristic of cereal-infecting fungi
Sperschneider, BMC genomics 2013 - “...PF00168) Endocytosis, exocytosis, synaptotagmin-1 (100%/54%) 52.1 No 41% Metazoan, 29% Viridiplantae, 22% Ascomycota, 8% others FGSG_07909 0.62 Homeobox KN domain (2.3e-15, PF05920) Homeobox domain (99.7%/11%) 84.6 Yes 56% Metazoan, 29% Viridiplantae, 12% Ascomycota, 3% others FGSG_07846 0.61 FMO-like (5.8e-16, PF00734) Monooxygenase (100%/75%) 62.6 No 37% Bacteria,...”
- “...domain (3.5e-05, PF04082) Centromere DNA-binding protein complex cbf3 (98.2%/72%) 62.6 No 94% Ascomycota, 6% Basidiomycota FGSG_07909 0.48 Homeobox KN domain (2.3e-15, PF05920) Homeobox domain (99.7%/11%) 84.6 Yes 56% Metazoan, 29% Viridiplantae, 12% Ascomycota, 3% others For each protein, its Pfam annotation, Phyre2 structure prediction, molecular weight...”
G2WY50 quercetin 2,3-dioxygenase (EC 1.13.11.24) from Verticillium dahliae (see paper)
VDAG_02532 cupin domain-containing protein from Verticillium dahliae VdLs.17
37% identity, 5% coverage
- Transcription factors containing both C2H2 and homeobox domains play different roles in Verticillium dahliae
Tang, mSphere 2024 - “...C 2 H 2 -homeobox transcription f actors in the genome of VdLs.17 are VDAG_00465, VDAG_02532, VDAG_02889, VDAG_04660, VDAG_04837, VDAG_04891, and VDAG_07897, named VdChtf1 to VdChtf7, respectively (Fig. S1A and B). To further investigate their functions, we knocked them out using Agrobacterium-mediated genetic transformation ( 23...”
FGSG_09043 hypothetical protein from Fusarium graminearum PH-1
41% identity, 11% coverage
- Genomic clustering and co-regulation of transcriptional networks in the pathogenic fungus Fusarium graminearum
Lawler, BMC systems biology 2013 - “...FG6 FGSG_02814 HLH 4 FG1, FG2.11, FG6, FG6 FGSG_05567 HLH 4 FG1, FG2.11, FG6, FG6 FGSG_09043 Homeobox/zf-C2H2 1 FG2.11 FGSG_06359 HSF_DNA-bind 1 FG6 FGSG_13911 Myb_DNA-binding 1 FG2.01 FGSG_01298 zf-C2H2 1 FG6 FGSG_01341 zf-C2H2 2 FG1, FG2.10 FGSG_01350 zf-C2H2 2 FG1, FG2.10 FGSG_02743 zf-C2H2 2 FG1, FG2.10...”
TRIATDRAFT_161626 uncharacterized protein from Trichoderma atroviride
41% identity, 5% coverage
VDAG_00465 uncharacterized protein from Verticillium dahliae VdLs.17
39% identity, 5% coverage
- Transcription factors containing both C2H2 and homeobox domains play different roles in Verticillium dahliae
Tang, mSphere 2024 - “...seven C 2 H 2 -homeobox transcription f actors in the genome of VdLs.17 are VDAG_00465, VDAG_02532, VDAG_02889, VDAG_04660, VDAG_04837, VDAG_04891, and VDAG_07897, named VdChtf1 to VdChtf7, respectively (Fig. S1A and B). To further investigate their functions, we knocked them out using Agrobacterium-mediated genetic transformation (...”
SS1G_03098 hypothetical protein from Sclerotinia sclerotiorum 1980 UF-70
42% identity, 4% coverage
- Changes in the Sclerotinia sclerotiorum transcriptome during infection of Brassica napus
Seifbarghi, BMC genomics 2017 - “...- SS1G_06124 mads-box mef2 type transcription factor (SRF type) - - - - - 3.5 SS1G_03098 homeobox transcription factor - - - - - 2.2 SS1G_03835 homeobox C2H2 transcription factor - - - - - 3.1 SS1G_06987 yippee zinc-binding protein - - - - - 2.4...”
- A cupin domain-containing protein with a quercetinase activity (VdQase) regulates Verticillium dahliae's pathogenicity and contributes to counteracting host defenses
El, Frontiers in plant science 2015 - “...FG09047.1 unnamed protein product hypothetical protein FG09043.1 conserved hypothetical protein conserved hypothetical protein hypothetical protein SS1G_03098 homeobox C2H2 transcription factor, putative homeobox and C2H2 transcription factor hypothetical protein CHGG_04773 hypothetical protein SNOG_06363 hypothetical protein BC1G_06341 homeobox C2H2 transcription factor, putative Pc06g01320, similar to copper homeostasis Pc22g06630,...”
- “...FG09047.1 hypothetical protein SNOG_10151 hypothetical protein CHGG_04773 hypothetical protein MGG_01730 hypothetical protein BC1G_06341 hypothetical protein SS1G_03098 Gibberella zeae PH-1 Gibberella zeae PH-1 Gibberella zeae PH-1 Neurospora crassa OR74A Pyrenophora tritici-repentis Pt-1C-BFP Gibberella zeae PH-1 Phaeosphaeria nodorum SN15 Chaetomium globosum CBS 148.51 Magnaporthe grisea 70-15 Botryotinia fuckeliana...”
BC1G_06341 Bchox2 from Botrytis cinerea B05.10
40% identity, 4% coverage
- A cupin domain-containing protein with a quercetinase activity (VdQase) regulates Verticillium dahliae's pathogenicity and contributes to counteracting host defenses
El, Frontiers in plant science 2015 - “...factor, putative homeobox and C2H2 transcription factor hypothetical protein CHGG_04773 hypothetical protein SNOG_06363 hypothetical protein BC1G_06341 homeobox C2H2 transcription factor, putative Pc06g01320, similar to copper homeostasis Pc22g06630, DNA binding domain hypothetical protein AN9328.2 hypothetical protein FG07909.1 homeobox C2H2 transcription factor, putative AhpA, protection against organic peroxides...”
- “...protein hypothetical protein FG09047.1 hypothetical protein SNOG_10151 hypothetical protein CHGG_04773 hypothetical protein MGG_01730 hypothetical protein BC1G_06341 hypothetical protein SS1G_03098 Gibberella zeae PH-1 Gibberella zeae PH-1 Gibberella zeae PH-1 Neurospora crassa OR74A Pyrenophora tritici-repentis Pt-1C-BFP Gibberella zeae PH-1 Phaeosphaeria nodorum SN15 Chaetomium globosum CBS 148.51 Magnaporthe grisea...”
Pc06g01320 uncharacterized protein from Penicillium rubens
39% identity, 7% coverage
AFUA_1G15550 homeobox and C2H2 transcription factor, putative from Aspergillus fumigatus Af293
39% identity, 6% coverage
- Transcriptome analysis of cyclic AMP-dependent protein kinase A-regulated genes reveals the production of the novel natural compound fumipyrrole by Aspergillus fumigatus
Macheleidt, Molecular microbiology 2015 - “...transcription factor (Azf1) 1.64 AFUA_2G10850 C6 finger domain protein 1.67 AFUA_4G01010 C6 transcription factor 1.69 AFUA_1G15550 Homeobox and C2H2 transcription factor 1.76 AFUA_2G12330 Zn cluster transcription factor AcuM 1.81 AFUA_6G01840 C6 transcription factor 1.82 AFUA_2G03020 MYB DNA-binding domain protein 1.83 AFUA_4G11480 C2H2 finger domain protein 2.02...”
- Regulation of sulphur assimilation is essential for virulence and affects iron homeostasis of the human-pathogenic mould Aspergillus fumigatus
Amich, PLoS pathogens 2013 - “...2.169 8,6E-08 AFUA_8G01090 thioredoxin, putative 2.241 0,00037 AFUA_1G02270 ARS binding protein Abp2, putative 2.696 2,0E-07 AFUA_1G15550 homeobox and C2H2 transcription factor, putative 2.697 1,7E-10 Resistance proteins and transporters (p-value=0.00036) AFUA_2G15130 ABC multidrug transporter, putative 1.569 0,00294 AFUA_1G10370 MFS multidrug transporter, putative 1.620 0,00121 AFUA_3G08530 MFS drug...”
NCU05257 homeobox and C2H2 transcription factor from Neurospora crassa OR74A
35% identity, 4% coverage
- Regulatory functions of homeobox domain transcription factors in fungi
Calvo, Applied and environmental microbiology 2024 (secret) - mus-52 disruption and metabolic regulation in Neurospora crassa: Transcriptional responses to extracellular phosphate availability
Martins, PloS one 2018 - “...factor protein (NCU02142) coding gene, which was also down-regulated in the high-Pi condition. Although the NCU05257 gene displayed a fold change value (log2 = 1.47) lower than our expression threshold, it was nevertheless considered in our analysis. Conversely, four genes were up-regulated in low-Pi condition, highlighting...”
- “...2489 and mutant FGSC 9568 strains. Among these, seven genes (NCU00038; NCU001386; NCU02142; NCU02499; NCU03643; NCU05257, and NCU08507) were concomitantly modulated in both phosphate availability conditions, with NCU00038 as the only single DEG up-regulated in both Pi conditions ( Fig 3 ). Most of the TFs...”
- Discovering functions of unannotated genes from a transcriptome survey of wild fungal isolates
Ellison, mBio 2014 - “...this strategy in the context of an unannotated transcription factor, we focused on the gene NCU05257, which encodes a predicted zinc finger and homeobox DNA-binding protein and which fell into an expression cluster containing 58 other genes in our analysis of expression among wild N.crassa strains...”
- “...enriched for genes annotated in amino acid metabolism; see Data SetS1 and TableS2 ), and NCU05257 was previously reported to be a putative target of the N.crassa amino acid biosynthesis regulator CPC1 ( 23 ). To test the regulatory impact of NCU05257 directly, we used transcriptome...”
VDAG_04891 uncharacterized protein from Verticillium dahliae VdLs.17
38% identity, 7% coverage
- Transcription factors containing both C2H2 and homeobox domains play different roles in Verticillium dahliae
Tang, mSphere 2024 - “...-homeobox transcription f actors in the genome of VdLs.17 are VDAG_00465, VDAG_02532, VDAG_02889, VDAG_04660, VDAG_04837, VDAG_04891, and VDAG_07897, named VdChtf1 to VdChtf7, respectively (Fig. S1A and B). To further investigate their functions, we knocked them out using Agrobacterium-mediated genetic transformation ( 23 ) and confirmed the...”
VDAG_02889 uncharacterized protein from Verticillium dahliae VdLs.17
36% identity, 6% coverage
- Transcription factors containing both C2H2 and homeobox domains play different roles in Verticillium dahliae
Tang, mSphere 2024 - “...2 H 2 -homeobox transcription f actors in the genome of VdLs.17 are VDAG_00465, VDAG_02532, VDAG_02889, VDAG_04660, VDAG_04837, VDAG_04891, and VDAG_07897, named VdChtf1 to VdChtf7, respectively (Fig. S1A and B). To further investigate their functions, we knocked them out using Agrobacterium-mediated genetic transformation ( 23 )...”
FOXG_07428 hypothetical protein from Fusarium oxysporum f. sp. lycopersici 4287
38% identity, 6% coverage
XP_001455625 uncharacterized protein from Paramecium tetraurelia
36% identity, 11% coverage
- Homeodomain proteins: an update
Bürglin, Chromosoma 2016 - “...Saccoglossus kowalevskii (acorn worm; hemichordate); Ce: Caenorhabditis elegans ; Pt: Paramecium tetraurelia (sequence accession number: XP_001455625). (PDF 3.03 MB) Sup. Fig. S2 Multiple sequence alignment of fungal MAT2 proteins. Default color code from SeaView (Gouy et al. 2010 ). Species abbreviations: Scer: Saccharomyces cerevisiae ; Vpol:...”
XP_644890 homeodomain containing protein from Dictyostelium discoideum AX4
40% identity, 8% coverage
For advice on how to use these tools together, see
Interactive tools for functional annotation of bacterial genomes.
The PaperBLAST database links 798,070 different protein sequences to 1,261,478 scientific articles. Searches against EuropePMC were last performed on May 12 2025.
PaperBLAST builds a database of protein sequences that are linked
to scientific articles. These links come from automated text searches
against the articles in EuropePMC
and from manually-curated information from GeneRIF, UniProtKB/Swiss-Prot,
BRENDA,
CAZy (as made available by dbCAN),
BioLiP,
CharProtDB,
MetaCyc,
EcoCyc,
TCDB,
REBASE,
the Fitness Browser,
and a subset of the European Nucleotide Archive with the /experiment tag.
Given this database and a protein sequence query,
PaperBLAST uses protein-protein BLAST
to find similar sequences with E < 0.001.
To build the database, we query EuropePMC with locus tags, with RefSeq protein
identifiers, and with UniProt
accessions. We obtain the locus tags from RefSeq or from MicrobesOnline. We use
queries of the form "locus_tag AND genus_name" to try to ensure that
the paper is actually discussing that gene. Because EuropePMC indexes
most recent biomedical papers, even if they are not open access, some
of the links may be to papers that you cannot read or that our
computers cannot read. We query each of these identifiers that
appears in the open access part of EuropePMC, as well as every locus
tag that appears in the 500 most-referenced genomes, so that a gene
may appear in the PaperBLAST results even though none of the papers
that mention it are open access. We also incorporate text-mined links
from EuropePMC that link open access articles to UniProt or RefSeq
identifiers. (This yields some additional links because EuropePMC
uses different heuristics for their text mining than we do.)
For every article that mentions a locus tag, a RefSeq protein
identifier, or a UniProt accession, we try to select one or two
snippets of text that refer to the protein. If we cannot get access to
the full text, we try to select a snippet from the abstract, but
unfortunately, unique identifiers such as locus tags are rarely
provided in abstracts.
PaperBLAST also incorporates manually-curated protein functions:
- Proteins from NCBI's RefSeq are included if a
GeneRIF
entry links the gene to an article in
PubMed®.
GeneRIF also provides a short summary of the article's claim about the
protein, which is shown instead of a snippet.
- Proteins from Swiss-Prot (the curated part of UniProt)
are included if the curators
identified experimental evidence for the protein's function (evidence
code ECO:0000269). For these proteins, the fields of the Swiss-Prot entry that
describe the protein's function are shown (with bold headings).
- Proteins from BRENDA,
a curated database of enzymes, are included if they are linked to a paper in PubMed
and their full sequence is known.
- Every protein from the non-redundant subset of
BioLiP,
a database
of ligand-binding sites and catalytic residues in protein structures, is included. Since BioLiP itself
does not include descriptions of the proteins, those are taken from the
Protein Data Bank.
Descriptions from PDB rely on the original submitter of the
structure and cannot be updated by others, so they may be less reliable.
(For SitesBLAST and Sites on a Tree, we use a larger subset of BioLiP so that every
ligand is represented among a group of structures with similar sequences, but for
PaperBLAST, we use the non-redundant set provided by BioLiP.)
- Every protein from EcoCyc, a curated
database of the proteins in Escherichia coli K-12, is included, regardless
of whether they are characterized or not.
- Proteins from the MetaCyc metabolic pathway database
are included if they are linked to a paper in PubMed and their full sequence is known.
- Proteins from the Transport Classification Database (TCDB)
are included if they have known substrate(s), have reference(s),
and are not described as uncharacterized or putative.
(Some of the references are not visible on the PaperBLAST web site.)
- Every protein from CharProtDB,
a database of experimentally characterized protein annotations, is included.
- Proteins from the CAZy database of carbohydrate-active enzymes
are included if they are associated with an Enzyme Classification number.
Even though CAZy does not provide links from individual protein sequences to papers,
these should all be experimentally-characterized proteins.
- Proteins from the REBASE database
of restriction enzymes are included if they have known specificity.
- Every protein with an evidence-based reannotation (based on mutant phenotypes)
in the Fitness Browser is included.
- Sequence-specific transcription factors (including sigma factors and DNA-binding response regulators)
with experimentally-determined DNA binding sites from the
PRODORIC database of gene regulation in prokaryotes.
- Putative transcription factors from RegPrecise
that have manually-curated predictions for their binding sites. These predictions are based on
conserved putative regulatory sites across genomes that contain similar transcription factors,
so PaperBLAST clusters the TFs at 70% identity and retains just one member of each cluster.
- Coding sequence (CDS) features from the
European Nucleotide Archive (ENA)
are included if the /experiment tag is set (implying that there is experimental evidence for the annotation),
the nucleotide entry links to paper(s) in PubMed,
and the nucleotide entry is from the STD data class
(implying that these are targeted annotated sequences, not from shotgun sequencing).
Also, to filter out genes whose transcription or translation was detected, but whose function
was not studied, nucleotide entries or papers with more than 25 such proteins are excluded.
Descriptions from ENA rely on the original submitter of the
sequence and cannot be updated by others, so they may be less reliable.
Except for GeneRIF and ENA,
the curated entries include a short curated
description of the protein's function.
For entries from BioLiP, the protein's function may not be known beyond binding to the ligand.
Many of these entries also link to articles in PubMed.
For more information see the
PaperBLAST paper (mSystems 2017)
or the code.
You can download PaperBLAST's database here.
Changes to PaperBLAST since the paper was written:
- November 2023: incorporated PRODORIC and RegPrecise. Many PRODORIC entries were not linked to a protein sequence (no UniProt identifier), so we added this information.
- February 2023: BioLiP changed their download format. PaperBLAST now includes their non-redundant subset. SitesBLAST and Sites on a Tree use a larger non-redundant subset that ensures that every ligand is represented within each cluster. This should ensure that every binding site is represented.
- June 2022: incorporated some coding sequences from ENA with the /experiment tag.
- March 2022: incorporated BioLiP.
- April 2020: incorporated TCDB.
- April 2019: EuropePMC now returns table entries in their search results. This has expanded PaperBLAST's database, but most of the new entries are of low relevance, and the resulting snippets are often just lists of locus tags with annotations.
- February 2018: the alignment page reports the conservation of the hit's functional sites (if available from from Swiss-Prot or UniProt)
- January 2018: incorporated BRENDA.
- December 2017: incorporated MetaCyc, CharProtDB, CAZy, REBASE, and the reannotations from the Fitness Browser.
- September 2017: EuropePMC no longer returns some table entries in their search results. This has shrunk PaperBLAST's database, but has also reduced the number of low-relevance hits.
Many of these changes are described in Interactive tools for functional annotation of bacterial genomes.
PaperBLAST cannot provide snippets for many of the papers that are
published in non-open-access journals. This limitation applies even if
the paper is marked as "free" on the publisher's web site and is
available in PubmedCentral or EuropePMC. If a journal that you publish
in is marked as "secret," please consider publishing elsewhere.
Many important articles are missing from PaperBLAST, either because
the article's full text is not in EuropePMC (as for many older
articles), or because the paper does not mention a protein identifier such as a locus tag, or because of PaperBLAST's heuristics. If you notice an
article that characterizes a protein's function but is missing from
PaperBLAST, please notify the curators at UniProt
or add an entry to GeneRIF.
Entries in either of these databases will eventually be incorporated
into PaperBLAST. Note that to add an entry to UniProt, you will need
to find the UniProt identifier for the protein. If the protein is not
already in UniProt, you can ask them to create an entry. To add an
entry to GeneRIF, you will need an NCBI Gene identifier, but
unfortunately many prokaryotic proteins in RefSeq do not have
corresponding Gene identifers.
References
PaperBLAST: Text-mining papers for information about homologs.
M. N. Price and A. P. Arkin (2017). mSystems, 10.1128/mSystems.00039-17.
Europe PMC in 2017.
M. Levchenko et al (2017). Nucleic Acids Research, 10.1093/nar/gkx1005.
Gene indexing: characterization and analysis of NLM's GeneRIFs.
J. A. Mitchell et al (2003). AMIA Annu Symp Proc 2003:460-464.
UniProt: the universal protein knowledgebase.
The UniProt Consortium (2016). Nucleic Acids Research, 10.1093/nar/gkw1099.
BRENDA in 2017: new perspectives and new tools in BRENDA.
S. Placzek et al (2017). Nucleic Acids Research, 10.1093/nar/gkw952.
The EcoCyc database: reflecting new knowledge about Escherichia coli K-12.
I. M. Keeseler et al (2016). Nucleic Acids Research, 10.1093/nar/gkw1003.
The MetaCyc database of metabolic pathways and enzymes.
R. Caspi et al (2018). Nucleic Acids Research, 10.1093/nar/gkx935.
CharProtDB: a database of experimentally characterized protein annotations.
R. Madupu et al (2012). Nucleic Acids Research, 10.1093/nar/gkr1133.
The carbohydrate-active enzymes database (CAZy) in 2013.
V. Lombard et al (2014). Nucleic Acids Research, 10.1093/nar/gkt1178.
The Transporter Classification Database (TCDB): recent advances
M. H. Saier, Jr. et al (2016). Nucleic Acids Research, 10.1093/nar/gkv1103.
REBASE - a database for DNA restriction and modification: enzymes, genes and genomes.
R. J. Roberts et al (2015). Nucleic Acids Research, 10.1093/nar/gku1046.
Deep annotation of protein function across diverse bacteria from mutant phenotypes.
M. N. Price et al (2016). bioRxiv, 10.1101/072470.
by Morgan Price,
Arkin group
Lawrence Berkeley National Laboratory