PaperBLAST
PaperBLAST Hits for MMJJ_RS07505 (57 a.a., MKLTKFETAR...)
Show query sequence
>MMJJ_RS07505
MKLTKFETARLIGARSLQISDGAPLAIESEKTSSLDLADDEVKQGKLPLCVKKQAKN
Running BLASTp...
Found 20 similar proteins in the literature:
MMP1327 DNA-directed RNA polymerase, subunit K from Methanococcus maripaludis S2
100% identity, 100% coverage
AF1131 DNA-directed RNA polymerase, subunit K (rpoK) from Archaeoglobus fulgidus DSM 4304
46% identity, 72% coverage
- Transcription in archaea
Kyrpides, Proceedings of the National Academy of Sciences of the United States of America 1999 - “...MTH1048 MTH42 MTH40 MTH1317 MTH264 MJ1148 AF2282 AF1885 AF1131 AF1130 AF0207 AF1117 AF0056 AF1235 MJ0507 MJ0782 AF0373 AF1299 MTH1627 MTH885 14 (10) 12 (11)...”
RPO6_SACSH / B8YB61 DNA-directed RNA polymerase subunit Rpo6; DNA-directed RNA polymerase subunit K; EC 2.7.7.6 from Saccharolobus shibatae (strain ATCC 51178 / DSM 5389 / JCM 8931 / NBRC 15437 / B12) (Sulfolobus shibatae) (see 3 papers)
B8YB61 DNA-directed RNA polymerase (subunit 5/13) (EC 2.7.7.6) from Saccharolobus shibatae (see 2 papers)
42% identity, 59% coverage
- function: DNA-dependent RNA polymerase (RNAP) catalyzes the transcription of DNA into RNA using the four ribonucleoside triphosphates as substrates.
catalytic activity: RNA(n) + a ribonucleoside 5'-triphosphate = RNA(n+1) + diphosphate (RHEA:21248)
subunit: Part of the 13-subunit RNA polymerase complex.
RPO6_SACS2 / Q97ZJ9 DNA-directed RNA polymerase subunit Rpo6; DNA-directed RNA polymerase subunit K; EC 2.7.7.6 from Saccharolobus solfataricus (strain ATCC 35092 / DSM 1617 / JCM 11322 / P2) (Sulfolobus solfataricus) (see paper)
Q97ZJ9 DNA-directed RNA polymerase (subunit 5/13) (EC 2.7.7.6) from Saccharolobus solfataricus (see paper)
SSO6768 DNA-directed RNA polymerase subunit K (rpoK) from Sulfolobus solfataricus P2
39% identity, 59% coverage
- function: DNA-dependent RNA polymerase (RNAP) catalyzes the transcription of DNA into RNA using the four ribonucleoside triphosphates as substrates.
catalytic activity: RNA(n) + a ribonucleoside 5'-triphosphate = RNA(n+1) + diphosphate (RHEA:21248)
subunit: Part of the 13-subunit RNA polymerase complex. - The complete genome of the crenarchaeon Sulfolobus solfataricus P2
She, Proceedings of the National Academy of Sciences of the United States of America 2001 - “...(Sso0751; ortholog of M. thermoautotrophicum rpoF) (68), rpoK (Sso6768) and rpoP (Sso5865; ortholog of M. thermoautotrophicum rpoP) (68). Apart from rpoG, rpoK,...”
MM_1759 DNA-directed RNA polymerase subunit K from Methanosarcina mazei Goe1
44% identity, 61% coverage
RPB6B_ARATH / Q9SJ96 DNA-directed RNA polymerases II and V subunit 6B from Arabidopsis thaliana (Mouse-ear cress) (see paper)
AT2G04630 NRPB6B; DNA binding / DNA-directed RNA polymerase from Arabidopsis thaliana
42% identity, 35% coverage
- function: DNA-dependent RNA polymerase catalyzes the transcription of DNA into RNA using the four ribonucleoside triphosphates as substrates. Component of RNA polymerase II which synthesizes mRNA precursors and many functional non-coding RNAs. Pol II is the central component of the basal RNA polymerase II transcription machinery. It is composed of mobile elements that move relative to each other. Component of RNA polymerase V which mediates RNA-directed DNA methylation-dependent (RdDM) transcriptional gene silencing (TGS) of endogenous repeated sequences, including transposable elements.
subunit: Component of the RNA polymerase II and V complexes. - Several Isoforms for Each Subunit Shared by RNA Polymerases are Differentially Expressed in the Cultivated Olive Tree (Olea europaea L.)
Fernández-Parras, Frontiers in molecular biosciences 2021 - “...RNA pols used as queries were NRPA/D5, At3g22320; NRPE5, At3g57080; NRPE5-Like, At2g41340; NRPA/E6a, At5g51940; NRPA/E6b, At2g04630; NRPA/E8a, At1g54250; NRPA/E8b, At3g59600; NRPA/E10, At1g11475; NRPB10-like, At1g61700; NRPA/E12a, At5g41010; NRPB12-like, At1g53690. The identified sequences of the common subunits of Arabidopsis RNA pols were subsequently employed as queries to recover...”
- “...as queries: NRPA/D5, At3g22320 ; NRPE5, At3g57080 ; NRPE5-Like, At2g41340 ; NRPA/E6a, At5g51940 ; NRPA/E6b, At2g04630; NRPA/E8a , At1g54250; NRPA/E8b , At3g59600; NRPA/E10 , At1g11475; NRPB10-like , At1g61700 ; NRPA/E12a , At5g41010 ; NRPB12-like , At1g53690. FIGURE 1 Schematic phylogenetic diagram of NRP5 genes. NRP5 sequences...”
- Knockdown NRPC2, 3, 8, NRPABC1 and NRPABC2 Affects RNAPIII Activity and Disrupts Seed Development in Arabidopsis
Zhao, International journal of molecular sciences 2021 - “...AT2G29540 ABC27 P20434 HsNRPABC1 P19388 AtNRPABC1 AT3G22320 ABC23 (-like) AAA34989 HsNRPABC2 P41584 AtNRPABC2-1 AT5G51940 AtNRPABC2-2 AT2G04630 ABC14.5 CAA37383 HsNRPABC3 P52434 AtNRPABC3-1 AT1G54250 AtNRPABC3-2 AT3G59600 ABC10 AAA64417 HsNRPABC4 P53803 AtNRPABC4 AT5G41010 ABC10 P22139 HsNRPABC5 P52436 AtNRPABC5 AT1G11475...”
- Subunit compositions of the RNA-silencing enzymes Pol IV and Pol V reveal their origins as specialized forms of RNA polymerase II
Ream, Molecular cell 2009 - “...also thank Biology 4024 students who helped clone cDNAs: Silvano Ciani and Colin Clune ( At2g04630 ), Andrew Pazandak and Kariline Bringe ( At1g54250 and At3gl6980 ); Caitlin Ramsey and Colin Orr ( At5g59180 ), Wan Shi and Soon Goo Lee ( At1g11475 ), and Lily...”
- The ASH1 HOMOLOG 2 (ASHH2) histone H3 methyltransferase is required for ovule and anther development in Arabidopsis
Grini, PloS one 2009 - “...n.t. 0.83 1.78 At2g03740 LEA-domain - stamens 1.42 2.68 At2g03850 LEA-domain - stamens 1.43 2.69 At2g04630 RBP6 Devaux et al., Mol Biol Cell 18: 12931301 (2007) n.t. 1.24 2.37 At2g07690 CDC46 homologue - shoot apex, carpels 0.93 1.91 At2g15400 RPB36B Larkin & Guilfoyle, J Biol Chem...”
PAB7132 DNA-directed RNA polymerase, subunit K from Pyrococcus abyssi GE5
44% identity, 93% coverage
- Archaeal phylogeny based on proteins of the transcription and translation machineries: tackling the Methanopyrus kandleri paradox
Brochier, Genome biology 2004 - “...rpoA" (PAB0425), rpoB (PAB0423), rpoD (PAB2410), rpoE' (PAB1105), rpoE" (PAB7428), rpoF (PAB0732), rpoH (PAB7151), rpoK (PAB7132), rpoL (PAB2316), rpoM/TFS (PAB1464), rpoN (PAB7131), rpoP (PAB3072), NusA (PAB0426), NusG (PAB2352), TPB (PAB1726), TFB (PAB1912), TFE (PAB0950), TFIIH (PAB2385), TIP49 (PAB2107). BLAST searches were performed at the National Center...”
RPD6A_ARATH / Q9FJ98 DNA-directed RNA polymerases II, IV and V subunit 6A; RNA polymerase Rpb6 from Arabidopsis thaliana (Mouse-ear cress) (see 2 papers)
AT5G51940 NRPB6A; DNA binding / DNA-directed RNA polymerase from Arabidopsis thaliana
42% identity, 35% coverage
- function: DNA-dependent RNA polymerase catalyzes the transcription of DNA into RNA using the four ribonucleoside triphosphates as substrates. Component of RNA polymerase II which synthesizes mRNA precursors and many functional non-coding RNAs. Pol II is the central component of the basal RNA polymerase II transcription machinery. It is composed of mobile elements that move relative to each other. Component of RNA polymerases IV and V which mediate short-interfering RNAs (siRNA) accumulation and subsequent RNA-directed DNA methylation-dependent (RdDM) transcriptional gene silencing (TGS) of endogenous repeated sequences, including transposable elements.
subunit: Component of the RNA polymerase II, IV and V complexes. Interacts with NRPD1. - DNA-dependent RNA polymerases in plants
Yang, The Plant cell 2023 - “.../ 5 At NRP(A/B/C/D)5 At3g22320 NRPE5 At3g57080 Sc ABC27 / / 6 At NRP(A/B/C/D/E)6a a At5g51940 Sc ABC23 / / 7 At NRPA7 At1g75670 NRPC7 At1g06790 NRPB7 At5g59180 NRPD7 At3g22900 NRPE7 At4g14600 Sc A43 C25 RPB7 / / 8 At NRP(A/B/C/D/E)8a a At1g54250 Sc RPABC3 /...”
- Several Isoforms for Each Subunit Shared by RNA Polymerases are Differentially Expressed in the Cultivated Olive Tree (Olea europaea L.)
Fernández-Parras, Frontiers in molecular biosciences 2021 - “...Arabidopsis thaliana RNA pols used as queries were NRPA/D5, At3g22320; NRPE5, At3g57080; NRPE5-Like, At2g41340; NRPA/E6a, At5g51940; NRPA/E6b, At2g04630; NRPA/E8a, At1g54250; NRPA/E8b, At3g59600; NRPA/E10, At1g11475; NRPB10-like, At1g61700; NRPA/E12a, At5g41010; NRPB12-like, At1g53690. The identified sequences of the common subunits of Arabidopsis RNA pols were subsequently employed as queries...”
- “...of RNA polymerases as queries: NRPA/D5, At3g22320 ; NRPE5, At3g57080 ; NRPE5-Like, At2g41340 ; NRPA/E6a, At5g51940 ; NRPA/E6b, At2g04630; NRPA/E8a , At1g54250; NRPA/E8b , At3g59600; NRPA/E10 , At1g11475; NRPB10-like , At1g61700 ; NRPA/E12a , At5g41010 ; NRPB12-like , At1g53690. FIGURE 1 Schematic phylogenetic diagram of NRP5...”
- Knockdown NRPC2, 3, 8, NRPABC1 and NRPABC2 Affects RNAPIII Activity and Disrupts Seed Development in Arabidopsis
Zhao, International journal of molecular sciences 2021 - “...NP_057056 AtNRPAC2 AT2G29540 ABC27 P20434 HsNRPABC1 P19388 AtNRPABC1 AT3G22320 ABC23 (-like) AAA34989 HsNRPABC2 P41584 AtNRPABC2-1 AT5G51940 AtNRPABC2-2 AT2G04630 ABC14.5 CAA37383 HsNRPABC3 P52434 AtNRPABC3-1 AT1G54250 AtNRPABC3-2 AT3G59600 ABC10 AAA64417 HsNRPABC4 P53803 AtNRPABC4 AT5G41010 ABC10 P22139 HsNRPABC5 P52436 AtNRPABC5 AT1G11475...”
- Early Responses to Severe Drought Stress in the Arabidopsis thaliana Cell Suspension Culture Proteome
Alqurashi, Proteomes 2018 - “...proteins of interest after the drought stress Unknown Protein Score Accession Annotation Protein A 0.5949 AT5G51940 Non-catalytic subunit of nuclear DNA-dependent RNA polymerases 0.5940 AT1G11240 Ribosomal RNA-processing protein 0.5939 AT3G56510 RNA-binding (RRM/RBD/RNP motifs) family protein 0.5907 AT2G44860 * Ribosomal protein L24e family protein 0.5886 AT2G45520 Coiled-coil...”
- SHH1, a homeodomain protein required for DNA methylation, as well as RDR2, RDM4, and chromatin remodeling factors, associate with RNA polymerase IV
Law, PLoS genetics 2011 - “...At3g22320 60 14 44.9 1380.2 117 NRPE5B NRPD5B/E5B At2g41340 17 8 35.8 367.7 31 NRPB6A/D6A/E6A At5g51940 4 2 17.4 131.0 11 NRPD7 NRPD7A At3g22900 10 3 19.5 271.0 23 NRPE7 NRPD7B/E7B At4g14660 2 2 14.6 53.0 4 NRPB8B/D8B/E8B At3g59600 0 0 0 0.0 0 NRPB9A/E9A NRPB9A/D9A/E9A...”
- Subunit compositions of the RNA-silencing enzymes Pol IV and Pol V reveal their origins as specialized forms of RNA polymerase II
Ream, Molecular cell 2009 - “...Shi and Soon Goo Lee ( At1g11475 ), and Lily Momper and Charu Agrawal ( At5g51940 ). Pikaard lab research is supported by National Institutes of Health (NIH) grant GM077590. Any opinions expressed in this paper are those of the authors and do not necessarily reflect...”
cgd7_4770 DNA-directed RNA polymerase subunit from Cryptosporidium parvum Iowa II
Q5CXV6 DNA-directed RNA polymerase subunit from Cryptosporidium parvum (strain Iowa II)
43% identity, 43% coverage
RPO6_SULAC / P39463 DNA-directed RNA polymerase subunit Rpo6; DNA-directed RNA polymerase subunit K; EC 2.7.7.6 from Sulfolobus acidocaldarius (strain ATCC 33909 / DSM 639 / JCM 8929 / NBRC 15157 / NCIMB 11770) (see 2 papers)
41% identity, 63% coverage
- function: DNA-dependent RNA polymerase (RNAP) catalyzes the transcription of DNA into RNA using the four ribonucleoside triphosphates as substrates.
function: Reconstitution experiments show this subunit is required for basic activity.
catalytic activity: RNA(n) + a ribonucleoside 5'-triphosphate = RNA(n+1) + diphosphate (RHEA:21248)
subunit: Part of the 13-subunit RNA polymerase complex.
I1LDU4 Uncharacterized protein from Glycine max
40% identity, 35% coverage
- Subcellular Proteomics to Understand Promotive Effect of Plant-Derived Smoke Solution on Soybean Root
Murashita, Proteomes 2021 - “...5 B). In the nuclear proteins, Root hair initiation protein (I1L3V3) and RNA polymerases II (I1LDU4) significantly increase by plant-derived smoke solution compared to control ( Table 1 ). On the other hand, four late embryogenesis-abundant (LEA) proteins (Q9XET0, I1L957, I1M3M9, and I1LE41), importin (I1MDN4), histone...”
- “...Matched Peptides Ratio Increased 1 I1L3V3 Root hair initiation protein root hairless 3 100 2 I1LDU4 RNA polymerases II 2 100 3 C6TGY7 Proliferating cell nuclear antigen 3 29.341 4 I1LHP2 Tyrosyl-tRNA synthetase/Nucleotidylyl transferase 2 5.495 5 I1JJS2 WD repeats region domain-containing protein 3 5.458 6...”
EHI_088230 DNA-directed RNA polymerases I, II, and III 23 kDa, putative from Entamoeba histolytica HM-1:IMSS
C4M6S1 DNA-directed RNA polymerases I, II, and III 23 kDa, putative from Entamoeba histolytica (strain ATCC 30459 / HM-1:IMSS / ABRM)
39% identity, 45% coverage
B0EM77 DNA-directed RNA polymerases I, II, and III subunit RPABC2, putative from Entamoeba dispar (strain ATCC PRA-260 / SAW760)
EDI_320930 DNA-directed RNA polymerases I, II, and III subunit RPABC2, putative from Entamoeba dispar SAW760
EDI_088050 DNA-directed RNA polymerases I, II, and III subunit RPABC2, putative from Entamoeba dispar SAW760
39% identity, 45% coverage
Q24320 DNA-directed RNA polymerases I, II, and III subunit RPABC2 from Drosophila melanogaster
40% identity, 39% coverage
rpb6 / GI|3130047 DNA-directed RNA polymerase I, II and III subunit Rpb6 from Schizosaccharomyces pombe (see 2 papers)
rpb6 / AAA52084.1 RNA polymerase small common phosphorylated subunit from Schizosaccharomyces pombe (see paper)
SPCC1020.04c DNA-directed RNA polymerase I, II and III subunit Rpb6 from Schizosaccharomyces pombe
44% identity, 36% coverage
GRMZM2G086904 DNA-directed RNA polymerases I, II, and III 14.4 kDa polypeptide from Zea mays
40% identity, 36% coverage
- RNA polymerase common subunit ZmRPABC5b is transcriptionally activated by Opaque2 and essential for endosperm development in maize
Chen, Nucleic acids research 2023 - “...yeast two-hybrid (Y2H) assay. We established that DEK701 can interact with the common subunit ZmRPABC2a1 (GRMZM2G086904) and ZmRPABC2a2 (GRMZM2G013600) (Figure 7A and Supplemental Figure S9 ). Luciferase complementation image (LCI) and bimolecular fluorescence complementation (BiFC) assays confirmed the interaction of these proteins, as the co-infiltration of...”
- Functional diversification of maize RNA polymerase IV and V subtypes via alternative catalytic subunits
Haag, Cell reports 2014 - “...47% NRPB5a GRMZM2G099183 38% NRPB5b GRMZM2G469969 44% 48% NRP(D/E)5 Rpb6 GRMZM2G013600 * * * NRP(B/D/E)6a GRMZM2G086904 * * * NRP(B/D/E)6b Rpb7 GRMZM2G179346 23% NRPB7 GRMZM2G040702 26% 42% NRP(D/E)7 Rpb8 GRMZM2G034326 61% 78% 77% NRP(B/D/E)8 GRMZM2G347789 NRPB8-like Rpb9 GRMZM2G046061 37% NRPB9a GRMZM2G023028 37% NRPB9b GRMZM5G898768 18% 33%...”
- “...8% NRP(D/E)4 Rpb5 GRMZM2G476009 NRPB5a GRMZM2G099183 NRPB5b GRMZM2G469969 36% 23% NRP(D/)E5 Rpb6 GRMZM2G013600 * NRP(B/D/E)6a GRMZM2G086904 * NRP(B/D/E)6b Rpb7 GRMZM2G179346 NRPB7 GRMZM2G040702 7% NRP(D/E)7 Rpb8 GRMZM2G034326 52% 26% NRP(B/D/E)8 GRMZM2G347789 NRPB8-like Rpb9 GRMZM2G046061 NRPB9a GRMZM2G023028 NRPB9b GRMZM5G898768 12% NRP(D/E)9 Rpb10 NP_001152395 * 42% NRP(B/D/E)10a GRMZM5G803992 35%...”
AFUA_1G05160 DNA-directed RNA polymerase I, II, and III subunit Rpb6 from Aspergillus fumigatus Af293
44% identity, 32% coverage
- Conservation of nucleosome positions in duplicated and orthologous gene pairs
Nishida, TheScientificWorldJournal 2012 - “...AFUA_1G17010 0.030714781 YDL247W AFUA_2G10910 0.033905332 YBR204C AFUA_1G03540 0.038361835 YJR010W AFUA_3G06530 0.038908707 YMR210W AFUA_6G04640 0.046477426 YPR187W AFUA_1G05160 0.048502572 YDR037W AFUA_6G07640 0.052027721 YIL134W AFUA_6G05170 0.053118026 YLR286C AFUA_5G03760 0.062116003 YKL179C AFUA_1G14240 0.062203351 YDR109C AFUA_4G04680 0.063717983 YOL157C AFUA_3G07380 0.064624439 YIR038C AFUA_2G17300 0.068756542 YDL102W AFUA_2G16600 0.088010728 YOR389W AFUA_2G01940 0.09375584 YBR244W AFUA_3G12270...”
MA0599 DNA-directed RNA polymerase, subunit K from Methanosarcina acetivorans C2A
41% identity, 88% coverage
RPO26 DNA-directed RNA polymerases I from Candida albicans (see paper)
42% identity, 32% coverage
- CharProtDB CGD description: Putative protein of unknown function; heterozygous null mutant exhibits resistance to parnafungin in the C. albicans fitness test; predicted ORF in Assemblies 19, 20 and 21
RPAB2_YEAST / P20435 DNA-directed RNA polymerases I, II, and III subunit RPABC2; RNA polymerases I, II, and III subunit ABC2; ABC23; DNA-directed RNA polymerases I, II, and III 23 kDa polypeptide from Saccharomyces cerevisiae (strain ATCC 204508 / S288c) (Baker's yeast) (see 11 papers)
YPR187W RNA polymerase subunit ABC23, common to RNA polymerases I, II, and III; part of central core; similar to bacterial omega subunit from Saccharomyces cerevisiae
NP_015513 DNA-directed RNA polymerase core subunit RPO26 from Saccharomyces cerevisiae S288C
40% identity, 33% coverage
- function: DNA-dependent RNA polymerases catalyze the transcription of DNA into RNA using the four ribonucleoside triphosphates as substrates. Common component of RNA polymerases I, II and III which synthesize ribosomal RNA precursors, mRNA precursors and many functional non- coding RNAs, and small RNAs, such as 5S rRNA and tRNAs, respectively. Pol II is the central component of the basal RNA polymerase II transcription machinery. RNA polymerases are composed of mobile elements that move relative to each other. In Pol II, RPB6 is part of the clamp element and together with parts of RPB1 and RPB2 forms a pocket to which the RPB4-RPB7 subcomplex binds.
subunit: Component of the RNA polymerase I (Pol I), RNA polymerase II (Pol II) and RNA polymerase III (Pol III) complexes. Component of the RNA polymerase I (Pol I) complex consisting of 14 subunits: RPA135, RPA190, RPC40, RPA14, RPB5, RPO26, RPA43, RPB8, RPA12, RPB10, RPC19, RPC10, RPA49 and RPA34. The complex is composed of a horseshoe-shaped core containing ten subunits (RPA135, RPA190, RPB5, RPO26, RPB8, RPB10, RPC10, RPA12, RPC19 and RPC40) where RPA135 and RPA190 form the DNA- binding cleft. Outside of the core, RPA14 and RPA43 form the stalk that mediates interactions with transcription initiation factors and newly synthesized RNA. Component of the RNA polymerase II (Pol II) complex consisting of 12 subunits: RPO21, RPB2, RPB3, RPB4, RPB5, RPO26, RPB7, RPB8, RPB9, RPB10 and RPC10. Component of the RNA polymerase III (Pol III) complex consisting of 17 subunits. - A series of pyrimidine-based antifungals with anti-mold activity disrupt ER function in Aspergillus fumigatus
Kelty, Microbiology spectrum 2024 - “...in the presence of 1 a Gene ID Gene name Growth b YPR181C SEC23 + YPR187W RPO26 + YML085C TUB1 + YPL143W RPL33A + YPL142C N/A c + YPL237W SUI3 + YPL235W RVB2 + YHR068W DYS1 ++ YHR143W-A RPC10 ++ YKL154W SRP102 ++ YKL108W SLD2 ++...”
- Detection of dynamic protein complexes through Markov Clustering based on Elephant Herd Optimization Approach
Rani, Scientific reports 2019 - “...4.1E-15 Enrichment Score: 6.2E-15 NIL 4 RNA polymerase I subunit YNL248C, YJR063W, YJL148W, YOR340C, YPR010C, YPR187W, YBR154C, YOR224C, YNL113W YNL248C, YJR063W, YJL148W, YOR340C, YPR010C, YPR187W, YBR154C, YNL113W YIL7095W Ribosome biogenesis (GO:0042254) P-Value : 7.1E-10 Enrichment Score: 2.8E1 DNA-directed RNA polymerase activity (GO:0003899) P-Value: 2.3E-14 Enrichment Score:...”
- Identifying protein complex by integrating characteristic of core-attachment into dynamic PPI network
Shen, PloS one 2017 - “...on DIP data. ID Core component Attached component hfun hF Tb p-value 11 ypr110c ypr190c ypr187w ypr032w ypr010c ypl235w ypl160w yor341w yor224c yor210w yor207c yor151c yor119c yor116c ynr003c ynl308c ynl248c ynl229c ynl113w ymr285c ylr453c ykl144c yjr063w yjl130c yjl011c yil128w yil035c yhr020w ygr229c ygr094w ygl016w ybr249c ybr245c...”
- “...yfr019w yfr008w yer022w ydr448w ydr443c ydl005c ybr253w ybr193c ygl025c GO:0016455 0.683 3.94e-32 62 yor116c ypr190c ypr187w ypr110c yor207c ynr003c ynl113w ykl144c yjl011c yfr037c ydr045c ybr154c ykr025w ynl151c ydl150w GO:0003899 0.583 1.94e-31 65 ygr104c ypr168w ypr070w yor174w yol135c ynr010w ynl236w ylr071c yhr058c yhr041c yer148w yer022w ydr308c ydl005c...”
- Predicting the functions of a protein from its ability to associate with other molecules
Taha, BMC bioinformatics 2016 - “...GO:0051640 (organelle localization); GO:0051641 (cellular localization); GO:0000746 (conjugation); GO:0051704 (multi-organism process); GO:0007018 (microtubule-based movement); GO:0022403 YPR187W GO:0006351 (transcription, DNA-templated); GO:0006360 (transcription from RNA polymerase I promoter); GO:0006366 (transcription from RNA polymerase II promoter); GO:0006383 (transcription from RNA polymerase III promoter); GO:0042797 (tRNA transcription from RNA polymerase...”
- Chemical genomic screening of a Saccharomyces cerevisiae genomewide mutant collection reveals genes required for defense against four antimicrobial peptides derived from proteins found in human saliva
Lis, Antimicrobial agents and chemotherapy 2013 - “...YLR376C YDR495C YOR089C YFR009W YIL064W YOR005C YOR270C YPR187W YJR033C YNL248C YNL262W YKL143W YOR233W YPR133W-A YJL130C YBL097W YGR185C YDR310C YPR094W...”
- Protein complex detection via weighted ensemble clustering based on Bayesian nonnegative matrix factorization
Ou-Yang, PloS one 2013 - “...by EC-BNMF can correctly classify these three complexes. Furthermore, four proteins (YOC224C, YOR210W, YBR154C and YPR187W) common to RNA polymerase I, II and III are correctly classified. Two other proteins (YNL113W and YPR110C) that are shared by RNA polymerase I and III are also correctly classified....”
- Ribosome biogenesis in the yeast Saccharomyces cerevisiae
Woolford, Genetics 2013 - “...Essential? YOR341W YPR010C Yes Yes YPR110C YNL113W YBR154C YPR187W YOR224C YOR210W YHR143W-A YJR063W YOR340C YDR156W YNL248C YJL148W Yes Yes Yes Yes Yes Yes Yes...”
- Conservation of nucleosome positions in duplicated and orthologous gene pairs
Nishida, TheScientificWorldJournal 2012 - “...0.969622155 0.984283791 866418 chr16 YPR165W 0.959239007 0.967322323 875364 + chr16 YPR176C 0.986542093 0.977514778 892074 chr16 YPR187W 0.964480751 0.98804279 911253 + chr16 YPR203W 0.968955474 0.985644074 943876 + Table 4 Spearman's rank correlation coefficients between nucleosome position profiles in the promoters of 347 orthologous gene pairs between Aspergillus...”
- “...YIR038C AFUA_1G17010 0.030714781 YDL247W AFUA_2G10910 0.033905332 YBR204C AFUA_1G03540 0.038361835 YJR010W AFUA_3G06530 0.038908707 YMR210W AFUA_6G04640 0.046477426 YPR187W AFUA_1G05160 0.048502572 YDR037W AFUA_6G07640 0.052027721 YIL134W AFUA_6G05170 0.053118026 YLR286C AFUA_5G03760 0.062116003 YKL179C AFUA_1G14240 0.062203351 YDR109C AFUA_4G04680 0.063717983 YOL157C AFUA_3G07380 0.064624439 YIR038C AFUA_2G17300 0.068756542 YDL102W AFUA_2G16600 0.088010728 YOR389W AFUA_2G01940 0.09375584 YBR244W...”
- More
- Two Routes to Genetic Suppression of RNA Trimethylguanosine Cap Deficiency via C-Terminal Truncation of U1 snRNP Subunit Snp1 or Overexpression of RNA Polymerase Subunit Rpo26.
Qiu, G3 (Bethesda, Md.) 2015 - GeneRIF: The global role of Rpo26 in transcription by all nuclear RNA polymerases from its particular ability to act as a dosage suppressor of the cold sensitivity of tgs1Delta cells.
- Diversification of function by different isoforms of conventionally shared RNA polymerase subunits
Devaux, Molecular biology of the cell 2007 - “...the Saccharomyces cerevisiae Rpb5p (NP_009712) or Rpb6p (NP_015513) were performed. Several sequences with low expectation values from closely related organisms...”
- “...Q100, and R136 of the S. cerevisiae protein NP_015513. Molecular Biology of the Cell Complex-specific Isoforms of Shared Subunits Figure 3. Subnuclear...”
- Transient-State Kinetic Analysis of the RNA Polymerase II Nucleotide Incorporation Mechanism
Carter, Biochemistry 2023 - “...Rpa190, P10964; Rpa135, P22138; Rpa49, Q01080; Rpa43, P46669; Rpc40, P07703; Rpa34, P47006; Rpb5, P20434; Rpb6, P20435; Rpc19, P28000; Rpb8, P20436; Rpa12, P32529; Rpb10, P22139; Rpb12, P40422. The authors declare no competing financial interest. ABBREVIATIONS Pol II RNA polymerase II Pol I RNA polymerase I Pol I...”
- Ribosomal RNA Transcription Machineries in Intestinal Protozoan Parasites: A Bioinformatic Analysis
Lagunas-Rangel, Acta parasitologica 2022 - “...229 201 7e26 5e16 32 26 cgd2_980 Q5CTZ0 205 7e62 45 POLR2F P61218 127 Rpb6 P20435 155 GL50803_15955 E2RTN0 104 4e26 52 cgd7_4770 Q5CXV6 129 4e36 72 POLR2H P52434 150 Rpb8 P20436 146 GL50803_15144 E2RU32 150 1e07 27 cgd1_2260 Q5CSK9 144 3e22 35 POLR2K P53803 58...”
- “...215 EHI_142090 C4LW54 204 1e47 39 EDI_338140 B0EA74 204 6e48 39 POLR2F P61218 127 Rpb6 P20435 155 EHI_088230 C4M6S1 122 2e32 73 EDI_088050 EDI_320930 B0EM77 122 6e32 72 POLR2H P52434 150 Rpb8 P20436 146 EHI_038570 C4LZP1 143 6e15 31 EDI_259550 B0EB74 143 3e15 31 POLR2K P53803...”
- Conserved Trigger Loop Histidine of RNA Polymerase II Functions as a Positional Catalyst Primarily through Steric Effects
Palo, Biochemistry 2021 - “...P04050 RPB1_YEAST RPB2 P08518 RPB2_YEAST RPB3 P16370 RPB3_YEAST RPB4 P20433 RPB4_YEAST RPB5 P20434 RPAB1_YEAST RPB6 P20435 RPAB2_YEAST RPB7 P34087 RPB7_YEAST RPB8 P20436 RPAB3_YEAST RPB9 P27999 RPB9_YEAST RPB10 P22139 RPAB5_YEAST RPB11 P38902 RPB11_YEAST RPB12 P40422 RPAB4_YEAST Abbreviations: EC elongation complex NAC nucleotide addition cycle NTP nucleoside triphosphate...”
- Cryo-EM structures of human RNA polymerase I
Misiaszek, Nature structural & molecular biology 2021 - “...(UniProt: P50106 , Pfam: PF08203 ), A43 (UniProt: P46669 , Pfam: PF17875 ), ABC23 (UniProt: P20435 , Pfam: PF01192 ), A135 (UniProt: P22138, Pfam: PF00562 ), and RPB4 (UniProt: P20433 , Pfam: PF00562 ). The Pfam family for A14 (Pfam: PF08203 ) has been improved to...”
- A Novel Assay for RNA Polymerase I Transcription Elongation Sheds Light on the Evolutionary Divergence of Eukaryotic RNA Polymerases
Scull, Biochemistry 2019 - “...Rpa190, P10964; Rpa135, P22138; Rpa49, Q01080; Rpa43, P46669; Rpc40, P07703; Rpa34, P47006; Rpb5, P20434; Rpb6, P20435; Rpc19, P28000; Rpb8, P20436; Rpa12, P32529; Rpb10, P22139; Rpb12, P40422. Transcription initiation factors for RNA Pol I: Rrn6, P32786; Rrn7, P40992; Rrn11, Q04712; TBP, P13393; Rrn3, P36070. The authors declare...”
- Global multiple protein-protein interaction network alignment by combining pairwise network alignments
Dohrmann, BMC bioinformatics 2015 - “...POLR2e P20434 RPB5 0.48 match match P24928 POLR2a P04050 RPO21 0.51 match match P61218 POLR2F P20435 RPO26 0.6 mismatch match P30876 POLR2B P08518 RPB2 0.6 match match O15514 POLR2d Q9VEA5 Rpb4 0.73 match match P24928 POLR2a P04052 RpII215 0.77 match match P61218 POLR2F Q24320 RpII18 0.77...”
- Analysis on multi-domain cooperation for predicting protein-protein interactions.
Wang, BMC bioinformatics 2007 - “...the information of TAP-MS, there are total 13 different proteins (P04050, P08518, P16370, P20433, P20434, P20435, P34087, P20436, P27999, P22139, P38902, P40422, P07273) in this complex as its subunits (Figure 8(a) ). Then, in the second step, according to Pfam and protein sequences, all possible domains...”
For advice on how to use these tools together, see
Interactive tools for functional annotation of bacterial genomes.
The PaperBLAST database links 793,807 different protein sequences to 1,259,118 scientific articles. Searches against EuropePMC were last performed on March 13 2025.
PaperBLAST builds a database of protein sequences that are linked
to scientific articles. These links come from automated text searches
against the articles in EuropePMC
and from manually-curated information from GeneRIF, UniProtKB/Swiss-Prot,
BRENDA,
CAZy (as made available by dbCAN),
BioLiP,
CharProtDB,
MetaCyc,
EcoCyc,
TCDB,
REBASE,
the Fitness Browser,
and a subset of the European Nucleotide Archive with the /experiment tag.
Given this database and a protein sequence query,
PaperBLAST uses protein-protein BLAST
to find similar sequences with E < 0.001.
To build the database, we query EuropePMC with locus tags, with RefSeq protein
identifiers, and with UniProt
accessions. We obtain the locus tags from RefSeq or from MicrobesOnline. We use
queries of the form "locus_tag AND genus_name" to try to ensure that
the paper is actually discussing that gene. Because EuropePMC indexes
most recent biomedical papers, even if they are not open access, some
of the links may be to papers that you cannot read or that our
computers cannot read. We query each of these identifiers that
appears in the open access part of EuropePMC, as well as every locus
tag that appears in the 500 most-referenced genomes, so that a gene
may appear in the PaperBLAST results even though none of the papers
that mention it are open access. We also incorporate text-mined links
from EuropePMC that link open access articles to UniProt or RefSeq
identifiers. (This yields some additional links because EuropePMC
uses different heuristics for their text mining than we do.)
For every article that mentions a locus tag, a RefSeq protein
identifier, or a UniProt accession, we try to select one or two
snippets of text that refer to the protein. If we cannot get access to
the full text, we try to select a snippet from the abstract, but
unfortunately, unique identifiers such as locus tags are rarely
provided in abstracts.
PaperBLAST also incorporates manually-curated protein functions:
- Proteins from NCBI's RefSeq are included if a
GeneRIF
entry links the gene to an article in
PubMed®.
GeneRIF also provides a short summary of the article's claim about the
protein, which is shown instead of a snippet.
- Proteins from Swiss-Prot (the curated part of UniProt)
are included if the curators
identified experimental evidence for the protein's function (evidence
code ECO:0000269). For these proteins, the fields of the Swiss-Prot entry that
describe the protein's function are shown (with bold headings).
- Proteins from BRENDA,
a curated database of enzymes, are included if they are linked to a paper in PubMed
and their full sequence is known.
- Every protein from the non-redundant subset of
BioLiP,
a database
of ligand-binding sites and catalytic residues in protein structures, is included. Since BioLiP itself
does not include descriptions of the proteins, those are taken from the
Protein Data Bank.
Descriptions from PDB rely on the original submitter of the
structure and cannot be updated by others, so they may be less reliable.
(For SitesBLAST and Sites on a Tree, we use a larger subset of BioLiP so that every
ligand is represented among a group of structures with similar sequences, but for
PaperBLAST, we use the non-redundant set provided by BioLiP.)
- Every protein from EcoCyc, a curated
database of the proteins in Escherichia coli K-12, is included, regardless
of whether they are characterized or not.
- Proteins from the MetaCyc metabolic pathway database
are included if they are linked to a paper in PubMed and their full sequence is known.
- Proteins from the Transport Classification Database (TCDB)
are included if they have known substrate(s), have reference(s),
and are not described as uncharacterized or putative.
(Some of the references are not visible on the PaperBLAST web site.)
- Every protein from CharProtDB,
a database of experimentally characterized protein annotations, is included.
- Proteins from the CAZy database of carbohydrate-active enzymes
are included if they are associated with an Enzyme Classification number.
Even though CAZy does not provide links from individual protein sequences to papers,
these should all be experimentally-characterized proteins.
- Proteins from the REBASE database
of restriction enzymes are included if they have known specificity.
- Every protein with an evidence-based reannotation (based on mutant phenotypes)
in the Fitness Browser is included.
- Sequence-specific transcription factors (including sigma factors and DNA-binding response regulators)
with experimentally-determined DNA binding sites from the
PRODORIC database of gene regulation in prokaryotes.
- Putative transcription factors from RegPrecise
that have manually-curated predictions for their binding sites. These predictions are based on
conserved putative regulatory sites across genomes that contain similar transcription factors,
so PaperBLAST clusters the TFs at 70% identity and retains just one member of each cluster.
- Coding sequence (CDS) features from the
European Nucleotide Archive (ENA)
are included if the /experiment tag is set (implying that there is experimental evidence for the annotation),
the nucleotide entry links to paper(s) in PubMed,
and the nucleotide entry is from the STD data class
(implying that these are targeted annotated sequences, not from shotgun sequencing).
Also, to filter out genes whose transcription or translation was detected, but whose function
was not studied, nucleotide entries or papers with more than 25 such proteins are excluded.
Descriptions from ENA rely on the original submitter of the
sequence and cannot be updated by others, so they may be less reliable.
Except for GeneRIF and ENA,
the curated entries include a short curated
description of the protein's function.
For entries from BioLiP, the protein's function may not be known beyond binding to the ligand.
Many of these entries also link to articles in PubMed.
For more information see the
PaperBLAST paper (mSystems 2017)
or the code.
You can download PaperBLAST's database here.
Changes to PaperBLAST since the paper was written:
- November 2023: incorporated PRODORIC and RegPrecise. Many PRODORIC entries were not linked to a protein sequence (no UniProt identifier), so we added this information.
- February 2023: BioLiP changed their download format. PaperBLAST now includes their non-redundant subset. SitesBLAST and Sites on a Tree use a larger non-redundant subset that ensures that every ligand is represented within each cluster. This should ensure that every binding site is represented.
- June 2022: incorporated some coding sequences from ENA with the /experiment tag.
- March 2022: incorporated BioLiP.
- April 2020: incorporated TCDB.
- April 2019: EuropePMC now returns table entries in their search results. This has expanded PaperBLAST's database, but most of the new entries are of low relevance, and the resulting snippets are often just lists of locus tags with annotations.
- February 2018: the alignment page reports the conservation of the hit's functional sites (if available from from Swiss-Prot or UniProt)
- January 2018: incorporated BRENDA.
- December 2017: incorporated MetaCyc, CharProtDB, CAZy, REBASE, and the reannotations from the Fitness Browser.
- September 2017: EuropePMC no longer returns some table entries in their search results. This has shrunk PaperBLAST's database, but has also reduced the number of low-relevance hits.
Many of these changes are described in Interactive tools for functional annotation of bacterial genomes.
PaperBLAST cannot provide snippets for many of the papers that are
published in non-open-access journals. This limitation applies even if
the paper is marked as "free" on the publisher's web site and is
available in PubmedCentral or EuropePMC. If a journal that you publish
in is marked as "secret," please consider publishing elsewhere.
Many important articles are missing from PaperBLAST, either because
the article's full text is not in EuropePMC (as for many older
articles), or because the paper does not mention a protein identifier such as a locus tag, or because of PaperBLAST's heuristics. If you notice an
article that characterizes a protein's function but is missing from
PaperBLAST, please notify the curators at UniProt
or add an entry to GeneRIF.
Entries in either of these databases will eventually be incorporated
into PaperBLAST. Note that to add an entry to UniProt, you will need
to find the UniProt identifier for the protein. If the protein is not
already in UniProt, you can ask them to create an entry. To add an
entry to GeneRIF, you will need an NCBI Gene identifier, but
unfortunately many prokaryotic proteins in RefSeq do not have
corresponding Gene identifers.
References
PaperBLAST: Text-mining papers for information about homologs.
M. N. Price and A. P. Arkin (2017). mSystems, 10.1128/mSystems.00039-17.
Europe PMC in 2017.
M. Levchenko et al (2017). Nucleic Acids Research, 10.1093/nar/gkx1005.
Gene indexing: characterization and analysis of NLM's GeneRIFs.
J. A. Mitchell et al (2003). AMIA Annu Symp Proc 2003:460-464.
UniProt: the universal protein knowledgebase.
The UniProt Consortium (2016). Nucleic Acids Research, 10.1093/nar/gkw1099.
BRENDA in 2017: new perspectives and new tools in BRENDA.
S. Placzek et al (2017). Nucleic Acids Research, 10.1093/nar/gkw952.
The EcoCyc database: reflecting new knowledge about Escherichia coli K-12.
I. M. Keeseler et al (2016). Nucleic Acids Research, 10.1093/nar/gkw1003.
The MetaCyc database of metabolic pathways and enzymes.
R. Caspi et al (2018). Nucleic Acids Research, 10.1093/nar/gkx935.
CharProtDB: a database of experimentally characterized protein annotations.
R. Madupu et al (2012). Nucleic Acids Research, 10.1093/nar/gkr1133.
The carbohydrate-active enzymes database (CAZy) in 2013.
V. Lombard et al (2014). Nucleic Acids Research, 10.1093/nar/gkt1178.
The Transporter Classification Database (TCDB): recent advances
M. H. Saier, Jr. et al (2016). Nucleic Acids Research, 10.1093/nar/gkv1103.
REBASE - a database for DNA restriction and modification: enzymes, genes and genomes.
R. J. Roberts et al (2015). Nucleic Acids Research, 10.1093/nar/gku1046.
Deep annotation of protein function across diverse bacteria from mutant phenotypes.
M. N. Price et al (2016). bioRxiv, 10.1101/072470.
by Morgan Price,
Arkin group
Lawrence Berkeley National Laboratory