PaperBLAST
PaperBLAST Hits for 96 a.a. (MLTGKQKRFL...)
Show query sequence
>96 a.a. (MLTGKQKRFL...)
MLTGKQKRFLRSKAHHLTPIFQVGKGGVNDNMIKQIAEALEARELIKVSVLQNCEEDKND
VAEALVKGSRSQLVQTIGNTIVLYKESKENKQIELP
Running BLASTp...
Found 30 similar proteins in the literature:
SAV1595 hypothetical protein from Staphylococcus aureus subsp. aureus Mu50
SA1423 hypothetical protein from Staphylococcus aureus subsp. aureus N315
68% identity, 100% coverage
SAPIG1660 ribosome assembly RNA-binding protein YhbY from Staphylococcus aureus subsp. aureus ST398
67% identity, 100% coverage
- High-Throughput Mutagenesis Reveals a Role for Antimicrobial Resistance- and Virulence-Associated Mobile Genetic Elements in Staphylococcus aureus Host Adaptation
Ba, Microbiology spectrum 2023 - “...3 quinol oxidase, subunit I C Human/pig 1108284-1110272 SAPIG1661 aroE Shikimate 5-dehydrogenase E Human 1742176-1742982 SAPIG1660 yhbY Ribosome assembly RNA-binding protein YbhY J Human/pig 1741882-1742172 SAPIG1790 aroF 3-Deoxy-7-phosphoheptulonate synthase E Human 1891461-1892552 SAPIG1303 glpD Aerobic glycerol-3-phosphate dehydrogenase C Human/pig 1369044-1370717 SAPIG2013 rutB Peroxyureidoacrylate/ureidoacrylate amidohydrolase RutB Q...”
- “...S5), confirming that gene ctaB was indeed conditionally essential for blood survival. Gene ybhY ( SAPIG1660 ), annotated to encode ribosome assembly RNA-binding protein YbhY, was also deleted in both ST398 MRSA strains. Although the gene was identified as conditionally essential for bacterial growth in both...”
SAR1672 conserved hypothetical protein from Staphylococcus aureus subsp. aureus MRSA252
67% identity, 100% coverage
- Contribution of Extracellular Membrane Vesicles To the Secretome of Staphylococcus aureus
Uppu, mBio 2023 - “...change in relative protein intensities. Five proteins (open black circles; BfmB, GcvPB, SAR0357, SAR2788, and SAR1672) were significantly depleted but not packaged in MVs. (E) The 20 MRSA252 proteins with a 10-fold depletion following ultracentrifugation of the culture supernatant to pellet MVs. 10.1128/mbio.03571-22.1 TABLES1 The 667...”
- “...were <10-fold. SAR2788 is an extracellular protein, whereas the other four (BfmB, GcvPB, SAR0357, and SAR1672) are cytoplasmic proteins. Five proteins were significantly enriched in the culture supernatant after ultracentrifugation (green and black dots in the top right panel of Fig.2D ). Two of these (cytoplasmic...”
SAOUHSC_01698 hypothetical protein from Staphylococcus aureus subsp. aureus NCTC 8325
66% identity, 100% coverage
- Proteomic and Metabolomic Analyses of a Tea-Tree Oil-Selected Staphylococcus aureus Small Colony Variant
Torres, Antibiotics (Basel, Switzerland) 2019 - “...synthesis 1.3 SAOUHSC_01661 tRNA A22 N-methylase uncharacterized 1.6 SAOUHSC_01679 MiaB tRNA A37 methylthiotransferase uncharacterized 1.8 SAOUHSC_01698 YbhY RNA-binding protein uncharacterized 1.3 SAOUHSC_01716 PrtC family collagenase-like protease uncharacterized 2.7 SAOUHSC_01735 tRNA A37 threonylcarbamoyladenosine dehydratase uncharacterized 1.6 SAOUHSC_01810 MaeB malate dehydrogenase carbohydrate metabolism 2.0 SAOUHSC_01858 phenylalanyl-tRNA synthetase subunit...”
lmo1489 similar to unknown proteins from Listeria monocytogenes EGD-e
LMRG_00942 conserved hypothetical protein from Listeria monocytogenes 10403S
61% identity, 100% coverage
- Listeria monocytogenes σA Is Sufficient to Survive Gallbladder Bile Exposure
Boonmee, Frontiers in microbiology 2019 - “...2.05 0.01 LMRG_00706 lmo1257 Hypothetical protein 2.53 0.00 LMRG_00826 lmo1375 Peptidase M20 2.07 0.00 LMRG_00942 lmo1489 LMRG_00945-LMRG_00939 RNA binding protein 2.22 0.00 LMRG_01131 lmo1983 ilvD ilv-leu Dihydroxy-acid dehydratase 2.62 0.01 LMRG_01132 lmo1984 ilvB Acetolactate synthase large subunit 3.12 0.00 LMRG_01134 lmo1986 ilvC Ketol-acid reductoisomerase 2.91 0.00...”
- Listeria monocytogenes σA Is Sufficient to Survive Gallbladder Bile Exposure
Boonmee, Frontiers in microbiology 2019 - “...protein 2.05 0.01 LMRG_00706 lmo1257 Hypothetical protein 2.53 0.00 LMRG_00826 lmo1375 Peptidase M20 2.07 0.00 LMRG_00942 lmo1489 LMRG_00945-LMRG_00939 RNA binding protein 2.22 0.00 LMRG_01131 lmo1983 ilvD ilv-leu Dihydroxy-acid dehydratase 2.62 0.01 LMRG_01132 lmo1984 ilvB Acetolactate synthase large subunit 3.12 0.00 LMRG_01134 lmo1986 ilvC Ketol-acid reductoisomerase 2.91...”
UC7_RS16265 ribosome assembly RNA-binding protein YhbY from Enterococcus caccae ATCC BAA-1240
61% identity, 87% coverage
C4N14_06085 ribosome assembly RNA-binding protein YhbY from Fusobacterium nucleatum subsp. nucleatum ATCC 23726
45% identity, 95% coverage
- A global survey of small RNA interactors identifies KhpA and KhpB as major RNA-binding proteins in Fusobacterium nucleatum
Zhu, Nucleic acids research 2024 - “...III, RNase J, RNase R, PNPase), the ATP-dependent RNA helicase DEAD, and three predicted RBPs (C4N14_06085, C4N14_04780, C4N14_02375). In addition, we identified translation initiation factors (2/75), RNA polymerase subunits (2/75), a DNA-binding protein (1/75), and a cold shock family protein (1/75). Proteins of unknown function (14/75)...”
HSISS4_01531 ribosome assembly RNA-binding protein YhbY from Streptococcus salivarius
50% identity, 88% coverage
SP_1748 hypothetical protein from Streptococcus pneumoniae TIGR4
SPD_1558 conserved hypothetical protein TIGR00253 from Streptococcus pneumoniae D39
49% identity, 89% coverage
- Adding context to the pneumococcal core genes using bioinformatic analysis of the intergenic pangenome of Streptococcus pneumoniae
Nielsen, Frontiers in bioinformatics 2023 - “...csIGR1 is flanked by SPD_1558 and SPD_1559. In TIGR4, csIGR2 is flanked by the genes SP_1748 and SP_1749. FIGURE 5 Unrooted phylogenetic tree of the 84 S. pneumoniae strains used in this study. The tree is based on SNPs in the core genes. The shading of...”
- A 3'UTR-derived small RNA represses pneumolysin synthesis and facilitates pneumococcal brain invasion
Shen, Communications biology 2024 - “...activities by pairing with an RBS-embedded intergenic region of the ply operon. The RNA-binding protein SPD_1558 facilitates the pairing. Importantly, PlyT inhibition of Ply synthesis is stronger in anaerobic culture and leads to lower Ply abundance. Deletion of plyT decreases the number of pneumococci in the...”
- “...(RBS)-embedded intergenic region (IGR) of the ply operon under the facilitation of the RNA-binding protein SPD_1558. Remarkably, higher levels of the plyT transcript and its inhibition of Ply synthesis are found in anaerobically- than in statically-grown D39 cells, thus resulting in lower Ply protein levels in...”
- Adding context to the pneumococcal core genes using bioinformatic analysis of the intergenic pangenome of Streptococcus pneumoniae
Nielsen, Frontiers in bioinformatics 2023 - “...D39 having csIGR1 (orange) and TIGR4 having csIGR2 (green). In D39, csIGR1 is flanked by SPD_1558 and SPD_1559. In TIGR4, csIGR2 is flanked by the genes SP_1748 and SP_1749. FIGURE 5 Unrooted phylogenetic tree of the 84 S. pneumoniae strains used in this study. The tree...”
- The Small Molecule DAM Inhibitor, Pyrimidinedione, Disrupts Streptococcus pneumoniae Biofilm Growth In Vitro
Yadav, PloS one 2015 - “...protein -1.8 (0.03) SPD_1380 conserved hypothetical protein -1.2 (0.007) SPD_1400 conserved hypothetical protein -1.5 (0.02) SPD_1558 conserved hypothetical protein -1.6 (0.01) SPD_1566 conserved hypothetical protein -2.3 (0.01) SPD_1588 conserved hypothetical protein -1.6 (0.01) SPD_1595 conserved hypothetical protein -1.9 (0.004) SPD_1662 conserved hypothetical protein -1.6 (0.01) SPD_1716...”
BIF_01751 ribosome assembly RNA-binding protein YhbY from Bifidobacterium animalis subsp. lactis BB-12
46% identity, 88% coverage
- Updated Genome Sequence for the Probiotic Bacterium Bifidobacterium animalis subsp. lactis BB-12
Jensen, Microbiology resource announcements 2021 - “...1435753 CA Intergenic(+46/+18) BIF_00308 / BIF_00385 1442201 +G Intergenic(+56/+22) BIF_00264 / BIF_01752 1444091 1bp35bp Intergenic(135/+100) BIF_01751 / BIF_01116 1444149 +C Intergenic(193/+42) BIF_01751 / BIF_01116 1444153 +C Intergenic(197/+38) BIF_01751 / BIF_01116 1459986 14bp30bp Intergenic(+38/+3) BIF_00179 / BIF_00879 1466361 + TTGCGTTCCC Intergenic(140/+28) BIF_01803 / BIF_00130 1466364 +C Intergenic(143/+25)...”
A4XJT0 CRM domain-containing protein from Caldicellulosiruptor saccharolyticus (strain ATCC 43494 / DSM 8903 / Tp8T 6331)
40% identity, 99% coverage
E2P69_RS07135 ribosome assembly RNA-binding protein YhbY from Xanthomonas perforans
41% identity, 95% coverage
P71376 RNA-binding protein HI_1333 from Haemophilus influenzae (strain ATCC 51907 / DSM 11121 / KW20 / Rd)
HI1333 conserved hypothetical protein from Haemophilus influenzae Rd KW20
42% identity, 96% coverage
- Functional annotation of conserved hypothetical proteins from Haemophilus influenzae Rd KW20
Shahbaaz, PloS one 2013 - “...HP HI1327 950255 P44163 Prokaryotic membrane lipoprotein lipid attachment site profile 208. HP HI1333 949671 P71376 RNA-binding, CRM domain 209. HP HI1338 950260 P44164 phosphohistidine phosphatase SixA 210. HP HI1339 950818 P71378 Late embryogenesis abundant protein 211. HP HI1340 950814 P44165 Outer membrane efflux porinTdeA 212....”
- Functional annotation of conserved hypothetical proteins from Haemophilus influenzae Rd KW20
Shahbaaz, PloS one 2013 - “...MatP 207. HP HI1327 950255 P44163 Prokaryotic membrane lipoprotein lipid attachment site profile 208. HP HI1333 949671 P71376 RNA-binding, CRM domain 209. HP HI1338 950260 P44164 phosphohistidine phosphatase SixA 210. HP HI1339 950818 P71378 Late embryogenesis abundant protein 211. HP HI1340 950814 P44165 Outer membrane efflux...”
- Analysis of electric moments of RNA-binding proteins: implications for mechanism and prediction
Ahmad, BMC structural biology 2011 - “...conformational changes by complex formation are investigated. Basic binding mechanism of a putative RNA-binding protein (HI1333 from Haemophilus influenza) is suggested as a potential application of this study. Results We found that similar to DNA-binding proteins (DBPs), RNA-binding proteins (RBPs) also show significantly higher values of...”
- “...computed the electric moments of a hypothetical protein from PDB (PDB ID 1JO0 ) i.e. HI1333, which is a hypothetical protein from Haemophilus influenzae and it has been marked as candidate of being an RNA-binding protein [ 22 ]. We computed the electric moments of this...”
- Identification of biofilm proteins in non-typeable Haemophilus Influenzae
Gallaher, BMC microbiology 2006 - “...Uncharacterized BCR - y y HI1168 68057915 all 2926 S Uncharacterized BCR - y y HI1333 1574791 All 1534 J Predicted RNA-binding protein containing KH domain, possibly ribosomal protein - N N HI1349 1574811 All 783 L Starvation-inducible DNA-binding protein Dps Y Y HI1427 68058243 4...”
- Structure of HI1333 (YhbY), a putative RNA-binding protein from Haemophilus influenzae
Willis, Proteins 2002 (PubMed)- “...Structure of HI1333 (YhbY), a Putative RNA-Binding Protein From Haemophilus influenzae Mark A. Willis,1 Wojciech Krajewski,1 Vani Rao Chalamasetty,2 Prasad...”
- “...We have determined the crystal structure of HI1333 (YhbY) from Haemophilus influenzae, a protein annotated as hypothetical in sequence databases. We...”
- Genetic analysis of a pyocin-resistant lipooligosaccharide (LOS) mutant of Haemophilus ducreyi: restoration of full-length LOS restores pyocin sensitivity
Filiatrault, Journal of bacteriology 2001 - “...85% similarity) (14). The next ORFs were the lgtF, HI1333, and a portion of Downloaded from http://jb.asm.org/ on February 11, 2017 by University of California,...”
- Pasteurella multocida gene expression in response to iron limitation
Paustian, Infection and immunity 2001 - “...P Unknown Tetrathionate reductase subunit B Hypothetical HI1333 protein ATP-binding protein, ABC transporter Hypothetical E. coli protein Hypothetical HI0883...”
- Construction and characterization of Haemophilus ducreyi lipooligosaccharide (LOS) mutants defective in expression of heptosyltransferase III and beta1,4-glucosyltransferase: identification of LOS glycoforms containing lactosamine repeats
Filiatrault, Infection and immunity 2000 - “...82% identical and 91% similar to a hypothetical protein (HI1333) of H. influenzae, with no known function (16). Seventy-six base pairs downstream of ORF3 is...”
- “...(lgtF); ORF3 has homology to hypothetical protein HI1333 of H. influenzae; ORF 4 shares homology with a phosphatidylglycerophosphate phosphatase B (pgpB)...”
SO1195 conserved hypothetical protein TIGR00253 from Shewanella oneidensis MR-1
41% identity, 96% coverage
YhbY / b3180 ribosome assembly factor YhbY from Escherichia coli K-12 substr. MG1655 (see 9 papers)
YHBY_ECOLI / P0AGK4 RNA-binding protein YhbY from Escherichia coli (strain K12) (see 2 papers)
yhbY RNA-binding protein YhbY from Escherichia coli K12 (see paper)
EDL933_RS21645 ribosome assembly RNA-binding protein YhbY from Escherichia coli O157:H7 str. EDL933
b3180 predicted RNA-binding protein from Escherichia coli str. K-12 substr. MG1655
41% identity, 98% coverage
- Three Novel Antisense Overlapping Genes in E. coli O157:H7 EDL933
Graf, Microbiology spectrum 2023 - “...specified in this case. oloz4542 (303nt) overlaps in antisense to the annotated gene yhbY (294nt, EDL933_RS21645), which encodes the ribosome assembly RNA-binding protein YhbY ( Fig.1B ). The 2 genes share 232nt of their sequences. The reading frame of oloz4542 is -2 in respect to its...”
- Biodistribution of 89Zr-DFO-labeled avian pathogenic Escherichia coli outer membrane vesicles by PET imaging in chickens
Li, Poultry science 2023 - “...108 P29680 DCUP Coenzyme transport and metabolism Cytoplasm 109 P75728 UBIF Function unknown Cytoplasm 110 P0AGK4 YHBY Translation, ribosomal structure and biogenesis Cytoplasm 111 P0A6Z3 HTPG Posttranslational modification, protein turnover, chaperones Cytoplasm 112 P0AA16 OMPR Function unknown Cytoplasm 113 P0AGJ9 SYY Translation, ribosomal structure and biogenesis...”
- Effects of Kasugamycin on the Translatome of Escherichia coli
Lange, PloS one 2017 - “...T G AGGA CA b1238 tdk 0,34 34 34 3 no GCCT G T GG b3180 yhbY 0,36 31 31 7 yes TAAG C A AA b1779 gapA 0,37 36 36 5 no GCT GG T GG b2765 sscR 0,37 24 24 4 no T GTA...”
- Analysis of promoter targets for Escherichia coli transcription elongation factor GreA in vivo and in vitro
Stepanova, Journal of bacteriology 2007 - “...S15 Ribosome-binding factor A Conserved protein yhbYc b3180 1.6 Predicted RNA-binding protein containing KH domain; possibly a ribosomal protein yhbEc b3184...”
NJ56_03835 ribosome assembly RNA-binding protein YhbY from Yersinia ruckeri
41% identity, 98% coverage
- Genome Sequence of the Fish Pathogen Yersinia ruckeri SC09 Provides Insights into Niche Adaptation and Pathogenic Mechanism
Liu, International journal of molecular sciences 2016 - “...respectively encoded by invC , invI , spaO , orgB , and orgA (NJ56_03820, NJ56_03825, NJ56_03835, NJ56_03765, and NJ56_03770) in SC09-Ysa, of which the ATPase could unfold some effector proteins in vitro , and therefore may play an equivalent role in vivo [ 41 ]. The...”
T_RS16300 ribosome assembly RNA-binding protein YhbY from Salmonella enterica subsp. enterica serovar Typhi str. Ty2
40% identity, 98% coverage
t3215 conserved hypothetical protein from Salmonella enterica subsp. enterica serovar Typhi Ty2
40% identity, 86% coverage
Q9K026 CRM domain-containing protein from Neisseria meningitidis serogroup B (strain ATCC BAA-335 / MC58)
40% identity, 83% coverage
FTN_0552 RNA-binding protein from Francisella tularensis subsp. novicida U112
37% identity, 93% coverage
- Molecular complexity orchestrates modulation of phagosome biogenesis and escape to the cytosol of macrophages by Francisella tularensis
Asare, Environmental microbiology 2010 - “...septum formation inhibitor 4 # tnfn1_pw060420p02q170 FTN_0331 minC septum formation inhibitor 4 # Transcription/Translation tnfn1_pw060328p06q196 FTN_0552 yhbY RNA-binding protein 5 # tnfn1_pw060510p03q150 FTN_0949 rplI 50S ribosomal protein L9 2 tnfn1_pw060328p06q170 FTN_1099 transcriptional regulator, LysR family 7 tnfn1_pw060419p03q165 FTN_1300 transcriptional regulator, LysR family 2 tnfn1_pw060328p02q148 FTN_1393 transcriptional...”
- Molecular bases of proliferation of Francisella tularensis in arthropod vectors
Asare, Environmental microbiology 2010 - “...septum formation inhibitor 4 # tnfn1_pw060420p02q170 FTN_0331 minC septum formation inhibitor 4 # Transcription/Translation tnfn1_pw060328p06q196 FTN_0552 yhbY RNA-binding protein 5 # tnfn1_pw060510p03q150 FTN_0949 rplI 50S ribosomal protein L9 2 tnfn1_pw060328p06q170 FTN_1099 transcriptional regulator, LysR family 7 tnfn1_pw060419p03q165 FTN_1300 transcriptional regulator, LysR family 2 tnfn1_pw060328p02q148 FTN_1393 transcriptional...”
P95453 Probable RNA-binding protein PA4753 from Pseudomonas aeruginosa (strain ATCC 15692 / DSM 22644 / CIP 104116 / JCM 14847 / LMG 12228 / 1C / PRS 101 / PAO1)
PA14_62880 putative RNA-binding protein from Pseudomonas aeruginosa UCBPP-PA14
PA4753 hypothetical protein from Pseudomonas aeruginosa PAO1
35% identity, 88% coverage
- Proteome-wide identification of druggable targets and inhibitors for multidrug-resistant <i>Pseudomonas aeruginosa</i> using an integrative subtractive proteomics and virtual screening approach
Vemula, Heliyon 2025 - “...153 Q9I6V9 1235 Q9HTK5 2317 P58040 3399 Q9HYK1 4481 Q9I373 154 Q9I700 1236 Q9HTL0 2318 P95453 3400 Q9HYK2 4482 Q9I374 155 Q9Z4J7 1237 Q9HTM0 2319 Q01609 3401 Q9HYK3 4483 Q9I375 156 Q9ZN70 1238 Q9HTM1 2320 Q03268 3402 Q9HYK4 4484 Q9I376 157 G3XCV2 1239 Q9HTN0 2321 Q03381...”
- Genomewide identification of genetic determinants of antimicrobial drug resistance in Pseudomonas aeruginosa
Dötsch, Antimicrobial agents and chemotherapy 2009 - “...PA14_57690 PA14_57880 PA14_57910 PA14_62560 PA14_62770 PA14_62880 PA14_64190 PA14_68670 PA14_69810 PA14_70980 PA14_15750 PA0425 PA0426 PA0572 PA0595 PA0764...”
- Contact Lens Wear Alters Transcriptional Responses to Pseudomonas aeruginosa in Both the Corneal Epithelium and the Bacteria
Kumar, Investigative ophthalmology & visual science 2025 - “...9.40E-04 flp Type IVb pilin Flp 230.93 1.21 7.29E-03 PA2971 Hypothetical protein 201.96 1.21 9.35E-03 PA4753 Hypothetical protein 209.75 1.21 8.54E-03 rplU 50S ribosomal protein L21 419.64 1.25 5.87E-04 efp Elongation factor P 340.12 1.31 4.27E-04 rpmB 50S ribosomal protein L28 546.41 1.32 4.48E-05 PA4638 PA4638...”
- NrtR Regulates the Type III Secretion System Through cAMP/Vfr Pathway in Pseudomonas aeruginosa
Jin, Frontiers in microbiology 2019 - “...2006 PA0020::Tn PA14 with PA0020 inserted with Tn Liberati et al., 2006 PA4753::Tn PA14 with PA4753 inserted with Tn Liberati et al., 2006 exsA :: PAK with exsA disrupted by insertion of cassette ; Sp r , Sm r Li et al., 2013 nadD2 PAK with...”
- “...T3SS ( Li et al., 2013 ). Among them, PA0020, PA3202, PA4336, PA4630, PA4916, and PA4753 encoded products are annotated as hypothetical proteins 1 ( Winsor et al., 2016 ) with unknown biological functions. To confirm their relationships with T3SS as well as exclude strain specific...”
- Loss of RNA Chaperone Hfq Unveils a Toxic Pathway in Pseudomonas aeruginosa
Hill, Journal of bacteriology 2019 (secret) - Genetic determinants involved in the susceptibility of Pseudomonas aeruginosa to beta-lactam antibiotics
Alvarez-Ortega, Antimicrobial agents and chemotherapy 2010 - “...PA4007 PA4069 PA4088 PA4269 PA4393 PA4745 PA4753 PA5130 PA5174 PA5288 PA5366 Wild-type strain Putative 2-OH-lauroyltransferase Noncatalytic dihydroorotase-like...”
- Genomewide identification of genetic determinants of antimicrobial drug resistance in Pseudomonas aeruginosa
Dötsch, Antimicrobial agents and chemotherapy 2009 - “...PA3800 PA3976 PA4005 PA4269 PA4441 PA4456 PA4459e PA4727 PA4745e PA4753 PA4853e PA5198 PA5288 PA5375 --e Gene name Growthb mexA mexB 1.07 1.03 1.05 0.94 1.35...”
- A putative RNA-binding protein has a role in virulence in Ralstonia solanacearum GMI1000
Franks, Molecular plant pathology 2008 - “...response to root exudates. One such gene, PA4753, encodes a putative RNA-binding protein. Homologues of PA4753 occur in other rhizosphere-associated bacteria,...”
- “...examined the role of Rsc1524, a homologue of PA4753 of P. aeruginosa, found in the root-infecting phytopathogen Ralstonia solanacearum GMI1000 (Boucher et al.,...”
- A putative RNA-binding protein has a role in virulence in Ralstonia solanacearum GMI1000
FRANKS, Molecular plant pathology 2008 - Transcriptome profiling of bacterial responses to root exudates identifies genes involved in microbe-plant interactions
Mark, Proceedings of the National Academy of Sciences of the United States of America 2005 - “...of these mutants (with insertions in PA1269, PA4582, PA4753, PA3022, and PA4352) exhibited growth characteristics indistinguishable from the wild type in LB and...”
- “...however, three mutants (with insertions in PA1269, PA4582, and PA4753) had reduced ability to compete with the wild type in the rhizosphere of both varieties...”
AT2G21350 RNA binding from Arabidopsis thaliana
37% identity, 39% coverage
- Regulation of co-translational mRNA decay by PAP and DXO1 in Arabidopsis
Carpentier, BMC plant biology 2025 - “...For these two candidate targets, PAP treatment significantly affects mRNA stability in both organs. For At2g21350 transcript, mRNA half-life varied from 20.8min to 141.5min (t.test p-value<0.05) and from 24.1min to 188.6min (t.test p-value<0.05) after PAP treatment in shoot and root respectively. At1g66900 transcript followed the same...”
- A nuclear-encoded chloroplast protein harboring a single CRM domain plays an important role in the Arabidopsis growth and stress response
Lee, BMC plant biology 2014 - “...[ 13 ]. Among the 16 Arabidopsis CRM domain-containing protein genes, two genes (At4g39040 and At2g21350) encode the smallest proteins harboring a single CRM domain [ 13 ]. However, the role of single CRM domain-containing proteins has not been demonstrated in plants. Here, we determined the...”
- “...3, and subfamily 4. Among the 16 CRM domain-containing protein genes, two genes (At4g39040 and At2g21350) encode proteins harboring a single CRM domain and are classified into subfamily group 4 [ 13 ]. We thus named At4g39040 as CFM4. The CFM4 protein contains a highly conserved...”
- Mining the soluble chloroplast proteome by affinity chromatography
Bayer, Proteomics 2011 - “...Oxidoreductase family protein O + AT2G17240 Unknown protein C + AT2G17340 Pantothenate kinase-related O + AT2G21350 RNA binding C + AT2G23390 Acyl-CoA N-acyltransferase M + AT2G25870 Haloacid dehalogenase-like family protein M + AT2G31890 b) ATRAP; putative RNA binding domain C + AT2G44760 Unknown protein C +...”
MMP0155 conserved hypothetical protein from Methanococcus maripaludis S2
31% identity, 66% coverage
- The conserved ribonuclease aCPSF1 triggers genome-wide transcription termination of Archaea via a 3'-end cleavage mode
Yue, Nucleic acids research 2020 - “...standard deviations are shown. ( D ) 3RACE assays the TRT transcripts of MMP0901 and MMP0155 in the strains numbered as in panel (B). ( E ) A proposed mechanism of aCPSF1 cleavage triggering the archaeal transcription termination exposes a homolog of the eukaryotic RNAP II...”
- “...also eliminated the TRTs of convergent ( MMP0901 , type I) and co-directional TUs ( MMP0155 , type II) (Figure 6D ). This demonstrated that the aCPSF1 orthologs from various archaeal phyla play a same role in transcription termination. DISCUSSION Transcription termination mechanisms remain largely unknown...”
BP1079 conserved hypothetical protein from Bordetella pertussis Tohama I
32% identity, 42% coverage
AT4G39040 RNA binding from Arabidopsis thaliana
35% identity, 29% coverage
- Roles of Organellar RNA-Binding Proteins in Plant Growth, Development, and Abiotic Stress Responses
Lee, International journal of molecular sciences 2020 - “...C/M Splicing of group II intron ( ndhB ) Stunted growth [ 57 ] CFM4 At4g39040 C 16S and 23S rRNA processing Retarded growth [ 24 ] mCSF1 At4g31010 M Splicing of multiple mitochondrial introns Embryo lethal Retarded growth [ 60 ] CFM9 At3g27550 M Splicing...”
- “...Splicing of multiple mitochondrial introns Sensitive to salt, drought, or ABA [ 61 ] CFM4 At4g39040 C 16S and 23S rRNA processing Sensitive to salt or cold stress [ 24 ] A. thaliana PPR family ABO5 At1g51965 M Splicing of nad2 intron3 Sensitive to ABA [...”
- Organellar Gene Expression and Acclimation of Plants to Environmental Stress
Leister, Frontiers in plant science 2017 - “...in first exon Pale green inner leaves when grown at 4C Wang et al., 2016 AT4G39040 cfm4 CRM FAMILY MEMBER SUBFAMILY 4, CFM4 Chloroplast ABRC cfm4-1 : SALK_076439, T-DNA is inserted in third exon; SALK_126978, T-DNA is inserted in third exon Retarded seed germination and growth...”
- A nuclear-encoded chloroplast protein harboring a single CRM domain plays an important role in the Arabidopsis growth and stress response
Lee, BMC plant biology 2014 - “...developmental and stress response roles of a nuclear-encoded chloroplast protein harboring a single CRM domain (At4g39040), designated CFM4, in Arabidopsis thaliana . Results Analysis of CFM4-GFP fusion proteins revealed that CFM4 is localized to chloroplasts. The loss-of-function T-DNA insertion mutants for CFM4 ( cfm4 ) displayed...”
- “...proteins, respectively [ 13 ]. Among the 16 Arabidopsis CRM domain-containing protein genes, two genes (At4g39040 and At2g21350) encode the smallest proteins harboring a single CRM domain [ 13 ]. However, the role of single CRM domain-containing proteins has not been demonstrated in plants. Here, we...”
- Reactive oxygen species and transcript analysis upon excess light treatment in wild-type Arabidopsis thaliana vs a photosensitive mutant lacking zeaxanthin and lutein
Alboresi, BMC plant biology 2011 - “...AT2G46550 expressed protein -1,23 249472_at AT5G39210 expressed protein -1,23 252136_at AT3G50770 calmodulin-related protein -1,21 252922_at AT4G39040 expressed protein -1,20 267591_at AT2G39705 expressed protein -1,20 257856_at AT3G12930 expressed protein -1,20 263264_at AT2G38810 histone H2A -1,19 249929_at AT5G22340 expressed protein -1,18 266329_at AT2G01590 expressed protein -1,18 248762_at AT5G47455...”
NP_849521 RNA-binding CRS1 / YhbY (CRM) domain protein from Arabidopsis thaliana
35% identity, 31% coverage
PFLU_5262 YhbY family RNA-binding protein from Pseudomonas [fluorescens] SBW25
35% identity, 90% coverage
C0J56_04495 YhbY family RNA-binding protein from Pseudomonas fluorescens
34% identity, 90% coverage
RSc1524 HYPOTHETICAL PROTEIN from Ralstonia solanacearum GMI1000
29% identity, 57% coverage
- A putative RNA-binding protein has a role in virulence in Ralstonia solanacearum GMI1000
Franks, Molecular plant pathology 2008 - “...study, we have tested the role of this homologue, Rsc1524, in the virulence of R. solanacearum GMI1000. Disruption of Rsc1524 resulted in a decrease in...”
- “...specific extracellular plant wall-degrading enzymes. Expression of Rsc1524 was influenced by different plant root exudates and root exudate components, which...”
- A putative RNA-binding protein has a role in virulence in Ralstonia solanacearum GMI1000
FRANKS, Molecular plant pathology 2008
CAF1P_MAIZE / Q84N49 CRS2-associated factor 1, chloroplastic; Chloroplastic group IIA intron splicing facilitator CRS2-associated factor 1 from Zea mays (Maize) (see 2 papers)
40% identity, 7% coverage
For advice on how to use these tools together, see
Interactive tools for functional annotation of bacterial genomes.
The PaperBLAST database links 798,070 different protein sequences to 1,261,478 scientific articles. Searches against EuropePMC were last performed on May 12 2025.
PaperBLAST builds a database of protein sequences that are linked
to scientific articles. These links come from automated text searches
against the articles in EuropePMC
and from manually-curated information from GeneRIF, UniProtKB/Swiss-Prot,
BRENDA,
CAZy (as made available by dbCAN),
BioLiP,
CharProtDB,
MetaCyc,
EcoCyc,
TCDB,
REBASE,
the Fitness Browser,
and a subset of the European Nucleotide Archive with the /experiment tag.
Given this database and a protein sequence query,
PaperBLAST uses protein-protein BLAST
to find similar sequences with E < 0.001.
To build the database, we query EuropePMC with locus tags, with RefSeq protein
identifiers, and with UniProt
accessions. We obtain the locus tags from RefSeq or from MicrobesOnline. We use
queries of the form "locus_tag AND genus_name" to try to ensure that
the paper is actually discussing that gene. Because EuropePMC indexes
most recent biomedical papers, even if they are not open access, some
of the links may be to papers that you cannot read or that our
computers cannot read. We query each of these identifiers that
appears in the open access part of EuropePMC, as well as every locus
tag that appears in the 500 most-referenced genomes, so that a gene
may appear in the PaperBLAST results even though none of the papers
that mention it are open access. We also incorporate text-mined links
from EuropePMC that link open access articles to UniProt or RefSeq
identifiers. (This yields some additional links because EuropePMC
uses different heuristics for their text mining than we do.)
For every article that mentions a locus tag, a RefSeq protein
identifier, or a UniProt accession, we try to select one or two
snippets of text that refer to the protein. If we cannot get access to
the full text, we try to select a snippet from the abstract, but
unfortunately, unique identifiers such as locus tags are rarely
provided in abstracts.
PaperBLAST also incorporates manually-curated protein functions:
- Proteins from NCBI's RefSeq are included if a
GeneRIF
entry links the gene to an article in
PubMed®.
GeneRIF also provides a short summary of the article's claim about the
protein, which is shown instead of a snippet.
- Proteins from Swiss-Prot (the curated part of UniProt)
are included if the curators
identified experimental evidence for the protein's function (evidence
code ECO:0000269). For these proteins, the fields of the Swiss-Prot entry that
describe the protein's function are shown (with bold headings).
- Proteins from BRENDA,
a curated database of enzymes, are included if they are linked to a paper in PubMed
and their full sequence is known.
- Every protein from the non-redundant subset of
BioLiP,
a database
of ligand-binding sites and catalytic residues in protein structures, is included. Since BioLiP itself
does not include descriptions of the proteins, those are taken from the
Protein Data Bank.
Descriptions from PDB rely on the original submitter of the
structure and cannot be updated by others, so they may be less reliable.
(For SitesBLAST and Sites on a Tree, we use a larger subset of BioLiP so that every
ligand is represented among a group of structures with similar sequences, but for
PaperBLAST, we use the non-redundant set provided by BioLiP.)
- Every protein from EcoCyc, a curated
database of the proteins in Escherichia coli K-12, is included, regardless
of whether they are characterized or not.
- Proteins from the MetaCyc metabolic pathway database
are included if they are linked to a paper in PubMed and their full sequence is known.
- Proteins from the Transport Classification Database (TCDB)
are included if they have known substrate(s), have reference(s),
and are not described as uncharacterized or putative.
(Some of the references are not visible on the PaperBLAST web site.)
- Every protein from CharProtDB,
a database of experimentally characterized protein annotations, is included.
- Proteins from the CAZy database of carbohydrate-active enzymes
are included if they are associated with an Enzyme Classification number.
Even though CAZy does not provide links from individual protein sequences to papers,
these should all be experimentally-characterized proteins.
- Proteins from the REBASE database
of restriction enzymes are included if they have known specificity.
- Every protein with an evidence-based reannotation (based on mutant phenotypes)
in the Fitness Browser is included.
- Sequence-specific transcription factors (including sigma factors and DNA-binding response regulators)
with experimentally-determined DNA binding sites from the
PRODORIC database of gene regulation in prokaryotes.
- Putative transcription factors from RegPrecise
that have manually-curated predictions for their binding sites. These predictions are based on
conserved putative regulatory sites across genomes that contain similar transcription factors,
so PaperBLAST clusters the TFs at 70% identity and retains just one member of each cluster.
- Coding sequence (CDS) features from the
European Nucleotide Archive (ENA)
are included if the /experiment tag is set (implying that there is experimental evidence for the annotation),
the nucleotide entry links to paper(s) in PubMed,
and the nucleotide entry is from the STD data class
(implying that these are targeted annotated sequences, not from shotgun sequencing).
Also, to filter out genes whose transcription or translation was detected, but whose function
was not studied, nucleotide entries or papers with more than 25 such proteins are excluded.
Descriptions from ENA rely on the original submitter of the
sequence and cannot be updated by others, so they may be less reliable.
Except for GeneRIF and ENA,
the curated entries include a short curated
description of the protein's function.
For entries from BioLiP, the protein's function may not be known beyond binding to the ligand.
Many of these entries also link to articles in PubMed.
For more information see the
PaperBLAST paper (mSystems 2017)
or the code.
You can download PaperBLAST's database here.
Changes to PaperBLAST since the paper was written:
- November 2023: incorporated PRODORIC and RegPrecise. Many PRODORIC entries were not linked to a protein sequence (no UniProt identifier), so we added this information.
- February 2023: BioLiP changed their download format. PaperBLAST now includes their non-redundant subset. SitesBLAST and Sites on a Tree use a larger non-redundant subset that ensures that every ligand is represented within each cluster. This should ensure that every binding site is represented.
- June 2022: incorporated some coding sequences from ENA with the /experiment tag.
- March 2022: incorporated BioLiP.
- April 2020: incorporated TCDB.
- April 2019: EuropePMC now returns table entries in their search results. This has expanded PaperBLAST's database, but most of the new entries are of low relevance, and the resulting snippets are often just lists of locus tags with annotations.
- February 2018: the alignment page reports the conservation of the hit's functional sites (if available from from Swiss-Prot or UniProt)
- January 2018: incorporated BRENDA.
- December 2017: incorporated MetaCyc, CharProtDB, CAZy, REBASE, and the reannotations from the Fitness Browser.
- September 2017: EuropePMC no longer returns some table entries in their search results. This has shrunk PaperBLAST's database, but has also reduced the number of low-relevance hits.
Many of these changes are described in Interactive tools for functional annotation of bacterial genomes.
PaperBLAST cannot provide snippets for many of the papers that are
published in non-open-access journals. This limitation applies even if
the paper is marked as "free" on the publisher's web site and is
available in PubmedCentral or EuropePMC. If a journal that you publish
in is marked as "secret," please consider publishing elsewhere.
Many important articles are missing from PaperBLAST, either because
the article's full text is not in EuropePMC (as for many older
articles), or because the paper does not mention a protein identifier such as a locus tag, or because of PaperBLAST's heuristics. If you notice an
article that characterizes a protein's function but is missing from
PaperBLAST, please notify the curators at UniProt
or add an entry to GeneRIF.
Entries in either of these databases will eventually be incorporated
into PaperBLAST. Note that to add an entry to UniProt, you will need
to find the UniProt identifier for the protein. If the protein is not
already in UniProt, you can ask them to create an entry. To add an
entry to GeneRIF, you will need an NCBI Gene identifier, but
unfortunately many prokaryotic proteins in RefSeq do not have
corresponding Gene identifers.
References
PaperBLAST: Text-mining papers for information about homologs.
M. N. Price and A. P. Arkin (2017). mSystems, 10.1128/mSystems.00039-17.
Europe PMC in 2017.
M. Levchenko et al (2017). Nucleic Acids Research, 10.1093/nar/gkx1005.
Gene indexing: characterization and analysis of NLM's GeneRIFs.
J. A. Mitchell et al (2003). AMIA Annu Symp Proc 2003:460-464.
UniProt: the universal protein knowledgebase.
The UniProt Consortium (2016). Nucleic Acids Research, 10.1093/nar/gkw1099.
BRENDA in 2017: new perspectives and new tools in BRENDA.
S. Placzek et al (2017). Nucleic Acids Research, 10.1093/nar/gkw952.
The EcoCyc database: reflecting new knowledge about Escherichia coli K-12.
I. M. Keeseler et al (2016). Nucleic Acids Research, 10.1093/nar/gkw1003.
The MetaCyc database of metabolic pathways and enzymes.
R. Caspi et al (2018). Nucleic Acids Research, 10.1093/nar/gkx935.
CharProtDB: a database of experimentally characterized protein annotations.
R. Madupu et al (2012). Nucleic Acids Research, 10.1093/nar/gkr1133.
The carbohydrate-active enzymes database (CAZy) in 2013.
V. Lombard et al (2014). Nucleic Acids Research, 10.1093/nar/gkt1178.
The Transporter Classification Database (TCDB): recent advances
M. H. Saier, Jr. et al (2016). Nucleic Acids Research, 10.1093/nar/gkv1103.
REBASE - a database for DNA restriction and modification: enzymes, genes and genomes.
R. J. Roberts et al (2015). Nucleic Acids Research, 10.1093/nar/gku1046.
Deep annotation of protein function across diverse bacteria from mutant phenotypes.
M. N. Price et al (2016). bioRxiv, 10.1101/072470.
by Morgan Price,
Arkin group
Lawrence Berkeley National Laboratory