PaperBLAST
PaperBLAST Hits for Pf6N2E2_291 (81 a.a., MYKVVLFNDD...)
Show query sequence
>Pf6N2E2_291
MYKVVLFNDDYTPMDFVVEVLEVFFNLNRELATKVMLAVHTEGRAVCGVFTRDIAETKAM
QVNQYARESQHPLLCEIEKDG
Running BLASTp...
Found 35 similar proteins in the literature:
PA2621 ATP-dependent Clp protease adaptor protein ClpS from Pseudomonas aeruginosa PAO1
PA14_30210 ATP-dependent Clp protease adaptor protein clpS from Pseudomonas aeruginosa UCBPP-PA14
86% identity, 66% coverage
- The emergence of cefiderocol resistance in Pseudomonas aeruginosa from a heteroresistant isolate during prolonged therapy
Teran, Antimicrobial agents and chemotherapy 2024 (secret) - Modulation of Type III Secretion System in Pseudomonas aeruginosa: Involvement of the PA4857 Gene Product
Zhu, Frontiers in microbiology 2016 - “...name or number Insertion site Protein description Max fold b PA0716/PA0717 788912 Hypothetical protein 3.4 PA2621 ( clpS ) 2964732 ATP-dependent Clp protease adaptor 7.2 PA3284 3676853 Hypothetical protein 6.0 PA0265 ( gabD ) 300415 Succinate-semialdehyde dehydrogenase -6.5 PA1056 ( shaC ) 1146026 Proton transport -4.0...”
- Genes required for and effects of alginate overproduction induced by growth of Pseudomonas aeruginosa on Pseudomonas isolation agar supplemented with ammonium metavanadate
Damron, Journal of bacteriology 2013 - “...glycosyltransferase Putative glycosyltransferase PA0666 PA0667 PA1726 PA2621 PA4001 PA5124 anmK bglX clpS sltB1 ntrB Putative chaperone Putative...”
- Role of intracellular proteases in the antibiotic resistance, motility, and biofilm formation of Pseudomonas aeruginosa
Fernández, Antimicrobial agents and chemotherapy 2012 - “...PA1803 (lon)b PA2620 (clpA) PAMr_nr_mas_06_2:F9 PA2621 (clpS) PAMr_nr_mas_11_1:C10 PA3326 PAMr_nr_mas_11_1:G12 PAMr_nr_mas_04_1:G10 PA3535 PA4576 Product of...”
- “...Three PA14 transposon mutants, namely, pfpI (PA0355), clpS (PA2621), and clpP (PA1801) mutants, like the lon mutant (17), displayed a strongly impaired ability...”
- Genome-wide identification of Pseudomonas aeruginosa virulence-related genes using a Caenorhabditis elegans infection model
Feinbaum, PLoS pathogens 2012 - “...(PA2620) is the second gene of a two gene operon; it is preceded by clpS (PA2621), encoding a ClpAP adaptor protein that has been shown to bind to the N-terminus of ClpA and inhibit ClpAP degradation of some substrates while enhancing the degradation of others [97]...”
- Genetic determinants involved in the susceptibility of Pseudomonas aeruginosa to beta-lactam antibiotics
Alvarez-Ortega, Antimicrobial agents and chemotherapy 2010 - “...PA14_23420 PA14_23430 PA14_43090 PA14_06490 PA1553 PA2023 PA2487 PA2621 PA2797 PA3141 PA3145 PA3247 PA3259 PA3520 PA3589 PA3620 PA3667 PA3704 PA3721 PA4109...”
- Genomewide identification of genetic determinants of antimicrobial drug resistance in Pseudomonas aeruginosa
Dötsch, Antimicrobial agents and chemotherapy 2009 - “...PA14_57570 PA14_58260 PA14_60860 PA14_66600 PA14_68610 PA14_18070 PA14_30210 PA0090 PA0958 PA1095 PA1348 PA1549 PA2023 PA2693 PA3224 PA3702 PA3831 PA4109...”
PMI0689 ATP-dependent Clp protease adaptor protein from Proteus mirabilis HI4320
70% identity, 76% coverage
SO2627, SO_2627 conserved hypothetical protein from Shewanella oneidensis MR-1
62% identity, 77% coverage
lpg0817 Hypothetical protein from Legionella pneumophila subsp. pneumophila str. Philadelphia 1
68% identity, 71% coverage
NP_459921 putative cytoplasmic protein from Salmonella enterica subsp. enterica serovar Typhimurium str. LT2
63% identity, 75% coverage
ClpS / b0881 specificity factor for ClpA-ClpP chaperone-protease complex from Escherichia coli K-12 substr. MG1655 (see 24 papers)
CLPS_ECOLI / P0A8Q6 ATP-dependent Clp protease adapter protein ClpS from Escherichia coli (strain K12) (see 3 papers)
clpS ATP-dependent Clp protease adaptor protein ClpS from Escherichia coli K12 (see 8 papers)
NP_415402 specificity factor for ClpA-ClpP chaperone-protease complex from Escherichia coli str. K-12 substr. MG1655
b0881 ATP-dependent Clp protease adaptor protein ClpS from Escherichia coli str. K-12 substr. MG1655
c1018 Protein yljA from Escherichia coli CFT073
Z1118 orf, hypothetical protein from Escherichia coli O157:H7 EDL933
ECs0967 hypothetical protein from Escherichia coli O157:H7 str. Sakai
63% identity, 75% coverage
- function: Involved in the modulation of the specificity of the ClpAP- mediated ATP-dependent protein degradation.
subunit: Binds to the N-terminal domain of the chaperone ClpA. - AAA+ protease-adaptor structures reveal altered conformations and ring specialization.
Kim, Nature structural & molecular biology 2022 - GeneRIF: AAA+ protease-adaptor structures reveal altered conformations and ring specialization.
- A single ClpS monomer is sufficient to direct the activity of the ClpA hexamer.
De, The Journal of biological chemistry 2010 - GeneRIF: one ClpS monomer is sufficient to direct the activity of the ClpA hexamer
- ClpS is the recognition component for Escherichia coli substrates of the N-end rule degradation pathway.
Schmidt, Molecular microbiology 2009 (PubMed)- GeneRIF: ClpS is an integral and essential component of the N-end rule pathway.
- Structural basis of N-end rule substrate recognition in Escherichia coli by the ClpAP adaptor protein ClpS.
Schuenemann, EMBO reports 2009 - GeneRIF: The data suggest that ClpS has been optimized for the binding and delivery of N-degrons containing an N-terminal Phe or Leu.
- Distinct structural elements of the adaptor ClpS are required for regulating degradation by ClpAP.
Hou, Nature structural & molecular biology 2008 (PubMed)- GeneRIF: ClpS functions, at least in part, as an allosteric effector of ClpAP, broadening understanding of how AAA+ adaptors control substrate selection.
- ClpS modulates but is not essential for bacterial N-end rule degradation.
Wang, Genes & development 2007 - GeneRIF: ClpAP recognizes N-end rule substrates directly, whereas ClpS modulates this degradation pathway.
- ClpS is an essential component of the N-end rule pathway in Escherichia coli.
Erbse, Nature 2006 (PubMed)- GeneRIF: the ClpAP-specific adaptor, ClpS, is essential for degradation of N-end rule substrates by ClpAP
- GeneRIF: ClpS selectively binds N-terminal destabilizing residues and targets them for degradation by the ClpAP complex resulting in carefully regulated proteolysis
- Crystallographic investigation of peptide binding sites in the N-domain of the ClpA chaperone.
Xia, Journal of structural biology (PubMed)- GeneRIF: Between the halves of the clpA pseudo-dimer is a large flexible acidic loop that becomes better ordered upon binding of the small adaptor protein, ClpS
- The Escherichia coli proteome: past, present, and future prospects
Han, Microbiology and molecular biology reviews : MMBR 2006 - “...proteolytic subunit 5.52/23,186.65 5.60/24,224 (5-6) ClpS P0A8Q6 ATP-dependent Clp protease adaptor protein 4.94/12,179.06 5.40/10,645 (4.5-5.5) ClpX P0A6H1...”
- Elucidation of the antibacterial mechanism of the Curvularia haloperoxidase system by DNA microarray profiling
Hansen, Applied and environmental microbiology 2004 - “...b2000 b4367 b0475 b1684 b1683 b1682 b1681 b1680 b1679 b0006 b0389 b0848 b0881 3.3 5.2 4.7 3.3 4.8 2.3 1.0 1.4 0.2 3.1 1.3 2.0 4.4 4.0 3.8 4.0 4.3 4.2 2.2 2.9...”
- DNA microarray-mediated transcriptional profiling of the Escherichia coli response to hydrogen peroxide
Zheng, Journal of bacteriology 2001 - “...b3924 b2414 b1683 b2365 b0848 b2012 b2366 b4062 b3917 b1682 b1020 b0475 b0881 b1164 18 18 17 16 16 15 15 14 13 13 12 12 12 11 11 11 Functionb Stress response...”
- “...4.0 3.4 3.0 1.5 3.5 yaaA yaiA ybjM yljA b0006 b0389 b0848 b0881 110 130 51 250 18 56 15.0 11.0 4.2 16 1.1 4.6 a Levels in cells during exponential growth in LB...”
- Genome-wide transcriptional profiling of the Escherichia coli responses to superoxide stress and sodium salicylate
Pomposiello, Journal of bacteriology 2001 - “...Description NaSal-activated b4014 b1276 b1241 b2252 b0864 b0485 b0881 b1112 b1164 b1165 b1200 b1450 b1452 b1643 b1795 b2174 b2266 b2672 b3004 b3024 b3238 b3242...”
- “...b3506 b3284 b3908 b2703 aceB acnA adhE ais artP b0485 b0881 b1112 b1164 b1165 b1200 b1450 b1452 b1643 b1795 b2174 b2266 b2672 b3004 b3024 b3238 b3242 cfa cyaA...”
- Genome-wide expression profiling in Escherichia coli K-12
Richmond, Nucleic acids research 1999 - “...10.5 1.0 b0966 yccV 10.3 34.3 b0879 ybjZ 10.2 5.2 b0881 yljA 10.2 6.5 b0400 phoR 10.1 34.5 Lysine decarboxylase 1 Cell division protein Mg2+ transport, system I...”
- Microbial analyses of ancient ice core sections from greenland and antarctica
Knowlton, Biology 2013 - “...e c3738 KC146577 Lactobacillus helveticus Fi 98 f c833 KC146573 Lactobacillus helveticus Fi 98 e c1018 KC146574 Lactobacillus helveticus Fi 98 f GI855 KC206493 Penicillium chrysognum As 98 d GI858 KC206480 Rhodotorula mucilacinosa Ba 98 d c1826 KC146566 Uncultured bacterium Fi 98 e c3135 KC146552 Uncultured...”
- Small non-coding RNAs in Caulobacter crescentus
Landt, Molecular microbiology 2008 (secret) - Small RNA-binding protein RapZ mediates cell envelope precursor sensing and signaling in Escherichia coli
Khan, The EMBO journal 2020 - “...counteract activation of QseE/QseF by RapZ Galactosidase activities of strains Z197 ( wildtype ) and Z1118 ( glmY glmZ ), which carry the chromosomal glmYlacZ fusion, were determined during growth. Strains Z197 ( wildtype ) and Z225 ( rapZ ) were transformed with the following plasmids...”
- Chromosomal instability in enterohaemorrhagic Escherichia coli O157:H7: impact on adherence, tellurite resistance and colony phenotype
Bielaszewska, Molecular microbiology 2011 - “...OI 43 and OI 48 ( Table S2 ) demonstrated deletions of 2.9 kb (ORFs Z1118 and clpA ) and 145.9 kb (ORFs Z1399 up to ycdU ) of the core chromosome respectively. Analysis of a 3711 bp amplicon connecting ORFs Z1398 and Z1650, which spans...”
- Gene expression induced in Escherichia coli O157:H7 upon exposure to model apple juice
Bergholz, Applied and environmental microbiology 2009 - “...0.76 Regulatory functions ECs0504 ECs0507 ECs0755 ECs0902 ECs0967 ECs1199 ECs1250 ECs1489 ECs1557 ECs1682 ECs1880 ECs2445 ECs2706 ECs2783 ECs2784 ECs2988...”
- “...1.76 1.32 1.60 Viral functions ECs0278 ECs0507 ECs0902 ECs0967 ECs1110 ECs1758 ECs3058 ECs3503 ECs3911 ECs4588 ECs4968 ECs4969 ECs4970 ECs4977 ECs4982 O157 ybaY...”
3o2bC / P0A8Q6 E. Coli clps in complex with a phe n-end rule peptide (see paper)
63% identity, 75% coverage
VC1143 conserved hypothetical protein from Vibrio cholerae O1 biovar eltor str. N16961
63% identity, 75% coverage
AOLE_08275 ATP-dependent Clp protease adapter ClpS from Acinetobacter oleivorans DR1
63% identity, 57% coverage
RSc2465 CONSERVED HYPOTHETICAL PROTEIN from Ralstonia solanacearum GMI1000
58% identity, 73% coverage
- Transcriptomes of Ralstonia solanacearum during Root Colonization of Solanum commersonii
Puigvert, Frontiers in plant science 2017 - “...RSp0814 2.44631 mqo malate:quinone oxidoreductase RSUY_RS11960 RSUY_24410 RSc2358 1.7429 ppc phosphoenolpyruvate carboxylase Proteases RSUY_RS12475 RSUY_25460 RSc2465 2.359347 clpS ATP-dependent Clp protease adaptor ClpS RSUY_RS18550 RSUY_38040 RSp0603 2.211049 serine protease RSUY_RS14120 RSUY_28870 RSc0388 1.98903 zinc protease Lipid metabolism RSUY_RS17295 RSUY_35410 2.478807 Acyl-CoA synthetase RSUY_RS01975 RSUY_04090 RSc3052 2.40887...”
BCAL2731 ATP-dependent Clp protease adaptor protein ClpS from Burkholderia cenocepacia J2315
61% identity, 76% coverage
- NtrC-dependent control of exopolysaccharide synthesis and motility in Burkholderia cenocepacia H111
Liu, PloS one 2017 - “...I35_0765 BCAL3108 Urease accessory protein ureF -7.1 I35_0766 BCAL3107 Urease accessory protein ureE -7.8 I35_2591 BCAL2731 ATP-dependent Clp protease adaptor protein clpS -2.6 I35_2821 BCAL0849 Putative lipoprotein 3.1 I35_3125 BCAL0540 ATP-dependent protease domain protein -4.1 I35_4673 BCAM0775 Glutathione S-transferase -5.3 I35_5615 BCAM1744 Extracellular protease precursor -2.3...”
- Response of Burkholderia cenocepacia H111 to micro-oxia
Pessi, PloS one 2013 - “...BCAL1919 ClpB protein 4.3 20.9 CCE49077 BCAL2730 ATP-dependent protease ATP-binding subunit ClpA 3.9 7.2 CCE49078 BCAL2731 ATP-dependent Clp protease adaptor protein ClpS 2.5 11.0 CCE49625 BCAL2780 Thioredoxin domain-containing protein EC-YbbN 1.7 10.6 CCE52629 BCAL3146 Heat shock protein 60 family chaperone GroEL 2.7 6.7 CCE51225 BCAL3269 Chaperone...”
PD0664 conserved hypothetical protein from Xylella fastidiosa Temecula1
58% identity, 74% coverage
RSUY_25460, RSUY_RS12475 ATP-dependent Clp protease adapter ClpS from Ralstonia solanacearum
57% identity, 73% coverage
- Transcriptomes of Ralstonia solanacearum during Root Colonization of Solanum commersonii
Puigvert, Frontiers in plant science 2017 - “...RSUY_39990 RSp0814 2.44631 mqo malate:quinone oxidoreductase RSUY_RS11960 RSUY_24410 RSc2358 1.7429 ppc phosphoenolpyruvate carboxylase Proteases RSUY_RS12475 RSUY_25460 RSc2465 2.359347 clpS ATP-dependent Clp protease adaptor ClpS RSUY_RS18550 RSUY_38040 RSp0603 2.211049 serine protease RSUY_RS14120 RSUY_28870 RSc0388 1.98903 zinc protease Lipid metabolism RSUY_RS17295 RSUY_35410 2.478807 Acyl-CoA synthetase RSUY_RS01975 RSUY_04090 RSc3052...”
- “...RSUY_RS19480 RSUY_39990 RSp0814 2.44631 mqo malate:quinone oxidoreductase RSUY_RS11960 RSUY_24410 RSc2358 1.7429 ppc phosphoenolpyruvate carboxylase Proteases RSUY_RS12475 RSUY_25460 RSc2465 2.359347 clpS ATP-dependent Clp protease adaptor ClpS RSUY_RS18550 RSUY_38040 RSp0603 2.211049 serine protease RSUY_RS14120 RSUY_28870 RSc0388 1.98903 zinc protease Lipid metabolism RSUY_RS17295 RSUY_35410 2.478807 Acyl-CoA synthetase RSUY_RS01975 RSUY_04090...”
CCNA_02552 ATP-dependent Clp protease adaptor protein ClpS from Caulobacter crescentus NA1000
60% identity, 73% coverage
- Environmental Conditions Modulate the Transcriptomic Response of Both Caulobacter crescentus Morphotypes to Cu Stress
Maertens, Microorganisms 2021 - “...were linked to genes encoding proteases and chaperones such as lon (CCNA_02037), clpX (CCNA_02039), clpS (CCNA_02552), and the gene encoding the Hsp20-family protein CCNA_03706 ( Figure 10 and Supplementary Table S6 ). For all of these genes, at least one TSS was previously detected (i.e., Zhou...”
- Two Outer Membrane Proteins Contribute to Caulobacter crescentus Cellular Fitness by Preventing Intracellular S-Layer Protein Accumulation
Overton, Applied and environmental microbiology 2016 - “...CCNA_02553 CCNA_03195 CCNA_02860 CCNA_00152 CCNA_03153 CCNA_00693 CCNA_02552 CCNA_03105 DnaK, chaperone protein DnaJ, chaperone protein Small heat shock protein...”
- Transposon Mutagenesis Paired with Deep Sequencing of Caulobacter crescentus under Uranium Stress Reveals Genes Essential for Detoxification and Stress Tolerance
Yung, Journal of bacteriology 2015 - “...CCNA_01379 CCNA_03625 CCNA_03498 CCNA_01521 CCNA_00290 CCNA_02552 CCNA_02553 CCNA_02140 CCNA_01067 CCNA_01061 CCNA_01622 LexA-like transcriptional repressor...”
- Global transcriptional response of Caulobacter crescentus to iron availability
da, BMC genomics 2013 - “...GroES 2.27 CC_0878 CCNA_00922 ClpB protein 2.71 CC_2258 CCNA_02341 Small heat shock protein 5.50 CC_2467 CCNA_02552 ATP-dependent Clp protease adaptor protein ClpS 2.42 CC_2468 CCNA_02553 ATP-dependent clp protease ATP-binding subunit ClpA 2.63 CC_2509 CCNA_02594 Endopeptidase htpX 8.09 CC_2510 b CCNA_02595 Hypothetical protein 8.94 CC_3098 b CCNA_03195...”
CC_2467 conserved hypothetical protein from Caulobacter crescentus CB15
60% identity, 67% coverage
- Global transcriptional response of Caulobacter crescentus to iron availability
da, BMC genomics 2013 - “...Co-chaperonin GroES 2.27 CC_0878 CCNA_00922 ClpB protein 2.71 CC_2258 CCNA_02341 Small heat shock protein 5.50 CC_2467 CCNA_02552 ATP-dependent Clp protease adaptor protein ClpS 2.42 CC_2468 CCNA_02553 ATP-dependent clp protease ATP-binding subunit ClpA 2.63 CC_2509 CCNA_02594 Endopeptidase htpX 8.09 CC_2510 b CCNA_02595 Hypothetical protein 8.94 CC_3098 b...”
3gq1A / Q9A5I0 The structure of the caulobacter crescentus clps protease adaptor protein in complex with a wlfvqrdske decapeptide (see paper)
60% identity, 94% coverage
Atu1363 ATP-dependent Clp protease adaptor protein ClpS from Agrobacterium tumefaciens str. C58 (Cereon)
57% identity, 68% coverage
SMc02110 CONSERVED HYPOTHETICAL PROTEIN from Sinorhizobium meliloti 1021
58% identity, 68% coverage
RL2212 putative ATP-dependent CLP protease adaptor protein from Rhizobium leguminosarum bv. viciae 3841
56% identity, 72% coverage
SL003B_1826 ATP-dependent Clp protease adapter ClpS from Polymorphum gilvum SL003B-26A1
57% identity, 72% coverage
- The genome sequence of Polymorphum gilvum SL003B-26A1(T) reveals its genetic basis for crude oil degradation and adaptation to the saline soil
Nie, PloS one 2012 - “...heat response proteases such as ATP-dependent metalloprotease FtsH (SL003B_0653, SL003B_0928), ATP-dependent Clp protease (SL003B_1811, SL003B_1812, SL003B_1826, SL003B_1827, SL003B_2063, and SL003B_2064), and ATP-dependent protease HslVU (SL003B_4321 and SL003B_4322). The strain SL003B-26A1 T also contains cold shock genes encoding Csp [63] (SL003B_1226, SL003B_1984, SL003B_3547, SL003B_3721, and SL003B_4222) for...”
RSP_0686 ATP-dependent Clp protease adaptor protein clpS from Rhodobacter sphaeroides 2.4.1
58% identity, 75% coverage
- Convergence of the transcriptional responses to heat shock and singlet oxygen stresses
Dufour, PLoS genetics 2012 - “...Phosphorus compounds RSP_0782 Protein synthesis/fate Amino acid biosynthesis RSP_0398 Degradation of proteins, peptides, and glycopeptides RSP_0686, RSP_1490 Protein folding and stabilization RSP_1219 tRNA and rRNA base modification RSP_2971 Unknown function Unknown function RSP_0151, RSP_0152, RSP_0269, RSP_0423, RSP_0557, RSP_0799, RSP_0896, RSP_1591, RSP_1956, RSP_1985, RSP_2225, RSP_2268, RSP_3075, RSP_3076,...”
OFBG_01607 ATP-dependent Clp protease adapter ClpS from Oxalobacter formigenes OXCC13
54% identity, 77% coverage
WP_027864722 ATP-dependent Clp protease adapter ClpS from Massilia varians
53% identity, 79% coverage
NGO0409 hypothetical protein from Neisseria gonorrhoeae FA 1090
55% identity, 78% coverage
LIMLP_09010 ATP-dependent Clp protease adapter ClpS from Leptospira interrogans serovar Manilae
Q72RD1 ATP-dependent Clp protease adapter protein ClpS from Leptospira interrogans serogroup Icterohaemorrhagiae serovar copenhageni (strain Fiocruz L1-130)
LIC11815 conserved hypothetical protein from Leptospira interrogans serovar Copenhageni str. Fiocruz L1-130
45% identity, 67% coverage
- Leptospira interrogans biofilm transcriptome highlights adaption to starvation and general stress while maintaining virulence
Davignon, NPJ biofilms and microbiomes 2024 - “...cognate complexes, clpA (LIMLP_09005, FC 2.8) and clpX (LIMLP_06920, 1.7), and its cofactor-encoding clpS gene (LIMLP_09010, FC 3.6). A gene encoding an ATP-dependent Lon protease (LIMLP_14705, FC 1.7), a gene having a Lon substrate binding domain (LIMLP_07500, FC 2.8) but lacking the AAA ATPase domain as...”
- Insights to the Assembly of a Functionally Active Leptospiral ClpP1P2 Protease Complex along with Its ATPase Chaperone ClpX
Dhara, ACS omega 2019 - “...(LIC11814) 22066012208841 Q72RD2 2241 4 bp overlap ( clpA & clpS ) clpS (LIC11815) 22088382209173 Q72RD1 336 150158 ( clpS & clpP2 ) clpP2 (LIC11951) 23593322359925 Q72R01 594 68240 ( clpP2 & clpB ) clpB (LIC12017) 24281662430748 Q72QU2 2583 not applicable Molecular Characterization of Core Catalytic...”
- Insights to the Assembly of a Functionally Active Leptospiral ClpP1P2 Protease Complex along with Its ATPase Chaperone ClpX
Dhara, ACS omega 2019 - “...( LIC11601 ) genes, whereas the adaptor proteins are encoded by clpS ( LIC11356 and LIC11815 ) ( Figure 1 and Table 1 ). It has been previously reported that the genes encoding caseinolytic proteases are highly conserved in both saprophytic and pathogenic strains of Leptospira...”
- “...the existence of two paralogs of the clpS gene were also predicted, where one ( LIC11815 ) of the clpS genes lies adjacent to clpA ( LIC11814 ), whereas the other ( LIC11356 ) clpS is located distant apart on the chromosome ( Figure 1 and...”
B2I23_RS00190 ATP-dependent Clp protease adapter ClpS from Candidatus Liberibacter asiaticus
48% identity, 57% coverage
TDE2123 conserved hypothetical protein from Treponema denticola ATCC 35405
49% identity, 77% coverage
- Transcriptional profiles of Treponema denticola in response to environmental conditions
McHardy, PloS one 2010 - “...GGGACAGGCAAAGAGCATAA GGGCCTTGATCTGGGTAACT RT-PCR TDE1382 TAGTAAAAAGCCGCCGAAAC TACCTGCCCTCCCTAATGTG RT-PCR TDE1663 TCGATCAGTTTACCGCACA CTTCATCCTTTTGTGAATCCAG RT-PCR TDE1795 CATATTCAAGACCGCGTGAT AGAAAAACATCCCGGTTTCC RT-PCR TDE2123 CAAGCCCAAAAGGGGACTAT ATAAGGACGGCCACAACAAA RT-PCR TDE2300 ATACGGTTGGCTTGGTGTTC TCCGCAGGAGAACCTAAAAA RT-PCR TDE2327 CCCGCAAATACAAGGAAGAA CTTTTCGAGTTCGGGGATTT RT-PCR TDE2480 CCAGCTTTGCCGATTATGTT ATGAGGAGATTGACGCAAGG RT-PCR TDE2592 AGGCGATCAAAACACAGGAA CAACATAAGACCGCATCGTG RT-PCR TDE2699 GGAAGAAACCTGCACATCGT GGGATTTTGCGTCGATAAGA RT-PCR TDE0626 AAAGACCGTAAAAGGCGAAGT Operon analysis TDE0627 TGAGTCTGCGGTGAAAGATG AATCATTGAAACGGCTTCGT...”
- “...Operon analysis TDE1173 CTCCAACGTTTACCGCTGAT Operon analysis TDE1174 GGGATAAATGCATCAAGCAA GATAAGTTCTCCGCCTGCTG Operon analysis TDE1175 GAAGATGCTCTTTCGGCAAC Operon analysis TDE2123 CAAGCCCAAAAGGGGACTAT Operon analysis TDE2124 CCCTTGAGCTTGAAGACGAC GCAAGGCTGTTTCTTCAAGG Operon analysis TDE2125 AGCAAAGCCCAGCTTATGAA Operon analysis TDE2479 CAAGAAAGCCGTCAAGCAAT Operon analysis TDE2480 GATACGGCCTTCCCCATAAT GATCGGTTTCGTCCACAACT Operon analysis TDE2481 TTCTCTCCCCTTGCCTTTTT Operon analysis 670 Flank 1 CGGCAAAACCTTGTTGGATA CGTTGCGGGCTAGCTAAAAGCGGCGTAAAAATGC...”
Cj1107 hypothetical protein Cj1107 from Campylobacter jejuni subsp. jejuni NCTC 11168
42% identity, 83% coverage
mll2736 ATP-dependent Clp protease adaptor protein ClpS from Mesorhizobium loti MAFF303099
42% identity, 79% coverage
CAETHG_RS02630 ATP-dependent Clp protease adaptor ClpS from Clostridium autoethanogenum DSM 10061
49% identity, 81% coverage
4yjxC / Q8UD95 The structure of agrobacterium tumefaciens clps2 bound to l- phenylalaninamide (see paper)
42% identity, 95% coverage
- Ligand: phenylalanine amide (4yjxC)
MXAN_6025 ATP-dependent Clp protease adaptor protein ClpS from Myxococcus xanthus DK 1622
44% identity, 75% coverage
jhp0028 putative from Helicobacter pylori J99
41% identity, 81% coverage
- Genetic signatures for Helicobacter pylori strains of West African origin
Bullock, PloS one 2017 - “...% amino acid identity a Annotation or predicted function Protein Length (amino acids) b HP0032 JHP0028 80.0 ATP-dependent Clp protease adaptor protein 91 HP0159 JHP0147 88.2 LPS 1,2 glycosyltransferase 372 HP0160 JHP0148 88.7 HcpD penicillin-binding protein 306 HP0379 JHP1002 76.8 Alpha 1,3 fucosyltransferase 425 HP0492 JHP0444...”
- “...the same region of the chromosome (HP0032, HP0033, HP1051 and HP1053 in reference strain 26695; JHP0028, JHP0029, JHP0374 and HP0372 in reference strain J99), and are likely co-transcribed. HP0032 and HP033 are predicted to have related functions. HP0032 encodes an ATP-dependent Clp protease and HP0033 encodes...”
- Genome sequence analysis of Helicobacter pylori strains associated with gastric ulceration and gastric cancer
McClain, BMC genomics 2009 - “...Description % aa identity (J99) a % aa identity (non-J99) b % unique sites c jhp0028 HP0032 Hypothetical 68 91 24 jhp0080 HP0087 d Hypothetical 89 96 8 jhp0173 HP0185 d Hypothetical 88 93 7 jhp0395 HP1029 d Hypothetical 88 95 7 a The sequences of...”
HP0032 conserved hypothetical protein from Helicobacter pylori 26695
33% identity, 87% coverage
- Genetic signatures for Helicobacter pylori strains of West African origin
Bullock, PloS one 2017 - “...Mean % amino acid identity a Annotation or predicted function Protein Length (amino acids) b HP0032 JHP0028 80.0 ATP-dependent Clp protease adaptor protein 91 HP0159 JHP0147 88.2 LPS 1,2 glycosyltransferase 372 HP0160 JHP0148 88.7 HcpD penicillin-binding protein 306 HP0379 JHP1002 76.8 Alpha 1,3 fucosyltransferase 425 HP0492...”
- “...Mean % amino acid identity, intra-hspWAfrica Annotation or predicted function Protein length (amino acids) a HP0032 73.3 88.3 99.0 ATP-dependent Clp protease adaptor ClpS 91 HP0033 89.9 96.1 98.9 ATP-dependent Clp protease 741 HP0257 88.3 94.3 93.9 Predicted coding region 219 HP0384 87.9 94.0 97.7 SPOR...”
- Genome-wide survey of mutual homologous recombination in a highly sexual bacterial species
Yahara, Genome biology and evolution 2012 - “...5 504 A HP1448 rnpA Ribonuclease P, protein component Transcription 0.065 0.010 5 486 A HP0032 clpS Hypothetical protein Other categories 0.066 0.011 3 276 A HP0320 tatA Sec-independent protein translocase protein Translocation 0.041 0.013 3 240 A HP0799 mogA Molybdenum cofactor biosynthesis protein Biosynthesis of...”
- Genome sequence analysis of Helicobacter pylori strains associated with gastric ulceration and gastric cancer
McClain, BMC genomics 2009 - “...% aa identity (J99) a % aa identity (non-J99) b % unique sites c jhp0028 HP0032 Hypothetical 68 91 24 jhp0080 HP0087 d Hypothetical 89 96 8 jhp0173 HP0185 d Hypothetical 88 93 7 jhp0395 HP1029 d Hypothetical 88 95 7 a The sequences of the...”
HPB8_1592 ATP-dependent Clp protease adaptor ClpS from Helicobacter pylori B8
34% identity, 87% coverage
- Sequencing, annotation, and comparative genome analysis of the gerbil-adapted Helicobacter pylori strain B8
Farnbacher, BMC genomics 2010 - “...typical genes related to DNA modification e.g. DNA methylases (HPB8_1059, HPB8_1100, HPB8_1101, HPB8_1103, HPB8_1538, and HPB8_1592) and restriction endonucleases (HPB8_1060, HPB8_1119, HPB8_1120, HPB8_1121, and HPB8_1706) are present in the genome of strain B8 (Additional file 1 , Table S8). Furthermore, we found two genes coding for...”
For advice on how to use these tools together, see
Interactive tools for functional annotation of bacterial genomes.
The PaperBLAST database links 793,807 different protein sequences to 1,259,118 scientific articles. Searches against EuropePMC were last performed on March 13 2025.
PaperBLAST builds a database of protein sequences that are linked
to scientific articles. These links come from automated text searches
against the articles in EuropePMC
and from manually-curated information from GeneRIF, UniProtKB/Swiss-Prot,
BRENDA,
CAZy (as made available by dbCAN),
BioLiP,
CharProtDB,
MetaCyc,
EcoCyc,
TCDB,
REBASE,
the Fitness Browser,
and a subset of the European Nucleotide Archive with the /experiment tag.
Given this database and a protein sequence query,
PaperBLAST uses protein-protein BLAST
to find similar sequences with E < 0.001.
To build the database, we query EuropePMC with locus tags, with RefSeq protein
identifiers, and with UniProt
accessions. We obtain the locus tags from RefSeq or from MicrobesOnline. We use
queries of the form "locus_tag AND genus_name" to try to ensure that
the paper is actually discussing that gene. Because EuropePMC indexes
most recent biomedical papers, even if they are not open access, some
of the links may be to papers that you cannot read or that our
computers cannot read. We query each of these identifiers that
appears in the open access part of EuropePMC, as well as every locus
tag that appears in the 500 most-referenced genomes, so that a gene
may appear in the PaperBLAST results even though none of the papers
that mention it are open access. We also incorporate text-mined links
from EuropePMC that link open access articles to UniProt or RefSeq
identifiers. (This yields some additional links because EuropePMC
uses different heuristics for their text mining than we do.)
For every article that mentions a locus tag, a RefSeq protein
identifier, or a UniProt accession, we try to select one or two
snippets of text that refer to the protein. If we cannot get access to
the full text, we try to select a snippet from the abstract, but
unfortunately, unique identifiers such as locus tags are rarely
provided in abstracts.
PaperBLAST also incorporates manually-curated protein functions:
- Proteins from NCBI's RefSeq are included if a
GeneRIF
entry links the gene to an article in
PubMed®.
GeneRIF also provides a short summary of the article's claim about the
protein, which is shown instead of a snippet.
- Proteins from Swiss-Prot (the curated part of UniProt)
are included if the curators
identified experimental evidence for the protein's function (evidence
code ECO:0000269). For these proteins, the fields of the Swiss-Prot entry that
describe the protein's function are shown (with bold headings).
- Proteins from BRENDA,
a curated database of enzymes, are included if they are linked to a paper in PubMed
and their full sequence is known.
- Every protein from the non-redundant subset of
BioLiP,
a database
of ligand-binding sites and catalytic residues in protein structures, is included. Since BioLiP itself
does not include descriptions of the proteins, those are taken from the
Protein Data Bank.
Descriptions from PDB rely on the original submitter of the
structure and cannot be updated by others, so they may be less reliable.
(For SitesBLAST and Sites on a Tree, we use a larger subset of BioLiP so that every
ligand is represented among a group of structures with similar sequences, but for
PaperBLAST, we use the non-redundant set provided by BioLiP.)
- Every protein from EcoCyc, a curated
database of the proteins in Escherichia coli K-12, is included, regardless
of whether they are characterized or not.
- Proteins from the MetaCyc metabolic pathway database
are included if they are linked to a paper in PubMed and their full sequence is known.
- Proteins from the Transport Classification Database (TCDB)
are included if they have known substrate(s), have reference(s),
and are not described as uncharacterized or putative.
(Some of the references are not visible on the PaperBLAST web site.)
- Every protein from CharProtDB,
a database of experimentally characterized protein annotations, is included.
- Proteins from the CAZy database of carbohydrate-active enzymes
are included if they are associated with an Enzyme Classification number.
Even though CAZy does not provide links from individual protein sequences to papers,
these should all be experimentally-characterized proteins.
- Proteins from the REBASE database
of restriction enzymes are included if they have known specificity.
- Every protein with an evidence-based reannotation (based on mutant phenotypes)
in the Fitness Browser is included.
- Sequence-specific transcription factors (including sigma factors and DNA-binding response regulators)
with experimentally-determined DNA binding sites from the
PRODORIC database of gene regulation in prokaryotes.
- Putative transcription factors from RegPrecise
that have manually-curated predictions for their binding sites. These predictions are based on
conserved putative regulatory sites across genomes that contain similar transcription factors,
so PaperBLAST clusters the TFs at 70% identity and retains just one member of each cluster.
- Coding sequence (CDS) features from the
European Nucleotide Archive (ENA)
are included if the /experiment tag is set (implying that there is experimental evidence for the annotation),
the nucleotide entry links to paper(s) in PubMed,
and the nucleotide entry is from the STD data class
(implying that these are targeted annotated sequences, not from shotgun sequencing).
Also, to filter out genes whose transcription or translation was detected, but whose function
was not studied, nucleotide entries or papers with more than 25 such proteins are excluded.
Descriptions from ENA rely on the original submitter of the
sequence and cannot be updated by others, so they may be less reliable.
Except for GeneRIF and ENA,
the curated entries include a short curated
description of the protein's function.
For entries from BioLiP, the protein's function may not be known beyond binding to the ligand.
Many of these entries also link to articles in PubMed.
For more information see the
PaperBLAST paper (mSystems 2017)
or the code.
You can download PaperBLAST's database here.
Changes to PaperBLAST since the paper was written:
- November 2023: incorporated PRODORIC and RegPrecise. Many PRODORIC entries were not linked to a protein sequence (no UniProt identifier), so we added this information.
- February 2023: BioLiP changed their download format. PaperBLAST now includes their non-redundant subset. SitesBLAST and Sites on a Tree use a larger non-redundant subset that ensures that every ligand is represented within each cluster. This should ensure that every binding site is represented.
- June 2022: incorporated some coding sequences from ENA with the /experiment tag.
- March 2022: incorporated BioLiP.
- April 2020: incorporated TCDB.
- April 2019: EuropePMC now returns table entries in their search results. This has expanded PaperBLAST's database, but most of the new entries are of low relevance, and the resulting snippets are often just lists of locus tags with annotations.
- February 2018: the alignment page reports the conservation of the hit's functional sites (if available from from Swiss-Prot or UniProt)
- January 2018: incorporated BRENDA.
- December 2017: incorporated MetaCyc, CharProtDB, CAZy, REBASE, and the reannotations from the Fitness Browser.
- September 2017: EuropePMC no longer returns some table entries in their search results. This has shrunk PaperBLAST's database, but has also reduced the number of low-relevance hits.
Many of these changes are described in Interactive tools for functional annotation of bacterial genomes.
PaperBLAST cannot provide snippets for many of the papers that are
published in non-open-access journals. This limitation applies even if
the paper is marked as "free" on the publisher's web site and is
available in PubmedCentral or EuropePMC. If a journal that you publish
in is marked as "secret," please consider publishing elsewhere.
Many important articles are missing from PaperBLAST, either because
the article's full text is not in EuropePMC (as for many older
articles), or because the paper does not mention a protein identifier such as a locus tag, or because of PaperBLAST's heuristics. If you notice an
article that characterizes a protein's function but is missing from
PaperBLAST, please notify the curators at UniProt
or add an entry to GeneRIF.
Entries in either of these databases will eventually be incorporated
into PaperBLAST. Note that to add an entry to UniProt, you will need
to find the UniProt identifier for the protein. If the protein is not
already in UniProt, you can ask them to create an entry. To add an
entry to GeneRIF, you will need an NCBI Gene identifier, but
unfortunately many prokaryotic proteins in RefSeq do not have
corresponding Gene identifers.
References
PaperBLAST: Text-mining papers for information about homologs.
M. N. Price and A. P. Arkin (2017). mSystems, 10.1128/mSystems.00039-17.
Europe PMC in 2017.
M. Levchenko et al (2017). Nucleic Acids Research, 10.1093/nar/gkx1005.
Gene indexing: characterization and analysis of NLM's GeneRIFs.
J. A. Mitchell et al (2003). AMIA Annu Symp Proc 2003:460-464.
UniProt: the universal protein knowledgebase.
The UniProt Consortium (2016). Nucleic Acids Research, 10.1093/nar/gkw1099.
BRENDA in 2017: new perspectives and new tools in BRENDA.
S. Placzek et al (2017). Nucleic Acids Research, 10.1093/nar/gkw952.
The EcoCyc database: reflecting new knowledge about Escherichia coli K-12.
I. M. Keeseler et al (2016). Nucleic Acids Research, 10.1093/nar/gkw1003.
The MetaCyc database of metabolic pathways and enzymes.
R. Caspi et al (2018). Nucleic Acids Research, 10.1093/nar/gkx935.
CharProtDB: a database of experimentally characterized protein annotations.
R. Madupu et al (2012). Nucleic Acids Research, 10.1093/nar/gkr1133.
The carbohydrate-active enzymes database (CAZy) in 2013.
V. Lombard et al (2014). Nucleic Acids Research, 10.1093/nar/gkt1178.
The Transporter Classification Database (TCDB): recent advances
M. H. Saier, Jr. et al (2016). Nucleic Acids Research, 10.1093/nar/gkv1103.
REBASE - a database for DNA restriction and modification: enzymes, genes and genomes.
R. J. Roberts et al (2015). Nucleic Acids Research, 10.1093/nar/gku1046.
Deep annotation of protein function across diverse bacteria from mutant phenotypes.
M. N. Price et al (2016). bioRxiv, 10.1101/072470.
by Morgan Price,
Arkin group
Lawrence Berkeley National Laboratory