PaperBLAST

PaperBLAST – Find papers about a protein or its homologs

PaperBLAST

PaperBLAST Hits for Pf6N2E2_291 (81 a.a., MYKVVLFNDD...)

Other sequence analysis tools:

Find functional residues: SitesBLAST

Search for conserved domains

Find the best match in UniProt

Compare to protein structures

Predict transmenbrane helices: Phobius

Predict protein localization: PSORTb

Find homologs in fast.genomics

Fitness BLAST: loading...

Show query sequence

Found 35 similar proteins in the literature:

PA2621 ATP-dependent Clp protease adaptor protein ClpS from Pseudomonas aeruginosa PAO1
PA14_30210 ATP-dependent Clp protease adaptor protein clpS from Pseudomonas aeruginosa UCBPP-PA14
86% identity, 66% coverage

The emergence of cefiderocol resistance in Pseudomonas aeruginosa from a heteroresistant isolate during prolonged therapy
Teran, Antimicrobial agents and chemotherapy 2024 (secret)
Modulation of Type III Secretion System in Pseudomonas aeruginosa: Involvement of the PA4857 Gene Product
Zhu, Frontiers in microbiology 2016
- “...name or number Insertion site Protein description Max fold b PA0716/PA0717 788912 Hypothetical protein 3.4 PA2621 ( clpS ) 2964732 ATP-dependent Clp protease adaptor 7.2 PA3284 3676853 Hypothetical protein 6.0 PA0265 ( gabD ) 300415 Succinate-semialdehyde dehydrogenase -6.5 PA1056 ( shaC ) 1146026 Proton transport -4.0...”
Genes required for and effects of alginate overproduction induced by growth of Pseudomonas aeruginosa on Pseudomonas isolation agar supplemented with ammonium metavanadate
Damron, Journal of bacteriology 2013
- “...glycosyltransferase Putative glycosyltransferase PA0666 PA0667 PA1726 PA2621 PA4001 PA5124 anmK bglX clpS sltB1 ntrB Putative chaperone Putative...”
Role of intracellular proteases in the antibiotic resistance, motility, and biofilm formation of Pseudomonas aeruginosa
Fernández, Antimicrobial agents and chemotherapy 2012
- “...PA1803 (lon)b PA2620 (clpA) PAMr_nr_mas_06_2:F9 PA2621 (clpS) PAMr_nr_mas_11_1:C10 PA3326 PAMr_nr_mas_11_1:G12 PAMr_nr_mas_04_1:G10 PA3535 PA4576 Product of...”
- “...Three PA14 transposon mutants, namely, pfpI (PA0355), clpS (PA2621), and clpP (PA1801) mutants, like the lon mutant (17), displayed a strongly impaired ability...”
Genome-wide identification of Pseudomonas aeruginosa virulence-related genes using a Caenorhabditis elegans infection model
Feinbaum, PLoS pathogens 2012
- “...(PA2620) is the second gene of a two gene operon; it is preceded by clpS (PA2621), encoding a ClpAP adaptor protein that has been shown to bind to the N-terminus of ClpA and inhibit ClpAP degradation of some substrates while enhancing the degradation of others [97]...”
Genetic determinants involved in the susceptibility of Pseudomonas aeruginosa to beta-lactam antibiotics
Alvarez-Ortega, Antimicrobial agents and chemotherapy 2010
- “...PA14_23420 PA14_23430 PA14_43090 PA14_06490 PA1553 PA2023 PA2487 PA2621 PA2797 PA3141 PA3145 PA3247 PA3259 PA3520 PA3589 PA3620 PA3667 PA3704 PA3721 PA4109...”
Genomewide identification of genetic determinants of antimicrobial drug resistance in Pseudomonas aeruginosa
Dötsch, Antimicrobial agents and chemotherapy 2009
- “...PA14_57570 PA14_58260 PA14_60860 PA14_66600 PA14_68610 PA14_18070 PA14_30210 PA0090 PA0958 PA1095 PA1348 PA1549 PA2023 PA2693 PA3224 PA3702 PA3831 PA4109...”

PMI0689 ATP-dependent Clp protease adaptor protein from Proteus mirabilis HI4320
70% identity, 76% coverage

MrpJ Directly Regulates Proteus mirabilis Virulence Factors, Including Fimbriae and Type VI Secretion, during Urinary Tract Infection
Debnath, Infection and immunity 2018
- “...PMI0390; pheA (355) PMI0390; pheA (487) PMI0493 (96) NA PMI0689; clpS (398) PMI0720 (410) NA PMI0749; tssA, or vipA (364) PMI0749; tssA, or vipA (563) PMI0750;...”

SO2627, SO_2627 conserved hypothetical protein from Shewanella oneidensis MR-1
62% identity, 77% coverage

Validating annotations for uncharacterized proteins in Shewanella oneidensis
Louie, Omics : a journal of integrative biology 2008
- “...SO_1851 SO_1963 SO_2042 SO_2043 SO_2593 SO_2603 SO_2614 SO_2627 SO_3014 SO_3015 SO_3367 SO_3436 SO_3542 SO_3578 SO_3667 SO_3668 SO_3957 SO_4227 SO_4398 SO_4413...”
Global profiling of Shewanella oneidensis MR-1: expression of hypothetical genes and improved functional annotations
Kolker, Proceedings of the National Academy of Sciences of the United States of America 2005
- “...SO0506 SO0887 SO1523 SO1597 SO1789 SO1963 SO2593 SO2614 SO2627 SO3340 SO3436 SO4413 SO4680 SO4719 Kolker et al. Upgraded annotation Homoserine kinase, type II...”

lpg0817 Hypothetical protein from Legionella pneumophila subsp. pneumophila str. Philadelphia 1
68% identity, 71% coverage

New Global Insights on the Regulation of the Biphasic Life Cycle and Virulence Via ClpP-Dependent Proteolysis in Legionella pneumophila
Ge, Molecular & cellular proteomics : MCP 2022
- “...Calc (pI) RP ratio a TP ratio b WT abundance c Log2 (RP/TP) Description ClpS lpg0817 12.7 5.45 1.19 3.27 1.90 ATP-dependent Clp protease adapter ClpS FlhF lpg1784 42.7 7.18 5.90 2.15 2.76 Flagellar GTP-binding protein FlhF MavQ lpg2975 100.7 6.61 2.11 3.13 Uncharacterized protein RavE...”

NP_459921 putative cytoplasmic protein from Salmonella enterica subsp. enterica serovar Typhimurium str. LT2
63% identity, 75% coverage

The expanded specificity and physiological role of a widespread N-degron recognin.
Gao, Proceedings of the National Academy of Sciences of the United States of America 2019
- GeneRIF: Study reports that Salmonella enterica ClpS binds, and ClpSAP degrades, proteins still harboring the N-terminal methionine. ClpS recognizes a type of degron in intact proteins based on the identity of the fourth amino acid from the N terminus, showing a strong preference for large hydrophobic amino acids. Study uncovered natural ClpS substrates, including SpoT, the essential synthase/hydrolase of the alarmone (p)ppGpp.
Sequestration from Protease Adaptor Confers Differential Stability to Protease Substrate.
Yeom, Molecular cell 2017
- GeneRIF: The Salmonella adaptor ClpS binds to the N terminus of the regulatory protein PhoP, resulting in PhoP degradation by ClpAP. the PhoP-activated protein MgtC protects PhoP from degradation by outcompeting ClpS for binding to PhoP.

ClpS / b0881 specificity factor for ClpA-ClpP chaperone-protease complex from Escherichia coli K-12 substr. MG1655 (see 24 papers)
CLPS_ECOLI / P0A8Q6 ATP-dependent Clp protease adapter protein ClpS from Escherichia coli (strain K12) (see 3 papers)
clpS ATP-dependent Clp protease adaptor protein ClpS from Escherichia coli K12 (see 8 papers)
NP_415402 specificity factor for ClpA-ClpP chaperone-protease complex from Escherichia coli str. K-12 substr. MG1655
b0881 ATP-dependent Clp protease adaptor protein ClpS from Escherichia coli str. K-12 substr. MG1655
c1018 Protein yljA from Escherichia coli CFT073
Z1118 orf, hypothetical protein from Escherichia coli O157:H7 EDL933
ECs0967 hypothetical protein from Escherichia coli O157:H7 str. Sakai
63% identity, 75% coverage

function: Involved in the modulation of the specificity of the ClpAP- mediated ATP-dependent protein degradation.
subunit: Binds to the N-terminal domain of the chaperone ClpA.
AAA+ protease-adaptor structures reveal altered conformations and ring specialization.
Kim, Nature structural & molecular biology 2022
- GeneRIF: AAA+ protease-adaptor structures reveal altered conformations and ring specialization.
A single ClpS monomer is sufficient to direct the activity of the ClpA hexamer.
De, The Journal of biological chemistry 2010
- GeneRIF: one ClpS monomer is sufficient to direct the activity of the ClpA hexamer
ClpS is the recognition component for Escherichia coli substrates of the N-end rule degradation pathway.
Schmidt, Molecular microbiology 2009 (PubMed)
- GeneRIF: ClpS is an integral and essential component of the N-end rule pathway.
Structural basis of N-end rule substrate recognition in Escherichia coli by the ClpAP adaptor protein ClpS.
Schuenemann, EMBO reports 2009
- GeneRIF: The data suggest that ClpS has been optimized for the binding and delivery of N-degrons containing an N-terminal Phe or Leu.
Distinct structural elements of the adaptor ClpS are required for regulating degradation by ClpAP.
Hou, Nature structural & molecular biology 2008 (PubMed)
- GeneRIF: ClpS functions, at least in part, as an allosteric effector of ClpAP, broadening understanding of how AAA+ adaptors control substrate selection.
ClpS modulates but is not essential for bacterial N-end rule degradation.
Wang, Genes & development 2007
- GeneRIF: ClpAP recognizes N-end rule substrates directly, whereas ClpS modulates this degradation pathway.
ClpS is an essential component of the N-end rule pathway in Escherichia coli.
Erbse, Nature 2006 (PubMed)
- GeneRIF: the ClpAP-specific adaptor, ClpS, is essential for degradation of N-end rule substrates by ClpAP
- GeneRIF: ClpS selectively binds N-terminal destabilizing residues and targets them for degradation by the ClpAP complex resulting in carefully regulated proteolysis
Crystallographic investigation of peptide binding sites in the N-domain of the ClpA chaperone.
Xia, Journal of structural biology (PubMed)
- GeneRIF: Between the halves of the clpA pseudo-dimer is a large flexible acidic loop that becomes better ordered upon binding of the small adaptor protein, ClpS
The Escherichia coli proteome: past, present, and future prospects
Han, Microbiology and molecular biology reviews : MMBR 2006
- “...proteolytic subunit 5.52/23,186.65 5.60/24,224 (5-6) ClpS P0A8Q6 ATP-dependent Clp protease adaptor protein 4.94/12,179.06 5.40/10,645 (4.5-5.5) ClpX P0A6H1...”
Elucidation of the antibacterial mechanism of the Curvularia haloperoxidase system by DNA microarray profiling
Hansen, Applied and environmental microbiology 2004
- “...b2000 b4367 b0475 b1684 b1683 b1682 b1681 b1680 b1679 b0006 b0389 b0848 b0881 3.3 5.2 4.7 3.3 4.8 2.3 1.0 1.4 0.2 3.1 1.3 2.0 4.4 4.0 3.8 4.0 4.3 4.2 2.2 2.9...”
DNA microarray-mediated transcriptional profiling of the Escherichia coli response to hydrogen peroxide
Zheng, Journal of bacteriology 2001
- “...b3924 b2414 b1683 b2365 b0848 b2012 b2366 b4062 b3917 b1682 b1020 b0475 b0881 b1164 18 18 17 16 16 15 15 14 13 13 12 12 12 11 11 11 Functionb Stress response...”
- “...4.0 3.4 3.0 1.5 3.5 yaaA yaiA ybjM yljA b0006 b0389 b0848 b0881 110 130 51 250 18 56 15.0 11.0 4.2 16 1.1 4.6 a Levels in cells during exponential growth in LB...”
Genome-wide transcriptional profiling of the Escherichia coli responses to superoxide stress and sodium salicylate
Pomposiello, Journal of bacteriology 2001
- “...Description NaSal-activated b4014 b1276 b1241 b2252 b0864 b0485 b0881 b1112 b1164 b1165 b1200 b1450 b1452 b1643 b1795 b2174 b2266 b2672 b3004 b3024 b3238 b3242...”
- “...b3506 b3284 b3908 b2703 aceB acnA adhE ais artP b0485 b0881 b1112 b1164 b1165 b1200 b1450 b1452 b1643 b1795 b2174 b2266 b2672 b3004 b3024 b3238 b3242 cfa cyaA...”
Genome-wide expression profiling in Escherichia coli K-12
Richmond, Nucleic acids research 1999
- “...10.5 1.0 b0966 yccV 10.3 34.3 b0879 ybjZ 10.2 5.2 b0881 yljA 10.2 6.5 b0400 phoR 10.1 34.5 Lysine decarboxylase 1 Cell division protein Mg2+ transport, system I...”
Microbial analyses of ancient ice core sections from greenland and antarctica
Knowlton, Biology 2013
- “...e c3738 KC146577 Lactobacillus helveticus Fi 98 f c833 KC146573 Lactobacillus helveticus Fi 98 e c1018 KC146574 Lactobacillus helveticus Fi 98 f GI855 KC206493 Penicillium chrysognum As 98 d GI858 KC206480 Rhodotorula mucilacinosa Ba 98 d c1826 KC146566 Uncultured bacterium Fi 98 e c3135 KC146552 Uncultured...”
Small non-coding RNAs in Caulobacter crescentus
Landt, Molecular microbiology 2008 (secret)
Small RNA-binding protein RapZ mediates cell envelope precursor sensing and signaling in Escherichia coli
Khan, The EMBO journal 2020
- “...counteract activation of QseE/QseF by RapZ Galactosidase activities of strains Z197 ( wildtype ) and Z1118 ( glmY glmZ ), which carry the chromosomal glmYlacZ fusion, were determined during growth. Strains Z197 ( wildtype ) and Z225 ( rapZ ) were transformed with the following plasmids...”
Chromosomal instability in enterohaemorrhagic Escherichia coli O157:H7: impact on adherence, tellurite resistance and colony phenotype
Bielaszewska, Molecular microbiology 2011
- “...OI 43 and OI 48 ( Table S2 ) demonstrated deletions of 2.9 kb (ORFs Z1118 and clpA ) and 145.9 kb (ORFs Z1399 up to ycdU ) of the core chromosome respectively. Analysis of a 3711 bp amplicon connecting ORFs Z1398 and Z1650, which spans...”
Gene expression induced in Escherichia coli O157:H7 upon exposure to model apple juice
Bergholz, Applied and environmental microbiology 2009
- “...0.76 Regulatory functions ECs0504 ECs0507 ECs0755 ECs0902 ECs0967 ECs1199 ECs1250 ECs1489 ECs1557 ECs1682 ECs1880 ECs2445 ECs2706 ECs2783 ECs2784 ECs2988...”
- “...1.76 1.32 1.60 Viral functions ECs0278 ECs0507 ECs0902 ECs0967 ECs1110 ECs1758 ECs3058 ECs3503 ECs3911 ECs4588 ECs4968 ECs4969 ECs4970 ECs4977 ECs4982 O157 ybaY...”

3o2bC / P0A8Q6 E. Coli clps in complex with a phe n-end rule peptide (see paper)
63% identity, 75% coverage

Ligand: peptide (3o2bC)

VC1143 conserved hypothetical protein from Vibrio cholerae O1 biovar eltor str. N16961
63% identity, 75% coverage

A simple mechanism for integration of quorum sensing and cAMP signalling in Vibrio cholerae
Walker, eLife 2023
- “...or bottom strand respectively. ( b ) Sequence of the intergenic region between VC1142 and VC1143 . The LuxO target site is shown in red. Start codons for the divergent genes VC1142 and VC1143 are in green. Transcription start sites identified by Papenfort et al., 2015...”
- “...of these known LuxO targets, and an additional binding site was identified between VC1142 and VC1143 . These divergent genes encode cold shock-like protein CspD, and the Clp protease adaptor protein, ClpS, respectively. Note that the LuxO binding signal at this locus is small, compared to...”
A simple mechanism for integration of quorum sensing and cAMP signalling inV. cholerae
Walker, 2023

AOLE_08275 ATP-dependent Clp protease adapter ClpS from Acinetobacter oleivorans DR1
63% identity, 57% coverage

Plasmid-encoded tetracycline efflux pump protein alters bacterial stress responses and ecological fitness of Acinetobacter oleivorans
Hong, PloS one 2014
- “...AOLE_18605 atpI ATP synthase I chain family protein 5.33 2.75 AOLE_15995 putative ATPase 4.80 1.87 AOLE_08275 clpS ATP-dependent Clp protease adaptor protein 2.51 1.07 AOLE_18600 atpB F0F1 ATP synthase subunit A 1.70 1.09 Hexadecane degradation-related genes AOLE_06655 putA NAD-dependent aldehyde dehydrogenase 2.79 7.41 AOLE_10550 alkB alkane...”

RSc2465 CONSERVED HYPOTHETICAL PROTEIN from Ralstonia solanacearum GMI1000
58% identity, 73% coverage

Transcriptomes of Ralstonia solanacearum during Root Colonization of Solanum commersonii
Puigvert, Frontiers in plant science 2017
- “...RSp0814 2.44631 mqo malate:quinone oxidoreductase RSUY_RS11960 RSUY_24410 RSc2358 1.7429 ppc phosphoenolpyruvate carboxylase Proteases RSUY_RS12475 RSUY_25460 RSc2465 2.359347 clpS ATP-dependent Clp protease adaptor ClpS RSUY_RS18550 RSUY_38040 RSp0603 2.211049 serine protease RSUY_RS14120 RSUY_28870 RSc0388 1.98903 zinc protease Lipid metabolism RSUY_RS17295 RSUY_35410 2.478807 Acyl-CoA synthetase RSUY_RS01975 RSUY_04090 RSc3052 2.40887...”

BCAL2731 ATP-dependent Clp protease adaptor protein ClpS from Burkholderia cenocepacia J2315
61% identity, 76% coverage

NtrC-dependent control of exopolysaccharide synthesis and motility in Burkholderia cenocepacia H111
Liu, PloS one 2017
- “...I35_0765 BCAL3108 Urease accessory protein ureF -7.1 I35_0766 BCAL3107 Urease accessory protein ureE -7.8 I35_2591 BCAL2731 ATP-dependent Clp protease adaptor protein clpS -2.6 I35_2821 BCAL0849 Putative lipoprotein 3.1 I35_3125 BCAL0540 ATP-dependent protease domain protein -4.1 I35_4673 BCAM0775 Glutathione S-transferase -5.3 I35_5615 BCAM1744 Extracellular protease precursor -2.3...”
Response of Burkholderia cenocepacia H111 to micro-oxia
Pessi, PloS one 2013
- “...BCAL1919 ClpB protein 4.3 20.9 CCE49077 BCAL2730 ATP-dependent protease ATP-binding subunit ClpA 3.9 7.2 CCE49078 BCAL2731 ATP-dependent Clp protease adaptor protein ClpS 2.5 11.0 CCE49625 BCAL2780 Thioredoxin domain-containing protein EC-YbbN 1.7 10.6 CCE52629 BCAL3146 Heat shock protein 60 family chaperone GroEL 2.7 6.7 CCE51225 BCAL3269 Chaperone...”

PD0664 conserved hypothetical protein from Xylella fastidiosa Temecula1
58% identity, 74% coverage

Characterization of regulatory pathways in Xylella fastidiosa: genes and phenotypes controlled by algU
Shi, Applied and environmental microbiology 2007
- “...gened Macromolecule metabolism Protein metabolism/degradation clpS ORF PD0664 Description M/W ratioa,b Expression in mutantc 0.448 Lower 0.485 Lower PD0665 clpB...”

RSUY_25460, RSUY_RS12475 ATP-dependent Clp protease adapter ClpS from Ralstonia solanacearum
57% identity, 73% coverage

Transcriptomes of Ralstonia solanacearum during Root Colonization of Solanum commersonii
Puigvert, Frontiers in plant science 2017
- “...RSUY_39990 RSp0814 2.44631 mqo malate:quinone oxidoreductase RSUY_RS11960 RSUY_24410 RSc2358 1.7429 ppc phosphoenolpyruvate carboxylase Proteases RSUY_RS12475 RSUY_25460 RSc2465 2.359347 clpS ATP-dependent Clp protease adaptor ClpS RSUY_RS18550 RSUY_38040 RSp0603 2.211049 serine protease RSUY_RS14120 RSUY_28870 RSc0388 1.98903 zinc protease Lipid metabolism RSUY_RS17295 RSUY_35410 2.478807 Acyl-CoA synthetase RSUY_RS01975 RSUY_04090 RSc3052...”
- “...RSUY_RS19480 RSUY_39990 RSp0814 2.44631 mqo malate:quinone oxidoreductase RSUY_RS11960 RSUY_24410 RSc2358 1.7429 ppc phosphoenolpyruvate carboxylase Proteases RSUY_RS12475 RSUY_25460 RSc2465 2.359347 clpS ATP-dependent Clp protease adaptor ClpS RSUY_RS18550 RSUY_38040 RSp0603 2.211049 serine protease RSUY_RS14120 RSUY_28870 RSc0388 1.98903 zinc protease Lipid metabolism RSUY_RS17295 RSUY_35410 2.478807 Acyl-CoA synthetase RSUY_RS01975 RSUY_04090...”

CCNA_02552 ATP-dependent Clp protease adaptor protein ClpS from Caulobacter crescentus NA1000
60% identity, 73% coverage

Environmental Conditions Modulate the Transcriptomic Response of Both Caulobacter crescentus Morphotypes to Cu Stress
Maertens, Microorganisms 2021
- “...were linked to genes encoding proteases and chaperones such as lon (CCNA_02037), clpX (CCNA_02039), clpS (CCNA_02552), and the gene encoding the Hsp20-family protein CCNA_03706 ( Figure 10 and Supplementary Table S6 ). For all of these genes, at least one TSS was previously detected (i.e., Zhou...”
Two Outer Membrane Proteins Contribute to Caulobacter crescentus Cellular Fitness by Preventing Intracellular S-Layer Protein Accumulation
Overton, Applied and environmental microbiology 2016
- “...CCNA_02553 CCNA_03195 CCNA_02860 CCNA_00152 CCNA_03153 CCNA_00693 CCNA_02552 CCNA_03105 DnaK, chaperone protein DnaJ, chaperone protein Small heat shock protein...”
Transposon Mutagenesis Paired with Deep Sequencing of Caulobacter crescentus under Uranium Stress Reveals Genes Essential for Detoxification and Stress Tolerance
Yung, Journal of bacteriology 2015
- “...CCNA_01379 CCNA_03625 CCNA_03498 CCNA_01521 CCNA_00290 CCNA_02552 CCNA_02553 CCNA_02140 CCNA_01067 CCNA_01061 CCNA_01622 LexA-like transcriptional repressor...”
Global transcriptional response of Caulobacter crescentus to iron availability
da, BMC genomics 2013
- “...GroES 2.27 CC_0878 CCNA_00922 ClpB protein 2.71 CC_2258 CCNA_02341 Small heat shock protein 5.50 CC_2467 CCNA_02552 ATP-dependent Clp protease adaptor protein ClpS 2.42 CC_2468 CCNA_02553 ATP-dependent clp protease ATP-binding subunit ClpA 2.63 CC_2509 CCNA_02594 Endopeptidase htpX 8.09 CC_2510 b CCNA_02595 Hypothetical protein 8.94 CC_3098 b CCNA_03195...”

CC_2467 conserved hypothetical protein from Caulobacter crescentus CB15
60% identity, 67% coverage

Global transcriptional response of Caulobacter crescentus to iron availability
da, BMC genomics 2013
- “...Co-chaperonin GroES 2.27 CC_0878 CCNA_00922 ClpB protein 2.71 CC_2258 CCNA_02341 Small heat shock protein 5.50 CC_2467 CCNA_02552 ATP-dependent Clp protease adaptor protein ClpS 2.42 CC_2468 CCNA_02553 ATP-dependent clp protease ATP-binding subunit ClpA 2.63 CC_2509 CCNA_02594 Endopeptidase htpX 8.09 CC_2510 b CCNA_02595 Hypothetical protein 8.94 CC_3098 b...”

3gq1A / Q9A5I0 The structure of the caulobacter crescentus clps protease adaptor protein in complex with a wlfvqrdske decapeptide (see paper)
60% identity, 94% coverage

Ligand: peptide (3gq1A)

Atu1363 ATP-dependent Clp protease adaptor protein ClpS from Agrobacterium tumefaciens str. C58 (Cereon)
57% identity, 68% coverage

Transcriptome architecture of the three main lineages of agrobacteria
Waldburger, mSystems 2023
- “...cell division ( 43 ). All members of three ortholog clusters (represented by atu1164 , atu1363 , and atu3742 of A. fabrum C58) have multiple TSSs ( Fig. 5a ). Two of these genes have known essential cellular functions with atu1363 encoding proteolytic complex member clpAS1...”

SMc02110 CONSERVED HYPOTHETICAL PROTEIN from Sinorhizobium meliloti 1021
58% identity, 68% coverage

Dual RpoH sigma factors and transcriptional plasticity in a symbiotic bacterium
Barnett, Journal of bacteriology 2012
- “...SMc01224 SMc01256 SMc01441 SMc01440 SMc01465 SMc01905 SMc02110 SMc02109 SMc02380 SMc02720 SMc02882 SMc02886 SMc02885 SMc03152 SMc03801 SMc04403 Gene set...”

RL2212 putative ATP-dependent CLP protease adaptor protein from Rhizobium leguminosarum bv. viciae 3841
56% identity, 72% coverage

Factors governing attachment of Rhizobium leguminosarum to legume roots at acid, neutral, and alkaline pHs
Parsons, mSystems 2024
- “...family transcriptional regulator pH 7.0 RL0141 cycM Membrane-bound cytochrome c CycM pH 7.0 and 7.5 RL2212 clpS ATP-dependent Clp protease adaptor protein ClpS pH 7.0 and 7.5 RL2588 tyrS Tyrosine-tRNA ligase TyrS. Catalyzes attachment of tyrosine to tRNA pH 7.0 RL2637 recA DNA repair and SOS...”
Factors governing attachment ofRhizobium leguminosarumto legume roots
Parsons, 2022
Antioxidant ability of glutaredoxins and their role in symbiotic nitrogen fixation in Rhizobium leguminosarum bv. viciae 3841
Zou, Applied and environmental microbiology 2021 (secret)

SL003B_1826 ATP-dependent Clp protease adapter ClpS from Polymorphum gilvum SL003B-26A1
57% identity, 72% coverage

The genome sequence of Polymorphum gilvum SL003B-26A1(T) reveals its genetic basis for crude oil degradation and adaptation to the saline soil
Nie, PloS one 2012
- “...heat response proteases such as ATP-dependent metalloprotease FtsH (SL003B_0653, SL003B_0928), ATP-dependent Clp protease (SL003B_1811, SL003B_1812, SL003B_1826, SL003B_1827, SL003B_2063, and SL003B_2064), and ATP-dependent protease HslVU (SL003B_4321 and SL003B_4322). The strain SL003B-26A1 T also contains cold shock genes encoding Csp [63] (SL003B_1226, SL003B_1984, SL003B_3547, SL003B_3721, and SL003B_4222) for...”

RSP_0686 ATP-dependent Clp protease adaptor protein clpS from Rhodobacter sphaeroides 2.4.1
58% identity, 75% coverage

Convergence of the transcriptional responses to heat shock and singlet oxygen stresses
Dufour, PLoS genetics 2012
- “...Phosphorus compounds RSP_0782 Protein synthesis/fate Amino acid biosynthesis RSP_0398 Degradation of proteins, peptides, and glycopeptides RSP_0686, RSP_1490 Protein folding and stabilization RSP_1219 tRNA and rRNA base modification RSP_2971 Unknown function Unknown function RSP_0151, RSP_0152, RSP_0269, RSP_0423, RSP_0557, RSP_0799, RSP_0896, RSP_1591, RSP_1956, RSP_1985, RSP_2225, RSP_2268, RSP_3075, RSP_3076,...”

OFBG_01607 ATP-dependent Clp protease adapter ClpS from Oxalobacter formigenes OXCC13
54% identity, 77% coverage

The genetic composition of Oxalobacter formigenes and its relationship to colonization and calcium oxalate stone disease
Knight, Urolithiasis 2013
- “...ATP-dependent ClpP protease and its proteolytic subunit. In addition there are three ORFs (OFBG_01106, OFBG_01606, OFBG_01607), that may encode for the ClpB, ClpA and ClpS proteases, respectively. Activation of the starvation-signaling stringent response has been shown to mediate antibiotic tolerance in Pseudomanas aeruginosa [ 48 ],...”

WP_027864722 ATP-dependent Clp protease adapter ClpS from Massilia varians
53% identity, 79% coverage

Genome insight and description of antibiotic producing Massilia antibiotica sp. nov., isolated from oil-contaminated soil
Dahal, Scientific reports 2021
- “...protease HtpX (WP_166861379, WP_166859769), DJ-1/PfpI/YhbO family deglycase/protease (WP_166859796, WP_166864477, WP_166859797), ATP-dependent Clp protease adapter ClpS (WP_027864722), FtsH protease activity modulator HflK (WP_166865455), and protease modulator HflC (WP_166865458). Presence of these various proteases encoding genes indicate the industrial and medical significance of strain TW-1 T . Bacterial...”

NGO0409 hypothetical protein from Neisseria gonorrhoeae FA 1090
55% identity, 78% coverage

Characterization of the dsDNA prophage sequences in the genome of Neisseria gonorrhoeae and visualization of productive bacteriophage
Piekarowicz, BMC microbiology 2007
- “...homology to DNA sequence of Ngo1 and Ngo2. Two regions without homology include: (i) NGO0475 NGO0409 responsible for the maintenance of lysogenic state (encoding a repressor) and (ii) NGO0482-NGO0487, genes responsible for DNA replication. There is also lack of the Ngo1 DNA region encoding PemK-PemI proteins....”

LIMLP_09010 ATP-dependent Clp protease adapter ClpS from Leptospira interrogans serovar Manilae
Q72RD1 ATP-dependent Clp protease adapter protein ClpS from Leptospira interrogans serogroup Icterohaemorrhagiae serovar copenhageni (strain Fiocruz L1-130)
LIC11815 conserved hypothetical protein from Leptospira interrogans serovar Copenhageni str. Fiocruz L1-130
45% identity, 67% coverage

Leptospira interrogans biofilm transcriptome highlights adaption to starvation and general stress while maintaining virulence
Davignon, NPJ biofilms and microbiomes 2024
- “...cognate complexes, clpA (LIMLP_09005, FC 2.8) and clpX (LIMLP_06920, 1.7), and its cofactor-encoding clpS gene (LIMLP_09010, FC 3.6). A gene encoding an ATP-dependent Lon protease (LIMLP_14705, FC 1.7), a gene having a Lon substrate binding domain (LIMLP_07500, FC 2.8) but lacking the AAA ATPase domain as...”
Insights to the Assembly of a Functionally Active Leptospiral ClpP1P2 Protease Complex along with Its ATPase Chaperone ClpX
Dhara, ACS omega 2019
- “...(LIC11814) 22066012208841 Q72RD2 2241 4 bp overlap ( clpA & clpS ) clpS (LIC11815) 22088382209173 Q72RD1 336 150158 ( clpS & clpP2 ) clpP2 (LIC11951) 23593322359925 Q72R01 594 68240 ( clpP2 & clpB ) clpB (LIC12017) 24281662430748 Q72QU2 2583 not applicable Molecular Characterization of Core Catalytic...”
Insights to the Assembly of a Functionally Active Leptospiral ClpP1P2 Protease Complex along with Its ATPase Chaperone ClpX
Dhara, ACS omega 2019
- “...( LIC11601 ) genes, whereas the adaptor proteins are encoded by clpS ( LIC11356 and LIC11815 ) ( Figure 1 and Table 1 ). It has been previously reported that the genes encoding caseinolytic proteases are highly conserved in both saprophytic and pathogenic strains of Leptospira...”
- “...the existence of two paralogs of the clpS gene were also predicted, where one ( LIC11815 ) of the clpS genes lies adjacent to clpA ( LIC11814 ), whereas the other ( LIC11356 ) clpS is located distant apart on the chromosome ( Figure 1 and...”

B2I23_RS00190 ATP-dependent Clp protease adapter ClpS from Candidatus Liberibacter asiaticus
48% identity, 57% coverage

The Genome of "Candidatus Liberibacter asiaticus" Is Highly Transcribed When Infecting the Gut of Diaphorina citri
Darolt, Frontiers in microbiology 2021
- “...for components of the protease Clp family were expressed in Las III: clpA (B2I23_RS00185), clpS (B2I23_RS00190), endopeptidase La (B2I23_RS00735), clpP (B2I23_RS00745), hslU (B2I23_RS01895), hslV (B2I23_RS01900), lon peptidase (B2I23_RS02370), and clpB (B2I23_RS03845) ( Supplementary Table 3 ). The other genes mapped with a representative number of reads...”

TDE2123 conserved hypothetical protein from Treponema denticola ATCC 35405
49% identity, 77% coverage

Transcriptional profiles of Treponema denticola in response to environmental conditions
McHardy, PloS one 2010
- “...GGGACAGGCAAAGAGCATAA GGGCCTTGATCTGGGTAACT RT-PCR TDE1382 TAGTAAAAAGCCGCCGAAAC TACCTGCCCTCCCTAATGTG RT-PCR TDE1663 TCGATCAGTTTACCGCACA CTTCATCCTTTTGTGAATCCAG RT-PCR TDE1795 CATATTCAAGACCGCGTGAT AGAAAAACATCCCGGTTTCC RT-PCR TDE2123 CAAGCCCAAAAGGGGACTAT ATAAGGACGGCCACAACAAA RT-PCR TDE2300 ATACGGTTGGCTTGGTGTTC TCCGCAGGAGAACCTAAAAA RT-PCR TDE2327 CCCGCAAATACAAGGAAGAA CTTTTCGAGTTCGGGGATTT RT-PCR TDE2480 CCAGCTTTGCCGATTATGTT ATGAGGAGATTGACGCAAGG RT-PCR TDE2592 AGGCGATCAAAACACAGGAA CAACATAAGACCGCATCGTG RT-PCR TDE2699 GGAAGAAACCTGCACATCGT GGGATTTTGCGTCGATAAGA RT-PCR TDE0626 AAAGACCGTAAAAGGCGAAGT Operon analysis TDE0627 TGAGTCTGCGGTGAAAGATG AATCATTGAAACGGCTTCGT...”
- “...Operon analysis TDE1173 CTCCAACGTTTACCGCTGAT Operon analysis TDE1174 GGGATAAATGCATCAAGCAA GATAAGTTCTCCGCCTGCTG Operon analysis TDE1175 GAAGATGCTCTTTCGGCAAC Operon analysis TDE2123 CAAGCCCAAAAGGGGACTAT Operon analysis TDE2124 CCCTTGAGCTTGAAGACGAC GCAAGGCTGTTTCTTCAAGG Operon analysis TDE2125 AGCAAAGCCCAGCTTATGAA Operon analysis TDE2479 CAAGAAAGCCGTCAAGCAAT Operon analysis TDE2480 GATACGGCCTTCCCCATAAT GATCGGTTTCGTCCACAACT Operon analysis TDE2481 TTCTCTCCCCTTGCCTTTTT Operon analysis 670 Flank 1 CGGCAAAACCTTGTTGGATA CGTTGCGGGCTAGCTAAAAGCGGCGTAAAAATGC...”

Cj1107 hypothetical protein Cj1107 from Campylobacter jejuni subsp. jejuni NCTC 11168
42% identity, 83% coverage

Survival of Campylobacter jejuni 11168H in Acanthamoebae castellanii Provides Mechanistic Insight into Host Pathogen Interactions
Nasher, Microorganisms 2022
- “...1.919746 1.64 10 5 Cj0528c flgB Flagellar basal body rod protein 1.882962 1.65 10 23 Cj1107 clpS ATP-dependent Clp protease adapter protein 1.869786 1.03 10 6 Cj0429c Uncharacterized protein 1.863965 2.77 10 12 Cj1464 flgM Flagellar biosynthesis protein 1.844621 7.16 10 11 Cj0580c hemN Heme chaperone...”
Survival of Campylobacter jejuni in Acanthamoebae castellanii provides mechanistic insight into host pathogen interactions
Nasher, 2022
The acid adaptive tolerance response in Campylobacter jejuni induces a global response, as suggested by proteomics and microarrays
Varsaki, Microbial biotechnology 2015
- “...involved in response to stress were upregulated. Among them, cj0954c (coding a putative DnaJ-like protein), cj1107 ( clpS ) and cj1108 ( clpA ). The stress response protein ClpA is a member of a family of molecular chaperones called the Clp ATPases (HSP100 proteins) which promote...”

mll2736 ATP-dependent Clp protease adaptor protein ClpS from Mesorhizobium loti MAFF303099
42% identity, 79% coverage

A large scale analysis of protein-protein interactions in the nitrogen-fixing bacterium Mesorhizobium loti
Shimoda, DNA research : an international journal for rapid publication of reports on genes and genomes 2008
- “...identify component of protein complexes that had not been assigned by gene annotation. For example, Mll2736 (hypothetical protein) which contains a ClpS core domain (IPR003769) interacts with two distinct Clp proteases (Mll0663 and Mll2335). Mlr3346 (hypothetical protein) contains a phosphonate metabolism PhnJ domain (IPR010306) and interacts...”

CAETHG_RS02630 ATP-dependent Clp protease adaptor ClpS from Clostridium autoethanogenum DSM 10061
49% identity, 81% coverage

Absolute Proteome Quantification in the Gas-Fermenting Acetogen Clostridium autoethanogenum
Valgepea, mSystems 2022
- “...reductoisomerase 7 1868 19627 19042 CAETHG_RS00590 IlvB3 Acetolactate synthase, large subunit 2 131 71 10426 CAETHG_RS02630 Adh3 NADPH-dependent butanol dehydrogenase 2 184 82 20.1 CAETHG_RS07815 AcsD Corrinoid iron-sulfur protein part 2 7 63344 50941 70340 CAETHG_RS07830 MetF Methylene-THF reductase 5 25416 23316 28036 CAETHG_RS07840 FolD Methylene-THF...”

4yjxC / Q8UD95 The structure of agrobacterium tumefaciens clps2 bound to l- phenylalaninamide (see paper)
42% identity, 95% coverage

Ligand: phenylalanine amide (4yjxC)

MXAN_6025 ATP-dependent Clp protease adaptor protein ClpS from Myxococcus xanthus DK 1622
44% identity, 75% coverage

Modular Lipoprotein Toxins Transferred by Outer Membrane Exchange Target Discrete Cell Entry Pathways
Vassallo, mBio 2021
- “...Fig.2A and Fig.S1 ). The eighth screening round identified SitA6-resistant mutants in the ClpA protease (MXAN_6025) and the upstream gene for the ClpS protease adaptor ( Fig.2A and C and Fig.S1 ), which conferred full resistance. However, since these were well-characterized cytoplasmic proteins that would not...”

jhp0028 putative from Helicobacter pylori J99
41% identity, 81% coverage

Genetic signatures for Helicobacter pylori strains of West African origin
Bullock, PloS one 2017
- “...% amino acid identity a Annotation or predicted function Protein Length (amino acids) b HP0032 JHP0028 80.0 ATP-dependent Clp protease adaptor protein 91 HP0159 JHP0147 88.2 LPS 1,2 glycosyltransferase 372 HP0160 JHP0148 88.7 HcpD penicillin-binding protein 306 HP0379 JHP1002 76.8 Alpha 1,3 fucosyltransferase 425 HP0492 JHP0444...”
- “...the same region of the chromosome (HP0032, HP0033, HP1051 and HP1053 in reference strain 26695; JHP0028, JHP0029, JHP0374 and HP0372 in reference strain J99), and are likely co-transcribed. HP0032 and HP033 are predicted to have related functions. HP0032 encodes an ATP-dependent Clp protease and HP0033 encodes...”
Genome sequence analysis of Helicobacter pylori strains associated with gastric ulceration and gastric cancer
McClain, BMC genomics 2009
- “...Description % aa identity (J99) a % aa identity (non-J99) b % unique sites c jhp0028 HP0032 Hypothetical 68 91 24 jhp0080 HP0087 d Hypothetical 89 96 8 jhp0173 HP0185 d Hypothetical 88 93 7 jhp0395 HP1029 d Hypothetical 88 95 7 a The sequences of...”

HP0032 conserved hypothetical protein from Helicobacter pylori 26695
33% identity, 87% coverage

Genetic signatures for Helicobacter pylori strains of West African origin
Bullock, PloS one 2017
- “...Mean % amino acid identity a Annotation or predicted function Protein Length (amino acids) b HP0032 JHP0028 80.0 ATP-dependent Clp protease adaptor protein 91 HP0159 JHP0147 88.2 LPS 1,2 glycosyltransferase 372 HP0160 JHP0148 88.7 HcpD penicillin-binding protein 306 HP0379 JHP1002 76.8 Alpha 1,3 fucosyltransferase 425 HP0492...”
- “...Mean % amino acid identity, intra-hspWAfrica Annotation or predicted function Protein length (amino acids) a HP0032 73.3 88.3 99.0 ATP-dependent Clp protease adaptor ClpS 91 HP0033 89.9 96.1 98.9 ATP-dependent Clp protease 741 HP0257 88.3 94.3 93.9 Predicted coding region 219 HP0384 87.9 94.0 97.7 SPOR...”
Genome-wide survey of mutual homologous recombination in a highly sexual bacterial species
Yahara, Genome biology and evolution 2012
- “...5 504 A HP1448 rnpA Ribonuclease P, protein component Transcription 0.065 0.010 5 486 A HP0032 clpS Hypothetical protein Other categories 0.066 0.011 3 276 A HP0320 tatA Sec-independent protein translocase protein Translocation 0.041 0.013 3 240 A HP0799 mogA Molybdenum cofactor biosynthesis protein Biosynthesis of...”
Genome sequence analysis of Helicobacter pylori strains associated with gastric ulceration and gastric cancer
McClain, BMC genomics 2009
- “...% aa identity (J99) a % aa identity (non-J99) b % unique sites c jhp0028 HP0032 Hypothetical 68 91 24 jhp0080 HP0087 d Hypothetical 89 96 8 jhp0173 HP0185 d Hypothetical 88 93 7 jhp0395 HP1029 d Hypothetical 88 95 7 a The sequences of the...”

HPB8_1592 ATP-dependent Clp protease adaptor ClpS from Helicobacter pylori B8
34% identity, 87% coverage

Sequencing, annotation, and comparative genome analysis of the gerbil-adapted Helicobacter pylori strain B8
Farnbacher, BMC genomics 2010
- “...typical genes related to DNA modification e.g. DNA methylases (HPB8_1059, HPB8_1100, HPB8_1101, HPB8_1103, HPB8_1538, and HPB8_1592) and restriction endonucleases (HPB8_1060, HPB8_1119, HPB8_1120, HPB8_1121, and HPB8_1706) are present in the genome of strain B8 (Additional file 1 , Table S8). Furthermore, we found two genes coding for...”

New Search

For advice on how to use these tools together, see Interactive tools for functional annotation of bacterial genomes.

Statistics

The PaperBLAST database links 793,807 different protein sequences to 1,259,118 scientific articles. Searches against EuropePMC were last performed on March 13 2025.

How It Works

PaperBLAST builds a database of protein sequences that are linked to scientific articles. These links come from automated text searches against the articles in EuropePMC and from manually-curated information from GeneRIF, UniProtKB/Swiss-Prot, BRENDA, CAZy (as made available by dbCAN), BioLiP, CharProtDB, MetaCyc, EcoCyc, TCDB, REBASE, the Fitness Browser, and a subset of the European Nucleotide Archive with the /experiment tag. Given this database and a protein sequence query, PaperBLAST uses protein-protein BLAST to find similar sequences with E < 0.001.

To build the database, we query EuropePMC with locus tags, with RefSeq protein identifiers, and with UniProt accessions. We obtain the locus tags from RefSeq or from MicrobesOnline. We use queries of the form "locus_tag AND genus_name" to try to ensure that the paper is actually discussing that gene. Because EuropePMC indexes most recent biomedical papers, even if they are not open access, some of the links may be to papers that you cannot read or that our computers cannot read. We query each of these identifiers that appears in the open access part of EuropePMC, as well as every locus tag that appears in the 500 most-referenced genomes, so that a gene may appear in the PaperBLAST results even though none of the papers that mention it are open access. We also incorporate text-mined links from EuropePMC that link open access articles to UniProt or RefSeq identifiers. (This yields some additional links because EuropePMC uses different heuristics for their text mining than we do.)

For every article that mentions a locus tag, a RefSeq protein identifier, or a UniProt accession, we try to select one or two snippets of text that refer to the protein. If we cannot get access to the full text, we try to select a snippet from the abstract, but unfortunately, unique identifiers such as locus tags are rarely provided in abstracts.

PaperBLAST also incorporates manually-curated protein functions:

Proteins from NCBI's RefSeq are included if a GeneRIF entry links the gene to an article in PubMed^®. GeneRIF also provides a short summary of the article's claim about the protein, which is shown instead of a snippet.
Proteins from Swiss-Prot (the curated part of UniProt) are included if the curators identified experimental evidence for the protein's function (evidence code ECO:0000269). For these proteins, the fields of the Swiss-Prot entry that describe the protein's function are shown (with bold headings).
Proteins from BRENDA, a curated database of enzymes, are included if they are linked to a paper in PubMed and their full sequence is known.
Every protein from the non-redundant subset of BioLiP, a database of ligand-binding sites and catalytic residues in protein structures, is included. Since BioLiP itself does not include descriptions of the proteins, those are taken from the Protein Data Bank. Descriptions from PDB rely on the original submitter of the structure and cannot be updated by others, so they may be less reliable. (For SitesBLAST and Sites on a Tree, we use a larger subset of BioLiP so that every ligand is represented among a group of structures with similar sequences, but for PaperBLAST, we use the non-redundant set provided by BioLiP.)
Every protein from EcoCyc, a curated database of the proteins in Escherichia coli K-12, is included, regardless of whether they are characterized or not.
Proteins from the MetaCyc metabolic pathway database are included if they are linked to a paper in PubMed and their full sequence is known.
Proteins from the Transport Classification Database (TCDB) are included if they have known substrate(s), have reference(s), and are not described as uncharacterized or putative. (Some of the references are not visible on the PaperBLAST web site.)
Every protein from CharProtDB, a database of experimentally characterized protein annotations, is included.
Proteins from the CAZy database of carbohydrate-active enzymes are included if they are associated with an Enzyme Classification number. Even though CAZy does not provide links from individual protein sequences to papers, these should all be experimentally-characterized proteins.
Proteins from the REBASE database of restriction enzymes are included if they have known specificity.
Every protein with an evidence-based reannotation (based on mutant phenotypes) in the Fitness Browser is included.
Sequence-specific transcription factors (including sigma factors and DNA-binding response regulators) with experimentally-determined DNA binding sites from the PRODORIC database of gene regulation in prokaryotes.
Putative transcription factors from RegPrecise that have manually-curated predictions for their binding sites. These predictions are based on conserved putative regulatory sites across genomes that contain similar transcription factors, so PaperBLAST clusters the TFs at 70% identity and retains just one member of each cluster.
Coding sequence (CDS) features from the European Nucleotide Archive (ENA) are included if the /experiment tag is set (implying that there is experimental evidence for the annotation), the nucleotide entry links to paper(s) in PubMed, and the nucleotide entry is from the STD data class (implying that these are targeted annotated sequences, not from shotgun sequencing). Also, to filter out genes whose transcription or translation was detected, but whose function was not studied, nucleotide entries or papers with more than 25 such proteins are excluded. Descriptions from ENA rely on the original submitter of the sequence and cannot be updated by others, so they may be less reliable.

Except for GeneRIF and ENA, the curated entries include a short curated description of the protein's function. For entries from BioLiP, the protein's function may not be known beyond binding to the ligand. Many of these entries also link to articles in PubMed.

For more information see the PaperBLAST paper (mSystems 2017) or the code. You can download PaperBLAST's database here.

Changes to PaperBLAST since the paper was written:

November 2023: incorporated PRODORIC and RegPrecise. Many PRODORIC entries were not linked to a protein sequence (no UniProt identifier), so we added this information.
February 2023: BioLiP changed their download format. PaperBLAST now includes their non-redundant subset. SitesBLAST and Sites on a Tree use a larger non-redundant subset that ensures that every ligand is represented within each cluster. This should ensure that every binding site is represented.
June 2022: incorporated some coding sequences from ENA with the /experiment tag.
March 2022: incorporated BioLiP.
April 2020: incorporated TCDB.
April 2019: EuropePMC now returns table entries in their search results. This has expanded PaperBLAST's database, but most of the new entries are of low relevance, and the resulting snippets are often just lists of locus tags with annotations.
February 2018: the alignment page reports the conservation of the hit's functional sites (if available from from Swiss-Prot or UniProt)
January 2018: incorporated BRENDA.
December 2017: incorporated MetaCyc, CharProtDB, CAZy, REBASE, and the reannotations from the Fitness Browser.
September 2017: EuropePMC no longer returns some table entries in their search results. This has shrunk PaperBLAST's database, but has also reduced the number of low-relevance hits.

Many of these changes are described in Interactive tools for functional annotation of bacterial genomes.

Secrets

PaperBLAST cannot provide snippets for many of the papers that are published in non-open-access journals. This limitation applies even if the paper is marked as "free" on the publisher's web site and is available in PubmedCentral or EuropePMC. If a journal that you publish in is marked as "secret," please consider publishing elsewhere.

Omissions from the PaperBLAST Database

Many important articles are missing from PaperBLAST, either because the article's full text is not in EuropePMC (as for many older articles), or because the paper does not mention a protein identifier such as a locus tag, or because of PaperBLAST's heuristics. If you notice an article that characterizes a protein's function but is missing from PaperBLAST, please notify the curators at UniProt or add an entry to GeneRIF. Entries in either of these databases will eventually be incorporated into PaperBLAST. Note that to add an entry to UniProt, you will need to find the UniProt identifier for the protein. If the protein is not already in UniProt, you can ask them to create an entry. To add an entry to GeneRIF, you will need an NCBI Gene identifier, but unfortunately many prokaryotic proteins in RefSeq do not have corresponding Gene identifers.

References

PaperBLAST: Text-mining papers for information about homologs.
M. N. Price and A. P. Arkin (2017). mSystems, 10.1128/mSystems.00039-17.

Europe PMC in 2017.
M. Levchenko et al (2017). Nucleic Acids Research, 10.1093/nar/gkx1005.

Gene indexing: characterization and analysis of NLM's GeneRIFs.
J. A. Mitchell et al (2003). AMIA Annu Symp Proc 2003:460-464.

UniProt: the universal protein knowledgebase.
The UniProt Consortium (2016). Nucleic Acids Research, 10.1093/nar/gkw1099.

BRENDA in 2017: new perspectives and new tools in BRENDA.
S. Placzek et al (2017). Nucleic Acids Research, 10.1093/nar/gkw952.

The EcoCyc database: reflecting new knowledge about Escherichia coli K-12.
I. M. Keeseler et al (2016). Nucleic Acids Research, 10.1093/nar/gkw1003.

The MetaCyc database of metabolic pathways and enzymes.
R. Caspi et al (2018). Nucleic Acids Research, 10.1093/nar/gkx935.

CharProtDB: a database of experimentally characterized protein annotations.
R. Madupu et al (2012). Nucleic Acids Research, 10.1093/nar/gkr1133.

The carbohydrate-active enzymes database (CAZy) in 2013.
V. Lombard et al (2014). Nucleic Acids Research, 10.1093/nar/gkt1178.

The Transporter Classification Database (TCDB): recent advances
M. H. Saier, Jr. et al (2016). Nucleic Acids Research, 10.1093/nar/gkv1103.

REBASE - a database for DNA restriction and modification: enzymes, genes and genomes.
R. J. Roberts et al (2015). Nucleic Acids Research, 10.1093/nar/gku1046.

Deep annotation of protein function across diverse bacteria from mutant phenotypes.
M. N. Price et al (2016). bioRxiv, 10.1101/072470.

by Morgan Price, Arkin group
Lawrence Berkeley National Laboratory