PaperBLAST

PaperBLAST – Find papers about a protein or its homologs

PaperBLAST

PaperBLAST Hits for SM_b21330 (69 a.a., MDWNRVEGNW...)

Other sequence analysis tools:

Find functional residues: SitesBLAST

Search for conserved domains

Find the best match in UniProt

Compare to protein structures

Predict transmenbrane helices: Phobius

Predict protein localization: PSORTb

Find homologs in fast.genomics

Fitness BLAST: loading...

Show query sequence

Found 19 similar proteins in the literature:

SM2011_RS11135 CsbD family protein from Sinorhizobium meliloti 2011
100% identity, 100% coverage

Identification of the Important Genes of Bradyrhizobium diazoefficiens 113-2 Involved in Soybean Nodule Development and Senescence
Yuan, Frontiers in microbiology 2021
- “...SM2011_RS28455 Nodule senescence MCHK_RS23965 113-2GL008064 blr1516 SM2011_RS15300 Early stages of nodule development 113-2GL008106 bsl1473 MAFF_RS37185 SM2011_RS11135 Nitrogen fixation 113-2GL008132 113-2GL003149 blr1448,blr6053 MAFF_RS04190 MCHK_RS11590 SM2011_RS14035 Nodule senescence 113-2GL008135 bsl1446 MAFF_RS26365 MCHK_RS00670 SM2011_RS17430 Early stages of nodule development 113-2GL008535 blr1091 MAFF_RS15660 MCHK_RS23865 Nodule senescence Our previous study has...”

RL2307 hypothetical protein from Rhizobium leguminosarum bv. viciae 3841
91% identity, 97% coverage

The Use of Transposon Insertion Sequencing to Interrogate the Core Functional Genome of the Legume Symbiont Rhizobium leguminosarum
Perry, Frontiers in microbiology 2016
- “...protein RL1618A TGI 17 1 9.82 NE 17 0.65 2.18 GD 0.0.1 Conserved hypothetical protein RL2307 TGI 4 0.75 5.67 NE 4 0.75 10 GD 0.0.1 Conserved hypothetical protein RL4065 TGI 3 0.67 1 NE 3 0 0 NE 0.0.1 Conserved hypothetical protein RL4716 TGI 18...”
Mutation of praR in Rhizobium leguminosarum enhances root biofilms, improving nodulation competitiveness by increased expression of attachment proteins
Frederix, Molecular microbiology 2014
- “...transporter protein RL3074 +2.7 0.015478 rapC Putative autoaggregation protein RL3375 +2.6 0.000292 Conserved hypothetical protein RL2307 +2.6 0.009037 Conserved hypothetical protein RL4195 +2.5 0.004917 Putative transmembrane protein RL1338 +2.5 0.008283 pmtA Putative phosphatidylethanolamine N -methyltransferase RL3959 +2.4 3.71E-05 Conserved hypothetical protein RL4614 +2.4 0.033157 rpoH2 Putative...”

NGR_RS22185 CsbD family protein from Sinorhizobium fredii NGR234
87% identity, 99% coverage

RpuS/R Is a Novel Two-Component Signal Transduction System That Regulates the Expression of the Pyruvate Symporter MctP in Sinorhizobium fredii NGR234
Ramos, Frontiers in microbiology 2022
- “...TRAP transporter solute-binding subunit Transport pNGR234b NGR_RS20160 4.67 Carbohydrate ABC transporter substrate-binding protein Transport Chromosome NGR_RS22185 4.75 CsbD family protein Stress Chromosome NGR_RS07515 4.89 Sugar ABC transporter substrate-binding protein Transport pNGR234b NGR_RS05310 4.91 Hypothethycal membrane protein Unknown pNGR234b NGR_RS32245 5.60 Hypothetical protein Unknown pNGR234b NGR_RS27720 6.26...”

MAFF_RS37185 CsbD family protein from Mesorhizobium japonicum MAFF 303099
88% identity, 97% coverage

Identification of the Important Genes of Bradyrhizobium diazoefficiens 113-2 Involved in Soybean Nodule Development and Senescence
Yuan, Frontiers in microbiology 2021
- “..., SM2011_RS28455 Nodule senescence MCHK_RS23965 113-2GL008064 blr1516 SM2011_RS15300 Early stages of nodule development 113-2GL008106 bsl1473 MAFF_RS37185 SM2011_RS11135 Nitrogen fixation 113-2GL008132 113-2GL003149 blr1448,blr6053 MAFF_RS04190 MCHK_RS11590 SM2011_RS14035 Nodule senescence 113-2GL008135 bsl1446 MAFF_RS26365 MCHK_RS00670 SM2011_RS17430 Early stages of nodule development 113-2GL008535 blr1091 MAFF_RS15660 MCHK_RS23865 Nodule senescence Our previous study...”

bsl1473 bsl1473 from Bradyrhizobium japonicum USDA 110
71% identity, 90% coverage

Identification of the Important Genes of Bradyrhizobium diazoefficiens 113-2 Involved in Soybean Nodule Development and Senescence
Yuan, Frontiers in microbiology 2021
- “...MCHK_RS33030 , SM2011_RS28455 Nodule senescence MCHK_RS23965 113-2GL008064 blr1516 SM2011_RS15300 Early stages of nodule development 113-2GL008106 bsl1473 MAFF_RS37185 SM2011_RS11135 Nitrogen fixation 113-2GL008132 113-2GL003149 blr1448,blr6053 MAFF_RS04190 MCHK_RS11590 SM2011_RS14035 Nodule senescence 113-2GL008135 bsl1446 MAFF_RS26365 MCHK_RS00670 SM2011_RS17430 Early stages of nodule development 113-2GL008535 blr1091 MAFF_RS15660 MCHK_RS23865 Nodule senescence Our previous...”

BN69_2599 CsbD family protein from Methylocystis sp. SC2
J7QV15 CsbD family protein from Methylocystis sp. (strain SC2)
60% identity, 94% coverage

Methylocystis sp. Strain SC2 Acclimatizes to Increasing NH₄⁺ Levels by a Precise Rebalancing of Enzymes and Osmolyte Composition
Guo, mSystems 2022
- “...protein 1.20E+04 1.17E+04 1.72E+04 9.07E+04 4.12E+04 0.035 0.521 2.920 1.783 0.883 0.651 0.028 0.114 J7QV15 BN69_2599 CsbD family protein 9.87E+05 1.19E+06 1.43E+06 3.12E+06 1.00E+07 0.274 0.531 1.659 3.342 0.183 0.009 0.010 0.000 J7QPM2 hdeA Probable acid stress chaperone HdeA 2.42E+07 3.26E+07 4.32E+07 5.79E+07 6.73E+07 0.428 0.834...”
Response of Methylocystis sp. Strain SC2 to Salt Stress: Physiology, Global Transcriptome, and Amino Acid Profiles
Han, Applied and environmental microbiology 2017
- “...response of strain SC2. The expression level of csbD (BN69_2599) was most strongly increased among all the 301 stress-responsive SC2 genes, with a log2 fold...”
- “...a stress response regulon may be located around gene BN69_2599. Interestingly, the SC2 genome also harbors a gene (BN69_1446) that encodes a homolog of the CsbD...”
Methylocystis sp. Strain SC2 Acclimatizes to Increasing NH₄⁺ Levels by a Precise Rebalancing of Enzymes and Osmolyte Composition
Guo, mSystems 2022
- “...domain protein 1.20E+04 1.17E+04 1.72E+04 9.07E+04 4.12E+04 0.035 0.521 2.920 1.783 0.883 0.651 0.028 0.114 J7QV15 BN69_2599 CsbD family protein 9.87E+05 1.19E+06 1.43E+06 3.12E+06 1.00E+07 0.274 0.531 1.659 3.342 0.183 0.009 0.010 0.000 J7QPM2 hdeA Probable acid stress chaperone HdeA 2.42E+07 3.26E+07 4.32E+07 5.79E+07 6.73E+07 0.428...”

KPN_04433 hypothetical protein from Klebsiella pneumoniae subsp. pneumoniae MGH 78578
57% identity, 80% coverage

Capsule deletion via a λ-Red knockout system perturbs biofilm formation and fimbriae expression in Klebsiella pneumoniae MGH 78578
Huang, BMC research notes 2014
- “...5.40 KPN_03160 Hypothetical 11.17 13.82 0.0028 6.30 KPN_04221 Periplasmic repressor CpxP 10.08 12.49 0.0044 5.32 KPN_04433 Putative stress-response protein 9.46 11.81 0.0083 5.09 KPN_04512 a N-glycosyl-transferase PgaC 6.40 9.74 0.0003 10.13 KPN_04513 a Putative polysaccharide deacetylase 6.38 9.79 0.0006 10.67 KPN_04514 a Outer membrane protein PgaA...”

KQQSB11_50044 CsbD family protein from Klebsiella quasipneumoniae subsp. quasipneumoniae
57% identity, 100% coverage

Identification of Klebsiella pneumoniae, Klebsiella quasipneumoniae, Klebsiella variicola and Related Phylogroups by MALDI-TOF Mass Spectrometry
Rodrigues, Frontiers in microbiology 2018
- “...of the SB11-Kp2 isolate (Figure 1 ). However, sequence analysis of YjbJ protein (locus tag KQQSB11_50044) revealed 100% identity with the other Kp2 strains and a theoretical molecular mass of 8274 Da (4137 in the double charged ion form). Furthermore, this peak was only present at...”

YjbJ / b4045 putative stress response protein YjbJ from Escherichia coli K-12 substr. MG1655 (see 9 papers)
yjbJ / RF|NP_418469 UPF0337 protein yjbJ from Escherichia coli K12 (see paper)
EDL933_5382, EDL933_RS26675 CsbD family protein from Escherichia coli O157:H7 str. EDL933
NP_418469 putative stress response protein YjbJ from Escherichia coli str. K-12 substr. MG1655
P68206 UPF0337 protein YjbJ from Escherichia coli (strain K12)
b4045 predicted stress response protein from Escherichia coli str. K-12 substr. MG1655
ECs5028 hypothetical protein from Escherichia coli O157:H7 str. Sakai
58% identity, 100% coverage

Transcriptomic and proteomic analysis of the virulence inducing effect of ciprofloxacin on enterohemorrhagic Escherichia coli
Kijewski, PloS one 2024
- “...FliN --- -1.7 -1.3 EDL933_RS20840 EDL933_4247 qseC Sensory histidine kinase QseC --- -1.8 1.6 EDL933_RS26675 EDL933_5382 yjbJ UPF0337 protein YjbJ --- 2.5 13.2 EDL933_RS28250 EDL933_5698 tsr Methyl-accepting chemotaxis protein I (serine chemoreceptor protein) --- -1.4 -1.1 Motility related DEGs and proteins shown as fold changes, between...”
- “...protein FliN --- -1.7 -1.3 EDL933_RS20840 EDL933_4247 qseC Sensory histidine kinase QseC --- -1.8 1.6 EDL933_RS26675 EDL933_5382 yjbJ UPF0337 protein YjbJ --- 2.5 13.2 EDL933_RS28250 EDL933_5698 tsr Methyl-accepting chemotaxis protein I (serine chemoreceptor protein) --- -1.4 -1.1 Motility related DEGs and proteins shown as fold changes,...”
Comparing the predicted and observed properties of proteins encoded in the genome of Escherichia coli K-12.
Link, Electrophoresis 1997 (PubMed)
- GeneRIF: N-terminus verified by Edman degradation on complete protein
Identification of specific protein amino acid substitutions of extended-spectrum β-lactamase (ESBL)-producing Escherichia coli ST131: a proteomics approach using mass spectrometry
Nakamura, Scientific reports 2019
- “...is unknown. The m/z 8351 peak was identified as UPF0337 protein YjbJ (Uniprot accession no. P68206) belonging to the UPF0337 (CsbD) family. The domain of YjbJ protein was CsbD, and its function is unknown. The m/z 8448 peak was identified as uncharacterized protein YnfD (Uniprot accession...”
Top-Down LESA Mass Spectrometry Protein Analysis of Gram-Positive and Gram-Negative Bacteria
Kocurek, Journal of the American Society for Mass Spectrometry 2017
- “...7701.88 -1.5 YahO P75694 91 -signal peptide 1189.5904 +7 8320.08 -1.3 UPF0337 protein YjbJ a P68206 80 1254.2614 +7 8772.78 -2.2 YdfK P76154 31 1494.9477 +7 10,457.58 +1.02 Da YbgS P0AAV6 45 -signal peptide; putative disulfide 7176 and deamidation Escherichia coli K-12 923.0049 +10 9219.98 -2.0...”
Tracing the phylogenetic history of the Crl regulon through the Bacteria and Archaea genomes
Santos-Zavaleta, BMC genomics 2019
- “...11 ] yhjR b3555 yhjR + MSI [ 10 ] bacterial cellulose biosynthetic process yiaG b4045 yiaG + MSI [ 10 ] regulation of transcription yjbJ b4329 yjbJ FliZ () + MSI [ 10 ] yjiG b1044 yjiH G-iadA + MSI [ 10 ] ymdA b1138...”
Depletion of the non-coding regulatory 6S RNA in E. coli causes a surprising reduction in the expression of the translation machinery
Neusser, BMC genomics 2010
- “...with H-NS b4401 arcA 1.55 response regulator in two-component regulatory system with ArcB or CpxA b4045 yjbJ 1.53 predicted stress response protein b2869 ygeV 1.53 predicted transcriptional regulator b3410 yhgG 1.50 transcriptional regulator 1 Meaningful genes were selected by the following criteria: known or predicted function...”
Global analysis of extracytoplasmic stress signaling in Escherichia coli
Bury-Moné, PLoS genetics 2009
- “...length of O-antigen -; 2.4 yfdC b2347 Predicted inner membrane protein 2.0 yjbJ F H b4045 Predicted stress response protein, belongs to the S regulon 2.0 galU F b1236 GalU: Subunit of glucose-1-phosphate uridylyltransferase 2.0 wza F wzb F wzc F H wcaA F H wcaB...”
The HU regulon is composed of genes responding to anaerobiosis, acid stress, high osmolarity and SOS induction
Oberto, PloS one 2009
- “...1.04 1.19 1 1.89 1.3 1.31 1 1.64 0.92 0.97 c isocitrate dehydrogenase kinase/phosphatase yjbJ b4045 yjbJ 1 0.74 1.1 1.67 1 1.94 1.32 1.73 1 1.52 1.06 0.47 a, b hypothetical protein ytfK b4217 ytfK 1 0.36 0.81 0.93 1 0.62 0.66 0.36 1 1.45...”
YcfR (BhsA) influences Escherichia coli biofilm formation through stress response and surface hydrophobicity
Zhang, Journal of bacteriology 2007
- “...yjbJ yjdN ygaM ymgE b0806 b0456 b1050 b4045 b4107 b2672 b1195 Hypothetical protein Hypothetical protein Hypothetical protein Highly abundant nonessential...”
Genome-wide analysis of the general stress response network in Escherichia coli: sigmaS-dependent genes, promoters, and sigma factor selectivity
Weber, Journal of bacteriology 2005
- “...OD 4 b2732 b2886 b2922 b3003 b3024 b3362 b3448 b3524 b4045 b4126 b4127 b4178 b4247 b4263 b4310 b1165 b1449 b1678 b1758 b1957 b1953 b2137 b3097 b3098 b3099 b3102...”
SigmaS-dependent gene expression at the onset of stationary phase in Escherichia coli: function of sigmaS-dependent genes and identification of their promoter sequences
Lacour, Journal of bacteriology 2004
- “...(regulatory) protein (b3555) Hypothetical protein (b4045) Hypothetical (membrane) protein (b1582) Hypothetical (periplasmic) protein (b3097) Hypothetical...”
Adaptation to famine: a family of stationary-phase genes revealed by microarray analysis
Tani, Proceedings of the National Academy of Sciences of the United States of America 2002
- “...ORF Fold osmC osmY b1482 b4376 5.9 8.9 poxB yjbJ b0871 b4045 8.0 1.7 Association with Lrp revealed by this study (47 genes) adhE b1241 2.9 frdA b4514 aldB b3588...”
Gene expression induced in Escherichia coli O157:H7 upon exposure to model apple juice
Bergholz, Applied and environmental microbiology 2009
- “...ECs4610 ECs4642 ECs4836 ECs4958 ECs4959 ECs4981 ECs5013 ECs5028 ECs5042 ECs5043 Exponential phase/ stationary phasec Log2 expression ratio Category and ECs no.a...”

STY4436 conserved hypothetical protein from Salmonella enterica subsp. enterica serovar Typhi str. CT18
STM4240 putative cytoplasmic protein from Salmonella typhimurium LT2
57% identity, 99% coverage

Transcriptomic study of Salmonella enterica subspecies enterica serovar Typhi biofilm
Chin, BMC genomics 2017
- “...0.001837 STY1854 Hypothetical protein 4.01921 5.00E-05 0.001837 cspE Cold shock-like protein CspE 4.36039 5.00E-05 0.001837 STY4436 Hypothetical protein 4.50574 5.00E-05 0.001837 STY0893 Biofilm formation regulatory protein bssr 5.12497 5.00E-05 0.001837 rmf Ribosome modulation factor 6.30581 5.00E-05 0.001837 The results were validated using Real-Time PCR. Please refer...”
Mapping the Regulatory Network for Salmonella enterica Serovar Typhimurium Invasion
Smith, mBio 2016
- “...SL1770 STM1841 SprB STM14_3799 SL3112 STM3138 SprB STM14_4215 pckA SL3467 STM3500 SprB STM14_5097 yjbJ SL4176 STM4240 SprB NA STnc520 STnc520 STnc520 SprB STM14_1174 SL0973 STM1034 HilA STM14_1176 SL0975 STM1036 HilA STM14_1177 SL0976 STM1037 HilA a The genes listed are direct regulatory targets of SPI-1-associated TFs that...”
The Rcs phosphorelay system is specific to enteric pathogens/commensals and activates ydeI, a gene important for persistent Salmonella infection of mice
Erickson, Molecular microbiology 2006
- “...STM3433 STM4064 STM4561b STM4239b STM4222 STM2983 STM4336b STM4240 STM3269 STM1285b STM1491 STM3443 STM1515 STM1492 STM3363 STM2311 STM2795 STM1589 STM1284...”

STM14_5097 CsbD family protein from Salmonella enterica subsp. enterica serovar Typhimurium str. 14028S
57% identity, 100% coverage

Proteome remodelling by the stress sigma factor RpoS/σ^S in Salmonella: identification of small proteins and evidence for post-transcriptional regulation
Lago, Scientific reports 2017
- “...likely correspond to long 5 UnTranslated Regions (UTR) of the S -dependent genes STM14_0421, STM14_1558, STM14_5097, and STM14_1275, respectively (Supplementary Fig. S4 and Table S1 ). This hypothesis is consistent with the non-canonical start codons and lack of ribosome binding sites for the putative ORFs STM14_0419,...”
- “...identity with E . coli YibT, DNA polymerase III-theta IPR009052 69 Yes yibT 14 Enterobacteriaceae STM14_5097 CsbD like IPR008462, pdb1RYK 70 Yes yjbJ 7, 1114 Bacteria, Archaea and Eukaryota STM14_5292 DUF1107 IPR009491 68 ytfK 7, 1214 - Proteobacteria STM14_5469 65% identity with E . coli YjjZ,...”
Mapping the Regulatory Network for Salmonella enterica Serovar Typhimurium Invasion
Smith, mBio 2016
- “...STM0341 SprB STM14_2227 SL1770 STM1841 SprB STM14_3799 SL3112 STM3138 SprB STM14_4215 pckA SL3467 STM3500 SprB STM14_5097 yjbJ SL4176 STM4240 SprB NA STnc520 STnc520 STnc520 SprB STM14_1174 SL0973 STM1034 HilA STM14_1176 SL0975 STM1036 HilA STM14_1177 SL0976 STM1037 HilA a The genes listed are direct regulatory targets of...”

OA04_05950 CsbD family protein from Pectobacterium versatile
52% identity, 100% coverage

The PhoPQ Two-Component System Is the Major Regulator of Cell Surface Properties, Stress Responses and Plant-Derived Substrate Utilisation During Development of Pectobacterium versatile-Host Plant Pathosystems
Kravchenko, Frontiers in microbiology 2020
- “...cbl OA04_28870 2.71 Transcriptional regulator CysB-like protein CCATTTTGCTATGATTTAT 10.8 Miscellaneous scrK OA04_04060 0.45 Fructokinase yjbJ OA04_05950 2.0 Osmotic stress-induced protein| RpoS regulon ygdBppdD OA04_10430-OA04_10440 3.58 Putative pilins ftp OA04_17930 2.29 Periplasmic FAD:protein FMN transferase TTTTTTTCCTTTCATTTGT 10.3 slyB OA04_18750 2.46 Outer membrane lipoprotein CTGTTTATACGCAATTTAA 9.9 ychH OA04_21530...”

PMI0360 general stress response protein from Proteus mirabilis HI4320
50% identity, 94% coverage

Transcriptome of swarming Proteus mirabilis
Pearson, Infection and immunity 2010
- “...PMI3178 PMI2256 PMI1471 PMI1473 PMI1171 PMI1472 PMI1616 PMI0360 PMI1730 PMI2268 PMI0047 PMI2036 PMI1737 PMI3401 PMI2890 PMI1509 PMI3701 PMI0952 PMI3204 PMI3115...”

BCAM0504 CsbD-like protein from Burkholderia cenocepacia J2315
49% identity, 88% coverage

Burkholderia cenocepacia differential gene expression during host-pathogen interactions and adaptation to the host environment
O'Grady, Frontiers in cellular and infection microbiology 2011
- “...Outer membrane efflux protein 187.90 BCAM0485 LacI family regulatory protein 4.99 BCAM0487 Conserved hypothetical 1.53 BCAM0504 CsbD-like protein 2.24 BCAM0505 Putative membrane-attached protein 1.67 BCAM0507 CsbD-like protein 2.40 BCAM0521 Putative IstB-like ATP-binding protein 2.85 BCAM0522 Putative integrase 1.76 BCAM0589 Conserved hypothetical protein 1.68 BCAM0622 Two-component regulatory...”

BP1738 conserved hypothetical protein from Bordetella pertussis Tohama I
45% identity, 93% coverage

Lonidamine, a Novel Modulator for the BvgAS System of Bordetella Species
Ota, Microbiology and immunology 2025 (no snippet)

PA14_62680 hypothetical protein from Pseudomonas aeruginosa UCBPP-PA14
Q9HV61 UPF0337 protein PA4738 from Pseudomonas aeruginosa (strain ATCC 15692 / DSM 22644 / CIP 104116 / JCM 14847 / LMG 12228 / 1C / PRS 101 / PAO1)
PA4738 hypothetical protein from Pseudomonas aeruginosa PAO1
48% identity, 93% coverage

Quorum quenching quandary: resistance to antivirulence compounds
Maeda, The ISME journal 2012
- “...PA14_10380 PA14_58350 PA4677 PA4738 PA4739 PA5482 PA14_61870 PA14_62680 PA14_62690 PA14_72370 mexB oprM by previous studies lasR rhlR coxB coxA coIII napA...”
Top-Down LESA Mass Spectrometry Protein Analysis of Gram-Positive and Gram-Negative Bacteria
Kocurek, Journal of the American Society for Mass Spectrometry 2017
- “...51 Incubation: 48 h, 37 C Sampled fresh 951.8583 +8 7606.81 0.0 UPF0337 protein PA4738 Q9HV61 58 956.3376 +6 5731.98 0.1 PA0039 Q9I793 24 Incubation: 24 h, 37 C Sampled fresh -signal peptide, 442 disulfide 958.5127 +16 15,320.09 3.5 PA5178 Q9HU11 27 Incubation: 48 h, 37...”
Predicting Pseudomonas aeruginosa drug resistance using artificial intelligence and clinical MALDI-TOF mass spectra
Nguyen, mSystems 2024
- “...isolates. Three reviewed proteins, namely, protein RegB, major cold shock protein CspA, and UPF0337 protein PA4738, were identified within the feature bin 7,5687,636 Da. Using the same approach, we investigated the most important spectral ranges in amikacin and ciprofloxacin. In ciprofloxacin-resistant isolates, a significantly increased signal...”
- “...work ( 13 , 18 , 29 ). The proteins RegB, CspA, and UPF0337 protein PA4738 were identified within the most contributing bin of our best performing model for predicting ceftazidime/avibactam resistance. RegB is known to facilitate production of exotoxin A, a potent virulence factor in...”
Quantitative proteomics reveals unique responses to antimicrobial treatments in clinical Pseudomonas aeruginosa isolates
Goodyear, mSystems 2023
- “...PA3787 PA3787 Hypothetical, unknown 3.30 2.98 3.55 PA4571 PA4571 Electron transfer activity 2.61 3.51 2.94 PA4738 PA4738 Hypothetical, unknown 3.68 2.37 3.38 PA4739 PA4739 Hypothetical, unknown 6.04 4.09 5.60 PA5313 GabT2 Polyamine catabolic process 3.20 2.87 3.00 Decreased PA0284 PA0284 Hypothetical, unknown 4.20 4.11 5.15 PA0619...”
A gene network-driven approach to infer novel pathogenicity-associated genes: application to Pseudomonas aeruginosa PAO1
De, mSystems 2023
- “...which is within pathway implicated in biofilm formation and long-term infection ( 106 ), and PA4738 and PA5482, which are involved in protection against osmotic stress ( 107 ). As pathogens encounter various stress factors during infection including osmotic stress that can interfere with cell envelope...”
A VirB4 ATPase of the mobile accessory genome orchestrates core genome-encoded features of physiology, metabolism, and virulence of Pseudomonas aeruginosa TBCF10839
Wiehlmann, Frontiers in cellular and infection microbiology 2023
- “...probable toxin transporter 8.1 PA4190 Probable FAD-dependent monooxygenase 6.5 PA4209 phzM, probable phenazine-specific methyltransferase 37.6 PA4738 Conserved hypothetical protein 14.8 PA4739 Conserved hypothetical protein 21.4 PA4778 cueR, negative regulator of H2-T6SS dependent copper binding, regulator of surfing motility, CueR 7.0 PA4828 Conserved hypothetical protein 10.3 PA4876...”
Top-Down LESA Mass Spectrometry Protein Analysis of Gram-Positive and Gram-Negative Bacteria
Kocurek, Journal of the American Society for Mass Spectrometry 2017
- “...P05384 51 Incubation: 48 h, 37 C Sampled fresh 951.8583 +8 7606.81 0.0 UPF0337 protein PA4738 Q9HV61 58 956.3376 +6 5731.98 0.1 PA0039 Q9I793 24 Incubation: 24 h, 37 C Sampled fresh -signal peptide, 442 disulfide 958.5127 +16 15,320.09 3.5 PA5178 Q9HU11 27 Incubation: 48 h,...”
- “...Most notably, however, the UPF0337 family of stress response proteins, represented in P. aeruginosa by PA4738, was observed both in E. coli (YjbJ) and in S. aureus (SAOUHSC_00845) as well as in all three streptococci. In addition to PA2146, P. aeruginosa yielded multiple proteins whose existence...”
Physiological and transcriptional responses to osmotic stress of two Pseudomonas syringae strains that differ in epiphytic fitness and osmotolerance
Freeman, Journal of bacteriology 2013
- “...PSPTO_1596, and the putative hydrophilin-encoding, osmoinduced PAO1 gene PA4738 (39) exhibit 40, 36, and 55% amino acid identity to the Escherichia coli...”
A non-classical LysR-type transcriptional regulator PA2206 is required for an effective oxidative stress response in Pseudomonas aeruginosa
Reen, PloS one 2013
- “...component, subunit 0.50 PA3451 hypothetical protein 0.50 PA3788 hypothetical protein 0.40 PA4141 hypothetical protein 0.50 PA4738 conserved hypothetical protein 0.44 PA4739 conserved hypothetical protein 0.48 PA5085 probable transcriptional regulator 0.31 PA5481 hypothetical protein 0.39 PA5482 hypothetical protein 0.37 Genes that exhibited a 2-fold or greater alteration...”
Quorum quenching quandary: resistance to antivirulence compounds
Maeda, The ISME journal 2012
- “...PA14_28600 PA14_24860 PA14_13390 PA14_10380 PA14_58350 PA4677 PA4738 PA4739 PA5482 PA14_61870 PA14_62680 PA14_62690 PA14_72370 mexB oprM by previous studies...”
More

XAC4007 conserved hypothetical protein from Xanthomonas axonopodis pv. citri str. 306
E2P69_RS15800 CsbD family protein from Xanthomonas perforans
45% identity, 84% coverage

Participation of two general stress response proteins from Xanthomonas citri subsp. citri in environmental stress adaptation and virulence
Barcarolo, FEMS microbiology ecology 2020 (PubMed)
- “...occur. In this work, two Xcc genes, XAC0100 and XAC4007, predicted in silico to be involved in general stress response, were studied under salt, osmotic,...”
- “...and during plant-pathogen interaction. Expression of XAC0100 and XAC4007 genes was induced under these stress conditions. Disruption of both genes in Xcc caused...”
Transcriptome profiling of type VI secretion system core gene tssM mutant of Xanthomonas perforans highlights regulators controlling diverse functions ranging from virulence to metabolism
Ramamoorthy, Microbiology spectrum 2024
- “...were upregulated in the mutant strain. The mRNA level of a general stress protein CsbD (E2P69_RS15800) was significantly increased by 2.2-fold in the mutant strain at 8 hours. The transcript abundance of two genes (E2P69_RS10375 and E2P69_RS01815) coding for OsmC family protein and OmpW family protein...”

BCAM0507 CsbD-like protein from Burkholderia cenocepacia J2315
52% identity, 81% coverage

NtrC-dependent control of exopolysaccharide synthesis and motility in Burkholderia cenocepacia H111
Liu, PloS one 2017
- “...BCAM0193 Hypothetical protein 6.0 I35_4193 BCAM0194 Hypothetical protein 5.9 I35_4195 BCAM0196 Hypothetical protein 4.7 I35_4401 BCAM0507 Uncharacterized protein conserved in bacteria -3.5 I35_4471 BCAM0576 Hypothetical protein -3.4 I35_4651 BCAM0752 Hydrolase-related protein -3.6 I35_4652 BCAM0753 Hypothetical protein -3.9 I35_4669 BCAM0770 Hypothetical protein -7.4 I35_4766 BCAM0853 Transposase and...”
Burkholderia cenocepacia differential gene expression during host-pathogen interactions and adaptation to the host environment
O'Grady, Frontiers in cellular and infection microbiology 2011
- “...protein 4.99 BCAM0487 Conserved hypothetical 1.53 BCAM0504 CsbD-like protein 2.24 BCAM0505 Putative membrane-attached protein 1.67 BCAM0507 CsbD-like protein 2.40 BCAM0521 Putative IstB-like ATP-binding protein 2.85 BCAM0522 Putative integrase 1.76 BCAM0589 Conserved hypothetical protein 1.68 BCAM0622 Two-component regulatory system, sensor kinase 1.58 BCAM0623 Two-component regulatory system, response...”

XAC0100 conserved hypothetical protein from Xanthomonas axonopodis pv. citri str. 306
45% identity, 80% coverage

Participation of two general stress response proteins from Xanthomonas citri subsp. citri in environmental stress adaptation and virulence
Barcarolo, FEMS microbiology ecology 2020 (PubMed)
- “...conditions may occur. In this work, two Xcc genes, XAC0100 and XAC4007, predicted in silico to be involved in general stress response, were studied under salt,...”
- “...stress, and during plant-pathogen interaction. Expression of XAC0100 and XAC4007 genes was induced under these stress conditions. Disruption of both genes...”

New Search

For advice on how to use these tools together, see Interactive tools for functional annotation of bacterial genomes.

Statistics

The PaperBLAST database links 793,807 different protein sequences to 1,259,118 scientific articles. Searches against EuropePMC were last performed on March 13 2025.

How It Works

PaperBLAST builds a database of protein sequences that are linked to scientific articles. These links come from automated text searches against the articles in EuropePMC and from manually-curated information from GeneRIF, UniProtKB/Swiss-Prot, BRENDA, CAZy (as made available by dbCAN), BioLiP, CharProtDB, MetaCyc, EcoCyc, TCDB, REBASE, the Fitness Browser, and a subset of the European Nucleotide Archive with the /experiment tag. Given this database and a protein sequence query, PaperBLAST uses protein-protein BLAST to find similar sequences with E < 0.001.

To build the database, we query EuropePMC with locus tags, with RefSeq protein identifiers, and with UniProt accessions. We obtain the locus tags from RefSeq or from MicrobesOnline. We use queries of the form "locus_tag AND genus_name" to try to ensure that the paper is actually discussing that gene. Because EuropePMC indexes most recent biomedical papers, even if they are not open access, some of the links may be to papers that you cannot read or that our computers cannot read. We query each of these identifiers that appears in the open access part of EuropePMC, as well as every locus tag that appears in the 500 most-referenced genomes, so that a gene may appear in the PaperBLAST results even though none of the papers that mention it are open access. We also incorporate text-mined links from EuropePMC that link open access articles to UniProt or RefSeq identifiers. (This yields some additional links because EuropePMC uses different heuristics for their text mining than we do.)

For every article that mentions a locus tag, a RefSeq protein identifier, or a UniProt accession, we try to select one or two snippets of text that refer to the protein. If we cannot get access to the full text, we try to select a snippet from the abstract, but unfortunately, unique identifiers such as locus tags are rarely provided in abstracts.

PaperBLAST also incorporates manually-curated protein functions:

Proteins from NCBI's RefSeq are included if a GeneRIF entry links the gene to an article in PubMed^®. GeneRIF also provides a short summary of the article's claim about the protein, which is shown instead of a snippet.
Proteins from Swiss-Prot (the curated part of UniProt) are included if the curators identified experimental evidence for the protein's function (evidence code ECO:0000269). For these proteins, the fields of the Swiss-Prot entry that describe the protein's function are shown (with bold headings).
Proteins from BRENDA, a curated database of enzymes, are included if they are linked to a paper in PubMed and their full sequence is known.
Every protein from the non-redundant subset of BioLiP, a database of ligand-binding sites and catalytic residues in protein structures, is included. Since BioLiP itself does not include descriptions of the proteins, those are taken from the Protein Data Bank. Descriptions from PDB rely on the original submitter of the structure and cannot be updated by others, so they may be less reliable. (For SitesBLAST and Sites on a Tree, we use a larger subset of BioLiP so that every ligand is represented among a group of structures with similar sequences, but for PaperBLAST, we use the non-redundant set provided by BioLiP.)
Every protein from EcoCyc, a curated database of the proteins in Escherichia coli K-12, is included, regardless of whether they are characterized or not.
Proteins from the MetaCyc metabolic pathway database are included if they are linked to a paper in PubMed and their full sequence is known.
Proteins from the Transport Classification Database (TCDB) are included if they have known substrate(s), have reference(s), and are not described as uncharacterized or putative. (Some of the references are not visible on the PaperBLAST web site.)
Every protein from CharProtDB, a database of experimentally characterized protein annotations, is included.
Proteins from the CAZy database of carbohydrate-active enzymes are included if they are associated with an Enzyme Classification number. Even though CAZy does not provide links from individual protein sequences to papers, these should all be experimentally-characterized proteins.
Proteins from the REBASE database of restriction enzymes are included if they have known specificity.
Every protein with an evidence-based reannotation (based on mutant phenotypes) in the Fitness Browser is included.
Sequence-specific transcription factors (including sigma factors and DNA-binding response regulators) with experimentally-determined DNA binding sites from the PRODORIC database of gene regulation in prokaryotes.
Putative transcription factors from RegPrecise that have manually-curated predictions for their binding sites. These predictions are based on conserved putative regulatory sites across genomes that contain similar transcription factors, so PaperBLAST clusters the TFs at 70% identity and retains just one member of each cluster.
Coding sequence (CDS) features from the European Nucleotide Archive (ENA) are included if the /experiment tag is set (implying that there is experimental evidence for the annotation), the nucleotide entry links to paper(s) in PubMed, and the nucleotide entry is from the STD data class (implying that these are targeted annotated sequences, not from shotgun sequencing). Also, to filter out genes whose transcription or translation was detected, but whose function was not studied, nucleotide entries or papers with more than 25 such proteins are excluded. Descriptions from ENA rely on the original submitter of the sequence and cannot be updated by others, so they may be less reliable.

Except for GeneRIF and ENA, the curated entries include a short curated description of the protein's function. For entries from BioLiP, the protein's function may not be known beyond binding to the ligand. Many of these entries also link to articles in PubMed.

For more information see the PaperBLAST paper (mSystems 2017) or the code. You can download PaperBLAST's database here.

Changes to PaperBLAST since the paper was written:

November 2023: incorporated PRODORIC and RegPrecise. Many PRODORIC entries were not linked to a protein sequence (no UniProt identifier), so we added this information.
February 2023: BioLiP changed their download format. PaperBLAST now includes their non-redundant subset. SitesBLAST and Sites on a Tree use a larger non-redundant subset that ensures that every ligand is represented within each cluster. This should ensure that every binding site is represented.
June 2022: incorporated some coding sequences from ENA with the /experiment tag.
March 2022: incorporated BioLiP.
April 2020: incorporated TCDB.
April 2019: EuropePMC now returns table entries in their search results. This has expanded PaperBLAST's database, but most of the new entries are of low relevance, and the resulting snippets are often just lists of locus tags with annotations.
February 2018: the alignment page reports the conservation of the hit's functional sites (if available from from Swiss-Prot or UniProt)
January 2018: incorporated BRENDA.
December 2017: incorporated MetaCyc, CharProtDB, CAZy, REBASE, and the reannotations from the Fitness Browser.
September 2017: EuropePMC no longer returns some table entries in their search results. This has shrunk PaperBLAST's database, but has also reduced the number of low-relevance hits.

Many of these changes are described in Interactive tools for functional annotation of bacterial genomes.

Secrets

PaperBLAST cannot provide snippets for many of the papers that are published in non-open-access journals. This limitation applies even if the paper is marked as "free" on the publisher's web site and is available in PubmedCentral or EuropePMC. If a journal that you publish in is marked as "secret," please consider publishing elsewhere.

Omissions from the PaperBLAST Database

Many important articles are missing from PaperBLAST, either because the article's full text is not in EuropePMC (as for many older articles), or because the paper does not mention a protein identifier such as a locus tag, or because of PaperBLAST's heuristics. If you notice an article that characterizes a protein's function but is missing from PaperBLAST, please notify the curators at UniProt or add an entry to GeneRIF. Entries in either of these databases will eventually be incorporated into PaperBLAST. Note that to add an entry to UniProt, you will need to find the UniProt identifier for the protein. If the protein is not already in UniProt, you can ask them to create an entry. To add an entry to GeneRIF, you will need an NCBI Gene identifier, but unfortunately many prokaryotic proteins in RefSeq do not have corresponding Gene identifers.

References

PaperBLAST: Text-mining papers for information about homologs.
M. N. Price and A. P. Arkin (2017). mSystems, 10.1128/mSystems.00039-17.

Europe PMC in 2017.
M. Levchenko et al (2017). Nucleic Acids Research, 10.1093/nar/gkx1005.

Gene indexing: characterization and analysis of NLM's GeneRIFs.
J. A. Mitchell et al (2003). AMIA Annu Symp Proc 2003:460-464.

UniProt: the universal protein knowledgebase.
The UniProt Consortium (2016). Nucleic Acids Research, 10.1093/nar/gkw1099.

BRENDA in 2017: new perspectives and new tools in BRENDA.
S. Placzek et al (2017). Nucleic Acids Research, 10.1093/nar/gkw952.

The EcoCyc database: reflecting new knowledge about Escherichia coli K-12.
I. M. Keeseler et al (2016). Nucleic Acids Research, 10.1093/nar/gkw1003.

The MetaCyc database of metabolic pathways and enzymes.
R. Caspi et al (2018). Nucleic Acids Research, 10.1093/nar/gkx935.

CharProtDB: a database of experimentally characterized protein annotations.
R. Madupu et al (2012). Nucleic Acids Research, 10.1093/nar/gkr1133.

The carbohydrate-active enzymes database (CAZy) in 2013.
V. Lombard et al (2014). Nucleic Acids Research, 10.1093/nar/gkt1178.

The Transporter Classification Database (TCDB): recent advances
M. H. Saier, Jr. et al (2016). Nucleic Acids Research, 10.1093/nar/gkv1103.

REBASE - a database for DNA restriction and modification: enzymes, genes and genomes.
R. J. Roberts et al (2015). Nucleic Acids Research, 10.1093/nar/gku1046.

Deep annotation of protein function across diverse bacteria from mutant phenotypes.
M. N. Price et al (2016). bioRxiv, 10.1101/072470.

by Morgan Price, Arkin group
Lawrence Berkeley National Laboratory