PaperBLAST
PaperBLAST Hits for Q81GN2 UPF0234 protein BC_1159 (Bacillus cereus (strain ATCC 14579 / DSM 31 / CCUG 7414 / JCM 2152 / NBRC 15305 / NCIMB 9373 / NCTC 2599 / NRRL B-3711)) (163 a.a., MAKDSSFDIV...)
Show query sequence
>Q81GN2 UPF0234 protein BC_1159 (Bacillus cereus (strain ATCC 14579 / DSM 31 / CCUG 7414 / JCM 2152 / NBRC 15305 / NCIMB 9373 / NCTC 2599 / NRRL B-3711))
MAKDSSFDIVSKVELPEVTNAINIALKEIQNRYDFKGSKSDIKLEKEVLVLTSDDEFKLE
QVKDVLISKLVKRNVPIKNLDYGKVEAATGNTVRQRATLQQGIDKDNAKKINNIIKEMKL
KVKTQVQDDQVRVTAKSRDDLQAVIAAVRSADLPIDVQFINYR
Running BLASTp...
Found 41 similar proteins in the literature:
Q81GN2 UPF0234 protein BC_1159 from Bacillus cereus (strain ATCC 14579 / DSM 31 / CCUG 7414 / JCM 2152 / NBRC 15305 / NCIMB 9373 / NCTC 2599 / NRRL B-3711)
100% identity, 100% coverage
EFD32_0973 YajQ family cyclic di-GMP-binding protein from Enterococcus faecalis D32
59% identity, 99% coverage
- Novel cyclic di-GMP effectors of the YajQ protein family control bacterial virulence
An, PLoS pathogens 2014 - “...(BCK_02545), Bordetella pertussis (BP1193_10170), Bacillus subtilis (YitK), Clostridium cellulovorans (Clocel_3875), C. jejuni (BN867_03480), Enterococcus faecalis (EFD32_0973), Haemophilus influenzae (R2846_1298), Legionella pneumophila (LPE509_01999), Mycobacterium tuberculosis (MT0592), Pseudomonas aeruginosa (PA4395), Pseudomonas syringae (PSPPH_4093), Stenotrophomonas maltophilia (Smlt4090), Vibrio cholerae (VC_1508), Yersinia pestis (YPC_3455). Sequence alignment of proteins from the...”
Clocel_3875 YajQ family cyclic di-GMP-binding protein from Clostridium cellulovorans 743B
51% identity, 100% coverage
- Novel cyclic di-GMP effectors of the YajQ protein family control bacterial virulence
An, PLoS pathogens 2014 - “...(XC_3703), Escherichia coli (YajQ), Bacillus cereus (BCK_02545), Bordetella pertussis (BP1193_10170), Bacillus subtilis (YitK), Clostridium cellulovorans (Clocel_3875), C. jejuni (BN867_03480), Enterococcus faecalis (EFD32_0973), Haemophilus influenzae (R2846_1298), Legionella pneumophila (LPE509_01999), Mycobacterium tuberculosis (MT0592), Pseudomonas aeruginosa (PA4395), Pseudomonas syringae (PSPPH_4093), Stenotrophomonas maltophilia (Smlt4090), Vibrio cholerae (VC_1508), Yersinia pestis (YPC_3455)....”
BCAL2769 hypothetical protein from Burkholderia cenocepacia J2315
49% identity, 97% coverage
- High confidence prediction of essential genes in Burkholderia cenocepacia
Juhas, PloS one 2012 - “...ubiB , and valS , as well as the so far uncharacterized genes BCAL1882 , BCAL2769 , BCAL3142 and BCAL3369 has been confirmed experimentally in B. cenocepacia . Conclusions/Significance We report on the identification of essential genes using a novel bioinformatics strategy and provide bioinformatics and...”
- “...infB , H111 gyrB , H111 uniB , H111 valS , H111 BCAL1882 , H111 BCAL2769 , H111 BCAL3142 and H111 BCAL3369 grew on LB plates supplemented with rhamnose but not with glucose as expected for mutants with essential genes under the control of rhamnose promoter....”
all4662 hypothetical protein from Nostoc sp. PCC 7120
44% identity, 100% coverage
- β-N-Methylamino-L-Alanine (BMAA) Causes Severe Stress in Nostoc sp. PCC 7120 Cells under Diazotrophic Conditions: A Proteomic Study
Koksharova, Toxins 2021 - “...ABC transporter alr3938 ABC transporter iron binding protein high-affinity iron ion transport Regulatory proteins Signaling all4662 cyclic-di-GMP-binding protein all0089 Uncharacterized conserved protein YggE, contains kinase-interacting SIMPL domain all0129 two-component system, OmpR family, response regulator RpaA Stress response all1541 peroxi- redoxin 2 family protein/ glutaredoxin Transcription all5263...”
- “...Secondary metabolites alr0599 1-deoxy- xylulose 5-phosphate synthase EC:2.2.1.7 Hypothetical proteins alr4505 all1411 asl4547 alr2889 asr3294 all4662 alr4505 all1411 asr1156 alr4505 alr4504 all1411 asl4547 all3826...”
- The First Proteomics Study of Nostoc sp. PCC 7120 Exposed to Cyanotoxin BMAA under Nitrogen Starvation
Koksharova, Toxins 2020 - “...of NtcA ( alr0608 , all2319 , all1454 , alr0599 , alr4380 , alr0140 , all4662 , all5263 , alr1524 , alr5275 , alr4566 , alr4505 , all1411 , asl4547 , alr2889 , asr3294 , all4662 ) ( Table 2 , Supplementary Table S2 ). 2.3....”
- “...42 AhpC/TSA family protein alr4404 This family includes peroxiredoxin proteins 0.39 0.035 43 cyclic-di-GMP-binding protein all4662 signaling and cellular processes 0.73 0.05 44 EC:3.4.24. | ftsH cell division protease all4936 cell division protease FtsH 1.89 0.015 Translation (4 proteins) 45 GTP-binding protein LepA all2508 elongation factor...”
P9WFK9 UPF0234 protein Rv0566c from Mycobacterium tuberculosis (strain ATCC 25618 / H37Rv)
MT0592 hypothetical protein from Mycobacterium tuberculosis CDC1551
Rv0566c hypothetical protein from Mycobacterium tuberculosis H37Rv
48% identity, 98% coverage
- The unfoldase ClpC1 of Mycobacterium tuberculosis regulates the expression of a distinct subset of proteins having intrinsically disordered termini
Lunge, The Journal of biological chemistry 2020 (secret) - Novel cyclic di-GMP effectors of the YajQ protein family control bacterial virulence
An, PLoS pathogens 2014 - “...(Clocel_3875), C. jejuni (BN867_03480), Enterococcus faecalis (EFD32_0973), Haemophilus influenzae (R2846_1298), Legionella pneumophila (LPE509_01999), Mycobacterium tuberculosis (MT0592), Pseudomonas aeruginosa (PA4395), Pseudomonas syringae (PSPPH_4093), Stenotrophomonas maltophilia (Smlt4090), Vibrio cholerae (VC_1508), Yersinia pestis (YPC_3455). Sequence alignment of proteins from the YajQ family. Consensus residues are listed below the alignment....”
- Mycobacterium tuberculosis Small RNA MTS1338 Confers Pathogenic Properties to Non-Pathogenic Mycobacterium smegmatis
Bychenko, Microorganisms 2021 - “...MSMEG_3580 Rv0129c Lipid metabolism A0QR29 3,3 Porin MspA MSMEG_0965 no A0QRM0 3 UPF0234 protein MSMEG_1165 Rv0566c Conserved hypotheticals A0R4A7 3 DUF732 domain-containing protein MSMEG_5766 no A0QV51 2,7 Methylmalonate-semialdehyde dehydrogenase MSMEG_2449 no A0R061 2,3 HesB/YadR/YfhF family protein MSMEG_4272 Rv2204c Conserved hypotheticals...”
- Integrative proteomic and glycoproteomic profiling of Mycobacterium tuberculosis culture filtrate
Tucci, PloS one 2020 - “...Rv3273 Rv3273 S735 NO 10 This work Conserved hypotheticals Rv0311 Rv0311 S10 NO This work Rv0566c Rv0566c T52, S53 & T55 NO This work Rv1352 Rv1352 T23 SP(Sec/SPI) 1 This work Rv1466 Rv1466 S5 NO This work Rv2166c Rv2166c S39 NO This work Rv2558 Rv2558 T82...”
- Computational Identification of the Proteins Associated With Quorum Sensing and Biofilm Formation in Mycobacterium tuberculosis
Hegde, Frontiers in microbiology 2019 - “...Rv0199, Rv0359, Rv1258c, Rv2136c, Rv3312A, SapM, SecA2, SugA, SugB, SugC Conserved hypotheticals 14 Rv0021c, Rv0038, Rv0566c, Rv0574c, Rv1176c, Rv1354c, Rv1357c, Rv1991c, Rv1996, Rv2216, Rv2298, Rv2300c, Rv3237c, Rv3519 Information pathways 9 Fmt, HelY, Hns, Mfd, MutT3, NrdH, RplM, SigB, TypA Intermediary metabolism and respiration 29 AckA, AroK,...”
- “...E secA2 , lpqY , sugA , sugB , sugC 0 0 1 1 1 Rv0566c , pks16 , Rv3519 , ltp3 0 0 0 1 0 Rv0195 , ceoB 0 0 1 0 0 Rv1176c , dacB2 0 1 0 0 0 Rv0199 0 0...”
- Investigating function roles of hypothetical proteins encoded by the Mycobacterium tuberculosis H37Rv genome
Yang, BMC genomics 2019 - “...this study No. ORF Accession number Annotated gene Description Widen value % Supporting number 1 Rv0566c NP_215080.1 yajQ Protein YajQ 94.15 3 2 Rv0190 NP_214704.1 rcnR Transcriptional repressor RcnR 92.89 4 3 Rv0587 NP_215101.1 yciC Membrane protein YciC 91.12 3 4 Rv2377c NP_216893.1 ybdZ Enterobactin biosynthesis...”
- Using a Label Free Quantitative Proteomics Approach to Identify Changes in Protein Abundance in Multidrug-Resistant Mycobacterium tuberculosis
Phong, Indian journal of microbiology 2015 - “...536 Rv3133c Transcriptional regulatory protein devR 9 2.6 Rv0566c UPF0234 protein 10 2.2 Rv1156 Conserved protein 10 2.6 Rv1770 Conserved protein 10 3.0 Rv0577...”
- Descriptive proteomic analysis shows protein variability between closely related clinical isolates of Mycobacterium tuberculosis
Mehaffy, Proteomics 2010 - “...levels within the S75 strains but differ significantly from CDC1551. Mpt83, Rv0398c, Wag31, Rv0854, Isr2, Rv0566c, AcpP, Rv2111c, and Cfp2 presented higher levels in CDC1551. From these, Mpt83, and Cfp2 were at least two fold more abundant in CDC1551, while Rv2111c was four times more abundant...”
- “...21 , 23 25 ] 57116727 Rv0379 secE2 0.79 0.65 0.78 3 4 60.56 15607706 Rv0566c -- 1.49 0.94 1.12 10 4 28.83 [ 21 , 24 , 25 ] 15607776 Rv0636 hadB 0.74 1.46 1.03 7 1 8.45 [ 17 , 25 ] 15607780 Rv0640...”
- Biomarkers of lung-related diseases: current knowledge by proteomic approaches
Lau, Journal of cellular physiology 2009 - “...from the M. tuberculosis H37Rv and CDC1551 strains. Eleven proteins, Rv0652, Rv1636, Rv2818c, Rv3369, Rv3865, Rv0566c, MT3304, Rv3160, Rv3874, Rv0560c, and Rv3648c, were identified by MALDITOFMS or LCESIMS. All these proteins were cloned and expressed in E. coli and affinity purified. By using three of these...”
- More
MAP4063c hypothetical protein from Mycobacterium avium subsp. paratuberculosis str. k10
46% identity, 98% coverage
OFBG_01674 YajQ family cyclic di-GMP-binding protein from Oxalobacter formigenes OXCC13
46% identity, 97% coverage
- Proteome Dynamics of the Specialist Oxalate Degrader Oxalobacter formigenes
Ellis, Journal of proteomics & bioinformatics 2016 - “...exposure to stress, three universal stress proteins encoded by genes at loci OFBG_00781, OFBG_00128 and OFBG_01674, which have been shown to provide general stress endurance activity, an Abi family protein (OFBG_00600), which is involved in resisting bacteriophage infection, an esterase (OFBG_01018) with predicted beta-lactamase activity, and...”
Q9F2U7 UPF0234 protein SCO4614 from Streptomyces coelicolor (strain ATCC BAA-471 / A3(2) / M145)
45% identity, 98% coverage
Bphy_0527 putative nucleotide-binding protein from Burkholderia phymatum STM815
49% identity, 97% coverage
Bmul_0741 YajQ family cyclic di-GMP-binding protein from Burkholderia multivorans ATCC 17616
49% identity, 97% coverage
MSMEG_1165 hypothetical protein from Mycobacterium smegmatis str. MC2 155
A0QRM0 UPF0234 protein MSMEG_1165/MSMEI_1134 from Mycolicibacterium smegmatis (strain ATCC 700084 / mc(2)155)
MSMEG_1165 YajQ family cyclic di-GMP-binding protein from Mycolicibacterium smegmatis MC2 155
45% identity, 98% coverage
- Mycobacterium tuberculosis Small RNA MTS1338 Confers Pathogenic Properties to Non-Pathogenic Mycobacterium smegmatis
Bychenko, Microorganisms 2021 - “...FbpC MSMEG_3580 Rv0129c Lipid metabolism A0QR29 3,3 Porin MspA MSMEG_0965 no A0QRM0 3 UPF0234 protein MSMEG_1165 Rv0566c Conserved hypotheticals A0R4A7 3 DUF732 domain-containing protein MSMEG_5766 no A0QV51 2,7 Methylmalonate-semialdehyde dehydrogenase MSMEG_2449 no A0R061 2,3 HesB/YadR/YfhF family protein MSMEG_4272 Rv2204c Conserved hypotheticals...”
- Identifying nucleic acid-associated proteins in Mycobacterium smegmatis by mass spectrometry-based proteomics
Kriel, BMC molecular and cell biology 2020 - “...proteins (Additional file 2 : Table S2), of which 12 (MSMEG_0067, MSMEG_0243, MSMEG_0754, MSMEG_0824, MSMEG_0948, MSMEG_1165, MSMEG_1342, MSMEG_1680, MSMEG_2782, MSMEG_3020, MSMEG_3595, MSMEG_4306) had no identifying GO terms and are not known to be associated with nucleic acids or nucleic acid-associated proteins. Gene ontology enrichment analysis demonstrated...”
- Mycobacterium tuberculosis Small RNA MTS1338 Confers Pathogenic Properties to Non-Pathogenic Mycobacterium smegmatis
Bychenko, Microorganisms 2021 - “...5,0 Secreted antigen 85-C FbpC MSMEG_3580 Rv0129c Lipid metabolism A0QR29 3,3 Porin MspA MSMEG_0965 no A0QRM0 3 UPF0234 protein MSMEG_1165 Rv0566c Conserved hypotheticals A0R4A7 3 DUF732 domain-containing protein MSMEG_5766 no A0QV51 2,7 Methylmalonate-semialdehyde dehydrogenase MSMEG_2449 no A0R061 2,3 HesB/YadR/YfhF family protein MSMEG_4272 Rv2204c Conserved hypotheticals...”
HI1034 conserved hypothetical protein from Haemophilus influenzae Rd KW20
P44096 UPF0234 protein HI_1034 from Haemophilus influenzae (strain ATCC 51907 / DSM 11121 / KW20 / Rd)
44% identity, 97% coverage
- A c-di-GMP binding effector STM0435 modulates flagellar motility and pathogenicity in Salmonella
Dai, Virulence 2024 - “...sandwich fold, suggesting nucleotide binding activity [ 34 ]. The structure of YajQ family members HI1034 from H. influenzae and XC_3703 from Xanthomonas campestris have been solved. STM0435 is the first YajQ family protein in Salmonella to be structurally analysed [ 35 , 36 ]. After...”
- Crystal structure of the YajQ-family protein XC_3703 from Xanthomonas campestris pv. campestris
Zhao, Acta crystallographica. Section F, Structural biology communications 2016 - “...solved by molecular replacement using another YajQ-family protein (HI1034; PDB entry 1in0; Teplyakov et al., 2003) from Haemophilus influenzae as the search...”
- “...bind RNA in the same manner as classical RRM proteins. HI1034 from H. influenzae was the first protein structure to be solved from the YajQ family (Teplyakov et...”
- Identification of biofilm proteins in non-typeable Haemophilus Influenzae
Gallaher, BMC microbiology 2006 - “...protector protein - n n HI0847 1573861 all 3085 S Uncharacterized BCR - n y HI1034 1574067 1 3 4 1666 S Uncharacterized BCR - y y HI1168 68057915 all 2926 S Uncharacterized BCR - y y HI1333 1574791 All 1534 J Predicted RNA-binding protein containing...”
- Identification and functional analysis of 'hypothetical' genes expressed in Haemophilus influenzae
Kolker, Nucleic acids research 2004 - “...a model of the native tetrameric structure of the HI1034 protein was shown to dock well with DNA, suggesting a DNA- or RNA-binding function (40). Although these...”
- “...1j8b 1j7h nanRATEK ruvCAB, mutT recR, dnaX tdcB HI0828 HI1034 P44887 P44096 98 163 2350 1666 1mwq 1in0 ispA, yciA, slt, bolA apbA, serB Sugar...”
- Structural and nucleotide-binding properties of YajQ and YnaF, two Escherichia coli proteins of unknown function
Saveanu, Protein science : a publication of the Protein Society 2002 - “...and Salmonella dublin. The Haemophilus influenzae yajQ gene (HI1034) is flanked by serB, encoding a phosphoserine phosphatase, and by corA, encoding a metal...”
- Identification and functional analysis of 'hypothetical' genes expressed in Haemophilus influenzae
Kolker, Nucleic acids research 2004 - “...ruvCAB, mutT recR, dnaX tdcB HI0828 HI1034 P44887 P44096 98 163 2350 1666 1mwq 1in0 ispA, yciA, slt, bolA apbA, serB Sugar (N-acetylneuraminate?)-binding...”
1in0B / P44096 Yajq protein (hi1034) (see paper)
44% identity, 97% coverage
- Ligand: mercury (ii) ion (1in0B)
R2846_1298 YajQ family cyclic di-GMP-binding protein from Haemophilus influenzae R2846
43% identity, 97% coverage
- Novel cyclic di-GMP effectors of the YajQ protein family control bacterial virulence
An, PLoS pathogens 2014 - “...(BP1193_10170), Bacillus subtilis (YitK), Clostridium cellulovorans (Clocel_3875), C. jejuni (BN867_03480), Enterococcus faecalis (EFD32_0973), Haemophilus influenzae (R2846_1298), Legionella pneumophila (LPE509_01999), Mycobacterium tuberculosis (MT0592), Pseudomonas aeruginosa (PA4395), Pseudomonas syringae (PSPPH_4093), Stenotrophomonas maltophilia (Smlt4090), Vibrio cholerae (VC_1508), Yersinia pestis (YPC_3455). Sequence alignment of proteins from the YajQ family. Consensus...”
PSPPH_4093 hypothetical protein from Pseudomonas syringae pv. phaseolicola 1448A
46% identity, 97% coverage
- Novel cyclic di-GMP effectors of the YajQ protein family control bacterial virulence
An, PLoS pathogens 2014 - “...(EFD32_0973), Haemophilus influenzae (R2846_1298), Legionella pneumophila (LPE509_01999), Mycobacterium tuberculosis (MT0592), Pseudomonas aeruginosa (PA4395), Pseudomonas syringae (PSPPH_4093), Stenotrophomonas maltophilia (Smlt4090), Vibrio cholerae (VC_1508), Yersinia pestis (YPC_3455). Sequence alignment of proteins from the YajQ family. Consensus residues are listed below the alignment. The sequence logo illustrates the conservation...”
AL538_RS00950 YajQ family cyclic di-GMP-binding protein from Vibrio harveyi
42% identity, 97% coverage
VY92_RS09790 YajQ family cyclic di-GMP-binding protein from Avibacterium paragallinarum
44% identity, 97% coverage
- The Transcriptomic and Bioinformatic Characterizations of Iron Acquisition and Heme Utilization in Avibacterium paragallinarum in Response to Iron-Starvation
Huo, Frontiers in microbiology 2021 - “...the primers is listed in Supplementary Table 1 . The qPCR programs for VY92_RS06600, VY92_RS09125, VY92_RS09790, VY92_RS03730, and VY92_RS03735 were as follows: 1 cycle of 95C for 3 min; 35 cycles of 95C for 30 s, 60C for 30 s, and 72C for 40 s. The...”
- “...condition. Here, three upregulated genes (VY92_RS03730, VY92_RS03735, and VY92_RS00335) and four downregulated genes (VY92_RS06600, VY92_RS09125, VY92_RS09790, and VY92_RS01655) were selected as representative DEGs for qPCR. Among them, VY92_RS03730, VY92_RS03735, and VY92_RS00335 were annotated as the heme utilization protein HutZ, heme utilization cystosolic carrier protein HutX, and...”
PA4395 hypothetical protein from Pseudomonas aeruginosa PAO1
Q9HW11 UPF0234 protein PA4395 from Pseudomonas aeruginosa (strain ATCC 15692 / DSM 22644 / CIP 104116 / JCM 14847 / LMG 12228 / 1C / PRS 101 / PAO1)
45% identity, 97% coverage
- Adsorption of extracellular proteases and pyocyanin produced by Pseudomonas aeruginosa using a macroporous magnesium oxide-templated carbon decreases cytotoxicity
Hirakawa, Current research in microbial sciences 2022 - “...Putrescine-binding protein (SpuD) 61.373 3.67.E+06 5.48.E+06 Q9I5W4 Metalloprotease (ImpA) 61.116 1.17.E+07 9.64.E+06 Q9HW11 UPF0234 protein (PA4395) 59.911 3.33.E+07 1.58.E+06 Q9I457 Glutathione peroxidase (PA1287) 59.694 1.08.E+08 1.14.E+08 Q9HV46 Transcription elongation factor (GreA) 58.009 1.26.E+07 4.30.E+06 *1 : Proteins from a supernatant of PAO1 grown without MgOC150 2...”
- Cysteamine Inhibits Glycine Utilisation and Disrupts Virulence in Pseudomonas aeruginosa
Fraser-Pitt, Frontiers in cellular and infection microbiology 2021 - “...RNA metabolic processes NP_251543.1 outer membrane lipoprotein PA2853 oprI Outer membrane protein NP_253085.1 nucleotide-binding protein PA4395 yajQ Unknown NP_253132.1 bifunctional sulfate adenylyltransferase subunit 1/adenylylsulfate kinase PA4442 cysNC hydrogen sulfide biosynthesis NP_250493.1 ATP-dependent protease ATP-binding subunit ClpX PA1802 clpX Protein folding NP_251342.1 chemotaxis transducer PA2652 Cellular response...”
- Transcriptomic Analysis Reveals the Dependency of Pseudomonas aeruginosa Genes for Double-Stranded RNA Bacteriophage phiYY Infection Cycle
Zhong, iScience 2020 - “...and the initiation of L segment transcription might require a host protein(s), such as YajQ (PA4395), which results in delayed transcription of theLsegment. Bacterial Transcriptomic Profile The bacterial gene expression level was measured by FPKM (Reads Per Kilobase Per Million Read), and genes with log2 fold...”
- A Cyclic di-GMP-binding Adaptor Protein Interacts with Histidine Kinase to Regulate Two-component Signaling
Xu, The Journal of biological chemistry 2016 - “...effectors include FleQ, PelD, FimX, BrlR, PA4395, and seven PilZ-domain-containing proteins. The transcriptional regulator FleQ controls flagellar biosynthesis...”
- “...IV pili biosynthesis, respectively (23-27). The c-di-GMPbinding PA4395, a member of the YajQ family proteins, interacts with a transcriptional regulator for...”
- Crystal structure of the YajQ-family protein XC_3703 from Xanthomonas campestris pv. campestris
Zhao, Acta crystallographica. Section F, Structural biology communications 2016 - “...(An et al., 2014). Besides XC_3703, its homologues PA4395 from Pseudomonas aeruginosa and Smlt4090 from Stenotrophomonas maltophilia are also reported to be...”
- Dissection of the cis-2-decenoic acid signaling network in Pseudomonas aeruginosa using microarray technique
Rahmani-Badi, Frontiers in microbiology 2015 - “...PA3070, PA3143, PA3216, PA3219, PA3229, PA3273, PA3402, PA3762, PA3772, PA3793, PA3869, PA3962, PA4149, PA4202, PA4298-PA4299, PA4395, PA4399, PA4404-PA4405, PA4510, PA4562,PA4884, PA4962, PA5237, PA5463, PA5492, PA5536 * The detailed information is provided in Tables S2, S3 . Figure 2 Functional classifications of DE genes in P. aeruginosa...”
- Novel cyclic di-GMP effectors of the YajQ protein family control bacterial virulence
An, PLoS pathogens 2014 - “...affinity for cyclic di-GMP binding was only seen in YajQ family proteins from P. aeruginosa (PA4395) and S. maltophilia (Smlt4090) ( Table S1 ). YajQ family proteins from E. coli , Clostridium sp. and B. cereus by contrast exhibited a greater affinity for ATP and/or GTP...”
- “...The finding that XC_3703 can influence the virulence of Xcc prompted us to test whether PA4395 and Smlt4090 have a role in the virulence of P. aeruginosa and S. maltophilia respectively. Previous studies have established a correlation between the ability of strains of P. aeruginosa and...”
- Proteome-wide identification of druggable targets and inhibitors for multidrug-resistant <i>Pseudomonas aeruginosa</i> using an integrative subtractive proteomics and virtual screening approach
Vemula, Heliyon 2025 - “...706 Q9HXB1 1788 Q9I0H2 2870 Q9HW10 3952 Q9I125 5034 Q9I6S3 707 Q9HXC2 1789 Q9I0J3 2871 Q9HW11 3953 Q9I126 5035 Q9I6S6 708 Q9HXC4 1790 Q9I0L1 2872 Q9HW13 3954 Q9I127 5036 Q9I6S7 709 Q9HXD6 1791 Q9I0L3 2873 Q9HW14 3955 Q9I128 5037 Q9I6T0 710 Q9HXE4 1792 Q9I0L4 2874 Q9HW15...”
- Adsorption of extracellular proteases and pyocyanin produced by Pseudomonas aeruginosa using a macroporous magnesium oxide-templated carbon decreases cytotoxicity
Hirakawa, Current research in microbial sciences 2022 - “...1.02.E+06 5.00.E+06 Q9I6J1 Putrescine-binding protein (SpuD) 61.373 3.67.E+06 5.48.E+06 Q9I5W4 Metalloprotease (ImpA) 61.116 1.17.E+07 9.64.E+06 Q9HW11 UPF0234 protein (PA4395) 59.911 3.33.E+07 1.58.E+06 Q9I457 Glutathione peroxidase (PA1287) 59.694 1.08.E+08 1.14.E+08 Q9HV46 Transcription elongation factor (GreA) 58.009 1.26.E+07 4.30.E+06 *1 : Proteins from a supernatant of PAO1 grown...”
YPC_3455 YajQ family cyclic di-GMP-binding protein from Yersinia pestis biovar Medievalis str. Harbin 35
YPO3170 nucleotide-binding protein from Yersinia pestis CO92
43% identity, 97% coverage
PflSS101_4316 YajQ family cyclic di-GMP-binding protein from Pseudomonas lactis
47% identity, 97% coverage
- Lipopeptide biosynthesis in Pseudomonas fluorescens is regulated by the protease complex ClpAP
Song, BMC microbiology 2015 - “...hypothetical protein 1.2 1.2 PflSS101_4298 tolB Tol-Pal system beta propeller repeat protein TolB 1.33 1.29 PflSS101_4316 PF04461 family protein 1.21 1.55 PflSS101_4394 thrC threonine synthase 1.29 1.43 PflSS101_4600 cbrB two-component response regulator CbrB 1.25 1.5 PflSS101_4631 dapB dihydrodipicolinate reductase 1.5 1.55 PflSS101_4632 dnaJ chaperone protein DnaJ...”
YajQ / b0426 nucleotide binding protein YajQ from Escherichia coli K-12 substr. MG1655 (see 6 papers)
YAJQ_ECOLI / P0A8E7 UPF0234 protein YajQ from Escherichia coli (strain K12) (see paper)
yajQ / MB|P0A8E7 UPF0234 protein yajQ from Escherichia coli K12 (see 6 papers)
NP_414960 nucleotide binding protein YajQ from Escherichia coli str. K-12 substr. MG1655
c0537 nucleotide-binding protein from Escherichia coli CFT073
44% identity, 97% coverage
- function: Binds nucleotides, may bind tRNA
subunit: Monomer. - Structural and nucleotide-binding properties of YajQ and YnaF, two Escherichia coli proteins of unknown function.
Saveanu, Protein science : a publication of the Protein Society 2002 - GeneRIF: N-terminus verified by Edman degradation on mature peptide
- TopDIA: A Software Tool for Top-Down Data-Independent Acquisition Proteomics
Basharat, Journal of proteome research 2025 - “...charge: 14, retention time: [29.46, 35.32] minutes) and P2 from UPF0234 protein YajQ (UniProt ID: P0A8E7, mass: 3,393.63 Da, m / z : 849.407, charge: 4, retention time: [32.27, 33.68] minutes). The intensity of P1 is about 4.5 times higher than P2 and only 3 fragment...”
- The Escherichia coli proteome: past, present, and future prospects
Han, Microbiology and molecular biology reviews : MMBR 2006 - “...YajD YajG YajO YajQ YbbL P0AAQ2 P0ADA5 P77735 P0A8E7 P77279 YbbN YbdQ P77395 YbeZ YbfF P0A9K3 P75736 PhoH-like protein Esterase 6.24/40,654.52 5.86/28,437.26...”
- Identification of in vivo-induced antigens including an RTX family exoprotein required for uropathogenic Escherichia coli virulence
Vigil, Infection and immunity 2011 - “...c4487 c2179 c3068 c2085 c4682 c2051 c0875 c4901 c0537 c2661 Function of gene product Hypothetical dipeptide ABC transporter, permease subunit Glycine betaine...”
PP1352 conserved hypothetical protein from Pseudomonas putida KT2440
44% identity, 97% coverage
AMK58_18090 YajQ family cyclic di-GMP-binding protein from Azospirillum brasilense
42% identity, 98% coverage
8k5qA / Q8ZRC9 Crystal structure of yajq stm0435 with c-di-gmp (see paper)
Q8ZRC9 UPF0234 protein YajQ from Salmonella typhimurium (strain LT2 / SGSC1412 / ATCC 700720)
STM0435 nucleotide-binding protein from Salmonella typhimurium LT2
43% identity, 97% coverage
- Ligand: 9,9'-[(2r,3r,3as,5s,7ar,9r,10r,10as,12s,14ar)-3,5,10,12-tetrahydroxy-5,12-dioxidooctahydro-2h,7h-difuro[3,2-d:3',2'-j][1,3,7,9,2,8]tetraoxadiphosphacyclododecine-2,9-diyl]bis(2-amino-1,9-dihydro-6h-purin-6-one) (8k5qA)
- N-dodecanoyl-homoserine lactone influences the levels of thiol and proteins related to oxidation-reduction process in Salmonella
de, PloS one 2018 - “...to stress -7.772 1.761 ND ND 8.250 1.368 ND ND ND ND UPF0234 protein YajQ Q8ZRC9 yajQ Response to stress -7.420 2.808 8.948 0.625 ND ND ND ND ND ND Universal stress protein G P67093 uspG Response to stress ND ND 9.407 2.079 0.013 0.039 1.175...”
- The mcpC mutant of Salmonella enteritidis exhibits attenuation and confers both immunogenicity and protective efficacy in mice
Zhang, Frontiers in microbiology 2025 - “...hypothesis is supported by many studies, such as the finding that the c-di-GMP binding effector STM0435 regulates flagella synthesis, controls biofilm formation, and affects Salmonella virulence ( Dai et al., 2024 ). In addition, the knockout of YeiE reduces the expression of flagella-related genes such as...”
- “...Y. Song N. Jia H. Ma Z. . ( 2024 ). A c-di-GMP binding effector STM0435 modulates flagellar motility and pathogenicity in Salmonella . Virulence 15 : 2331265 . doi: 10.1080/21505594.2024.2331265 , PMID: 38532247 Datsenko K. A. Wanner B. L. ( 2000 ). One-step inactivation of...”
- A c-di-GMP binding effector STM0435 modulates flagellar motility and pathogenicity in Salmonella
Dai, Virulence 2024 - “...10978029 38532247 10.1080/21505594.2024.2331265 2331265 Version of Record Research Article Research Article A c-di-GMP binding effector STM0435 modulates flagellar motility and pathogenicity in Salmonella Y. DAI ET AL. VIRULENCE Dai Yuanji a * Liu Ruirui a * Yue Yingying a b Song Nannan a b Jia Haihong...”
- “...pathogenicity and immune escape of Salmonella . We identified the conserved and unknown function protein STM0435 as a new flagellar regulator. The stm0435 strain exhibited higher pathogenicity in both cellular and animal infection experiments than the wild-type Salmonella . Proteomic and transcriptomic analyses demonstrated dramatic increases...”
- Comparative proteomic analysis of Salmonella enterica serovar Typhimurium ppGpp-deficient mutant to identify a novel virulence protein required for intracellular survival in macrophages
Haneda, BMC microbiology 2010 - “...EC 062 STM1091 sopB 0.2 0.036 ND 064 STM4319 phoN 0.1 0.014 0.54 0.22 108 STM0435 yajQ 0.5 0.038 0.12 0.05 c 108-2 STM1440 sodC1 0.5 0.038 ND 153 STM3318 yhbN 0.6 0.047 0.28 0.12 c 154 STM4405 ytfJ 0.2 0.049 0.30 0.02 c 184 STM3348...”
- Mass spectrometry-based quantitative proteomic analysis of Salmonella enterica serovar Enteritidis protein expression upon exposure to hydrogen peroxide
Kim, BMC microbiology 2010 - “...41% STM0316 Aminoacyl-histidine dipeptidase pepD 52.69 5.17 15% STM0432 Phosphonoacetaldehyde hydrolase phnX 28.57 5.58 41% STM0435 Nucleotide-binding protein yajQ 18.31 5.6 52% STM0447 Trigger factor tig 48.02 4.84 23% STM0488 Adenylate kinase adk 23.49 5.53 51% STM0536 Peptidyl-prolyl cis-trans isomerase B ppiB 18.13 5.52 45% STM0608...”
- “...tsf 21 4% STM0316 Aminoacyl-histidine dipeptidase pepD 9 1% STM0432 Phosphonoacetaldehyde hydrolase phnX 31 3% STM0435 Nucleotide-binding protein yajQ 0% STM0447 Trigger factor tig 11 2% STM0488 Adenylate kinase adk 0% STM0536 Peptidyl-prolyl cis-trans isomerase B ppiB 0% STM0608 Chain T, crystal structure of Ahpc ahpC...”
plu3881 No description from Photorhabdus luminescens subsp. laumondii TTO1
42% identity, 97% coverage
XAC3671 conserved hypothetical protein from Xanthomonas axonopodis pv. citri str. 306
40% identity, 97% coverage
Q8Z8W2 UPF0234 protein YajQ from Salmonella typhi
43% identity, 97% coverage
XC_3703 hypothetical protein from Xanthomonas campestris pv. campestris str. 8004
Q4UQD0 UPF0234 protein XC_3703 from Xanthomonas campestris pv. campestris (strain 8004)
WP_011038715 YajQ family cyclic di-GMP-binding protein from Xanthomonas campestris pv. raphani
39% identity, 97% coverage
- A c-di-GMP binding effector STM0435 modulates flagellar motility and pathogenicity in Salmonella
Dai, Virulence 2024 - “...including QseB/C, H-NS, cAMP-CAP, RcsB, CsrA, and c-di-GMP [ 2932 ]. The YajQ family protein XC_3703 was identified as a novel c-di-GMP effector in the plant pathogen Xanthomonas campestris [ 16 ]. Considering that STM0435 belongs to the YajQ family and is evolutionarily conserved in bacteria,...”
- “...activity [ 34 ]. The structure of YajQ family members HI1034 from H. influenzae and XC_3703 from Xanthomonas campestris have been solved. STM0435 is the first YajQ family protein in Salmonella to be structurally analysed [ 35 , 36 ]. After performing sequence alignment of stm0435...”
- Functional diversity of c-di-GMP receptors in prokaryotic and eukaryotic systems
Khan, Cell communication and signaling : CCS 2023 - “...115 ] CLP X. campestris - 3.5M Regulated bacterial virulence gene expression [ 116 ] XC_3703 X. campestris pv. campestris - 2M Activated virulence-related genes [ 117 ] PXO_00049 PXO_02374 PXO_02715 X. oryzae pv. oryzae PilZ domain 0.139M for PXO_00049 0.102M for PXO_02374 Regulated virulence [...”
- “...Biol. 2010;396:64662. 117. Zhao Z, Wu Z, Zhang J. Crystal structure of the YajQ-family protein XC_3703 from Xanthomonas campestris pv. campestris . Acta Crystallogr F Struct Biol Commun. 2016;72:7205. 118. Yang F, Tian F, Chen H, Hutchins W, Yang CH, He C. The Xanthomonas oryzae pv....”
- Mechanistic insights into host adaptation, virulence and epidemiology of the phytopathogen Xanthomonas
An, FEMS microbiology reviews 2020 - “...identified in Xcc (An etal . 2013b ; Zhao etal . 2016 ). This protein (XC_3703) acts to influence the transcription of genes that contribute to virulence in plants and biofilm formation. The available evidence suggests that XC_3703 exerts its action through proteinprotein interactions with the...”
- “..., Wu Z , Zhang J et al . Crystal structure of the YajQ-family protein XC_3703 from Xanthomonas campestris pv. campestris . Acta Crystallogr Sect F Struct Biol Commun . 2016 ; 72 : 720 5 . 27599864 Zhou L , Huang T-W , Wang J-Y...”
- A YajQ-LysR-like, cyclic di-GMP-dependent system regulating biosynthesis of an antifungal antibiotic in a crop-protecting bacterium, Lysobacter enzymogenes
Han, Molecular plant pathology 2020 - “...OH11. Results CdgL is a cdiGMPbinding protein that affects gene expression in Lysobacter The YajQ (Xc_3703) protein from X. campestris has recently been described as a cdiGMP receptor that affects the virulence of this plant pathogen (An et al. , 2014 ). Because Lysobacter is phylogenetically...”
- Evaluating Eucalyptus leaf colonization by Brasilonema octagenarum (Cyanobacteria, Scytonemataceae) using in planta experiments and genomics
Alvarenga, PeerJ 2020 - “...+ + + + + Q2LK92 BcPIC5/BcFKBP12 rapamycin sensitivity + + + + + Q4UQD0 XC_3703 cyclic di-GMP effector + + + + Q4UTV7 XC_2466 aspartate alpha-decarboxylase + + + + + Q4UUL4 XC_2203 nucleotide diphosphate kinase + + + + + Q58PW8 HsvA hrp-associated systemic...”
- Identification of c-di-GMP Signaling Components in Xanthomonas oryzae and Their Orthologs in Xanthomonads Involved in Regulation of Bacterial Virulence Expression
Yang, Frontiers in microbiology 2019 - “...and effectors has been recently identified in Xanthomonads and other bacteria. One example is YajQ (XC_3703), a PNPase from Xcc, and XOC_4190 from Xoc, which was well-conserved in Xanthomonas spp (An et al., 2014 ; Zhao et al., 2016 ). YajQ showed 96% identity with PXO_03091,...”
- “...Zhao Z. Wu Z. Zhang J. ( 2016 ). Crystal structure of the YajQ-family protein XC_3703 from Xanthomonas campestris pv. campestris . Acta Crystallogr. F Struct. Biol. Commun . 72 , 720 725 . 10.1107/S2053230X16013017 27599864 Zheng D. Constantinidou C. Hobman J. L. Minchin S. D....”
- Crystal structure of the YajQ-family protein XC_3703 from Xanthomonas campestris pv. campestris
Zhao, Acta crystallographica. Section F, Structural biology communications 2016 - “...YajQ-family protein XC_3703 from Xanthomonas campestris pv. campestris Zhixin Zhao,* Zhen Wu and Jun Zhang Received 20 June 2016 Accepted 11 August 2016 Edited...”
- “...Terwilliger, Los Alamos National Laboratory, USA Keywords: XC_3703; YajQ; c-di-GMP; receptor. PDB reference: XC_3703, 5b7w Supporting information: this article...”
- Bacterial pathogenesis of plants: future challenges from a microbial perspective: Challenges in Bacterial Molecular Plant Pathology
Pfeilmeier, Molecular plant pathology 2016 - “...to expand, with the discovery of new cdGbinding targets, such as the YajQ family protein XC_3703 in Xcc (An et al ., 2014b ) and the type III injectisome ATPase HrcN in P. syringae (Trampari et al ., 2015 ). Plantpathogenic bacteria display evolved cellcell signalling...”
- More
- Evaluating Eucalyptus leaf colonization by Brasilonema octagenarum (Cyanobacteria, Scytonemataceae) using in planta experiments and genomics
Alvarenga, PeerJ 2020 - “...undefined + + + + + Q2LK92 BcPIC5/BcFKBP12 rapamycin sensitivity + + + + + Q4UQD0 XC_3703 cyclic di-GMP effector + + + + Q4UTV7 XC_2466 aspartate alpha-decarboxylase + + + + + Q4UUL4 XC_2203 nucleotide diphosphate kinase + + + + + Q58PW8 HsvA hrp-associated...”
- “...presented similarity with proteins involved in regulatory cascades for virulence ( D4HUY4 , D4HX24 , Q4UQD0 , Q58PW8 and Q5H3K9 ) ( Oh, Kim & Beer, 2005 ; Subramoni et al., 2012 ; An et al., 2014 ; Ancona, Li & Zhao, 2014 ; Santander etal.,...”
- Diversity of Cyclic Di-GMP-Binding Proteins and Mechanisms
Chou, Journal of bacteriology 2016 - “...Q9HUT5 H7C7G9 P22260 G3XCV0 Q9KQ66 A0R6A5 P75905, P69432 P05055 Q9KU59 Q4UQD0 Q9UJV9 Q9Y3Q4 Structurea Kd (M) NAc NA NA 3IAO 4Q20 3IWZ NA NA NA NA 11.7 10 2.4...”
- Crystal structure of the YajQ-family protein XC_3703 from Xanthomonas campestris pv. campestris.
Zhao, Acta crystallographica. Section F, Structural biology communications 2016 - GeneRIF: Crystal structure of the YajQ-family protein XC_3703 from Xanthomonas campestris pv. Campestris has been reported.
SYNW1816 conserved hypothetical protein from Synechococcus sp. WH 8102
41% identity, 99% coverage
PXO_03091 protein YajQ from Xanthomonas oryzae pv. oryzae PXO99A
38% identity, 92% coverage
XOO0711 hypothetical protein from Xanthomonas oryzae pv. oryzae KACC10331
38% identity, 97% coverage
VAS14_01951 YajQ family cyclic di-GMP-binding protein from Photobacterium angustum S14
41% identity, 97% coverage
Smlt4090 hypothetical protein from Stenotrophomonas maltophilia K279a
39% identity, 97% coverage
- Crystal structure of the YajQ-family protein XC_3703 from Xanthomonas campestris pv. campestris
Zhao, Acta crystallographica. Section F, Structural biology communications 2016 - “...its homologues PA4395 from Pseudomonas aeruginosa and Smlt4090 from Stenotrophomonas maltophilia are also reported to be potential c-di-GMP receptors from the...”
- Novel cyclic di-GMP effectors of the YajQ protein family control bacterial virulence
An, PLoS pathogens 2014 - “...binding was only seen in YajQ family proteins from P. aeruginosa (PA4395) and S. maltophilia (Smlt4090) ( Table S1 ). YajQ family proteins from E. coli , Clostridium sp. and B. cereus by contrast exhibited a greater affinity for ATP and/or GTP than for any cyclic...”
- “...that XC_3703 can influence the virulence of Xcc prompted us to test whether PA4395 and Smlt4090 have a role in the virulence of P. aeruginosa and S. maltophilia respectively. Previous studies have established a correlation between the ability of strains of P. aeruginosa and S. maltophilia...”
BVG93_01845 YajQ family cyclic di-GMP-binding protein from Serratia marcescens
40% identity, 97% coverage
- Microbial Reduction of Fumonisin B1 by the New Isolate Serratia marcescens 329-2
Keawmanee, Toxins 2021 - “...glgS 3.89 A0A1Q5WEW3 Antibiotic biosynthesis monooxygenase A8A12_06980 3.87 A0A6I4GZS8 Hydrolase GMA22_24835 3.80 A0A221FKL4 UPF0234 protein BVG93_01845 BVG93_01845 3.75 A0A6H3S2C0 ATP-dependent protease subunit HslV hslV 3.71 A0A6M5I193 MBL fold metallo-hydrolase HMI62_20840 3.64 A0A6N0DB57 Protein deglycase HchA hchA 3.62 A0A2V4G7I4 Amino acid ABC transporter substrate-binding protein glnH 3.62...”
ACIAD3137 conserved hypothetical protein from Acinetobacter sp. ADP1
42% identity, 97% coverage
SynWH7803_1823 hypothetical protein from Synechococcus sp. WH 7803
39% identity, 96% coverage
SAR11_0692 hypothetical protein from Candidatus Pelagibacter ubique HTCC1062
41% identity, 97% coverage
LPE509_01999 YajQ family cyclic di-GMP-binding protein from Legionella pneumophila subsp. pneumophila LPE509
36% identity, 97% coverage
- Novel cyclic di-GMP effectors of the YajQ protein family control bacterial virulence
An, PLoS pathogens 2014 - “...(YitK), Clostridium cellulovorans (Clocel_3875), C. jejuni (BN867_03480), Enterococcus faecalis (EFD32_0973), Haemophilus influenzae (R2846_1298), Legionella pneumophila (LPE509_01999), Mycobacterium tuberculosis (MT0592), Pseudomonas aeruginosa (PA4395), Pseudomonas syringae (PSPPH_4093), Stenotrophomonas maltophilia (Smlt4090), Vibrio cholerae (VC_1508), Yersinia pestis (YPC_3455). Sequence alignment of proteins from the YajQ family. Consensus residues are listed...”
VC_1508 YajQ family cyclic di-GMP-binding protein from Vibrio cholerae O1 biovar El Tor str. N16961
38% identity, 97% coverage
- Novel cyclic di-GMP effectors of the YajQ protein family control bacterial virulence
An, PLoS pathogens 2014 - “...(LPE509_01999), Mycobacterium tuberculosis (MT0592), Pseudomonas aeruginosa (PA4395), Pseudomonas syringae (PSPPH_4093), Stenotrophomonas maltophilia (Smlt4090), Vibrio cholerae (VC_1508), Yersinia pestis (YPC_3455). Sequence alignment of proteins from the YajQ family. Consensus residues are listed below the alignment. The sequence logo illustrates the conservation between sequences. Where the height of...”
A0A3G4V556 UPF0234 protein ECB94_00660 from Vibrio mediterranei
37% identity, 97% coverage
For advice on how to use these tools together, see
Interactive tools for functional annotation of bacterial genomes.
The PaperBLAST database links 798,070 different protein sequences to 1,261,478 scientific articles. Searches against EuropePMC were last performed on May 12 2025.
PaperBLAST builds a database of protein sequences that are linked
to scientific articles. These links come from automated text searches
against the articles in EuropePMC
and from manually-curated information from GeneRIF, UniProtKB/Swiss-Prot,
BRENDA,
CAZy (as made available by dbCAN),
BioLiP,
CharProtDB,
MetaCyc,
EcoCyc,
TCDB,
REBASE,
the Fitness Browser,
and a subset of the European Nucleotide Archive with the /experiment tag.
Given this database and a protein sequence query,
PaperBLAST uses protein-protein BLAST
to find similar sequences with E < 0.001.
To build the database, we query EuropePMC with locus tags, with RefSeq protein
identifiers, and with UniProt
accessions. We obtain the locus tags from RefSeq or from MicrobesOnline. We use
queries of the form "locus_tag AND genus_name" to try to ensure that
the paper is actually discussing that gene. Because EuropePMC indexes
most recent biomedical papers, even if they are not open access, some
of the links may be to papers that you cannot read or that our
computers cannot read. We query each of these identifiers that
appears in the open access part of EuropePMC, as well as every locus
tag that appears in the 500 most-referenced genomes, so that a gene
may appear in the PaperBLAST results even though none of the papers
that mention it are open access. We also incorporate text-mined links
from EuropePMC that link open access articles to UniProt or RefSeq
identifiers. (This yields some additional links because EuropePMC
uses different heuristics for their text mining than we do.)
For every article that mentions a locus tag, a RefSeq protein
identifier, or a UniProt accession, we try to select one or two
snippets of text that refer to the protein. If we cannot get access to
the full text, we try to select a snippet from the abstract, but
unfortunately, unique identifiers such as locus tags are rarely
provided in abstracts.
PaperBLAST also incorporates manually-curated protein functions:
- Proteins from NCBI's RefSeq are included if a
GeneRIF
entry links the gene to an article in
PubMed®.
GeneRIF also provides a short summary of the article's claim about the
protein, which is shown instead of a snippet.
- Proteins from Swiss-Prot (the curated part of UniProt)
are included if the curators
identified experimental evidence for the protein's function (evidence
code ECO:0000269). For these proteins, the fields of the Swiss-Prot entry that
describe the protein's function are shown (with bold headings).
- Proteins from BRENDA,
a curated database of enzymes, are included if they are linked to a paper in PubMed
and their full sequence is known.
- Every protein from the non-redundant subset of
BioLiP,
a database
of ligand-binding sites and catalytic residues in protein structures, is included. Since BioLiP itself
does not include descriptions of the proteins, those are taken from the
Protein Data Bank.
Descriptions from PDB rely on the original submitter of the
structure and cannot be updated by others, so they may be less reliable.
(For SitesBLAST and Sites on a Tree, we use a larger subset of BioLiP so that every
ligand is represented among a group of structures with similar sequences, but for
PaperBLAST, we use the non-redundant set provided by BioLiP.)
- Every protein from EcoCyc, a curated
database of the proteins in Escherichia coli K-12, is included, regardless
of whether they are characterized or not.
- Proteins from the MetaCyc metabolic pathway database
are included if they are linked to a paper in PubMed and their full sequence is known.
- Proteins from the Transport Classification Database (TCDB)
are included if they have known substrate(s), have reference(s),
and are not described as uncharacterized or putative.
(Some of the references are not visible on the PaperBLAST web site.)
- Every protein from CharProtDB,
a database of experimentally characterized protein annotations, is included.
- Proteins from the CAZy database of carbohydrate-active enzymes
are included if they are associated with an Enzyme Classification number.
Even though CAZy does not provide links from individual protein sequences to papers,
these should all be experimentally-characterized proteins.
- Proteins from the REBASE database
of restriction enzymes are included if they have known specificity.
- Every protein with an evidence-based reannotation (based on mutant phenotypes)
in the Fitness Browser is included.
- Sequence-specific transcription factors (including sigma factors and DNA-binding response regulators)
with experimentally-determined DNA binding sites from the
PRODORIC database of gene regulation in prokaryotes.
- Putative transcription factors from RegPrecise
that have manually-curated predictions for their binding sites. These predictions are based on
conserved putative regulatory sites across genomes that contain similar transcription factors,
so PaperBLAST clusters the TFs at 70% identity and retains just one member of each cluster.
- Coding sequence (CDS) features from the
European Nucleotide Archive (ENA)
are included if the /experiment tag is set (implying that there is experimental evidence for the annotation),
the nucleotide entry links to paper(s) in PubMed,
and the nucleotide entry is from the STD data class
(implying that these are targeted annotated sequences, not from shotgun sequencing).
Also, to filter out genes whose transcription or translation was detected, but whose function
was not studied, nucleotide entries or papers with more than 25 such proteins are excluded.
Descriptions from ENA rely on the original submitter of the
sequence and cannot be updated by others, so they may be less reliable.
Except for GeneRIF and ENA,
the curated entries include a short curated
description of the protein's function.
For entries from BioLiP, the protein's function may not be known beyond binding to the ligand.
Many of these entries also link to articles in PubMed.
For more information see the
PaperBLAST paper (mSystems 2017)
or the code.
You can download PaperBLAST's database here.
Changes to PaperBLAST since the paper was written:
- November 2023: incorporated PRODORIC and RegPrecise. Many PRODORIC entries were not linked to a protein sequence (no UniProt identifier), so we added this information.
- February 2023: BioLiP changed their download format. PaperBLAST now includes their non-redundant subset. SitesBLAST and Sites on a Tree use a larger non-redundant subset that ensures that every ligand is represented within each cluster. This should ensure that every binding site is represented.
- June 2022: incorporated some coding sequences from ENA with the /experiment tag.
- March 2022: incorporated BioLiP.
- April 2020: incorporated TCDB.
- April 2019: EuropePMC now returns table entries in their search results. This has expanded PaperBLAST's database, but most of the new entries are of low relevance, and the resulting snippets are often just lists of locus tags with annotations.
- February 2018: the alignment page reports the conservation of the hit's functional sites (if available from from Swiss-Prot or UniProt)
- January 2018: incorporated BRENDA.
- December 2017: incorporated MetaCyc, CharProtDB, CAZy, REBASE, and the reannotations from the Fitness Browser.
- September 2017: EuropePMC no longer returns some table entries in their search results. This has shrunk PaperBLAST's database, but has also reduced the number of low-relevance hits.
Many of these changes are described in Interactive tools for functional annotation of bacterial genomes.
PaperBLAST cannot provide snippets for many of the papers that are
published in non-open-access journals. This limitation applies even if
the paper is marked as "free" on the publisher's web site and is
available in PubmedCentral or EuropePMC. If a journal that you publish
in is marked as "secret," please consider publishing elsewhere.
Many important articles are missing from PaperBLAST, either because
the article's full text is not in EuropePMC (as for many older
articles), or because the paper does not mention a protein identifier such as a locus tag, or because of PaperBLAST's heuristics. If you notice an
article that characterizes a protein's function but is missing from
PaperBLAST, please notify the curators at UniProt
or add an entry to GeneRIF.
Entries in either of these databases will eventually be incorporated
into PaperBLAST. Note that to add an entry to UniProt, you will need
to find the UniProt identifier for the protein. If the protein is not
already in UniProt, you can ask them to create an entry. To add an
entry to GeneRIF, you will need an NCBI Gene identifier, but
unfortunately many prokaryotic proteins in RefSeq do not have
corresponding Gene identifers.
References
PaperBLAST: Text-mining papers for information about homologs.
M. N. Price and A. P. Arkin (2017). mSystems, 10.1128/mSystems.00039-17.
Europe PMC in 2017.
M. Levchenko et al (2017). Nucleic Acids Research, 10.1093/nar/gkx1005.
Gene indexing: characterization and analysis of NLM's GeneRIFs.
J. A. Mitchell et al (2003). AMIA Annu Symp Proc 2003:460-464.
UniProt: the universal protein knowledgebase.
The UniProt Consortium (2016). Nucleic Acids Research, 10.1093/nar/gkw1099.
BRENDA in 2017: new perspectives and new tools in BRENDA.
S. Placzek et al (2017). Nucleic Acids Research, 10.1093/nar/gkw952.
The EcoCyc database: reflecting new knowledge about Escherichia coli K-12.
I. M. Keeseler et al (2016). Nucleic Acids Research, 10.1093/nar/gkw1003.
The MetaCyc database of metabolic pathways and enzymes.
R. Caspi et al (2018). Nucleic Acids Research, 10.1093/nar/gkx935.
CharProtDB: a database of experimentally characterized protein annotations.
R. Madupu et al (2012). Nucleic Acids Research, 10.1093/nar/gkr1133.
The carbohydrate-active enzymes database (CAZy) in 2013.
V. Lombard et al (2014). Nucleic Acids Research, 10.1093/nar/gkt1178.
The Transporter Classification Database (TCDB): recent advances
M. H. Saier, Jr. et al (2016). Nucleic Acids Research, 10.1093/nar/gkv1103.
REBASE - a database for DNA restriction and modification: enzymes, genes and genomes.
R. J. Roberts et al (2015). Nucleic Acids Research, 10.1093/nar/gku1046.
Deep annotation of protein function across diverse bacteria from mutant phenotypes.
M. N. Price et al (2016). bioRxiv, 10.1101/072470.
by Morgan Price,
Arkin group
Lawrence Berkeley National Laboratory