PaperBLAST
PaperBLAST Hits for VIMSS757130 Hypothetical protein (289 a.a., MYSNHSIDQA...)
Show query sequence
>VIMSS757130 Hypothetical protein
MYSNHSIDQAAPYSGCSGIGLRLEHIDDILKEQPTVDYFEVLADNYMKQSGVQFKQLLKI
AEFYPVSLHSVGLSIATSSEPDYQYLRQIKDLAHCLNSKLISDHLCWTHANQFFTHELIP
FPYTEETLSFIIEKTNRVQEYLNQPIMYENVSRYVTYRQNTLSEAEFLNELSTATGCGIL
LDINNLYVNWYNHGDDPDKYLHSMNTKNVWQMHLGGFSKQEGYLLDSHSDKVYQDVWQLY
EKAQNLFINTPTVIEWDNDLPDFSLLYEEMCKAKTIIKCVAERSMNHYA
Running BLASTp...
Found 51 similar proteins in the literature:
lpg0667 Hypothetical protein from Legionella pneumophila subsp. pneumophila str. Philadelphia 1
100% identity, 100% coverage
LPP_RS03635 DUF692 domain-containing protein from Legionella pneumophila str. Paris
97% identity, 100% coverage
LPL_RS03550 DUF692 domain-containing protein from Legionella pneumophila str. Lens
95% identity, 100% coverage
LPA_RS03760, LPC_RS03800 DUF692 domain-containing protein from Legionella pneumophila 2300/99 Alcoy
96% identity, 100% coverage
- The Legionella pneumophila GIG operon responds to gold and copper in planktonic and biofilm cultures
Jwanoswki, PloS one 2017 - “...813482 (-) LPA_RS03775 816859 (+) NC_014125.1 LPA_RS03755 814258 (-) LPA_RS11585 2646878 (-) LPA_RS03795 822332 (-) LPA_RS03760 815120 (-) LPA_RS11590 2647738 (-) LPA_RS03800 823152 (-) LPA_RS03765 815380 (-) LPA_RS11595 2648134 (-) LPA_RS03785 819919 (+) We surveyed the genomes of four additional L . pneumophila strains for the...”
- “...827443 (-) LPC_RS03815 830819 (+) NC_009494.1 LPC_RS03795 828003 (-) LPC_RS11785 2673120 (-) LPC_RS03835 836292 (-) LPC_RS03800 829080 (-) LPC_RS11790 2673980 (-) LPC_RS03840 837112 (-) LPC_RS03805 829340 (-) LPC_RS11795 2674376 (-) LPC_RS03825 833879 (+) Lens LPL_RS03540 790101 (-) LPL_RS03565 793477 (+) NC_006369.1 LPL_RS03545 790877 (-) LPL_RS11000 2494966...”
lpg2254 Hypothetical protein from Legionella pneumophila subsp. pneumophila str. Philadelphia 1
40% identity, 92% coverage
LPP_RS11160 DUF692 domain-containing protein from Legionella pneumophila str. Paris
40% identity, 93% coverage
LPA_RS11590, LPC_RS11790 DUF692 domain-containing protein from Legionella pneumophila str. Corby
39% identity, 96% coverage
- The Legionella pneumophila GIG operon responds to gold and copper in planktonic and biofilm cultures
Jwanoswki, PloS one 2017 - “...816859 (+) NC_014125.1 LPA_RS03755 814258 (-) LPA_RS11585 2646878 (-) LPA_RS03795 822332 (-) LPA_RS03760 815120 (-) LPA_RS11590 2647738 (-) LPA_RS03800 823152 (-) LPA_RS03765 815380 (-) LPA_RS11595 2648134 (-) LPA_RS03785 819919 (+) We surveyed the genomes of four additional L . pneumophila strains for the presence of the...”
- “...830819 (+) NC_009494.1 LPC_RS03795 828003 (-) LPC_RS11785 2673120 (-) LPC_RS03835 836292 (-) LPC_RS03800 829080 (-) LPC_RS11790 2673980 (-) LPC_RS03840 837112 (-) LPC_RS03805 829340 (-) LPC_RS11795 2674376 (-) LPC_RS03825 833879 (+) Lens LPL_RS03540 790101 (-) LPL_RS03565 793477 (+) NC_006369.1 LPL_RS03545 790877 (-) LPL_RS11000 2494966 (-) LPL_RS03585 798950...”
LPL_RS11005 DUF692 domain-containing protein from Legionella pneumophila str. Lens
39% identity, 95% coverage
HWN72_21935 DUF692 domain-containing protein from Novosphingobium sp. HR1a
37% identity, 92% coverage
- LuxR402 of Novosphingobium sp. HR1a regulates the correct configuration of cell envelopes
Segura, Frontiers in microbiology 2023 - “...luxR family (l uxR402 ) HWN72_23475 2.38 TonB-dependent receptor HWN72_17540 2.37 LLM class flavin-dependent oxidoreductase HWN72_21935 2.17 DUF692 domain-containing protein HWN72_18160 2.10 Nuclear transport factor 2 family protein HWN72_21235 2.04 tRNA-Ser HWN72_18065 2.01 Efflux RND transporter periplasmic adaptor subunit Gene HWN72_14810 encodes a hypothetical protein but...”
CC2906 conserved hypothetical protein from Caulobacter crescentus CB15
CCNA_03000 hypothetical protein from Caulobacter crescentus NA1000
40% identity, 84% coverage
- Extracytoplasmic function (ECF) sigma factor σF is involved in Caulobacter crescentus response to heavy metal stress
Kohler, BMC microbiology 2012 - “...a transcriptional response to environmental stresses still needs to be characterized. The observation that genes CC2906, CC3255 and CC3257, previously found to be dependent on F [ 16 ], are induced following C. crescentus exposure to chromate, dichromate and cadmium [ 12 ] suggested to us...”
- “...only six genes down-regulated in sigF mutant cells relative to the parental strain (CC2748, CC2905, CC2906, CC3255, CC3256 and CC3257) (Table 1 ). Interestingly, close inspection of probes corresponding to the upstream region from CC2906 and CC3255 suggested that these regions are also down-regulated in sigF...”
- A caulobacter crescentus extracytoplasmic function sigma factor mediating the response to oxidative stress in stationary phase
Alvarez-Martinez, Journal of bacteriology 2006 - “...University of California, Berkeley CC1039 CC1775 CC1777 CC1839 CC2906 CC3255 CC3257 CC3572 RTPCRb Gene product a 1843 1844 ALVAREZ-MARTINEZ ET AL. TABLE 4....”
- Environmental Conditions Modulate the Transcriptomic Response of Both Caulobacter crescentus Morphotypes to Cu Stress
Maertens, Microorganisms 2021 - “...determinants that could mask its role. The SigF-regulated CCNA_02999-03001 cluster codes for uncharacterized proteins, but CCNA_03000 and CCNA_03001 share 44.5% and 38.5% protein similarity with CCNA_03364 and CCNA_003363, respectively. Finally, the SigF-regulated CCNA_02834-02833 cluster is homologous to the MrsPQ system, which is involved in the protection...”
- Extracytoplasmic function (ECF) sigma factor σF is involved in Caulobacter crescentus response to heavy metal stress
Kohler, BMC microbiology 2012 - “...chromosome of NA1000, an open reading frame (CCNA_03001) was proposed to be located between genes CCNA_03000 (corresponding to CC2906) and CCNA_03002 (corresponding to CC2908). Nevertheless, CCNA_03001 appears to be co-transcribed with CCNA_03000 and CCNA_03002. In addition, we could observe co-occurrence of CCNA_03001 with other F -dependent...”
- “...nucleotide sequence between CC2906 and CC2908 in CB15 strain is identical to the region between CCNA_03000 and CCNA_03002 of NA1000 strain, we conclude that CC2907 was incorrectly annotated in the genome of CB15 strain and this gene is the first one of the operon CC2907-CC2906-CC2905 (Figure...”
Y3000_CAUVN / A0A0H3CC29 UPF0276 protein CCNA_03000 from Caulobacter vibrioides (strain NA1000 / CB15N) (Caulobacter crescentus) (see paper)
40% identity, 84% coverage
- disruption phenotype: Not essential even when cells grown in the presence of its inducers dichromate or cadmium.
lpg2107 Hypothetical protein from Legionella pneumophila subsp. pneumophila str. Philadelphia 1
37% identity, 89% coverage
- The Legionella pneumophila GIG operon responds to gold and copper in planktonic and biofilm cultures
Jwanoswki, PloS one 2017 - “...N-terminal DUF2063 domain with putative role in DNA binding and transcriptional regulation Rmet_4684 (ABF11546.1) 278 lpg2107 (YP_096120.1) 284 7.00E-72 98 40 DUF692 family of uncharacterized bacterial proteins; possibly involved in methanobactin synthesis Rmet_4685 (ABF11547.1) 94 lpg2108 (YP_096121.1) 97 9.00E-06 81 33 DUF2282 family of putative integral...”
- “...to function in DNA binding and transcriptional regulation [ 27 ]. The third GIG gene, lpg2107 , encodes a 284-aa protein assigned to the DUF692 family of uncharacterized bacterial proteins. Other members of this family are key enzymes in the biosynthesis of methanobactins, secreted copper-binding and...”
SMb20529 CONSERVED HYPOTHETICAL PROTEIN from Sinorhizobium meliloti 1021
37% identity, 87% coverage
ELZ14_06450 DUF692 domain-containing protein from Pseudomonas brassicacearum
36% identity, 93% coverage
BPSL1691 conserved hypothetical protein from Burkholderia pseudomallei K96243
36% identity, 86% coverage
Y3364_CAUVN / A0A0H3CEP9 UPF0276 protein CCNA_03364 from Caulobacter vibrioides (strain NA1000 / CB15N) (Caulobacter crescentus) (see paper)
CC3255 conserved hypothetical protein from Caulobacter crescentus CB15
CCNA_03364 hypothetical protein from Caulobacter crescentus NA1000
34% identity, 90% coverage
- disruption phenotype: Not essential even when cells grown in the presence of its inducers dichromate or cadmium.
- An Extracytoplasmic Function Sigma/Anti-Sigma Factor System Regulates Hypochlorous Acid Resistance and Impacts Expression of the Type IV Secretion System in Brucella melitensis
Li, Journal of bacteriology 2021 - “...ECF16 prototype SigF in C. crescentus is expressed at basal levels and regulates target gene CC3255 expression, even in the absence of an exogenous inducer ( 37 ). Maintaining basal level expression of an ECF can be advantageous by ensuring bacteria a rapid response upon exposure...”
- Extracytoplasmic function (ECF) sigma factor σF is involved in Caulobacter crescentus response to heavy metal stress
Kohler, BMC microbiology 2012 - “...transcriptional response to environmental stresses still needs to be characterized. The observation that genes CC2906, CC3255 and CC3257, previously found to be dependent on F [ 16 ], are induced following C. crescentus exposure to chromate, dichromate and cadmium [ 12 ] suggested to us that...”
- “...F in the C. crescentus response to chromium and cadmium stresses, we monitored expression of CC3255, previously identified as a F -dependent gene, as well as CC3252, which is co-transcribed with sigF (CC3253), by quantitative RT-PCR. This analysis showed that CC3255 is significantly induced in parental...”
- A caulobacter crescentus extracytoplasmic function sigma factor mediating the response to oxidative stress in stationary phase
Alvarez-Martinez, Journal of bacteriology 2006 - “...sigF (reverse) CC1777 (sodA) (forward) CC1777 (reverse) CC3255 (forward) CC3255 (reverse) CC1039 (msrA) (forward) CC1039 (reverse) These primers were also used...”
- “...were also performed to confirm differential expression of CC3255, which encodes a conserved hypothetical protein, between the two strains. This is the first...”
- Whole-genome transcriptional analysis of heavy metal stresses in Caulobacter crescentus
Hu, Journal of bacteriology 2005 - “...it is a dose response. Four genes (CC3254, CC3255, CC3256, and CC3257) consecutively located on the chromosome were commonly upregulated under cadmium,...”
- Environmental Conditions Modulate the Transcriptomic Response of Both Caulobacter crescentus Morphotypes to Cu Stress
Maertens, Microorganisms 2021 - “...codes for uncharacterized proteins, but CCNA_03000 and CCNA_03001 share 44.5% and 38.5% protein similarity with CCNA_03364 and CCNA_003363, respectively. Finally, the SigF-regulated CCNA_02834-02833 cluster is homologous to the MrsPQ system, which is involved in the protection of proteins from oxidative stress by repairing oxidized periplasmic proteins...”
PGA1_c21010 component of stress sensing system, with PGA1_c21020 (DUF2063) from Phaeobacter inhibens DSM 17395
35% identity, 89% coverage
- mutant phenotype: PFam PF05114.9 (DUF692). Conserved cofitness and pleiotropic phenotypes
PP0992 conserved hypothetical protein from Pseudomonas putida KT2440
38% identity, 91% coverage
MIM_c31300 DUF692 domain-containing protein from Advenella mimigardefordensis DPN7
35% identity, 88% coverage
- Proteomic analysis of organic sulfur compound utilisation in Advenella mimigardefordensis strain DPN7T
Meinert, PloS one 2017 - “...locus tags: MIM_c31270, putative membrane protein; MIM_c31280, putative membrane protein, DoxX family; MIM_c31290, hypothetical protein; MIM_c31300, hypothetical protein; MIM_c31310, hypothetical protein; MIM_c31320, methylisocitrate lyase; MIM_c31330, putative AcnD-accessory protein PrpF; MIM_c31340, Fe/S-dependent 2-methylisocitrate dehydratase; MIM_c31350, 2-methylcitrate synthase; MIM_c31360, transcriptional regulator, XRE family ; MIM_c31370, transcriptional regulator, LysR...”
LPC_RS03840 DUF692 domain-containing protein from Legionella pneumophila str. Corby
35% identity, 90% coverage
LPA_RS03800 DUF692 domain-containing protein from Legionella pneumophila 2300/99 Alcoy
35% identity, 90% coverage
LPL_RS03590 DUF692 domain-containing protein from Legionella pneumophila str. Lens
35% identity, 90% coverage
LPP_RS03675 DUF692 domain-containing protein from Legionella pneumophila str. Paris
35% identity, 90% coverage
lpg0676 Hypothetical protein from Legionella pneumophila subsp. pneumophila str. Philadelphia 1
36% identity, 89% coverage
- The Legionella pneumophila GIG operon responds to gold and copper in planktonic and biofilm cultures
Jwanoswki, PloS one 2017 - “...GIG genes, again arranged in identical order and strand orientation. The H4 operon ( lpg0671- lpg0676 ) contains homologs of all four GIG genes, but in a different order and on different strands. The first H4 gene ( lpg0671 ) shares a DoxX domain with lpg2105...”
- “...lpg2253 2557140 (-) lpg0675 725484 (-) lpg2107 2354586 (-) lpg0667 718248 (-) lpg2254 2558033 (-) lpg0676 726280 (-) lpg2108 2354894 (-) lpg0669 718508 (-) lpg2255 2558333 (-) lpg0673 723047 (+) Paris LPP_RS03625 801134 (-) LPP_RS03650 804510 (+) NC_006368.2 LPP_RS03630 801910 (-) LPP_RS11155 2543886 (-) LPP_RS03670 809982...”
BP2925 conserved hypothetical protein from Bordetella pertussis Tohama I
33% identity, 90% coverage
- Differentially expressed genes in Bordetella pertussis strains belonging to a lineage which recently spread globally
de, PloS one 2014 - “...Un 1.9 3.0 * 1.3 2.1 BP2486 exported protein Cm 4.4 * 4.0 * 1.6 BP2925 conserved hypothetical protein C 1.2 4.0 * 1.6 1.9 BP2926 conserved hypothetical protein Un 1.5 3.4 * 1.5 1.7 BP3095 modB molybdate-binding periplasmic protein precursor P 1.1 3.4 * 1.4...”
- Genome-wide gene expression analysis of Bordetella pertussis isolates associated with a resurgence in pertussis: elucidation of factors involved in the increased fitness of epidemic strains
King, PloS one 2013 - “...0,01021466 0,0421139 BP2315 autotransporter vag8 1,701057776 0,000587588 0,0066942 BP2924 putative exported protein 1,23341571 0,001204634 0,0108068 BP2925 conserved hypothetical protein 1,26736157 1,52744E05 0,0006696 BP2927 putative integral membrane protein 1,393854682 5,92749E07 8,467E05 BP3783 pertussis toxin subunit 1 precursor ptxA 1,173210514 0,008153995 0,0362248 BP3785 pertussis toxin subunit 4 precursor...”
Q1LE80 UPF0276 protein Rmet_4684 from Cupriavidus metallidurans (strain ATCC 43123 / DSM 2839 / NBRC 102507 / CH34)
Rmet_4684 DUF692 domain-containing protein from Cupriavidus metallidurans CH34
34% identity, 91% coverage
HMPREF0012_00560 DUF692 domain-containing protein from Acinetobacter calcoaceticus RUH2202
32% identity, 90% coverage
PP_2398 conserved hypothetical protein from Pseudomonas putida KT2440
31% identity, 91% coverage
NTHI1443 hypothetical protein from Haemophilus influenzae 86-028NP
HI1600 H. influenzae predicted coding region HI1600 from Haemophilus influenzae Rd KW20
32% identity, 80% coverage
- A multi-iron enzyme installs copper-binding oxazolone/thioamide pairs on a nontypeable Haemophilus influenzae virulence factor
Manley, Proceedings of the National Academy of Sciences of the United States of America 2024 - “...other genes of unknown function ( Fig. 1 A ). Notably, downstream of NTHI1441 are NTHI1443 and NTHI1444 , which are predicted to encode an MNIO and an RRE-containing DUF2063-like protein, respectively ( SI Appendix , Fig. S1 ). This observation led us to hypothesize that...”
- “...Fig. 1 B , R1 to R3). These cysteines are potential sites of PTM by NTHI1443. Fig. 1. The operon structure and amino acid sequence of NTHI1441. ( A ) The NTHI1441 operon is composed of four genes: NTHI1440 , predicted to encode a DoxX protein;...”
- Discovery and Contribution of Nontypeable Haemophilus influenzae NTHI1441 to Human Respiratory Epithelial Cell Invasion
Ahearn, Infection and immunity 2019 - “...four gene operon 235 including the ORFs NTHI1440, NTHI1443, and NTHI1444 (Figure 3A) (34). Endpoint 236 reverse transcriptase-PCR with RNA isolated from strain...”
- “...transcript that includes the ORFs NTHI1440, NTHI1441, 247 NTHI1443, and NTHI1444. The nucleotide sequences of the NTHI1441 operon is 248 present and conserved...”
- Structure-based function analysis of putative conserved proteins with isomerase activity from Haemophilus influenzae
Shahbaaz, 3 Biotech 2015 - “...HP HI1388.1 11. NC_000907.1 950784 NP_439587.1 Q57152 HP HI1436 12. NC_000907.1 950455 NP_439742.1 P44268 HP HI1600 13. NC_000907.1 950796 NP_439799.1 P52606 HP HI1657 Materials and methods Sequence retrieval Extensive analysis of H. influenzae genome shows 1,657 proteins which are encoded by its genome ( http://www.ncbi.nlm.nih.gov/genome/?term=haemophilus+influenzae )....”
- Functional annotation of conserved hypothetical proteins from Haemophilus influenzae Rd KW20
Shahbaaz, PloS one 2013 - “...HP HI1581 950440 P44262 Glyoxalase/Bleomycin resistance protein/Dihydroxybiphenyldioxygenase 268. HP HI1598 950454 P45267 adenylatecyclase 269. HP HI1600 950455 P44268 Xylose isomerase-like, TIM barrel domain 270. HP HI1602 950457 P44270 TQO small subunit DoxD family protein (subunit of the terminal quinol oxidase) 271. HP HI1605 950458 P44272 SH3...”
P44268 UPF0276 protein HI_1600 from Haemophilus influenzae (strain ATCC 51907 / DSM 11121 / KW20 / Rd)
32% identity, 83% coverage
- Structure-based function analysis of putative conserved proteins with isomerase activity from Haemophilus influenzae
Shahbaaz, 3 Biotech 2015 - “...NP_439541.1 O86237 HP HI1388.1 11. NC_000907.1 950784 NP_439587.1 Q57152 HP HI1436 12. NC_000907.1 950455 NP_439742.1 P44268 HP HI1600 13. NC_000907.1 950796 NP_439799.1 P52606 HP HI1657 Materials and methods Sequence retrieval Extensive analysis of H. influenzae genome shows 1,657 proteins which are encoded by its genome (...”
- “...tools. The predicted models of P44506, P44641, P46494, P44827, Q57151, P44094, P45104, P71373, P44160, Q57152, P44268, P52606 show significant validation score on SAVES server. The outcomes of structural analysis for each protein are described here, separately. HP P44506 HP P44506 is localized in the cytoplasm and...”
- Functional annotation of conserved hypothetical proteins from Haemophilus influenzae Rd KW20
Shahbaaz, PloS one 2013 - “...950440 P44262 Glyoxalase/Bleomycin resistance protein/Dihydroxybiphenyldioxygenase 268. HP HI1598 950454 P45267 adenylatecyclase 269. HP HI1600 950455 P44268 Xylose isomerase-like, TIM barrel domain 270. HP HI1602 950457 P44270 TQO small subunit DoxD family protein (subunit of the terminal quinol oxidase) 271. HP HI1605 950458 P44272 SH3 domain-containing protein...”
B158DRAFT_1333 component of chlorite stress sensing system with B158DRAFT_1334 (DUF2063) from Kangiella aquimarina DSM 16071
30% identity, 90% coverage
- mutant phenotype: PFam PF05114.9 (DUF692). conserved specific phenotype and conserved cofitness
NMH_2154 DUF692 domain-containing protein from Neisseria meningitidis H44/76
32% identity, 89% coverage
- Deep sequencing whole transcriptome exploration of the σE regulon in Neisseria meningitidis
Huis, PloS one 2011 - “...doxX family protein, E operon NMH_02153 5042 60516 12.0 <0.0001 Hypothetical protein in E operon NMH_2154 4807 63164 13.1 <0.0001 Hypothetical protein in E operon NMH_2155 3071 40218 13.1 <0.0001 Hypothetical protein in E operon NMH_2156 E 1587 9997 6.3 <0.0001 RNA polymerase E , E...”
- “...mseR Fold change p -value Description NMH_0763 nqrE 30 121 4.0 0.020 NADH: ubiquinone oxidoreductase NMH_2154 157 19 0.1 <0.001 Hypothetical protein in E operon NMH_2156 E 26 141 5.4 0.001 RNA polymerase E , E operon NMH_2475 17 461 27.1 <0.001 mechanosensitive ion channel family...”
NMB2142 hypothetical protein from Neisseria meningitidis MC58
32% identity, 89% coverage
NGO1946 hypothetical protein from Neisseria gonorrhoeae FA 1090
32% identity, 89% coverage
- Dual species transcriptomics reveals conserved metabolic and immunologic processes in interactions between human neutrophils and Neisseria gonorrhoeae
Potter, PLoS pathogens 2024 - “...factor Ecf (NGO1944). ecf , msrAB , and members of the ecf operon (NGO1944, NGO1945, NGO1946, and NGO1948) were upregulated in Gc exposed to PMNs in both Gc strains ( Figs 2 and S6 ) [ 34 ]. We speculated that an ecf mutant may have...”
- The structure of the first representative of Pfam family PF09836 reveals a two-domain organization and suggests involvement in transcriptional regulation
Das, Acta crystallographica. Section F, Structural biology and crystallization communications 2010 - “...NGO1943 (unknown function), NGO1944 (Pfam PF04542, domain 2 of 70 ECF RNA polymerase sigma factors), NGO1946 (unknown function DUF692; PF05114), NGO1947 (putative periplasmic protein of unknown function; Gunesekere et al. , 2006 ), NGO1948 (DoxX; PF07681, similar to DoxD, the small subunit of the terminal quinol...”
- “...function DUF452; PF04301). Functional studies with NGO1944 based on DNA microarrays suggest that NGO1944, NGO1945, NGO1946, NGO1947 and NGO1948 may be cotranscribed and involved in the regulation of msrAB , a methionine sulfoxide reductase (Gunesekere et al. , 2006 ). The genomic context of other DUF2063...”
- Identification of a novel anti-sigmaE factor in Neisseria meningitidis
Hopman, BMC microbiology 2010 - “...NGO1943) of N . gonorrhoeae is identical to that of meningococci (NMB2140-NMB2145), and four genes, NGO1946, NGO1947, NGO1948 belonging to the rpoE operon, and NGO2059, encoding MsrA/MrsB, were also upregulated, along with E (NGO1944) itself, in a gonococcal strain overexpressing rpoE [ 24 ]. We demonstrated...”
- Ecf, an alternative sigma factor from Neisseria gonorrhoeae, controls expression of msrAB, which encodes methionine sulfoxide reductase
Gunesekere, Journal of bacteriology 2006 - “...RACE Northern blotting; amplification of msrAB probe qRT-PCR of NGO1946 qRT-PCR of NGO1948, 5 RACE qRT-PCR of NGO1946 qRT-PCR of msrAB qRT-PCR of msrAB qRT-PCR...”
- “...result of overexpression of Ecf ORF IDa NGO1944d NGO1946 NGO1947 NGO1948 NGO2059 Fold changeb Gene name ecf msrAB Proposed function Microarrayc qRT-PCR 2.2 1.8...”
NMA0228 hypothetical protein NMA0228 from Neisseria meningitidis Z2491
32% identity, 89% coverage
LF41_2296 DUF692 domain-containing protein from Lysobacter dokdonensis DS-58
29% identity, 91% coverage
- Genome sequence of Lysobacter dokdonensis DS-58(T), a gliding bacterium isolated from soil in Dokdo, Korea
Kwak, Standards in genomic sciences 2015 - “...permease (LF41_2293); 7, ribonuclease T (LF41_2294); 8, hypothetical protein (LF41_2295); 9, DUF692 domain containing protein (LF41_2296); 10, hypothetical protein (LF41_2297); 11, phosphate transport system regulatory protein (LF41_2298); 12, phosphate transport ATP-binding protein (LF41_2299); 13, phosphate transport system permease protein (LF41_2300); 14, phosphate transport system permease protein...”
Sama_1305 component of chlorite stress sensing system with Sama_1304 from Shewanella amazonensis SB2B
29% identity, 90% coverage
- mutant phenotype: PFam PF05114.9 (DUF692). conserved specific phenotype, and Sama_1304 seems to replace DUF2063
HS_1138 hypothetical protein from Haemophilus somnus 129PT
30% identity, 71% coverage
PA4106 hypothetical protein from Pseudomonas aeruginosa PAO1
Q9HWS2 UPF0276 protein PA4106 from Pseudomonas aeruginosa (strain ATCC 15692 / DSM 22644 / CIP 104116 / JCM 14847 / LMG 12228 / 1C / PRS 101 / PAO1)
30% identity, 88% coverage
- Protein interactions in human pathogens revealed through deep learning
Humphreys, Nature microbiology 2024 - “...and protein of unknown function YcaR (P0AAZ7) from E. coli . c , Uncharacterized protein PA4106 ( Q9HWS2 ) and a putative transcriptional factor PA4105 ( Q9HWS3 ) from P. aeruginosa . d , lpg2881 and lpg0371 from L. pneumophila , a pair that is tested...”
- Multimodal cadmium resistance and its regulatory networking in Pseudomonas aeruginosa strain CD3
Chatterjee, Scientific reports 2024 - “...of four protein clusters was created using the STRING database. BfmR, BfmS, PA4103, PA4104, PA4105, PA4106, and PA4107 (EFhP) proteins formed the first yellow cluster, which is involved in biofilm development and maturation. CopR, CopS, IrlR, PtrA, ParR, PcoA, PA1437, PA1438, PA4886, PA2807, PA2523 (CzcR), and...”
- Genome-Wide Mapping Reveals Complex Regulatory Activities of BfmR in Pseudomonas aeruginosa
Fan, Microorganisms 2021 - “...only a handful of BfmR targets (e.g., bfmRS operon, pa4103 - pa4104 operon, pa4107 - pa4106 - pa4105 operon, rhlR , and phdA ) have been identified so far [ 11 , 18 ]. In this study, we used high-throughput methods to identify BfmR targets. We...”
- “...BfmR regulon and includes known target genes such as pa4103 , pa4104 , pa4105 , pa4106 , and pa4107 (Data set 2). Functional analyses indicate that those BfmR-targeted genes are involved in different biological processes including oxidative phosphorylation (e.g., cyoABCDE operon), metabolism (e.g., phhABC operons), antibiotic...”
- Within-Host Evolution of the Dutch High-Prevalent Pseudomonas aeruginosa Clone ST406 during Chronic Colonization of a Patient with Cystic Fibrosis
van, PloS one 2016 - “...is a hypothetical gene encoding a polypeptide with similarities to the DoxX superfamily, PA4105 and PA4106 are hypothetical genes encoding DUF 2063 and DUF 692 superfamily protein, respectively. EfhP is the gene originally named PA4107. Arrows indicate direction of transcription. Besides differences in colony morphology and...”
- A Pseudomonas aeruginosa EF-hand protein, EfhP (PA4107), modulates stress responses and virulence at high calcium concentration
Sarkisova, PloS one 2014 - “...in P. aeruginosa PAO1, suggests that it is operonic with two genes encoding hypothetical proteins PA4106 and PA4105. These proteins contain the DUF692 and DUF2063 domains, respectively, with PA4106 having structural similarity to sugar isomerases, and PA4105 having structural similarity to a predicted transcriptional regulator from...”
- A novel signal transduction pathway that modulates rhl quorum sensing and bacterial virulence in Pseudomonas aeruginosa
Cao, PLoS pathogens 2014 - “...Interestingly, these 7 genes, including PA4100 , bfmR , PA4103 , PA4104 , PA4105 , PA4106 , and PA4107 , are located at or near the bfmRS ( PA4101 - PA4102 ) loci ( Figure S2A , Table S2 in Text S1 ). These microarray-based expression...”
- “...and PA4104 are organized into PA4103 operon ( PA4103 - PA4104 ) while PA4105 , PA4106 and PA4107 are organized into PA4107 operon ( PA4107 - PA4106 - PA4105 ) ( www.pseudomonas.com ). Among these genes, PA4100 encodes a dehydrogenase of unknown function, and bfmR encodes...”
- Screening for quorum-sensing inhibitors (QSI) by use of a novel genetic system, the QSI selector
Rasmussen, Journal of bacteriology 2005 - “...PA3889 PA3890 PA3898 PA3920 PA3923 PA3957 PA4078 PA4086 PA4106 PA4129 PA4130 PA4133 PA4134 PA4138 PA4141 PA4142 PA4143 PA4171 PA4172 PA4175 PA4199 PA4204 PA4205...”
- Proteome-wide identification of druggable targets and inhibitors for multidrug-resistant <i>Pseudomonas aeruginosa</i> using an integrative subtractive proteomics and virtual screening approach
Vemula, Heliyon 2025 - “...4113 Q9I1P8 868 Q9I2U2 1950 Q9I352 3032 Q9HWR9 4114 Q9I1Q0 869 Q9I2U8 1951 Q9I357 3033 Q9HWS2 4115 Q9I1Q1 870 Q9I2W4 1952 Q9I363 3034 Q9HWS3 4116 Q9I1Q2 871 Q9I2W7 1953 Q9I387 3035 Q9HWS4 4117 Q9I1Q3 872 Q9I2X0 1954 Q9I388 3036 Q9HWS5 4118 Q9I1Q4 873 Q9I315 1955 Q9I389...”
- Protein interactions in human pathogens revealed through deep learning
Humphreys, Nature microbiology 2024 - “...of unknown function YcaR (P0AAZ7) from E. coli . c , Uncharacterized protein PA4106 ( Q9HWS2 ) and a putative transcriptional factor PA4105 ( Q9HWS3 ) from P. aeruginosa . d , lpg2881 and lpg0371 from L. pneumophila , a pair that is tested positive by...”
- “...which no function has been assigned 33 , 34 . P. aeruginosa PA4105PA4106 ( Q9HWS3 Q9HWS2 ) are uncharacterized proteins with no clear homologues of known functions based on primary sequence comparisons, but a FoldSeek v.8 search 35 revealed structural similarity between these proteins and TglI...”
PA14_21580 hypothetical protein from Pseudomonas aeruginosa UCBPP-PA14
28% identity, 93% coverage
Q9HYW0 UPF0276 protein PA3283 from Pseudomonas aeruginosa (strain ATCC 15692 / DSM 22644 / CIP 104116 / JCM 14847 / LMG 12228 / 1C / PRS 101 / PAO1)
PA3283 hypothetical protein from Pseudomonas aeruginosa PAO1
28% identity, 93% coverage
- Proteome-wide identification of druggable targets and inhibitors for multidrug-resistant <i>Pseudomonas aeruginosa</i> using an integrative subtractive proteomics and virtual screening approach
Vemula, Heliyon 2025 - “...1312 Q9HUL2 2394 Q9HTD3 3476 Q9HYV9 4558 Q9I3J4 231 Q59647 1313 Q9HUL6 2395 Q9HTD4 3477 Q9HYW0 4559 Q9I3J6 232 Q59650 1314 Q9HUL7 2396 Q9HTD5 3478 Q9HYW1 4560 Q9I3K0 233 Q9HT22 1315 Q9HUL8 2397 Q9HTD6 3479 Q9HYW2 4561 Q9I3K3 234 Q9HT84 1316 Q9HUM1 2398 Q9HTD8 3480 Q9HYW3...”
- Exposure of Pseudomonas aeruginosa to Cinnamaldehyde Selects Multidrug Resistant Mutants
Tetard, Antibiotics (Basel, Switzerland) 2022 - “...Ribose operon repressor RbsR Unknown functions PA0841 A 304 V A 304 V Hypothetical protein PA3283 N 142 S N 142 S N 142 S Hypothetical protein * STOP codon. Mutations predicted as being deleterious to protein function by PROVEAN (score < 2.5) are indicated in...”
- ZnuA and zinc homeostasis in Pseudomonas aeruginosa
Pederick, Scientific reports 2015 - “...ABC permease 8.0 PA2915 metallo -lactamase 5.2 PA2916 lysine transporter (LysE) 4.2 PA3282 hypothetical 4.1 PA3283 hypothetical 5.0 PA3284 hypothetical 5.0 PA3600 50S ribosomal protein L36 89.2 0.0013 PA3601 50S ribosomal protein L31 109.0 PA4063 Zn 2+ periplasmic binding protein 45.1 0.0011 PA4064 ABC transporter nucleotide...”
- Function of the Pseudomonas aeruginosa NrdR Transcription Factor: Global Transcriptomic Analysis and Its Role on Ribonucleotide Reductase Gene Expression
Crespo, PloS one 2015 - “...2.41 and 1.96). The highest repression under this condition was found in several hypothetical proteins (PA3283, PA3281, PA0565 with log 2 -fold changes of -4.57, -3.73, -3.06) and also several genes involved in antibiotic resistance, such as the entire mexEF-oprN operon (log 2 -fold change from...”
- Cystic fibrosis sputum supports growth and cues key aspects of Pseudomonas aeruginosa physiology
Palmer, Journal of bacteriology 2005 - “...PA2451 PA2452 PA2807 PA2862 PA2911 PA2912 PA3281 PA3282 PA3283 PA3284 PA3407 PA3444 PA3526 PA3598 PA3600 PA3601 PA3662 PA3749 PA3757 PA3758 PA3759 PA3789 PA3790...”
- Screening for quorum-sensing inhibitors (QSI) by use of a novel genetic system, the QSI selector
Rasmussen, Journal of bacteriology 2005 - “...PA1556 PA1557 PA1673 PA1715 PA1718 PA1746 PA2285 PA2310 PA3283 PA3284 PA3441 PA3442 PA3444 PA3445 PA3570 PA3720 PA3935 PA3938 PA4067 PA4191 PA4195 PA4710 PA5027...”
- Identification, timing, and signal specificity of Pseudomonas aeruginosa quorum-controlled genes: a transcriptome analysis
Schuster, Journal of bacteriology 2003 - “...PA3038 PA3174 PA3205 PA3233 PA3234 PA3235 PA3281 PA3282 PA3283 PA3284 PA3364 PA3365 PA3575 PA3790 PA4359 PA4371 PA4442 PA4443 PA4691 PA4692 PA4770 PA5168 lasI...”
SCAB_20761 putative myo-inositol catabolism protein from Streptomyces scabiei 87.22
26% identity, 58% coverage
SCAB_RS09810 DUF692 domain-containing protein from Streptomyces scabiei 87.22
26% identity, 59% coverage
3bwwA / Q0I408 Crystal structure of a duf692 family protein (hs_1138) from haemophilus somnus 129pt at 2.20 a resolution
29% identity, 93% coverage
- Ligand: fe (iii) ion (3bwwA)
SCO6045 hypothetical protein from Streptomyces coelicolor A3(2)
27% identity, 60% coverage
MAB_3022c hypothetical protein from Mycobacterium abscessus ATCC 19977
29% identity, 68% coverage
8hi7B / A0A8T8BZJ9 Crystal structure of a holoenzyme tglhi with two fe irons for pseudomonas syringae peptidyl (s) 2-mercaptoglycine biosynthesis (see paper)
24% identity, 87% coverage
- Ligand: fe (iii) ion (8hi7B)
7fc0E / A0A1I4IFL0 Reconstitution of mbnabc complex from rugamonas rubra atcc-43154 (groupiii) (see paper)
22% identity, 80% coverage
- Ligands: peptide; fe (iii) ion (7fc0E)
7dz9B / E3BK14 Mbnabc complex (see paper)
23% identity, 80% coverage
- Ligands: peptide; fe (iii) ion (7dz9B)
IW22_14845 DUF692 family multinuclear iron-containing protein from Chryseobacterium sp. JM1
22% identity, 43% coverage
PMI30_00628 DUF692 family multinuclear iron-containing protein from Pseudomonas sp. GM50
21% identity, 21% coverage
For advice on how to use these tools together, see
Interactive tools for functional annotation of bacterial genomes.
The PaperBLAST database links 798,070 different protein sequences to 1,261,478 scientific articles. Searches against EuropePMC were last performed on May 12 2025.
PaperBLAST builds a database of protein sequences that are linked
to scientific articles. These links come from automated text searches
against the articles in EuropePMC
and from manually-curated information from GeneRIF, UniProtKB/Swiss-Prot,
BRENDA,
CAZy (as made available by dbCAN),
BioLiP,
CharProtDB,
MetaCyc,
EcoCyc,
TCDB,
REBASE,
the Fitness Browser,
and a subset of the European Nucleotide Archive with the /experiment tag.
Given this database and a protein sequence query,
PaperBLAST uses protein-protein BLAST
to find similar sequences with E < 0.001.
To build the database, we query EuropePMC with locus tags, with RefSeq protein
identifiers, and with UniProt
accessions. We obtain the locus tags from RefSeq or from MicrobesOnline. We use
queries of the form "locus_tag AND genus_name" to try to ensure that
the paper is actually discussing that gene. Because EuropePMC indexes
most recent biomedical papers, even if they are not open access, some
of the links may be to papers that you cannot read or that our
computers cannot read. We query each of these identifiers that
appears in the open access part of EuropePMC, as well as every locus
tag that appears in the 500 most-referenced genomes, so that a gene
may appear in the PaperBLAST results even though none of the papers
that mention it are open access. We also incorporate text-mined links
from EuropePMC that link open access articles to UniProt or RefSeq
identifiers. (This yields some additional links because EuropePMC
uses different heuristics for their text mining than we do.)
For every article that mentions a locus tag, a RefSeq protein
identifier, or a UniProt accession, we try to select one or two
snippets of text that refer to the protein. If we cannot get access to
the full text, we try to select a snippet from the abstract, but
unfortunately, unique identifiers such as locus tags are rarely
provided in abstracts.
PaperBLAST also incorporates manually-curated protein functions:
- Proteins from NCBI's RefSeq are included if a
GeneRIF
entry links the gene to an article in
PubMed®.
GeneRIF also provides a short summary of the article's claim about the
protein, which is shown instead of a snippet.
- Proteins from Swiss-Prot (the curated part of UniProt)
are included if the curators
identified experimental evidence for the protein's function (evidence
code ECO:0000269). For these proteins, the fields of the Swiss-Prot entry that
describe the protein's function are shown (with bold headings).
- Proteins from BRENDA,
a curated database of enzymes, are included if they are linked to a paper in PubMed
and their full sequence is known.
- Every protein from the non-redundant subset of
BioLiP,
a database
of ligand-binding sites and catalytic residues in protein structures, is included. Since BioLiP itself
does not include descriptions of the proteins, those are taken from the
Protein Data Bank.
Descriptions from PDB rely on the original submitter of the
structure and cannot be updated by others, so they may be less reliable.
(For SitesBLAST and Sites on a Tree, we use a larger subset of BioLiP so that every
ligand is represented among a group of structures with similar sequences, but for
PaperBLAST, we use the non-redundant set provided by BioLiP.)
- Every protein from EcoCyc, a curated
database of the proteins in Escherichia coli K-12, is included, regardless
of whether they are characterized or not.
- Proteins from the MetaCyc metabolic pathway database
are included if they are linked to a paper in PubMed and their full sequence is known.
- Proteins from the Transport Classification Database (TCDB)
are included if they have known substrate(s), have reference(s),
and are not described as uncharacterized or putative.
(Some of the references are not visible on the PaperBLAST web site.)
- Every protein from CharProtDB,
a database of experimentally characterized protein annotations, is included.
- Proteins from the CAZy database of carbohydrate-active enzymes
are included if they are associated with an Enzyme Classification number.
Even though CAZy does not provide links from individual protein sequences to papers,
these should all be experimentally-characterized proteins.
- Proteins from the REBASE database
of restriction enzymes are included if they have known specificity.
- Every protein with an evidence-based reannotation (based on mutant phenotypes)
in the Fitness Browser is included.
- Sequence-specific transcription factors (including sigma factors and DNA-binding response regulators)
with experimentally-determined DNA binding sites from the
PRODORIC database of gene regulation in prokaryotes.
- Putative transcription factors from RegPrecise
that have manually-curated predictions for their binding sites. These predictions are based on
conserved putative regulatory sites across genomes that contain similar transcription factors,
so PaperBLAST clusters the TFs at 70% identity and retains just one member of each cluster.
- Coding sequence (CDS) features from the
European Nucleotide Archive (ENA)
are included if the /experiment tag is set (implying that there is experimental evidence for the annotation),
the nucleotide entry links to paper(s) in PubMed,
and the nucleotide entry is from the STD data class
(implying that these are targeted annotated sequences, not from shotgun sequencing).
Also, to filter out genes whose transcription or translation was detected, but whose function
was not studied, nucleotide entries or papers with more than 25 such proteins are excluded.
Descriptions from ENA rely on the original submitter of the
sequence and cannot be updated by others, so they may be less reliable.
Except for GeneRIF and ENA,
the curated entries include a short curated
description of the protein's function.
For entries from BioLiP, the protein's function may not be known beyond binding to the ligand.
Many of these entries also link to articles in PubMed.
For more information see the
PaperBLAST paper (mSystems 2017)
or the code.
You can download PaperBLAST's database here.
Changes to PaperBLAST since the paper was written:
- November 2023: incorporated PRODORIC and RegPrecise. Many PRODORIC entries were not linked to a protein sequence (no UniProt identifier), so we added this information.
- February 2023: BioLiP changed their download format. PaperBLAST now includes their non-redundant subset. SitesBLAST and Sites on a Tree use a larger non-redundant subset that ensures that every ligand is represented within each cluster. This should ensure that every binding site is represented.
- June 2022: incorporated some coding sequences from ENA with the /experiment tag.
- March 2022: incorporated BioLiP.
- April 2020: incorporated TCDB.
- April 2019: EuropePMC now returns table entries in their search results. This has expanded PaperBLAST's database, but most of the new entries are of low relevance, and the resulting snippets are often just lists of locus tags with annotations.
- February 2018: the alignment page reports the conservation of the hit's functional sites (if available from from Swiss-Prot or UniProt)
- January 2018: incorporated BRENDA.
- December 2017: incorporated MetaCyc, CharProtDB, CAZy, REBASE, and the reannotations from the Fitness Browser.
- September 2017: EuropePMC no longer returns some table entries in their search results. This has shrunk PaperBLAST's database, but has also reduced the number of low-relevance hits.
Many of these changes are described in Interactive tools for functional annotation of bacterial genomes.
PaperBLAST cannot provide snippets for many of the papers that are
published in non-open-access journals. This limitation applies even if
the paper is marked as "free" on the publisher's web site and is
available in PubmedCentral or EuropePMC. If a journal that you publish
in is marked as "secret," please consider publishing elsewhere.
Many important articles are missing from PaperBLAST, either because
the article's full text is not in EuropePMC (as for many older
articles), or because the paper does not mention a protein identifier such as a locus tag, or because of PaperBLAST's heuristics. If you notice an
article that characterizes a protein's function but is missing from
PaperBLAST, please notify the curators at UniProt
or add an entry to GeneRIF.
Entries in either of these databases will eventually be incorporated
into PaperBLAST. Note that to add an entry to UniProt, you will need
to find the UniProt identifier for the protein. If the protein is not
already in UniProt, you can ask them to create an entry. To add an
entry to GeneRIF, you will need an NCBI Gene identifier, but
unfortunately many prokaryotic proteins in RefSeq do not have
corresponding Gene identifers.
References
PaperBLAST: Text-mining papers for information about homologs.
M. N. Price and A. P. Arkin (2017). mSystems, 10.1128/mSystems.00039-17.
Europe PMC in 2017.
M. Levchenko et al (2017). Nucleic Acids Research, 10.1093/nar/gkx1005.
Gene indexing: characterization and analysis of NLM's GeneRIFs.
J. A. Mitchell et al (2003). AMIA Annu Symp Proc 2003:460-464.
UniProt: the universal protein knowledgebase.
The UniProt Consortium (2016). Nucleic Acids Research, 10.1093/nar/gkw1099.
BRENDA in 2017: new perspectives and new tools in BRENDA.
S. Placzek et al (2017). Nucleic Acids Research, 10.1093/nar/gkw952.
The EcoCyc database: reflecting new knowledge about Escherichia coli K-12.
I. M. Keeseler et al (2016). Nucleic Acids Research, 10.1093/nar/gkw1003.
The MetaCyc database of metabolic pathways and enzymes.
R. Caspi et al (2018). Nucleic Acids Research, 10.1093/nar/gkx935.
CharProtDB: a database of experimentally characterized protein annotations.
R. Madupu et al (2012). Nucleic Acids Research, 10.1093/nar/gkr1133.
The carbohydrate-active enzymes database (CAZy) in 2013.
V. Lombard et al (2014). Nucleic Acids Research, 10.1093/nar/gkt1178.
The Transporter Classification Database (TCDB): recent advances
M. H. Saier, Jr. et al (2016). Nucleic Acids Research, 10.1093/nar/gkv1103.
REBASE - a database for DNA restriction and modification: enzymes, genes and genomes.
R. J. Roberts et al (2015). Nucleic Acids Research, 10.1093/nar/gku1046.
Deep annotation of protein function across diverse bacteria from mutant phenotypes.
M. N. Price et al (2016). bioRxiv, 10.1101/072470.
by Morgan Price,
Arkin group
Lawrence Berkeley National Laboratory