PaperBLAST
PaperBLAST Hits for O34628 Uncharacterized protein YvlB (Bacillus subtilis (strain 168)) (365 a.a., MKQEKERILK...)
Show query sequence
>O34628 Uncharacterized protein YvlB (Bacillus subtilis (strain 168))
MKQEKERILKLVEEGKLTAQEALTLIEKLDSDYKEKEEKITALSVHVHDEEEPFTTAKKE
SGKPSLGAKLFDWIDSAVKKVKEVDLDLNFGHAYDVQHIFQFKDTDFSSVELQIANGSVN
IVPWEDDDIRAECQAKVYRADSQDAARHAFLQHIECEIKGNKFFIRTEKKTMKTNVTLYI
PQKEYDKIRVKLFNGPVRGEHLHVKEFSAKTTNGVLSFSYLTAEKAIAETANGQIKLASH
SCGTIEAETINGLIDLRGKSESIDVQSFNGNIAINVTESDCRSIYAKTTTGNVELAIPDD
LAVKAELKSNLGTLSHELMDVEMLKEKNDTIQKEMMFTSNQAHDQNITVFSESLTGAIKL
KYSQR
Running BLASTp...
Found 13 similar proteins in the literature:
O34628 Uncharacterized protein YvlB from Bacillus subtilis (strain 168)
100% identity, 100% coverage
OG1RF_11464 daptomycin-sensing surface protein LiaX from Enterococcus faecalis OG1RF
32% identity, 45% coverage
LIAX_ENTFA / Q834B6 Putative adhesin domain-containing protein LiaX; Daptomycin-sensing surface protein LiaX from Enterococcus faecalis (strain ATCC 700802 / V583) (see paper)
EF1753 conserved hypothetical protein from Enterococcus faecalis V583
32% identity, 45% coverage
- function: Involved in cell membrane remodeling, perhaps acting by negative modulation of the liaFSR and liaXYZ gene clusters, thereby regulating content and localization of anionic phospholipids (PubMed:31818937). Binds to the antibiotic daptomycin (DAP) and to cationic antimicrobial peptides, such as human LL-37, perhaps functioning as a sensor that activates the cell envelope stress response (PubMed:31818937).
disruption phenotype: Deletion in the OG1RF strain causes anionic phospholipids to be redistributed away from the division septum; during exponential growth, there are increases in phosphatidylglycerol (PG), decreases in cardiolipin, with no change in lysyl-PG (PubMed:31818937). Confers resistance to the antibiotic daptomycin (DAP) in the OG1RF strain (PubMed:31818937). Resistant to killing of the OG1RF strain by human cathelicidin LL-37 (PubMed:31818937). Deletion in the OG1RF strain increases virulence in nematode C.elegans infection model (PubMed:31818937). - Genes Contributing to the Unique Biology and Intrinsic Antibiotic Resistance of Enterococcus faecalis
Gilmore, mBio 2020 - “...0.000 (>0.10) 0.000 (>0.10) 0.000 (>0.10) 0.490 (>0.10) 0.214 (>0.10) 0.065 (>0.10) 0.289 (>0.10) SAW_01705 EF1753 LiaX Important 0.053 0.138 (>0.10) 0.000 (>0.10) 0.096 (>0.10) 0.008 (>0.10) 0.000 (>0.10) 0.000 (>0.10) 0.000 (>0.10) 0.101 (>0.10) 0.000 (>0.10) 0.000 (>0.10) a Light shading indicates response regulators of...”
- “...EF0397, EF0990, EF1043, EF1146, EF2923, and EF3061 (all Fitness Critical) and EF1195, EF1316, EF1724, EF1752, EF1753, EF2606, and EF3086 (Fitness Important) ( Data Set S1 ). Most encode unknown functions, whereas EF0394, EF0397, EF1043, EF1752, EF1753, and EF3061 appear to be extracellular or associated with the...”
- Chlorhexidine Induces VanA-Type Vancomycin Resistance Genes in Enterococci
Bhardwaj, Antimicrobial agents and chemotherapy 2016 - “...vancomycin response regulator EF1533 EF2698 EF2697 EF2477 EF1006 EF1753 EF1752 EF1751 EF0026 EF3027 EF0932 10.5 9.8d 35.8 11.9 15.9 69.0 32.0 31.1 52.6 19.9...”
- Mutations associated with reduced surotomycin susceptibility in Clostridium difficile and Enterococcus species
Adams, Antimicrobial agents and chemotherapy 2015 - “...mutations in the yvlB homologue in E. faecalis, EF1753, are present in daptomycin-resistant E. faecalis strains (22, 35). Finally, L. plantarum ftsH mutant and...”
- Enterococcus faecalis Glycolipids Modulate Lipoprotein-Content of the Bacterial Cell Membrane and Host Immune Response
Theilacker, PloS one 2015 - “...protein family lipoprotein EF1416 1.201 Glucose-6-phosphate isomerase cytoplasmic EF2697 1.159 Conserved domain protein cell membrane EF1753 1.141 Uncharacterized protein cytoplasmic EF1045 1.111 6-phosphofructokinase cytoplasmic EF0761 1.097 Amino acid ABC transporter. amino acid binding permease protein cell membrane EF2496 1.085 Lipoprotein lipoprotein EF0685 1.082 Foldase protein PrsA...”
- The cell wall-targeting antibiotic stimulon of Enterococcus faecalis
Abranches, PloS one 2014 - “...7.46 38.96 13.64 21.61 EF1587 nudix family phosphohydrolase 2.82 2.61 22.39 11.86 6.7 15.45 19.3 EF1753 hypothetical protein 4.40 3.95 25.31 18.40 3.74 10.2 11.63 18.16 EF1814 EmbR/QcaA drug resistance transporter 6.01 8.05 7.86 14.35 14.55 EF2784 hypothetical protein 2.66 17.35 12.3 4.95 4.93 EF2892 hypothetical...”
- “...of a DNA-binding domain. Initially, we attempted to generate deletions in EF0026, EF0708, EF0797, EF1533, EF1753 and EF3245. However, we were unable to confirm mutations of EF0708 and EF1258, suggesting that these genes may perform an essential role for cell viability, or that we did not...”
- Adaptation of Enterococcus faecalis to daptomycin reveals an ordered progression to resistance
Miller, Antimicrobial agents and chemotherapy 2013 - “...observed a frameshift mutation at position Ile-390 in YvlB (EF1753) of a DAP-resistant E. faecalis variant (10). As none of the adaptive mutations in LiaFSR...”
- “...et al. (10) observed mutations in cls, yvlB (denoted EF1753), and drmA (denoted EF1797). In E. faecium, Munita et al. (50) demonstrated the importance of...”
- Genetic basis for daptomycin resistance in enterococci
Palmer, Antimicrobial agents and chemotherapy 2011 - “...653/1446 cardiolipin synthetase EF0782, rpoN 296/1314 EF1753, conserved 1168/1602 hypothetical protein EF1753, conserved 1168/1602 hypothetical protein EF1797,...”
- “...interest (EF0224 rpsE, EF0243 brnQ, EF0631, EF0782 rpoN, EF1753, EF1797, EF2698) using each of the strains (control, DAP-A, DAP-B, DAP-C) as templates....”
- Characterizing vancomycin-resistant Enterococcus strains with various mechanisms of daptomycin resistance developed in an in vitro pharmacokinetic/pharmacodynamic model
Steed, Antimicrobial agents and chemotherapy 2011 - “...Two gene clusters (EF2694 to EF2701 and EF1751 to EF1753) associated with cell membrane proteins and the phage shock protein C were found to be upregulated in...”
lmo2487 similar to B. subtilis YvlB protein from Listeria monocytogenes EGD-e
Q8Y4F7 Lmo2487 protein from Listeria monocytogenes serovar 1/2a (strain ATCC BAA-679 / EGD-e)
26% identity, 86% coverage
- Proteomic dataset of Listeria monocytogenes exposed to sublethal concentrations of free and nanoencapsulated nisin
Pinilla, Data in brief 2022 - “...rRNA at the A site Q48762 lmo0234 lmo0234 Nis / LNis Hypothetical protein; RNAse Q8Y4F7 lmo2487 lmo2487 Nis / LNis Annotation not available Q48754 lmo1388 tcsA Nis / LNis CD4+ T-cell stimulating antigen P66352 lmo2607 rpsK Nis / LNis 30S ribosomal protein S11; located on the...”
- Listeria monocytogenes genes supporting growth under standard laboratory cultivation conditions and during macrophage infection
Fischer, Genome research 2022 - “...( ackA , ltaS , oppD , oppF , sipZ , sod , smpB , lmo2487 ) ( Bonnemain et al. 2004 ; Gravesen et al. 2004 ; Archambaud et al. 2006 ; Gueriri et al. 2008 ; Webb et al. 2009 ; Mraheil et al....”
- Listeria monocytogenes σA Is Sufficient to Survive Gallbladder Bile Exposure
Boonmee, Frontiers in microbiology 2019 - “...lmo2210 Hypothetical protein 18.59 0.00 LMRG_01630 lmo2202 fabH LMRG_01630-LMRG_01631 3-oxoacyl-[acyl-carrier-protein] synthase, KASIII 2.05 0.00 LMRG_01761 lmo2487 Hypothetical protein 2.14 0.00 LMRG_01976 lmo2720 Acyl-coenzyme A synthetases/AMP-(fatty) acid ligases, YtcI homolog 4.32 0.00 LMRG_02011 lmo0911 Hypothetical protein 2.43 0.00 LMRG_02071 lmo0972 dltC dltABCD , LMRG_02074 D-alanyl carrier protein...”
- PadR-type repressors controlling production of a non-canonical FtsW/RodA homologue and other trans-membrane proteins
Hauf, Scientific reports 2019 - “...5 lmo0602 hypothetical protein, N-acetyltransferase domain 12.79.1 0.0011 lmo0954 LiaI phage shock protein 3.81.1 0.0024 lmo2487 DUF4097 containing hypothetical protein 3.10.7 0.0072 lmo0955 LiaH phage shock protein 2.60.6 0.0027 lmo1637 putative ABC transporter, permease protein 2.50.2 0.0003 lmo0047 putative lipoprotein 2.40.4 0.0029 lmo1636 putative ABC transporter,...”
- Protein level identification of the Listeria monocytogenes sigma H, sigma L, and sigma C regulons
Mujahid, BMC microbiology 2013 - “...consensus promoter, suggesting direct transcriptional regulation by H . In addition, the coding gene for Lmo2487, one of these 15 proteins, is in an operon with lmo2485, which was previously reported to be positively regulated by H , even though no upstream H consensus promoter was...”
- “...conductance mechanosensitive channel protein mscL Cellular processes Adaptations to atypical conditions tttcac atcgcagttagatgttt tatact SigmaA Lmo2487 1.65 hypothetical protein lmo2487 Hypothetical proteins Conserved N/A N/A Lmo2614 2.05 50S ribosomal protein L30 rpmD Protein synthesis Ribosomal proteins: synthesis and modification ttgatt actacccctaacccgtg tataat SigmaA Lmo2621 1.63 50S...”
- Assessing the contributions of the LiaS histidine kinase to the innate resistance of Listeria monocytogenes to nisin, cephalosporins, and disinfectants
Collins, Applied and environmental microbiology 2012 - “...noted previously that the expression of liaS, lmo2229, and lmo2487 is reduced in the nisin-resistant LO28lisK mutant (8), yet the expression of these three loci...”
- “...deleting liaS on the expression of liaS, lmo2229, and lmo2487 during logarithmic growth in both the LO28 and lisK backgrounds. Although the expression of liaS...”
- Listeria monocytogenes grown at 7° C shows reduced acid survival and an altered transcriptional response to acid shock compared to L. monocytogenes grown at 37° C
Ivy, Applied and environmental microbiology 2012 - “...lmo2293 lmo2295 lmo2296 lmo2362 lmo2363 lmo2408 lmo2409 lmo2484 lmo2487 lmo2625 (rplP) lmo2630 (rplW) lmo2632 (rplC) lmo2633 (rpsJ) 5 min 15 min 5 min 15...”
- TelA contributes to the innate resistance of Listeria monocytogenes to nisin and other cell wall-acting antibiotics
Collins, Antimicrobial agents and chemotherapy 2010 - “...site. f Primers are 5 to 3. b c (lmo2487) are all upregulated in spontaneously nisin-resistant L. monocytogenes strains (16). Here the screening of a mariner...”
- More
- Proteomic dataset of Listeria monocytogenes exposed to sublethal concentrations of free and nanoencapsulated nisin
Pinilla, Data in brief 2022 - “...16S rRNA at the A site Q48762 lmo0234 lmo0234 Nis / LNis Hypothetical protein; RNAse Q8Y4F7 lmo2487 lmo2487 Nis / LNis Annotation not available Q48754 lmo1388 tcsA Nis / LNis CD4+ T-cell stimulating antigen P66352 lmo2607 rpsK Nis / LNis 30S ribosomal protein S11; located on...”
LMRG_01761 hypothetical protein from Listeria monocytogenes 10403S
26% identity, 86% coverage
WP_002330768 daptomycin-sensing surface protein LiaX from Enterococcus faecium
27% identity, 51% coverage
M7W_985 daptomycin-sensing surface protein LiaX from Enterococcus faecium ATCC 8459 = NRRL B-2354
26% identity, 51% coverage
LGG_00914 hypothetical protein from Lactobacillus rhamnosus GG
LGG_00914 daptomycin-sensing surface protein LiaX from Lacticaseibacillus rhamnosus GG
24% identity, 45% coverage
- Proteomics and transcriptomics characterization of bile stress response in probiotic Lactobacillus rhamnosus GG
Koskenniemi, Molecular & cellular proteomics : MCP 2011 - “...matchedd LGG_00031 LGG_00238 LGG_02913 LGG_00534 LGG_00914 LGG_01820 LGG_00984 LGG_02239 LGG_01295 LGG_01367 LGG_01478 LGG_01652 LGG_01820 LGG_02239 LGG_01821...”
- “...No. of peptides matchedd LGG_00252 LGG_00740 LGG_00914 LGG_00914 LGG_00914 LGG_01820 LGG_00934 LGG_00936 LGG_01062 LGG_01181 LGG_01416 LGG_01181 LGG_01416...”
- Variability of Genetic Characters Associated with Probiotic Functions in Lacticaseibacillus Species
Rossi, Microorganisms 2022 - “...acid (LTA) synthase LGG_00830, a polysaccharide biosynthesis transport protein LGG_00851, the LiaX daptomycin-sensing surface protein LGG_00914, a PspC domain-containing protein that in Staphylococcus mutans mediates biofilm formation in vivo [ 19 ], a toxin immunity protein LGG_01002, a lipopolysaccharide assembly protein LGG_01366, a fibronectin binding protein...”
- “...in L. rhamnosus GG Anti-inflammatory protein LGG_02734 b Bile salt hydrolase LGG_00501 * Biofilm formation LGG_00914; LGG_01827 Cell wall anchored proteins with LPXTG domain LGG_00434 b,f ; LGG_00578 c,d,e ; LGG_00584 Fibrinogen binding LGG_01590 c,d,e,f ; LGG_02282 b Fibronectin binding LGG_0005 a,b ; LGG_01450 Fucose utilization...”
LSA0512 Hypothetical protein from Lactobacillus sakei subsp. sakei 23K
23% identity, 50% coverage
- Global transcriptome response in Lactobacillus sakei during growth on ribose
McLeod, BMC microbiology 2011 - “...lsa0418 Hypothetical protein -0.8 LSA0464 lsa0464 Hypothetical protein -0.6 LSA0470 lsa0470 Hypothetical protein 0.9 0.7 LSA0512 lsa0512 Hypothetical protein -0.6 LSA0515 lsa0515 Hypothetical integral membrane protein -0.5 LSA0536 lsa0536 Hypothetical protein 0.7 LSA0716 lsa0716 Hypothetical protein 0.6 LSA0752 lsa0752 Hypothetical protein 0.5 0.6 LSA0757 lsa0757 Hypothetical...”
llmg_2164 hypothetical protein from Lactococcus lactis subsp. cremoris MG1363
LACR_2166 hypothetical protein from Lactococcus lactis subsp. cremoris SK11
24% identity, 84% coverage
- Cell wall homeostasis in lactic acid bacteria: threats and defences
Martínez, FEMS microbiology reviews 2020 - “...(S. Kulakauskas, unpublished). Also, special attention should be paid to the lactococcal yth -operon (or llmg_2164 llmg_2163 ). Notably, this operon is among the highest up-regulated after treatment with Lcn972 and protects cells from its antimicrobial activity (Martnez etal . 2007 ; Roces etal . 2009...”
- Transcriptional response of Lactococcus lactis during bacterial emulsification
Tarazanova, PloS one 2019 - “...llmg_1115 XpaC-like protein 4.3 3.6e-7 S. Function unknown llmg_2163 K llmg_2163 hypothetical protein 18.4 1.4e-13 llmg_2164 llmg_2164 hypothetical protein 18.4 2.3e-11 llmg_1659 llmg_1659 hypothetical protein 11.3 4.6e-14 llmg_1572 mycA hypothetical protein 5.7 1.5e-8 llmg_0590 llmg_0590 hypothetical protein 4.9 2.1e-3 llmg_1263 llmg_1263 hypothetical protein 4.3 2.1e-6 llmg_1029...”
- Refining the Pneumococcal Competence Regulon by RNA Sequencing
Slager, Journal of bacteriology 2019 - “...homolog CesR ( 15 , 95 ) as follows: llmg_0165, llmg_0169, llmg_1115, llmg_1155, llmg_1650, and llmg_2164. Cappable-seq ( 60 ) was used to identify L. lactis TSSs (S. B. van der Meulen and O. P. Kuipers, unpublished data). Importantly, we did not use the standard 0-order...”
- Early Transcriptome Response of Lactococcus lactis to Environmental Stresses Reveals Differentially Expressed Small Regulatory RNAs and tRNAs
van, Frontiers in microbiology 2017 - “...5 min. Among the highest upregulated genes are those of an operon ( llmg_2163 - llmg_2164 ) specifying a putative stress-responsive transcriptional regulator with a PspC domain (Llmg_2163). Both genes are ~10-fold upregulated; it was also induced upon overproduction in L. lactis of the membrane protein...”
- “...after exposure to the bacteriocin Lcn972 (Martnez et al., 2007 ). A deletion mutant of llmg_2164 was shown to be very sensitive to NaCl (Roces et al., 2009 ). In E. coli , the psp operon is induced after application of various types of stresses including...”
- Stress Physiology of Lactic Acid Bacteria
Papadimitriou, Microbiology and molecular biology reviews : MMBR 2016 - “...Lc. lactis against Lcn972, while a mutant lacking llmg_2164 suffered from a low resistance to high temperature and salinity (443). The llmg_1155/spxB gene...”
- Efficient overproduction of membrane proteins in Lactococcus lactis requires the cell envelope stress sensor/regulator couple CesSR
Pinto, PloS one 2011 - “...transporter ATP-binding and permease protein 1.83 3.7*10 5 llmg_1918 Putative membrane protein 2.55 2.9*10 4 llmg_2164 Putative uncharacterized protein 12.87 3.9*10 10 llmg_2163 Putative stress-responsive transcriptional regulator 11.88 2.5*10 9 llmg_2420 Putative uncharacterized protein 1.29 9.7*10 3 llmg_2477 Lysine-specific permease 2.91 6.3*10 6 A full-genome DNA...”
- “...Figure 3 ). Similar data was obtained using a lacZ fusion to the promoter of llmg_2164 , another CesSR regulon member (data not shown). 10.1371/journal.pone.0021873.g003 Figure 3 The CesSR response is proportional to, and follows in time, the level of production of BcaP-GFP-H6. The level of...”
- Proteomic analyses to reveal the protective role of glutathione in resistance of Lactococcus lactis to osmotic stress
Zhang, Applied and environmental microbiology 2010 - “...clpE LACR_1027 LACR_0578 8.6 1.7 3.2 NC Hypothetical protein LACR_2166 RNA-binding protein 2.4 1.7 NC LACR_2311 NC 1.7 NC LACR_2166 LACR_2289 3.9 NC NC 2.7 5.6...”
lmo2486 lmo2486 from Listeria monocytogenes EGD-e
Q8Y4F8 Lmo2486 protein from Listeria monocytogenes serovar 1/2a (strain ATCC BAA-679 / EGD-e)
LMRG_01762 hypothetical protein from Listeria monocytogenes 10403S
23% identity, 53% coverage
- Proteomic Analysis of Listeria monocytogenes FBUNT During Biofilm Formation at 10°C in Response to Lactocin AL705
Melian, Frontiers in microbiology 2021 - “...1,4-dihydroxy-2-naphthoate octaprenyltransferase mo1677/menA Q8Y6K7 Precorrin-3 methylase lmo1197/cbiF Q8Y7S4 Uncharacterized Hypothetical protein lmo2209 Q8Y567 Hypothetical protein lmo2486 Q8Y4F8 Hypothetical protein lmo1466 Q8Y746 Hypothetical protein lmo2843 Q8Y3J2 Hypothetical protein lmo0391 Q8Y9X6 Hypothetical protein lmo0111 Q8YAK7 Hypothetical protein lmo2502 Q8Y4E4 Hypothetical protein lmo0584 Q8Y9E6 Hypothetical protein lmo1452 P53434 Interaction...”
- Mutant and Recombinant Phages Selected from In Vitro Coevolution Conditions Overcome Phage-Resistant Listeria monocytogenes
Peters, Applied and environmental microbiology 2020 (secret) - Exploring Listeria monocytogenes Transcriptomes in Correlation with Divergence of Lineages and Virulence as Measured in Galleria mellonella
Lee, Applied and environmental microbiology 2019 (secret) - Interference of components of the phosphoenolpyruvate phosphotransferase system with the central virulence gene regulator PrfA of Listeria monocytogenes
Mertins, Journal of bacteriology 2007 - “...protein; lmo2485, similar to B. subtilis YvlC protein; lmo2486, unknown; lmo2487, similar to B. subtilis YvlB protein; lmo1001, similar to B. subtilis protein...”
- Proteomic Analysis of Listeria monocytogenes FBUNT During Biofilm Formation at 10°C in Response to Lactocin AL705
Melian, Frontiers in microbiology 2021 - “...octaprenyltransferase mo1677/menA Q8Y6K7 Precorrin-3 methylase lmo1197/cbiF Q8Y7S4 Uncharacterized Hypothetical protein lmo2209 Q8Y567 Hypothetical protein lmo2486 Q8Y4F8 Hypothetical protein lmo1466 Q8Y746 Hypothetical protein lmo2843 Q8Y3J2 Hypothetical protein lmo0391 Q8Y9X6 Hypothetical protein lmo0111 Q8YAK7 Hypothetical protein lmo2502 Q8Y4E4 Hypothetical protein lmo0584 Q8Y9E6 Hypothetical protein lmo1452 P53434 Interaction of...”
LMOf2365_2459 PspC domain protein from Listeria monocytogenes str. 4b F2365
20% identity, 65% coverage
LMZ02_00515 DUF4097 family beta strand repeat-containing protein from Paenibacillus macerans
26% identity, 47% coverage
- Acarbose glycosylation by AcbE for the production of acarstatins with enhanced α-amylase inhibitory activity
Zhang, Synthetic and systems biotechnology 2024 - “...using affinity chromatography ( Fig. 2 a). Based on biochemical analyses of homologous AcbD and LMZ02_00515 proteins, acarbose ( 1 ) and various oligosaccharides, including maltotriose, maltotetraose, maltopentaose, maltooligosaccharides, and -cyclodextrin, were selected as candidate substrates for AcbE*. The optimized reactions used 10M AcbE, 5mM acarbose,...”
For advice on how to use these tools together, see
Interactive tools for functional annotation of bacterial genomes.
The PaperBLAST database links 793,807 different protein sequences to 1,259,118 scientific articles. Searches against EuropePMC were last performed on March 13 2025.
PaperBLAST builds a database of protein sequences that are linked
to scientific articles. These links come from automated text searches
against the articles in EuropePMC
and from manually-curated information from GeneRIF, UniProtKB/Swiss-Prot,
BRENDA,
CAZy (as made available by dbCAN),
BioLiP,
CharProtDB,
MetaCyc,
EcoCyc,
TCDB,
REBASE,
the Fitness Browser,
and a subset of the European Nucleotide Archive with the /experiment tag.
Given this database and a protein sequence query,
PaperBLAST uses protein-protein BLAST
to find similar sequences with E < 0.001.
To build the database, we query EuropePMC with locus tags, with RefSeq protein
identifiers, and with UniProt
accessions. We obtain the locus tags from RefSeq or from MicrobesOnline. We use
queries of the form "locus_tag AND genus_name" to try to ensure that
the paper is actually discussing that gene. Because EuropePMC indexes
most recent biomedical papers, even if they are not open access, some
of the links may be to papers that you cannot read or that our
computers cannot read. We query each of these identifiers that
appears in the open access part of EuropePMC, as well as every locus
tag that appears in the 500 most-referenced genomes, so that a gene
may appear in the PaperBLAST results even though none of the papers
that mention it are open access. We also incorporate text-mined links
from EuropePMC that link open access articles to UniProt or RefSeq
identifiers. (This yields some additional links because EuropePMC
uses different heuristics for their text mining than we do.)
For every article that mentions a locus tag, a RefSeq protein
identifier, or a UniProt accession, we try to select one or two
snippets of text that refer to the protein. If we cannot get access to
the full text, we try to select a snippet from the abstract, but
unfortunately, unique identifiers such as locus tags are rarely
provided in abstracts.
PaperBLAST also incorporates manually-curated protein functions:
- Proteins from NCBI's RefSeq are included if a
GeneRIF
entry links the gene to an article in
PubMed®.
GeneRIF also provides a short summary of the article's claim about the
protein, which is shown instead of a snippet.
- Proteins from Swiss-Prot (the curated part of UniProt)
are included if the curators
identified experimental evidence for the protein's function (evidence
code ECO:0000269). For these proteins, the fields of the Swiss-Prot entry that
describe the protein's function are shown (with bold headings).
- Proteins from BRENDA,
a curated database of enzymes, are included if they are linked to a paper in PubMed
and their full sequence is known.
- Every protein from the non-redundant subset of
BioLiP,
a database
of ligand-binding sites and catalytic residues in protein structures, is included. Since BioLiP itself
does not include descriptions of the proteins, those are taken from the
Protein Data Bank.
Descriptions from PDB rely on the original submitter of the
structure and cannot be updated by others, so they may be less reliable.
(For SitesBLAST and Sites on a Tree, we use a larger subset of BioLiP so that every
ligand is represented among a group of structures with similar sequences, but for
PaperBLAST, we use the non-redundant set provided by BioLiP.)
- Every protein from EcoCyc, a curated
database of the proteins in Escherichia coli K-12, is included, regardless
of whether they are characterized or not.
- Proteins from the MetaCyc metabolic pathway database
are included if they are linked to a paper in PubMed and their full sequence is known.
- Proteins from the Transport Classification Database (TCDB)
are included if they have known substrate(s), have reference(s),
and are not described as uncharacterized or putative.
(Some of the references are not visible on the PaperBLAST web site.)
- Every protein from CharProtDB,
a database of experimentally characterized protein annotations, is included.
- Proteins from the CAZy database of carbohydrate-active enzymes
are included if they are associated with an Enzyme Classification number.
Even though CAZy does not provide links from individual protein sequences to papers,
these should all be experimentally-characterized proteins.
- Proteins from the REBASE database
of restriction enzymes are included if they have known specificity.
- Every protein with an evidence-based reannotation (based on mutant phenotypes)
in the Fitness Browser is included.
- Sequence-specific transcription factors (including sigma factors and DNA-binding response regulators)
with experimentally-determined DNA binding sites from the
PRODORIC database of gene regulation in prokaryotes.
- Putative transcription factors from RegPrecise
that have manually-curated predictions for their binding sites. These predictions are based on
conserved putative regulatory sites across genomes that contain similar transcription factors,
so PaperBLAST clusters the TFs at 70% identity and retains just one member of each cluster.
- Coding sequence (CDS) features from the
European Nucleotide Archive (ENA)
are included if the /experiment tag is set (implying that there is experimental evidence for the annotation),
the nucleotide entry links to paper(s) in PubMed,
and the nucleotide entry is from the STD data class
(implying that these are targeted annotated sequences, not from shotgun sequencing).
Also, to filter out genes whose transcription or translation was detected, but whose function
was not studied, nucleotide entries or papers with more than 25 such proteins are excluded.
Descriptions from ENA rely on the original submitter of the
sequence and cannot be updated by others, so they may be less reliable.
Except for GeneRIF and ENA,
the curated entries include a short curated
description of the protein's function.
For entries from BioLiP, the protein's function may not be known beyond binding to the ligand.
Many of these entries also link to articles in PubMed.
For more information see the
PaperBLAST paper (mSystems 2017)
or the code.
You can download PaperBLAST's database here.
Changes to PaperBLAST since the paper was written:
- November 2023: incorporated PRODORIC and RegPrecise. Many PRODORIC entries were not linked to a protein sequence (no UniProt identifier), so we added this information.
- February 2023: BioLiP changed their download format. PaperBLAST now includes their non-redundant subset. SitesBLAST and Sites on a Tree use a larger non-redundant subset that ensures that every ligand is represented within each cluster. This should ensure that every binding site is represented.
- June 2022: incorporated some coding sequences from ENA with the /experiment tag.
- March 2022: incorporated BioLiP.
- April 2020: incorporated TCDB.
- April 2019: EuropePMC now returns table entries in their search results. This has expanded PaperBLAST's database, but most of the new entries are of low relevance, and the resulting snippets are often just lists of locus tags with annotations.
- February 2018: the alignment page reports the conservation of the hit's functional sites (if available from from Swiss-Prot or UniProt)
- January 2018: incorporated BRENDA.
- December 2017: incorporated MetaCyc, CharProtDB, CAZy, REBASE, and the reannotations from the Fitness Browser.
- September 2017: EuropePMC no longer returns some table entries in their search results. This has shrunk PaperBLAST's database, but has also reduced the number of low-relevance hits.
Many of these changes are described in Interactive tools for functional annotation of bacterial genomes.
PaperBLAST cannot provide snippets for many of the papers that are
published in non-open-access journals. This limitation applies even if
the paper is marked as "free" on the publisher's web site and is
available in PubmedCentral or EuropePMC. If a journal that you publish
in is marked as "secret," please consider publishing elsewhere.
Many important articles are missing from PaperBLAST, either because
the article's full text is not in EuropePMC (as for many older
articles), or because the paper does not mention a protein identifier such as a locus tag, or because of PaperBLAST's heuristics. If you notice an
article that characterizes a protein's function but is missing from
PaperBLAST, please notify the curators at UniProt
or add an entry to GeneRIF.
Entries in either of these databases will eventually be incorporated
into PaperBLAST. Note that to add an entry to UniProt, you will need
to find the UniProt identifier for the protein. If the protein is not
already in UniProt, you can ask them to create an entry. To add an
entry to GeneRIF, you will need an NCBI Gene identifier, but
unfortunately many prokaryotic proteins in RefSeq do not have
corresponding Gene identifers.
References
PaperBLAST: Text-mining papers for information about homologs.
M. N. Price and A. P. Arkin (2017). mSystems, 10.1128/mSystems.00039-17.
Europe PMC in 2017.
M. Levchenko et al (2017). Nucleic Acids Research, 10.1093/nar/gkx1005.
Gene indexing: characterization and analysis of NLM's GeneRIFs.
J. A. Mitchell et al (2003). AMIA Annu Symp Proc 2003:460-464.
UniProt: the universal protein knowledgebase.
The UniProt Consortium (2016). Nucleic Acids Research, 10.1093/nar/gkw1099.
BRENDA in 2017: new perspectives and new tools in BRENDA.
S. Placzek et al (2017). Nucleic Acids Research, 10.1093/nar/gkw952.
The EcoCyc database: reflecting new knowledge about Escherichia coli K-12.
I. M. Keeseler et al (2016). Nucleic Acids Research, 10.1093/nar/gkw1003.
The MetaCyc database of metabolic pathways and enzymes.
R. Caspi et al (2018). Nucleic Acids Research, 10.1093/nar/gkx935.
CharProtDB: a database of experimentally characterized protein annotations.
R. Madupu et al (2012). Nucleic Acids Research, 10.1093/nar/gkr1133.
The carbohydrate-active enzymes database (CAZy) in 2013.
V. Lombard et al (2014). Nucleic Acids Research, 10.1093/nar/gkt1178.
The Transporter Classification Database (TCDB): recent advances
M. H. Saier, Jr. et al (2016). Nucleic Acids Research, 10.1093/nar/gkv1103.
REBASE - a database for DNA restriction and modification: enzymes, genes and genomes.
R. J. Roberts et al (2015). Nucleic Acids Research, 10.1093/nar/gku1046.
Deep annotation of protein function across diverse bacteria from mutant phenotypes.
M. N. Price et al (2016). bioRxiv, 10.1101/072470.
by Morgan Price,
Arkin group
Lawrence Berkeley National Laboratory