PaperBLAST

PaperBLAST – Find papers about a protein or its homologs

PaperBLAST

PaperBLAST Hits for 85 a.a. (MLSFLVSLVV...)

Other sequence analysis tools:

Find functional residues: SitesBLAST

Search for conserved domains

Find the best match in UniProt

Compare to protein structures

Predict transmenbrane helices: Phobius

Predict protein localization: PSORTb

Find homologs in fast.genomics

Fitness BLAST: loading...

Show query sequence

Found 24 similar proteins in the literature:

lp_0926 integral membrane protein from Lactobacillus plantarum WCFS1
56% identity, 99% coverage

Two homologous Agr-like quorum-sensing systems cooperatively control adherence, cell morphology, and cell viability properties in Lactobacillus plantarum WCFS1
Fujii, Journal of bacteriology 2008
- “...Downregulated genes lp_0023 lp_0111 lp_0525 lp_0683 lp_0885 lp_0926 lp_0927 lp_0930 lp_1703 lp_2658 lp_3045 lp_3082 lp_3084 lp_3085 lp_3087 lp_3128 lp_3267...”
- “...(from lp_1197 to lp_1205), membrane protein-encoding genes (lp_0926, lp_3575, and lp_3577), Agr-LIKE QUORUM-SENSING SYSTEMS IN L. PLANTARUM VOL. 190, 2008 7661...”
An agr-like two-component regulatory system in Lactobacillus plantarum is involved in production of a novel cyclic peptide and regulation of adherence
Sturme, Journal of bacteriology 2005
- “...as genes encoding integral membrane proteins (e.g., lp_0926, lp_3575, and lp_3577). Cluster 3 encompassed constitutively up-regulated genes, with the highest...”
- “...ORFa Cluster 2 lp_0525 lp_0526 lp_0683 lp_0684 lp_0925 lp_0926 lp_0927 lp_0928 lp_0929 lp_0930 lp_0931 sacK1 pts1BCA sacA sacR agl2 treA pts4ABC pts7C galT...”

SAR0392 putative membrane protein from Staphylococcus aureus subsp. aureus MRSA252
46% identity, 99% coverage

The Staphylococcus aureus response to unsaturated long chain free fatty acids: survival mechanisms and virulence implications
Kenny, PloS one 2009
- “...hypothetical protein 2.05 4.99E-02 SAR0305 putative membrane protein 3.89 6.02E-03 SAR0390 putative lipoprotein 3.97 1.68E-03 SAR0392 putative membrane protein 2.54 1.20E-02 SAR0405 hypothetical protein 2.76 1.07E-02 SAR0444 putative lipoprotein 2.31 2.16E-03 SAR0498 yabJ putative regulatory protein 3.65 1.07E-03 SAR0499 spoVG stage V sporulation protein G 2.83...”

SAOUHSC_00358 hypothetical protein from Staphylococcus aureus subsp. aureus NCTC 8325
SA0360 hypothetical protein from Staphylococcus aureus subsp. aureus N315
SAV0374 hypothetical protein from Staphylococcus aureus subsp. aureus Mu50
SAUSA300_0374 hypothetical protein from Staphylococcus aureus subsp. aureus USA300_FPR3757
NWMN_0366 hypothetical protein from Staphylococcus aureus subsp. aureus str. Newman
USA300HOU_0397 hypothetical membrane protein from Staphylococcus aureus subsp. aureus USA300_TCH1516
46% identity, 98% coverage

Lysogenization of Staphylococcus aureus RN450 by phages ϕ11 and ϕ80α leads to the activation of the SigB regulon
Fernández, Scientific reports 2018
- “...SAOUHSC_00257 0.48 0.28 SAOUHSC_00291 2.32 2.18 Up SAOUHSC_00317 2.81 3.78 Up SAOUHSC_00356 3.96 11.49 Up SAOUHSC_00358 4.67 12.22 Up SAOUHSC_00401 0.33 0.16 SAOUHSC_00619 5.66 21.10 Up SAOUHSC_00624 2.72 7.23 Up SAOUHSC_00625 mnhA2 2.10 4.29 Up SAOUHSC_00626 mnhB2 2.14 3.70 Up SAOUHSC_00627 mnhC2 2.13 3.97 Up SAOUHSC_00628...”
Comparative genomic analysis of European and Middle Eastern community-associated methicillin-resistant Staphylococcus aureus (CC80:ST80-IV) isolates by high-density microarray
Goering, Clinical microbiology and infection : the official publication of the European Society of Clinical Microbiology and Infectious Diseases 2009
- “...SA0331 Hypothetical protein + + + USA300 USA400 SA0359 Hypothetical protein + + USA300 USA400 SA0360 Conserved hypothetical protein + + USA300 USA400 SA0397 Conserved hypothetical protein + + + + + USA300 USA400 SA0406 Hypothetical protein + + + + + USA300+ USA400 sdrD SA0520...”
Exploring the transcriptome of Staphylococcus aureus in its natural niche
Chaves-Moreno, Scientific reports 2016
- “...vivo compared to in vivo (up to 11,970rpm). Evidently, also genes encoding various membrane proteins (SAV0374, SAV0574, SAV1030 and SAV1359) were extremely differently expressed under in vivo versus in vitro 24 conditions ( Supplementary Dataset S2 and Fig. 3D ). A further protein where the encoding...”
The msaABCR Operon Regulates the Response to Oxidative Stress in Staphylococcus aureus
Pandey, Journal of bacteriology 2019 (secret)
Transcriptional Response of Staphylococcus aureus to Sunlight in Oxic and Anoxic Conditions
McClary, Frontiers in microbiology 2018
- “...complex 0.25 2.5 NWMN_0163 Conserved hypothetical protein 0.24 9.9 NWMN_1371 Conserved hypothetical protein 0.24 7.2 NWMN_0366 Conserved hypothetical protein 0.24 6.4 NWMN_2392 Conserved hypothetical protein 0.24 12.6 NWMN_2282 Conserved hypothetical protein 0.23 5.0 NWMN_1477 Conserved hypothetical protein 0.23 10.1 clfA Clumping factor A 0.22 1.8 hutG...”
Pre-epidemic evolution of the MRSA USA300 clade and a molecular key for classification
Bianco, Frontiers in cellular and infection microbiology 2023
- “...AdhR, PchA, HisG SNPs recN (1638449) , leuS (1888283) USA300HOU_0191 (202764) , intergenic (265666) , USA300HOU_0397 (424978) , argS (670365) , intergenic (835434) , USA300HOU_0795 (850349) vwb (876702) , USA300HOU_0938 (982790) , oppD1 (990291) , ebh (1488257) , rluB (1611873) , comGA (1657533) , alaS (1723795)...”

DV527_RS10290 GlsB/YeaQ/YmgE family stress response membrane protein from Staphylococcus saprophyticus
47% identity, 98% coverage

Transcriptome Analysis of Halotolerant Staphylococcus saprophyticus Isolated from Korean Fermented Shrimp
Jo, Foods (Basel, Switzerland) 2022
- “...stress protein 1.24 3.09 1.86 DV527_RS03625 Asp23/Gls24 family envelope stress response protein 1.55 2.87 1.33 DV527_RS10290 GlsB/YeaQ/YmgE family stress response membrane protein 2.21 2.69 0.49 DV527_RS11920 General stress protein 0.52 1.54 1.02 DV527_RS11570 50S ribosomal protein L25/general stress protein Ctc 0.31 0.96 1.27 DV527_RS06680 Asp23/Gls24 family...”

DMB76_011110 GlsB/YeaQ/YmgE family stress response membrane protein from Staphylococcus saccharolyticus
48% identity, 74% coverage

Biofilm formation and inflammatory potential of Staphylococcus saccharolyticus: A possible cause of orthopedic implant-associated infections
Afshar, Frontiers in microbiology 2022
- “...DMB76_000275 hypothetical protein 8801.7 2.8 DMB76_011400 type I toxin-antitoxin system Fst family toxin 29.8 2.7 DMB76_011110 GlsB/YeaQ/YmgE family stress response membrane protein 8802.0 2.6 DMB76_000660 ABC transporter ATP-binding protein 33.8 2.6 DMB76_008440 gallidermin family lantibiotic 105.0 2.4 DMB76_010640 septum formation initiator family protein 742.6 2.3 DMB76_001580...”

LSA0166 Hypothetical Integral membrane protein from Lactobacillus sakei subsp. sakei 23K
47% identity, 75% coverage

Global transcriptome response in Lactobacillus sakei during growth on ribose
McLeod, BMC microbiology 2011
- “...precursor -0.5 LSA0106 lsa0106 Hypothetical cell surface protein precursor 0.5 LSA0160 lsa0160 Hypothetical protein -0.7 LSA0166 lsa0166 Hypothetical Integral membrane protein -1.2 LSA0190 lsa0190 Hypothetical integral membrane protein -0.7 -0.6 LSA0191 lsa0191 Hypothetical integral membrane protein -0.6 -0.6 LSA0199 lsa0199 Hypothetical protein 1.1 1.0 1.1 LSA0208...”

BC1000 hypothetical Membrane Spanning Protein from Bacillus cereus ATCC 14579
39% identity, 96% coverage

Identification of a conserved 5'-dRP lyase activity in bacterial DNA repair ligase D and its potential role in base excision repair
de, Nucleic acids research 2016
- “...those with the chromosomal-encoded neo gene between wild type (wt) ykoU and ykoT genes (strain BC1000) or between ykoUE184A and ykoT genes (strain BC1001) (Supplementary Table S1). GP1502 DNA was used to transform BC1000 strain to render the BC1002 strain. Plasmid-borne ykoUE184A neo ykoT operon was...”
Correction: SecDF as Part of the Sec-Translocase Facilitates Efficient Secretion of Bacillus cereus Toxins and Cell Wall-Associated Proteins
, PloS one 2014
- “...Catalase 13.31 4.2E-06 BC0998 General stress protein 17M 11.41 2.1E-08 BC0999 hypothetical protein 12.27 2.8E-07 BC1000 hypothetical Membrane Spanning Protein 12.54 6.7E-06 BC1002 Anti-sigma B factor antagonist 5.36 2.5E-06 BC1003 Anti-sigma B factor 8.97 1.4E-06 BC1004 RNA polymerase sigma-B factor 7.84 1.8E-06 BC1010 hypothetical protein 10.61...”
SecDF as part of the Sec-translocase facilitates efficient secretion of Bacillus cereus toxins and cell wall-associated proteins
Vörös, PloS one 2014
- “...Catalase 13.31 4.2E-06 BC0998 General stress protein 17M 11.41 2.1E-08 BC0999 hypothetical protein 12.27 2.8E-07 BC1000 hypothetical Membrane Spanning Protein 12.54 6.7E-06 BC1002 Anti-sigma B factor antagonist 5.36 2.5E-06 BC1003 Anti-sigma B factor 8.97 1.4E-06 BC1004 RNA polymerase sigma-B factor 7.84 1.8E-06 BC1010 hypothetical protein 10.61...”
SpoIVA and SipL are Clostridium difficile spore morphogenetic proteins
Putnam, Journal of bacteriology 2013
- “...at 4C. Samples were dialyzed against 2 M guanidine HCl in BC1000 (20 mM Tris [pH 7.4], 0.2 mM EDTA, 20% [vol/vol] glycerol, 1 M KCl) plus 0.1% (vol/vol) NP-40...”
- “...0.1% (vol/vol) NP-40 for 1.5 h at 4C, and again against BC1000 plus 0.1% (vol/vol) NP-40 for 1.5 h at 4C and centrifuged at 15,000 rpm for 30 min at 4C....”
Bacillus cereus cell response upon exposure to acid environment: toward the identification of potential biomarkers
Desriac, Frontiers in microbiology 2013
- “...BC0995 Hypothetical protein BC0996 Hypothetical protein BC0998 yflT General stress protein BC0999 csbD Hypothetical protein BC1000 Hypothetical protein BC1001 Hypothetical protein BC1002 rsbV Anti- B factor antagonist BC1003 rsbW Anti- B factor BC1004 sigB RNA polymerase sigma factor B BC1005 orf4 Putative bacterioferritin BC1006 rsbY PP2C-type...”
- “...rsbV, rsbW and sigB, orf4 , and rsbY , as well as the two genes BC1000 and BC1009 are up-regulated (1.5 fold in both conditions). In the same way, BC0862 and BC0998 genes are over-expressed: the first one encoding the YflT protein is known to be...”
Identification of the sigmaB regulon of Bacillus cereus and conservation of sigmaB-regulated genes in low-GC-content gram-positive bacteria
van, Journal of bacteriology 2007
- “...Berkeley bc0862 bc0863 bc0995 bc0996 bc0998 bc0999d bc1000 bc1001 bc1002 bc1003 bc1004 Experimentally defined and/or predicted promoter sequencec 4388 VAN...”
- “...has a role in hyperosmotic and cold stress (10); and bc1000, which is homologous to the GlsB protein of Enterococcus faecalis, where it has a role in resistance...”

EF0081 membrane protein, putative from Enterococcus faecalis V583
49% identity, 91% coverage

Functional studies of E. faecalis RNase J2 and its role in virulence and fitness
Gao, PloS one 2017
- “...hypothetical membrane protein -4.10 0.002 ef0079 gls24 protein -4.71 0.004 ef0080 glsB protein -5.68 0.012 ef0081 conserved hypothetical protein -5.06 0.013 ef0082 transporter, putative -6.34 0.020 ef0083 hypothetical protein -11.95 0.042 ef0103 transcriptional regulator, putative -3.73 0.030 ef0104 arginine deiminase -3.66 0.020 ef0105 ornithine carbamoyltransferase, catabolic...”
Intra- and interspecies genomic transfer of the Enterococcus faecalis pathogenicity island
Laverde, PloS one 2011
- “...EF0078 + + + + EF0079 + + + + EF0080 + + + + EF0081 + + + + EF0082 + + + + 6a + - - - EF0083 + + + + EF0084 + - - - 6b + - - - EF0087...”

LBA0872 hypothetical protein from Lactobacillus acidophilus NCFM
44% identity, 98% coverage

Deletion of Lipoteichoic Acid Synthase Impacts Expression of Genes Encoding Cell Surface Proteins in Lactobacillus acidophilus
Selle, Frontiers in microbiology 2017
- “...LBA1220 Pyridine mercuric reductase -2.2 to -4.9 LBA1801 Hypothetical protein -1.1 to -2.2 Upregulated genes LBA0872 Hypothetical protein 1.1 to 1.7 LBA0873 Hypothetical protein 1.3 to 1.8 LBA1045 Glutamine ABC transporter ATP-binding protein 1.1 to 2.0 LBA1140 Lysin 1.2 to 1.5 LBA1184 Hypothetical protein 1.1 to...”
Transcriptional and functional analysis of oxalyl-coenzyme A (CoA) decarboxylase and formyl-CoA transferase genes from Lactobacillus acidophilus
Azcarate-Peril, Applied and environmental microbiology 2006
- “...genome contains genes encoding two bile salt hydrolases, LBA0872 (bsh1) and LBA1078 (bsh2). Therefore, we designed RT-QPCR primers for bsh1 and bsh2 and...”
Microarray analysis of a two-component regulatory system involved in acid resistance and proteolytic activity in Lactobacillus acidophilus
Azcarate-Peril, Applied and environmental microbiology 2005
- “...only [S], [R] LBA0555 myosin-crossreactive antigen LBA0872 putative membrane protein LBA1119 putative inner membrane protein LBA1869 beta-phosphoglucomutase...”

SPy1768 conserved hypothetical protein from Streptococcus pyogenes M1 GAS
47% identity, 85% coverage

Global Analysis and Comparison of the Transcriptomes and Proteomes of Group A Streptococcus Biofilms
Freiberg, mSystems 2016
- “...1.31 Thioredoxin O Spy1734 6.49 6.97 5.64 3.59 4.07 2.74 3.54 4.02 2.69 Streptopain inhibitor Spy1768 ahpC 2.26 2.41 2.62 2.04 2.19 2.4 0.94 1.1 1.31 Peroxiredoxin reductase [NAD(P)H] V Downregulated Spy0249 oppA 2.23 2.55 2.79 1.96 2.28 2.52 0.77 1.09 1.33 Oligopeptide-binding protein E Spy1076...”

EF2708 membran protein, putative from Enterococcus faecalis V583
42% identity, 92% coverage

Isolation of VanB-type Enterococcus faecalis strains from nosocomial infections: first report of the isolation and identification of the pheromone-responsive plasmids pMG2200, Encoding VanB-type vancomycin resistance and a Bac41-type bacteriocin, and pMG2201, encoding erythromycin resistance and cytolysin (Hly/Bac)
Zheng, Antimicrobial agents and chemotherapy 2009
- “...45 69 78 CCW 71425 71183 243/80 EF2708 72 87 Enterococcus faecalis Enterococcus faecalis Enterococcus faecalis V583 Exiguobacterium sibiricum Enterococcus...”

SP_0279 hypothetical protein from Streptococcus pneumoniae TIGR4
48% identity, 74% coverage

Genome-wide identification of Streptococcus pneumoniae genes essential for bacterial replication during experimental meningitis
Molzen, Infection and immunity 2011
- “...Hypothetical SP_0029 SP_0067 SP_0098 SP_0099 SP_0198 SP_0276 SP_0279 SP_0552 SP_0649 SP_0748 SP_0822 SP_1025 SP_1059 SP_1465 SP_1635 SP_1931 SP_1995 SP_2098...”

LLKF_0277 hypothetical protein from Lactococcus lactis subsp. lactis KF147
50% identity, 68% coverage

Lactococcus lactis metabolism and gene expression during growth on plant tissues
Golomb, Journal of bacteriology 2015
- “...upregulated were both copies of the ymgGHIJ genes (llkf_0277 to -0280 and llkf_2281 to -2284), which were previously correlated with oxidative stress tolerance...”

AWJ25_RS06350 GlsB/YeaQ/YmgE family stress response membrane protein from Enterococcus faecium
49% identity, 62% coverage

Gene Duplications in the Genomes of Staphylococci and Enterococci
Sanchez-Herrero, Frontiers in molecular biosciences 2020
- “...are also duplicated in other E. faecium strains. Group Locus Tag 1 Description Percentage 103 AWJ25_RS06350 GlsB/YeaQ/YmgE family stress response membrane protein 99.25% 104 AWJ25_RS07455 LysM peptidoglycan binding domain containing protein 99.25% 109 AWJ25_RS09645 PTS lactose/cellobiose transporter subunit IIA 95.49% 110 AWJ25_RS09650 PTS sugar transporter subunit...”

BAS2692 conserved hypothetical protein from Bacillus anthracis str. Sterne
AW20_5555 GlsB/YeaQ/YmgE family stress response membrane protein from Bacillus anthracis str. Sterne
48% identity, 91% coverage

Beyond the spore, the exosporium sugar anthrose impacts vegetative Bacillus anthracis gene regulation in cis and trans
Norris, Scientific reports 2023
- “...membrane protein YwiC 1.01 AW20_4184 Hypothetical protein 1.03 AW20_983 BAS1679 Uncharacterized protein YndH 1.06 AW20_5555 BAS2692 Integral membrane protein; YeaQ/YmgE family 1.06 AW20_728 BAS1929 Phosphoglycerate mutase family 3 1.06 AW20_1125 BAS1547 Chemotaxis protein methyltransferase CheR 1.11 AW20_1033 BAS1630 Transcriptional regulator, GntR family 1.12 AW20_4348 BAS3861 UPF0358...”
Beyond the spore, the exosporium sugar anthrose impacts vegetative Bacillus anthracis gene regulation in cis and trans
Norris, Scientific reports 2023
- “...Uncharacterized membrane protein YwiC 1.01 AW20_4184 Hypothetical protein 1.03 AW20_983 BAS1679 Uncharacterized protein YndH 1.06 AW20_5555 BAS2692 Integral membrane protein; YeaQ/YmgE family 1.06 AW20_728 BAS1929 Phosphoglycerate mutase family 3 1.06 AW20_1125 BAS1547 Chemotaxis protein methyltransferase CheR 1.11 AW20_1033 BAS1630 Transcriptional regulator, GntR family 1.12 AW20_4348 BAS3861...”

SPy1265 conserved hypothetical protein from Streptococcus pyogenes M1 GAS
M6_Spy0965 Integral membrane protein from Streptococcus pyogenes MGAS10394
36% identity, 91% coverage

Global Analysis and Comparison of the Transcriptomes and Proteomes of Group A Streptococcus Biofilms
Freiberg, mSystems 2016
- “...DNA transport R Spy1168 2.75 2.3 2.87 1.86 1.41 1.98 1.56 1.11 1.68 Phage protein Spy1265 1.06 0.95 1.65 1.63 1.51 2.21 1.23 1.11 1.81 Ribose operon repressor K Spy1282 msrA 2.97 3.2 2.78 1.85 2.07 1.66 1.45 1.68 1.26 Peptide methionine sulfoxide reductase MsrA/MsrB O...”
Comparative growth, cross stress resistance, transcriptomics of Streptococcus pyogenes cultured under low shear modeled microgravity and normal gravity
Kalpana, Saudi journal of biological sciences 2016
- “...M6_Spy0900 1.52 Hypothetical protein M6_Spy0903 1.52 Phage transcriptional repressor M6_Spy0935 1.58 30S ribosomal protein S20 M6_Spy0965 2.41 Integral membrane protein M6_Spy0984 1.56 Sla M6_Spy0994 1.57 Phage endopeptidase M6_Spy1007 1.53 ATP-dependent Clp protease proteolytic subunit M6_Spy1106 1.56 Transcriptional regulator, Cro/CI family M6_Spy1117 1.50 Hypothetical protein M6_Spy1194 1.52...”

L67002 HYPOTHETICAL PROTEIN from Lactococcus lactis subsp. lactis Il1403
49% identity, 67% coverage

Genotype-phenotype matching analysis of 38 Lactococcus lactis strains using random forest methods
Bayjanov, BMC microbiology 2013
- “...therefore the ability to metabolize arginine. A cluster of 4 genes (L65637, L66209, L66407 and L67002 in strain IL1403, and their orthologs) was identified to be relevant to arginine metabolism (Figure 4 A). All 4 proteins are annotated as hypothetical proteins in strain IL1403 and two...”
- “...two encoded proteins, llmg_1257 and llmg_1259, are in the same COGs with proteins L66209 and L67002 of strain IL1403. The protein L67002 belongs to a family of membrane proteins of which some are glycosyltransferase-associated proteins. Probably, at least two of these proteins, L66209 and L67002, and...”

LSEI_2880 Predicted membrane protein from Lactobacillus casei ATCC 334
54% identity, 66% coverage

New Genes Involved in Mild Stress Response Identified by Transposon Mutagenesis in Lactobacillus paracasei
Palud, Frontiers in microbiology 2018
- “...E LSEI_2739 Zn-dependent hydrolase H LSEI_2787 NADPH:quinone reductase related Zn-dependent oxidoreductase H LSEI_2806 HP O LSEI_2880 Membrane protein H (P) LSEI_2884 Esterase/lipase H LSEI_A15 HP H LSEI_r1832 23S ribosomal RNA E (P) LSEI_t0720 tRNA H Total 77 41 18 11 15 Genes and putative promoters in...”

llmg_1257 hypothetical protein from Lactococcus lactis subsp. cremoris MG1363
LLKF_2284 hypothetical protein from Lactococcus lactis subsp. lactis KF147
49% identity, 67% coverage

Genotype-phenotype matching analysis of 38 Lactococcus lactis strains using random forest methods
Bayjanov, BMC microbiology 2013
- “...also identified to be related to arginine metabolism (Figure 4 B), and two encoded proteins, llmg_1257 and llmg_1259, are in the same COGs with proteins L66209 and L67002 of strain IL1403. The protein L67002 belongs to a family of membrane proteins of which some are glycosyltransferase-associated...”
- “...proteins. Probably, at least two of these proteins, L66209 and L67002, and their MG1363 orthologs, llmg_1257 and llmg_1259, should be re-annotated as transport proteins or maybe more specifically arginine transport proteins. However, experimental validation is necessary. Figure 4 Genes related to arginine metabolism. A ) Two...”
Molecular description and industrial potential of Tn6098 conjugative transfer conferring alpha-galactoside metabolism in Lactococcus lactis
Machielsen, Applied and environmental microbiology 2011
- “...LLKF_2278 LLKF_2279 LLKF_2280 LLKF_2281 LLKF_2282 LLKF_2283 LLKF_2284 LLKF_2285 LLKF_2286 LLKF_2287 LLKF_t0054 2326054 2327241 2329454 2330622 2332669 2332809...”

UC7_RS15700 GlsB/YeaQ/YmgE family stress response membrane protein from Enterococcus caccae ATCC BAA-1240
47% identity, 82% coverage

Apigenin Impacts the Growth of the Gut Microbiota and Alters the Gene Expression of Enterococcus
Wang, Molecules (Basel, Switzerland) 2017
- “...protection protein 2.0 Ribosomal protection UC7_RS14785 hypothetical protein 1.9 Unknown UC7_RS11535 hypothetical protein 1.9 Unknown UC7_RS15700 general stress protein GlsB 1.9 Stress response UC7_RS14775 hypothetical protein 1.8 Unknown UC7_RS15645 hypothetical protein 1.8 Unknown UC7_RS16575 hypothetical protein 1.8 Unknown UC7_RS16225 WxL domain surface protein 1.7 Surface protein...”

SP_1801 hypothetical protein from Streptococcus pneumoniae TIGR4
53% identity, 51% coverage

The Streptococcus pneumoniae transcriptome in patient cerebrospinal fluid identifies novel virulence factors required for meningitis
Wall, 2024

LLKF_2085 hypothetical protein from Lactococcus lactis subsp. lactis KF147
46% identity, 67% coverage

Strain-Dependent Transcriptome Signatures for Robustness in Lactococcus lactis
Dijkstra, PloS one 2016
- “...rarA ArsR family transcriptional regulator positive 0.7 LLKF_0447 yeaA beta-lactamase superfamily Zn-dependent hydrolase positive 6.0 LLKF_2085 ytgB hypothetical protein positive 17.7 LLKF_1563 bglH beta-glucosidase/ 6-phospho-beta-glucosidase positive 0.4 LLKF_1820 yrbB transglycosylase positive 26.3 LLKF_2083 hypothetical protein positive 15.2 LLKF_2084 ytgA hypothetical protein positive 14.1 LLKF_1723 excisionase positive...”

M5005_Spy_0976 integral membrane protein from Streptococcus pyogenes MGAS5005
44% identity, 56% coverage

A genome-wide analysis of small regulatory RNAs in the human pathogen group A Streptococcus
Perez, PloS one 2009
- “...to cdd and the 16S rRNA methyltransferase M SR961800 961800 962000 200 < Adjacent to M5005_Spy_0976 and pcrA M SR969000 969000 969100 100 ? Adjacent to cfa and a histidine-binding protein M SR1016300 1016300 1016500 200 > Prophage-encoded M SR1018400 1018400 1018500 100 > Prophage-encoded M...”

lp_3577 integral membrane protein from Lactobacillus plantarum WCFS1
50% identity, 48% coverage

Expression of heterologous sigma factors enables functional screening of metagenomic and heterologous genomic libraries
Gaida, Nature communications 2015
- “...found genes encoding transporters ( araP , lp_3563 and lp_3565), two membrane proteins (lp_3575 and lp_3577), proteins associated with energy metabolism ( lox and pox4 ), as well as two proteins (catalase ( kat )) and a heat-shock protein ( clpL )) involved in stress response....”
Two homologous Agr-like quorum-sensing systems cooperatively control adherence, cell morphology, and cell viability properties in Lactobacillus plantarum WCFS1
Fujii, Journal of bacteriology 2008
- “...lp_3082 lp_3084 lp_3085 lp_3087 lp_3128 lp_3267 lp_3420 lp_3575 lp_3577 lp_3578 lp_3579 lp_3580 lp_3582 lp_3583 lp_3586 a Gene cps2A cps2B galE2 cps2E cps2F...”
- “...to lp_1205), membrane protein-encoding genes (lp_0926, lp_3575, and lp_3577), Agr-LIKE QUORUM-SENSING SYSTEMS IN L. PLANTARUM VOL. 190, 2008 7661 TABLE 4. Genes...”
Identification of prebiotic fructooligosaccharide metabolism in Lactobacillus plantarum WCFS1 through microarrays
Saulnier, Applied and environmental microbiology 2007
- “...lp_2113 lp_3243 lp_3250 lp_3318 lp_3433 lp_3489 lp_3577 Unknown Oxidoreductase Unknown Unknown Unknown Oxidoreductase Unknown Unknown Unknown Unknown Unknown...”
An agr-like two-component regulatory system in Lactobacillus plantarum is involved in production of a novel cyclic peptide and regulation of adherence
Sturme, Journal of bacteriology 2005
- “...integral membrane proteins (e.g., lp_0926, lp_3575, and lp_3577). Cluster 3 encompassed constitutively up-regulated genes, with the highest effect in early...”
- “...asp2 hpaG lp_2658 lp_2743 lp_2744 lp_3045 lp_3047 lp_3575 lp_3577 lp_3578 lp_3580 kat lamA lp_3581 lp_3581a lamC lamD lp_3582 lp_3583 lp_3586 lamB clpL lox...”

New Search

For advice on how to use these tools together, see Interactive tools for functional annotation of bacterial genomes.

Statistics

The PaperBLAST database links 798,070 different protein sequences to 1,261,478 scientific articles. Searches against EuropePMC were last performed on May 12 2025.

How It Works

PaperBLAST builds a database of protein sequences that are linked to scientific articles. These links come from automated text searches against the articles in EuropePMC and from manually-curated information from GeneRIF, UniProtKB/Swiss-Prot, BRENDA, CAZy (as made available by dbCAN), BioLiP, CharProtDB, MetaCyc, EcoCyc, TCDB, REBASE, the Fitness Browser, and a subset of the European Nucleotide Archive with the /experiment tag. Given this database and a protein sequence query, PaperBLAST uses protein-protein BLAST to find similar sequences with E < 0.001.

To build the database, we query EuropePMC with locus tags, with RefSeq protein identifiers, and with UniProt accessions. We obtain the locus tags from RefSeq or from MicrobesOnline. We use queries of the form "locus_tag AND genus_name" to try to ensure that the paper is actually discussing that gene. Because EuropePMC indexes most recent biomedical papers, even if they are not open access, some of the links may be to papers that you cannot read or that our computers cannot read. We query each of these identifiers that appears in the open access part of EuropePMC, as well as every locus tag that appears in the 500 most-referenced genomes, so that a gene may appear in the PaperBLAST results even though none of the papers that mention it are open access. We also incorporate text-mined links from EuropePMC that link open access articles to UniProt or RefSeq identifiers. (This yields some additional links because EuropePMC uses different heuristics for their text mining than we do.)

For every article that mentions a locus tag, a RefSeq protein identifier, or a UniProt accession, we try to select one or two snippets of text that refer to the protein. If we cannot get access to the full text, we try to select a snippet from the abstract, but unfortunately, unique identifiers such as locus tags are rarely provided in abstracts.

PaperBLAST also incorporates manually-curated protein functions:

Proteins from NCBI's RefSeq are included if a GeneRIF entry links the gene to an article in PubMed^®. GeneRIF also provides a short summary of the article's claim about the protein, which is shown instead of a snippet.
Proteins from Swiss-Prot (the curated part of UniProt) are included if the curators identified experimental evidence for the protein's function (evidence code ECO:0000269). For these proteins, the fields of the Swiss-Prot entry that describe the protein's function are shown (with bold headings).
Proteins from BRENDA, a curated database of enzymes, are included if they are linked to a paper in PubMed and their full sequence is known.
Every protein from the non-redundant subset of BioLiP, a database of ligand-binding sites and catalytic residues in protein structures, is included. Since BioLiP itself does not include descriptions of the proteins, those are taken from the Protein Data Bank. Descriptions from PDB rely on the original submitter of the structure and cannot be updated by others, so they may be less reliable. (For SitesBLAST and Sites on a Tree, we use a larger subset of BioLiP so that every ligand is represented among a group of structures with similar sequences, but for PaperBLAST, we use the non-redundant set provided by BioLiP.)
Every protein from EcoCyc, a curated database of the proteins in Escherichia coli K-12, is included, regardless of whether they are characterized or not.
Proteins from the MetaCyc metabolic pathway database are included if they are linked to a paper in PubMed and their full sequence is known.
Proteins from the Transport Classification Database (TCDB) are included if they have known substrate(s), have reference(s), and are not described as uncharacterized or putative. (Some of the references are not visible on the PaperBLAST web site.)
Every protein from CharProtDB, a database of experimentally characterized protein annotations, is included.
Proteins from the CAZy database of carbohydrate-active enzymes are included if they are associated with an Enzyme Classification number. Even though CAZy does not provide links from individual protein sequences to papers, these should all be experimentally-characterized proteins.
Proteins from the REBASE database of restriction enzymes are included if they have known specificity.
Every protein with an evidence-based reannotation (based on mutant phenotypes) in the Fitness Browser is included.
Sequence-specific transcription factors (including sigma factors and DNA-binding response regulators) with experimentally-determined DNA binding sites from the PRODORIC database of gene regulation in prokaryotes.
Putative transcription factors from RegPrecise that have manually-curated predictions for their binding sites. These predictions are based on conserved putative regulatory sites across genomes that contain similar transcription factors, so PaperBLAST clusters the TFs at 70% identity and retains just one member of each cluster.
Coding sequence (CDS) features from the European Nucleotide Archive (ENA) are included if the /experiment tag is set (implying that there is experimental evidence for the annotation), the nucleotide entry links to paper(s) in PubMed, and the nucleotide entry is from the STD data class (implying that these are targeted annotated sequences, not from shotgun sequencing). Also, to filter out genes whose transcription or translation was detected, but whose function was not studied, nucleotide entries or papers with more than 25 such proteins are excluded. Descriptions from ENA rely on the original submitter of the sequence and cannot be updated by others, so they may be less reliable.

Except for GeneRIF and ENA, the curated entries include a short curated description of the protein's function. For entries from BioLiP, the protein's function may not be known beyond binding to the ligand. Many of these entries also link to articles in PubMed.

For more information see the PaperBLAST paper (mSystems 2017) or the code. You can download PaperBLAST's database here.

Changes to PaperBLAST since the paper was written:

November 2023: incorporated PRODORIC and RegPrecise. Many PRODORIC entries were not linked to a protein sequence (no UniProt identifier), so we added this information.
February 2023: BioLiP changed their download format. PaperBLAST now includes their non-redundant subset. SitesBLAST and Sites on a Tree use a larger non-redundant subset that ensures that every ligand is represented within each cluster. This should ensure that every binding site is represented.
June 2022: incorporated some coding sequences from ENA with the /experiment tag.
March 2022: incorporated BioLiP.
April 2020: incorporated TCDB.
April 2019: EuropePMC now returns table entries in their search results. This has expanded PaperBLAST's database, but most of the new entries are of low relevance, and the resulting snippets are often just lists of locus tags with annotations.
February 2018: the alignment page reports the conservation of the hit's functional sites (if available from from Swiss-Prot or UniProt)
January 2018: incorporated BRENDA.
December 2017: incorporated MetaCyc, CharProtDB, CAZy, REBASE, and the reannotations from the Fitness Browser.
September 2017: EuropePMC no longer returns some table entries in their search results. This has shrunk PaperBLAST's database, but has also reduced the number of low-relevance hits.

Many of these changes are described in Interactive tools for functional annotation of bacterial genomes.

Secrets

PaperBLAST cannot provide snippets for many of the papers that are published in non-open-access journals. This limitation applies even if the paper is marked as "free" on the publisher's web site and is available in PubmedCentral or EuropePMC. If a journal that you publish in is marked as "secret," please consider publishing elsewhere.

Omissions from the PaperBLAST Database

Many important articles are missing from PaperBLAST, either because the article's full text is not in EuropePMC (as for many older articles), or because the paper does not mention a protein identifier such as a locus tag, or because of PaperBLAST's heuristics. If you notice an article that characterizes a protein's function but is missing from PaperBLAST, please notify the curators at UniProt or add an entry to GeneRIF. Entries in either of these databases will eventually be incorporated into PaperBLAST. Note that to add an entry to UniProt, you will need to find the UniProt identifier for the protein. If the protein is not already in UniProt, you can ask them to create an entry. To add an entry to GeneRIF, you will need an NCBI Gene identifier, but unfortunately many prokaryotic proteins in RefSeq do not have corresponding Gene identifers.

References

PaperBLAST: Text-mining papers for information about homologs.
M. N. Price and A. P. Arkin (2017). mSystems, 10.1128/mSystems.00039-17.

Europe PMC in 2017.
M. Levchenko et al (2017). Nucleic Acids Research, 10.1093/nar/gkx1005.

Gene indexing: characterization and analysis of NLM's GeneRIFs.
J. A. Mitchell et al (2003). AMIA Annu Symp Proc 2003:460-464.

UniProt: the universal protein knowledgebase.
The UniProt Consortium (2016). Nucleic Acids Research, 10.1093/nar/gkw1099.

BRENDA in 2017: new perspectives and new tools in BRENDA.
S. Placzek et al (2017). Nucleic Acids Research, 10.1093/nar/gkw952.

The EcoCyc database: reflecting new knowledge about Escherichia coli K-12.
I. M. Keeseler et al (2016). Nucleic Acids Research, 10.1093/nar/gkw1003.

The MetaCyc database of metabolic pathways and enzymes.
R. Caspi et al (2018). Nucleic Acids Research, 10.1093/nar/gkx935.

CharProtDB: a database of experimentally characterized protein annotations.
R. Madupu et al (2012). Nucleic Acids Research, 10.1093/nar/gkr1133.

The carbohydrate-active enzymes database (CAZy) in 2013.
V. Lombard et al (2014). Nucleic Acids Research, 10.1093/nar/gkt1178.

The Transporter Classification Database (TCDB): recent advances
M. H. Saier, Jr. et al (2016). Nucleic Acids Research, 10.1093/nar/gkv1103.

REBASE - a database for DNA restriction and modification: enzymes, genes and genomes.
R. J. Roberts et al (2015). Nucleic Acids Research, 10.1093/nar/gku1046.

Deep annotation of protein function across diverse bacteria from mutant phenotypes.
M. N. Price et al (2016). bioRxiv, 10.1101/072470.

by Morgan Price, Arkin group
Lawrence Berkeley National Laboratory