PaperBLAST
PaperBLAST Hits for 85 a.a. (MLSFLVSLVV...)
Show query sequence
>85 a.a. (MLSFLVSLVV...)
MLSFLVSLVVAIVIGLIGSAIVGNRLPGGIFGSMIAGLIGAWIGHGLLGTWGPSLAGFAI
FPAIIGAAIFVFLLGLIFRGLRKEA
Running BLASTp...
Found 24 similar proteins in the literature:
lp_0926 integral membrane protein from Lactobacillus plantarum WCFS1
56% identity, 99% coverage
- Two homologous Agr-like quorum-sensing systems cooperatively control adherence, cell morphology, and cell viability properties in Lactobacillus plantarum WCFS1
Fujii, Journal of bacteriology 2008 - “...Downregulated genes lp_0023 lp_0111 lp_0525 lp_0683 lp_0885 lp_0926 lp_0927 lp_0930 lp_1703 lp_2658 lp_3045 lp_3082 lp_3084 lp_3085 lp_3087 lp_3128 lp_3267...”
- “...(from lp_1197 to lp_1205), membrane protein-encoding genes (lp_0926, lp_3575, and lp_3577), Agr-LIKE QUORUM-SENSING SYSTEMS IN L. PLANTARUM VOL. 190, 2008 7661...”
- An agr-like two-component regulatory system in Lactobacillus plantarum is involved in production of a novel cyclic peptide and regulation of adherence
Sturme, Journal of bacteriology 2005 - “...as genes encoding integral membrane proteins (e.g., lp_0926, lp_3575, and lp_3577). Cluster 3 encompassed constitutively up-regulated genes, with the highest...”
- “...ORFa Cluster 2 lp_0525 lp_0526 lp_0683 lp_0684 lp_0925 lp_0926 lp_0927 lp_0928 lp_0929 lp_0930 lp_0931 sacK1 pts1BCA sacA sacR agl2 treA pts4ABC pts7C galT...”
SAR0392 putative membrane protein from Staphylococcus aureus subsp. aureus MRSA252
46% identity, 99% coverage
SAOUHSC_00358 hypothetical protein from Staphylococcus aureus subsp. aureus NCTC 8325
SA0360 hypothetical protein from Staphylococcus aureus subsp. aureus N315
SAV0374 hypothetical protein from Staphylococcus aureus subsp. aureus Mu50
SAUSA300_0374 hypothetical protein from Staphylococcus aureus subsp. aureus USA300_FPR3757
NWMN_0366 hypothetical protein from Staphylococcus aureus subsp. aureus str. Newman
USA300HOU_0397 hypothetical membrane protein from Staphylococcus aureus subsp. aureus USA300_TCH1516
46% identity, 98% coverage
- Lysogenization of Staphylococcus aureus RN450 by phages ϕ11 and ϕ80α leads to the activation of the SigB regulon
Fernández, Scientific reports 2018 - “...SAOUHSC_00257 0.48 0.28 SAOUHSC_00291 2.32 2.18 Up SAOUHSC_00317 2.81 3.78 Up SAOUHSC_00356 3.96 11.49 Up SAOUHSC_00358 4.67 12.22 Up SAOUHSC_00401 0.33 0.16 SAOUHSC_00619 5.66 21.10 Up SAOUHSC_00624 2.72 7.23 Up SAOUHSC_00625 mnhA2 2.10 4.29 Up SAOUHSC_00626 mnhB2 2.14 3.70 Up SAOUHSC_00627 mnhC2 2.13 3.97 Up SAOUHSC_00628...”
- Comparative genomic analysis of European and Middle Eastern community-associated methicillin-resistant Staphylococcus aureus (CC80:ST80-IV) isolates by high-density microarray
Goering, Clinical microbiology and infection : the official publication of the European Society of Clinical Microbiology and Infectious Diseases 2009 - “...SA0331 Hypothetical protein + + + USA300 USA400 SA0359 Hypothetical protein + + USA300 USA400 SA0360 Conserved hypothetical protein + + USA300 USA400 SA0397 Conserved hypothetical protein + + + + + USA300 USA400 SA0406 Hypothetical protein + + + + + USA300+ USA400 sdrD SA0520...”
- Exploring the transcriptome of Staphylococcus aureus in its natural niche
Chaves-Moreno, Scientific reports 2016 - “...vivo compared to in vivo (up to 11,970rpm). Evidently, also genes encoding various membrane proteins (SAV0374, SAV0574, SAV1030 and SAV1359) were extremely differently expressed under in vivo versus in vitro 24 conditions ( Supplementary Dataset S2 and Fig. 3D ). A further protein where the encoding...”
- The msaABCR Operon Regulates the Response to Oxidative Stress in Staphylococcus aureus
Pandey, Journal of bacteriology 2019 (secret) - Transcriptional Response of Staphylococcus aureus to Sunlight in Oxic and Anoxic Conditions
McClary, Frontiers in microbiology 2018 - “...complex 0.25 2.5 NWMN_0163 Conserved hypothetical protein 0.24 9.9 NWMN_1371 Conserved hypothetical protein 0.24 7.2 NWMN_0366 Conserved hypothetical protein 0.24 6.4 NWMN_2392 Conserved hypothetical protein 0.24 12.6 NWMN_2282 Conserved hypothetical protein 0.23 5.0 NWMN_1477 Conserved hypothetical protein 0.23 10.1 clfA Clumping factor A 0.22 1.8 hutG...”
- Pre-epidemic evolution of the MRSA USA300 clade and a molecular key for classification
Bianco, Frontiers in cellular and infection microbiology 2023 - “...AdhR, PchA, HisG SNPs recN (1638449) , leuS (1888283) USA300HOU_0191 (202764) , intergenic (265666) , USA300HOU_0397 (424978) , argS (670365) , intergenic (835434) , USA300HOU_0795 (850349) vwb (876702) , USA300HOU_0938 (982790) , oppD1 (990291) , ebh (1488257) , rluB (1611873) , comGA (1657533) , alaS (1723795)...”
DV527_RS10290 GlsB/YeaQ/YmgE family stress response membrane protein from Staphylococcus saprophyticus
47% identity, 98% coverage
DMB76_011110 GlsB/YeaQ/YmgE family stress response membrane protein from Staphylococcus saccharolyticus
48% identity, 74% coverage
LSA0166 Hypothetical Integral membrane protein from Lactobacillus sakei subsp. sakei 23K
47% identity, 75% coverage
- Global transcriptome response in Lactobacillus sakei during growth on ribose
McLeod, BMC microbiology 2011 - “...precursor -0.5 LSA0106 lsa0106 Hypothetical cell surface protein precursor 0.5 LSA0160 lsa0160 Hypothetical protein -0.7 LSA0166 lsa0166 Hypothetical Integral membrane protein -1.2 LSA0190 lsa0190 Hypothetical integral membrane protein -0.7 -0.6 LSA0191 lsa0191 Hypothetical integral membrane protein -0.6 -0.6 LSA0199 lsa0199 Hypothetical protein 1.1 1.0 1.1 LSA0208...”
BC1000 hypothetical Membrane Spanning Protein from Bacillus cereus ATCC 14579
39% identity, 96% coverage
- Identification of a conserved 5'-dRP lyase activity in bacterial DNA repair ligase D and its potential role in base excision repair
de, Nucleic acids research 2016 - “...those with the chromosomal-encoded neo gene between wild type (wt) ykoU and ykoT genes (strain BC1000) or between ykoUE184A and ykoT genes (strain BC1001) (Supplementary Table S1). GP1502 DNA was used to transform BC1000 strain to render the BC1002 strain. Plasmid-borne ykoUE184A neo ykoT operon was...”
- Correction: SecDF as Part of the Sec-Translocase Facilitates Efficient Secretion of Bacillus cereus Toxins and Cell Wall-Associated Proteins
, PloS one 2014 - “...Catalase 13.31 4.2E-06 BC0998 General stress protein 17M 11.41 2.1E-08 BC0999 hypothetical protein 12.27 2.8E-07 BC1000 hypothetical Membrane Spanning Protein 12.54 6.7E-06 BC1002 Anti-sigma B factor antagonist 5.36 2.5E-06 BC1003 Anti-sigma B factor 8.97 1.4E-06 BC1004 RNA polymerase sigma-B factor 7.84 1.8E-06 BC1010 hypothetical protein 10.61...”
- SecDF as part of the Sec-translocase facilitates efficient secretion of Bacillus cereus toxins and cell wall-associated proteins
Vörös, PloS one 2014 - “...Catalase 13.31 4.2E-06 BC0998 General stress protein 17M 11.41 2.1E-08 BC0999 hypothetical protein 12.27 2.8E-07 BC1000 hypothetical Membrane Spanning Protein 12.54 6.7E-06 BC1002 Anti-sigma B factor antagonist 5.36 2.5E-06 BC1003 Anti-sigma B factor 8.97 1.4E-06 BC1004 RNA polymerase sigma-B factor 7.84 1.8E-06 BC1010 hypothetical protein 10.61...”
- SpoIVA and SipL are Clostridium difficile spore morphogenetic proteins
Putnam, Journal of bacteriology 2013 - “...at 4C. Samples were dialyzed against 2 M guanidine HCl in BC1000 (20 mM Tris [pH 7.4], 0.2 mM EDTA, 20% [vol/vol] glycerol, 1 M KCl) plus 0.1% (vol/vol) NP-40...”
- “...0.1% (vol/vol) NP-40 for 1.5 h at 4C, and again against BC1000 plus 0.1% (vol/vol) NP-40 for 1.5 h at 4C and centrifuged at 15,000 rpm for 30 min at 4C....”
- Bacillus cereus cell response upon exposure to acid environment: toward the identification of potential biomarkers
Desriac, Frontiers in microbiology 2013 - “...BC0995 Hypothetical protein BC0996 Hypothetical protein BC0998 yflT General stress protein BC0999 csbD Hypothetical protein BC1000 Hypothetical protein BC1001 Hypothetical protein BC1002 rsbV Anti- B factor antagonist BC1003 rsbW Anti- B factor BC1004 sigB RNA polymerase sigma factor B BC1005 orf4 Putative bacterioferritin BC1006 rsbY PP2C-type...”
- “...rsbV, rsbW and sigB, orf4 , and rsbY , as well as the two genes BC1000 and BC1009 are up-regulated (1.5 fold in both conditions). In the same way, BC0862 and BC0998 genes are over-expressed: the first one encoding the YflT protein is known to be...”
- Identification of the sigmaB regulon of Bacillus cereus and conservation of sigmaB-regulated genes in low-GC-content gram-positive bacteria
van, Journal of bacteriology 2007 - “...Berkeley bc0862 bc0863 bc0995 bc0996 bc0998 bc0999d bc1000 bc1001 bc1002 bc1003 bc1004 Experimentally defined and/or predicted promoter sequencec 4388 VAN...”
- “...has a role in hyperosmotic and cold stress (10); and bc1000, which is homologous to the GlsB protein of Enterococcus faecalis, where it has a role in resistance...”
EF0081 membrane protein, putative from Enterococcus faecalis V583
49% identity, 91% coverage
LBA0872 hypothetical protein from Lactobacillus acidophilus NCFM
44% identity, 98% coverage
SPy1768 conserved hypothetical protein from Streptococcus pyogenes M1 GAS
47% identity, 85% coverage
EF2708 membran protein, putative from Enterococcus faecalis V583
42% identity, 92% coverage
SP_0279 hypothetical protein from Streptococcus pneumoniae TIGR4
48% identity, 74% coverage
LLKF_0277 hypothetical protein from Lactococcus lactis subsp. lactis KF147
50% identity, 68% coverage
AWJ25_RS06350 GlsB/YeaQ/YmgE family stress response membrane protein from Enterococcus faecium
49% identity, 62% coverage
- Gene Duplications in the Genomes of Staphylococci and Enterococci
Sanchez-Herrero, Frontiers in molecular biosciences 2020 - “...are also duplicated in other E. faecium strains. Group Locus Tag 1 Description Percentage 103 AWJ25_RS06350 GlsB/YeaQ/YmgE family stress response membrane protein 99.25% 104 AWJ25_RS07455 LysM peptidoglycan binding domain containing protein 99.25% 109 AWJ25_RS09645 PTS lactose/cellobiose transporter subunit IIA 95.49% 110 AWJ25_RS09650 PTS sugar transporter subunit...”
BAS2692 conserved hypothetical protein from Bacillus anthracis str. Sterne
AW20_5555 GlsB/YeaQ/YmgE family stress response membrane protein from Bacillus anthracis str. Sterne
48% identity, 91% coverage
SPy1265 conserved hypothetical protein from Streptococcus pyogenes M1 GAS
M6_Spy0965 Integral membrane protein from Streptococcus pyogenes MGAS10394
36% identity, 91% coverage
L67002 HYPOTHETICAL PROTEIN from Lactococcus lactis subsp. lactis Il1403
49% identity, 67% coverage
- Genotype-phenotype matching analysis of 38 Lactococcus lactis strains using random forest methods
Bayjanov, BMC microbiology 2013 - “...therefore the ability to metabolize arginine. A cluster of 4 genes (L65637, L66209, L66407 and L67002 in strain IL1403, and their orthologs) was identified to be relevant to arginine metabolism (Figure 4 A). All 4 proteins are annotated as hypothetical proteins in strain IL1403 and two...”
- “...two encoded proteins, llmg_1257 and llmg_1259, are in the same COGs with proteins L66209 and L67002 of strain IL1403. The protein L67002 belongs to a family of membrane proteins of which some are glycosyltransferase-associated proteins. Probably, at least two of these proteins, L66209 and L67002, and...”
LSEI_2880 Predicted membrane protein from Lactobacillus casei ATCC 334
54% identity, 66% coverage
llmg_1257 hypothetical protein from Lactococcus lactis subsp. cremoris MG1363
LLKF_2284 hypothetical protein from Lactococcus lactis subsp. lactis KF147
49% identity, 67% coverage
- Genotype-phenotype matching analysis of 38 Lactococcus lactis strains using random forest methods
Bayjanov, BMC microbiology 2013 - “...also identified to be related to arginine metabolism (Figure 4 B), and two encoded proteins, llmg_1257 and llmg_1259, are in the same COGs with proteins L66209 and L67002 of strain IL1403. The protein L67002 belongs to a family of membrane proteins of which some are glycosyltransferase-associated...”
- “...proteins. Probably, at least two of these proteins, L66209 and L67002, and their MG1363 orthologs, llmg_1257 and llmg_1259, should be re-annotated as transport proteins or maybe more specifically arginine transport proteins. However, experimental validation is necessary. Figure 4 Genes related to arginine metabolism. A ) Two...”
- Molecular description and industrial potential of Tn6098 conjugative transfer conferring alpha-galactoside metabolism in Lactococcus lactis
Machielsen, Applied and environmental microbiology 2011 - “...LLKF_2278 LLKF_2279 LLKF_2280 LLKF_2281 LLKF_2282 LLKF_2283 LLKF_2284 LLKF_2285 LLKF_2286 LLKF_2287 LLKF_t0054 2326054 2327241 2329454 2330622 2332669 2332809...”
UC7_RS15700 GlsB/YeaQ/YmgE family stress response membrane protein from Enterococcus caccae ATCC BAA-1240
47% identity, 82% coverage
- Apigenin Impacts the Growth of the Gut Microbiota and Alters the Gene Expression of Enterococcus
Wang, Molecules (Basel, Switzerland) 2017 - “...protection protein 2.0 Ribosomal protection UC7_RS14785 hypothetical protein 1.9 Unknown UC7_RS11535 hypothetical protein 1.9 Unknown UC7_RS15700 general stress protein GlsB 1.9 Stress response UC7_RS14775 hypothetical protein 1.8 Unknown UC7_RS15645 hypothetical protein 1.8 Unknown UC7_RS16575 hypothetical protein 1.8 Unknown UC7_RS16225 WxL domain surface protein 1.7 Surface protein...”
SP_1801 hypothetical protein from Streptococcus pneumoniae TIGR4
53% identity, 51% coverage
LLKF_2085 hypothetical protein from Lactococcus lactis subsp. lactis KF147
46% identity, 67% coverage
- Strain-Dependent Transcriptome Signatures for Robustness in Lactococcus lactis
Dijkstra, PloS one 2016 - “...rarA ArsR family transcriptional regulator positive 0.7 LLKF_0447 yeaA beta-lactamase superfamily Zn-dependent hydrolase positive 6.0 LLKF_2085 ytgB hypothetical protein positive 17.7 LLKF_1563 bglH beta-glucosidase/ 6-phospho-beta-glucosidase positive 0.4 LLKF_1820 yrbB transglycosylase positive 26.3 LLKF_2083 hypothetical protein positive 15.2 LLKF_2084 ytgA hypothetical protein positive 14.1 LLKF_1723 excisionase positive...”
M5005_Spy_0976 integral membrane protein from Streptococcus pyogenes MGAS5005
44% identity, 56% coverage
lp_3577 integral membrane protein from Lactobacillus plantarum WCFS1
50% identity, 48% coverage
- Expression of heterologous sigma factors enables functional screening of metagenomic and heterologous genomic libraries
Gaida, Nature communications 2015 - “...found genes encoding transporters ( araP , lp_3563 and lp_3565), two membrane proteins (lp_3575 and lp_3577), proteins associated with energy metabolism ( lox and pox4 ), as well as two proteins (catalase ( kat )) and a heat-shock protein ( clpL )) involved in stress response....”
- Two homologous Agr-like quorum-sensing systems cooperatively control adherence, cell morphology, and cell viability properties in Lactobacillus plantarum WCFS1
Fujii, Journal of bacteriology 2008 - “...lp_3082 lp_3084 lp_3085 lp_3087 lp_3128 lp_3267 lp_3420 lp_3575 lp_3577 lp_3578 lp_3579 lp_3580 lp_3582 lp_3583 lp_3586 a Gene cps2A cps2B galE2 cps2E cps2F...”
- “...to lp_1205), membrane protein-encoding genes (lp_0926, lp_3575, and lp_3577), Agr-LIKE QUORUM-SENSING SYSTEMS IN L. PLANTARUM VOL. 190, 2008 7661 TABLE 4. Genes...”
- Identification of prebiotic fructooligosaccharide metabolism in Lactobacillus plantarum WCFS1 through microarrays
Saulnier, Applied and environmental microbiology 2007 - “...lp_2113 lp_3243 lp_3250 lp_3318 lp_3433 lp_3489 lp_3577 Unknown Oxidoreductase Unknown Unknown Unknown Oxidoreductase Unknown Unknown Unknown Unknown Unknown...”
- An agr-like two-component regulatory system in Lactobacillus plantarum is involved in production of a novel cyclic peptide and regulation of adherence
Sturme, Journal of bacteriology 2005 - “...integral membrane proteins (e.g., lp_0926, lp_3575, and lp_3577). Cluster 3 encompassed constitutively up-regulated genes, with the highest effect in early...”
- “...asp2 hpaG lp_2658 lp_2743 lp_2744 lp_3045 lp_3047 lp_3575 lp_3577 lp_3578 lp_3580 kat lamA lp_3581 lp_3581a lamC lamD lp_3582 lp_3583 lp_3586 lamB clpL lox...”
For advice on how to use these tools together, see
Interactive tools for functional annotation of bacterial genomes.
The PaperBLAST database links 798,070 different protein sequences to 1,261,478 scientific articles. Searches against EuropePMC were last performed on May 12 2025.
PaperBLAST builds a database of protein sequences that are linked
to scientific articles. These links come from automated text searches
against the articles in EuropePMC
and from manually-curated information from GeneRIF, UniProtKB/Swiss-Prot,
BRENDA,
CAZy (as made available by dbCAN),
BioLiP,
CharProtDB,
MetaCyc,
EcoCyc,
TCDB,
REBASE,
the Fitness Browser,
and a subset of the European Nucleotide Archive with the /experiment tag.
Given this database and a protein sequence query,
PaperBLAST uses protein-protein BLAST
to find similar sequences with E < 0.001.
To build the database, we query EuropePMC with locus tags, with RefSeq protein
identifiers, and with UniProt
accessions. We obtain the locus tags from RefSeq or from MicrobesOnline. We use
queries of the form "locus_tag AND genus_name" to try to ensure that
the paper is actually discussing that gene. Because EuropePMC indexes
most recent biomedical papers, even if they are not open access, some
of the links may be to papers that you cannot read or that our
computers cannot read. We query each of these identifiers that
appears in the open access part of EuropePMC, as well as every locus
tag that appears in the 500 most-referenced genomes, so that a gene
may appear in the PaperBLAST results even though none of the papers
that mention it are open access. We also incorporate text-mined links
from EuropePMC that link open access articles to UniProt or RefSeq
identifiers. (This yields some additional links because EuropePMC
uses different heuristics for their text mining than we do.)
For every article that mentions a locus tag, a RefSeq protein
identifier, or a UniProt accession, we try to select one or two
snippets of text that refer to the protein. If we cannot get access to
the full text, we try to select a snippet from the abstract, but
unfortunately, unique identifiers such as locus tags are rarely
provided in abstracts.
PaperBLAST also incorporates manually-curated protein functions:
- Proteins from NCBI's RefSeq are included if a
GeneRIF
entry links the gene to an article in
PubMed®.
GeneRIF also provides a short summary of the article's claim about the
protein, which is shown instead of a snippet.
- Proteins from Swiss-Prot (the curated part of UniProt)
are included if the curators
identified experimental evidence for the protein's function (evidence
code ECO:0000269). For these proteins, the fields of the Swiss-Prot entry that
describe the protein's function are shown (with bold headings).
- Proteins from BRENDA,
a curated database of enzymes, are included if they are linked to a paper in PubMed
and their full sequence is known.
- Every protein from the non-redundant subset of
BioLiP,
a database
of ligand-binding sites and catalytic residues in protein structures, is included. Since BioLiP itself
does not include descriptions of the proteins, those are taken from the
Protein Data Bank.
Descriptions from PDB rely on the original submitter of the
structure and cannot be updated by others, so they may be less reliable.
(For SitesBLAST and Sites on a Tree, we use a larger subset of BioLiP so that every
ligand is represented among a group of structures with similar sequences, but for
PaperBLAST, we use the non-redundant set provided by BioLiP.)
- Every protein from EcoCyc, a curated
database of the proteins in Escherichia coli K-12, is included, regardless
of whether they are characterized or not.
- Proteins from the MetaCyc metabolic pathway database
are included if they are linked to a paper in PubMed and their full sequence is known.
- Proteins from the Transport Classification Database (TCDB)
are included if they have known substrate(s), have reference(s),
and are not described as uncharacterized or putative.
(Some of the references are not visible on the PaperBLAST web site.)
- Every protein from CharProtDB,
a database of experimentally characterized protein annotations, is included.
- Proteins from the CAZy database of carbohydrate-active enzymes
are included if they are associated with an Enzyme Classification number.
Even though CAZy does not provide links from individual protein sequences to papers,
these should all be experimentally-characterized proteins.
- Proteins from the REBASE database
of restriction enzymes are included if they have known specificity.
- Every protein with an evidence-based reannotation (based on mutant phenotypes)
in the Fitness Browser is included.
- Sequence-specific transcription factors (including sigma factors and DNA-binding response regulators)
with experimentally-determined DNA binding sites from the
PRODORIC database of gene regulation in prokaryotes.
- Putative transcription factors from RegPrecise
that have manually-curated predictions for their binding sites. These predictions are based on
conserved putative regulatory sites across genomes that contain similar transcription factors,
so PaperBLAST clusters the TFs at 70% identity and retains just one member of each cluster.
- Coding sequence (CDS) features from the
European Nucleotide Archive (ENA)
are included if the /experiment tag is set (implying that there is experimental evidence for the annotation),
the nucleotide entry links to paper(s) in PubMed,
and the nucleotide entry is from the STD data class
(implying that these are targeted annotated sequences, not from shotgun sequencing).
Also, to filter out genes whose transcription or translation was detected, but whose function
was not studied, nucleotide entries or papers with more than 25 such proteins are excluded.
Descriptions from ENA rely on the original submitter of the
sequence and cannot be updated by others, so they may be less reliable.
Except for GeneRIF and ENA,
the curated entries include a short curated
description of the protein's function.
For entries from BioLiP, the protein's function may not be known beyond binding to the ligand.
Many of these entries also link to articles in PubMed.
For more information see the
PaperBLAST paper (mSystems 2017)
or the code.
You can download PaperBLAST's database here.
Changes to PaperBLAST since the paper was written:
- November 2023: incorporated PRODORIC and RegPrecise. Many PRODORIC entries were not linked to a protein sequence (no UniProt identifier), so we added this information.
- February 2023: BioLiP changed their download format. PaperBLAST now includes their non-redundant subset. SitesBLAST and Sites on a Tree use a larger non-redundant subset that ensures that every ligand is represented within each cluster. This should ensure that every binding site is represented.
- June 2022: incorporated some coding sequences from ENA with the /experiment tag.
- March 2022: incorporated BioLiP.
- April 2020: incorporated TCDB.
- April 2019: EuropePMC now returns table entries in their search results. This has expanded PaperBLAST's database, but most of the new entries are of low relevance, and the resulting snippets are often just lists of locus tags with annotations.
- February 2018: the alignment page reports the conservation of the hit's functional sites (if available from from Swiss-Prot or UniProt)
- January 2018: incorporated BRENDA.
- December 2017: incorporated MetaCyc, CharProtDB, CAZy, REBASE, and the reannotations from the Fitness Browser.
- September 2017: EuropePMC no longer returns some table entries in their search results. This has shrunk PaperBLAST's database, but has also reduced the number of low-relevance hits.
Many of these changes are described in Interactive tools for functional annotation of bacterial genomes.
PaperBLAST cannot provide snippets for many of the papers that are
published in non-open-access journals. This limitation applies even if
the paper is marked as "free" on the publisher's web site and is
available in PubmedCentral or EuropePMC. If a journal that you publish
in is marked as "secret," please consider publishing elsewhere.
Many important articles are missing from PaperBLAST, either because
the article's full text is not in EuropePMC (as for many older
articles), or because the paper does not mention a protein identifier such as a locus tag, or because of PaperBLAST's heuristics. If you notice an
article that characterizes a protein's function but is missing from
PaperBLAST, please notify the curators at UniProt
or add an entry to GeneRIF.
Entries in either of these databases will eventually be incorporated
into PaperBLAST. Note that to add an entry to UniProt, you will need
to find the UniProt identifier for the protein. If the protein is not
already in UniProt, you can ask them to create an entry. To add an
entry to GeneRIF, you will need an NCBI Gene identifier, but
unfortunately many prokaryotic proteins in RefSeq do not have
corresponding Gene identifers.
References
PaperBLAST: Text-mining papers for information about homologs.
M. N. Price and A. P. Arkin (2017). mSystems, 10.1128/mSystems.00039-17.
Europe PMC in 2017.
M. Levchenko et al (2017). Nucleic Acids Research, 10.1093/nar/gkx1005.
Gene indexing: characterization and analysis of NLM's GeneRIFs.
J. A. Mitchell et al (2003). AMIA Annu Symp Proc 2003:460-464.
UniProt: the universal protein knowledgebase.
The UniProt Consortium (2016). Nucleic Acids Research, 10.1093/nar/gkw1099.
BRENDA in 2017: new perspectives and new tools in BRENDA.
S. Placzek et al (2017). Nucleic Acids Research, 10.1093/nar/gkw952.
The EcoCyc database: reflecting new knowledge about Escherichia coli K-12.
I. M. Keeseler et al (2016). Nucleic Acids Research, 10.1093/nar/gkw1003.
The MetaCyc database of metabolic pathways and enzymes.
R. Caspi et al (2018). Nucleic Acids Research, 10.1093/nar/gkx935.
CharProtDB: a database of experimentally characterized protein annotations.
R. Madupu et al (2012). Nucleic Acids Research, 10.1093/nar/gkr1133.
The carbohydrate-active enzymes database (CAZy) in 2013.
V. Lombard et al (2014). Nucleic Acids Research, 10.1093/nar/gkt1178.
The Transporter Classification Database (TCDB): recent advances
M. H. Saier, Jr. et al (2016). Nucleic Acids Research, 10.1093/nar/gkv1103.
REBASE - a database for DNA restriction and modification: enzymes, genes and genomes.
R. J. Roberts et al (2015). Nucleic Acids Research, 10.1093/nar/gku1046.
Deep annotation of protein function across diverse bacteria from mutant phenotypes.
M. N. Price et al (2016). bioRxiv, 10.1101/072470.
by Morgan Price,
Arkin group
Lawrence Berkeley National Laboratory