PaperBLAST
PaperBLAST Hits for Shewana3_0683 (83 a.a., MECSLIEQIL...)
Show query sequence
>Shewana3_0683
MECSLIEQILRDALALDEVHASSDGSHYKVIAVGECFDGMSRVKQQQAIYAPLMSYIASG
ELHALTIKTFTPTQWKREKIFNS
Running BLASTp...
Found 39 similar proteins in the literature:
IbaG / b3190 acid stress protein IbaG from Escherichia coli K-12 substr. MG1655 (see 6 papers)
IBAG_ECOLI / P0A9W6 Acid stress protein IbaG from Escherichia coli (strain K12) (see paper)
b3190 orf, hypothetical protein from Escherichia coli str. K-12 substr. MG1655
NP_417657 acid stress protein IbaG from Escherichia coli str. K-12 substr. MG1655
57% identity, 99% coverage
- function: Involved in cell resistance against acid stress.
disruption phenotype: Deletion mutants grow faster and have higher viabilities in rich media, but have lower viabilities than the wild type in the late stationary phase. - The Gene Expression Profile of Uropathogenic Escherichia coli in Women with Uncomplicated Urinary Tract Infections Is Recapitulated in the Mouse Model
Frick-Cheng, mBio 2020 - “...protein GspL 3.3 UTI89_C3377 hpt Hypoxanthine phosphoribosyltransferase 2.1 b0125 ibaG Acid stress protein IbaG 2.2 b3190 lysP Lysine:H(+) symporter 2.1 b2156 opgC Protein required for succinyl modification of osmoregulated periplasmic glucans 2.6 b1047 ribE 6,7-Dimethyl-8-ribityllumazine synthase 2.1 b0415 rpmE 50S ribosomal subunit protein L31 2.7 b3936...”
- Remaining flexible in old alliances: functional plasticity in constrained mutualisms
Wernegreen, DNA and cell biology 2009 - “...Bbp528 yqeI b2847 Bpen401 Bfl390 WGLp467 - - - yrbA b3190 Bpen046 Bfl045 WGLp328 BU385 BUsg372 Bbp348 zur b4046 Bpen026 Bfl026 - - - - Gene 443 regulation genes...”
- Combined, functional genomic-biochemical approach to intermediary metabolism: interaction of acivicin, a glutamine amidotransferase inhibitor, with Escherichia coli K-12
Smulski, Journal of bacteriology 2001 - “...b3021 b3022 b3024 b3029 b3068 b3097 b3098 b3099 b3160 b3190 b3203 b3263 b3292 b3293 b3399 b3400 b3401 b3446 b3448 b3472 b3494 b3515 b3516 b3522 b3548 b3555...”
- Characterization of the BolA homolog IbaG: a new gene involved in acid resistance.
Guinote, Journal of microbiology and biotechnology 2012 (PubMed)- GeneRIF: YrbA, renamed as ibaG, is not an essential gene and is involved in acid resistance.
ECs4069 hypothetical protein from Escherichia coli O157:H7 str. Sakai
57% identity, 93% coverage
SL1344_3280 BolA family iron metabolism protein IbaG from Salmonella enterica subsp. enterica serovar Typhimurium str. SL1344
57% identity, 99% coverage
PMI3660 morphoprotein from Proteus mirabilis HI4320
58% identity, 94% coverage
M892_12930 BolA family iron metabolism protein IbaG from Vibrio campbellii ATCC BAA-1116
57% identity, 94% coverage
YPTB3514 putative BolA/YrbA family protein from Yersinia pseudotuberculosis IP 32953
54% identity, 95% coverage
YPO3570 BolA-like protein from Yersinia pestis CO92
53% identity, 95% coverage
WP_000376481 BolA family iron metabolism protein IbaG from Vibrio cholerae O1 str. 2011EL-1137
57% identity, 94% coverage
VP2659 BolA/YrbA family protein from Vibrio parahaemolyticus RIMD 2210633
53% identity, 99% coverage
- BolA-like protein (IbaG) promotes biofilm formation and pathogenicity of Vibrio parahaemolyticus
Wang, Frontiers in microbiology 2024 - “...is yet to be assessed. In this study, we performed the first systematic analysis of vp2659 (255-bp encoding protein IbaG) in the genome of V. parahaemolyticus SH112. We successfully constructed the ibaG (255-bp) mutant and obtained its revertant C ibaG and overexpression WT ( ibaG )...”
- “...colonization and mouse virulence, demonstrating for the first time that the transcriptional regulator IbaG ( vp2659 ) acts as an important virulence factor. Our data suggested that IbaG may affect the virulence of V. parahaemolyticus by participating in numerous cellular metabolic processes, such as motility and...”
P45026 Uncharacterized protein HI_1082 from Haemophilus influenzae (strain ATCC 51907 / DSM 11121 / KW20 / Rd)
HI1082 conserved hypothetical protein from Haemophilus influenzae Rd KW20
54% identity, 96% coverage
HD0254 conserved hypothetical protein from Haemophilus ducreyi 35000HP
49% identity, 95% coverage
Bfl045 conserved hypothetical protein from Candidatus Blochmannia floridanus
36% identity, 90% coverage
BU385 hypothetical protein from Buchnera aphidicola str. APS (Acyrthosiphon pisum)
40% identity, 76% coverage
bolA / CAB45536.1 BolA protein from Pseudomonas fluorescens (see paper)
42% identity, 61% coverage
lpg0846 hypothetical BolA like protein from Legionella pneumophila subsp. pneumophila str. Philadelphia 1
37% identity, 88% coverage
- Transcriptomic changes of Legionella pneumophila in water
Li, BMC genomics 2015 - “...orthologous groups. According to the microarray data, six of the selected genes ( lpg0586 , lpg0846 , lpg1206 , lpg1659 , lpg2316 ( bdhA ) and lpg2524 ) were significantly up-regulated in water at two or all three time points, while the remaining four genes (...”
- “...qPCR ATTCCCATCGCCATTTAGAG 25_QR lpg0025 qPCR CAACCCGAGAGGTAACTAATAC 586_QF lpg0586 qPCR GTGGCGTTCCAGTTTGT 586_QR lpg0586 qPCR CTGTCCAGGCAGCATAAC 846_QF lpg0846 qPCR GGTAGAAGGCGATGGTTATC 846_QR lpg0846 qPCR GCCTTCCGGTGGTAATAAA 890_QF lpg0890 qPCR CCTTCCAATCCCATGCTAAAG 890_QR lpg0890 qPCR GTCAAATCCGAGTTCAAGAGG 1206_QF lpg1206 qPCR GCGTCATGAGGATTCTATTCG 1206_QR lpg1206 qPCR GGCCTGTAAATCGTATCAGAC 1284_QF lpg1284 qPCR GTTTATCTCAGAGCGGCAAG 1284_QR lpg1284 qPCR GACATCCTCCAAAGGCTTATC...”
C0J56_08545 BolA family protein from Pseudomonas fluorescens
42% identity, 61% coverage
ssr3122 hypothetical protein from Synechocystis sp. PCC 6803
47% identity, 60% coverage
- Deep Proteogenomics of a Photosynthetic Cyanobacterium
Spät, Journal of proteome research 2023 - “...0.01). By focusing on the subset of uncharacterized proteins, we observed that Slr1419, Slr1846, and Ssr3122 were significantly decreased in abundance during resuscitation compared to chlorosis ( Figure S2f ). Instead, Sll1735 (increased), Sll1783, Sll7086, Slr5111, and Slr5127 (decreased) were connected to the low CO 2...”
- Structural Determinants and Their Role in Cyanobacterial Morphogenesis
Springstein, Life (Basel, Switzerland) 2020 - “...Synpcc7942_0299 (WP_011243525.1) All0086 (WP_010994263.1) Cell elongation MreD N/A Synpcc7942_0298 (ABB56330.1) All0085 (BAB77609.1) Cell elongation BolA Ssr3122 (WP_010871705.1) Synpcc7942_1146 (ABB57176.1) Asr0798 (WP_010994972.1) Cell elongation CikA Slr1969 (WP_010872820.1) Synpcc7942_0644 (WP_011243194.1) All1688 (WP_010995857.1) Circadian rhythm PBP1 Sll0002 (WP_010873436.1) Synpcc7942_2000 (WP_011378270.1) Alr5101 (WP_010999227.1) Cell wall synthesis PBP2 Slr1710 (WP_010871874.1) Synpcc7942_0785...”
- Proteomic analysis reveals resistance mechanism against biofuel hexane in Synechocystis sp. PCC 6803
Liu, Biotechnology for biofuels 2012 - “...Slr1846, Slr1847, Slr2101, Ssl0242, Ssl0352, Ssl0467, Ssl0832, Ssl1690, Ssl1707, Ssl1972, Ssl2717, Ssl3364, Ssr1528, Ssr1853, Ssr2554, Ssr3122, Ssr3304, Ssr3402 Hypothetical proteins *Proteins with1.5 fold change and p - value less than 0.05. **Hypothetical proteins listed with gene ID only, full information in Additional file 1 : Table...”
PA0857 morphogene protein BolA from Pseudomonas aeruginosa PAO1
42% identity, 64% coverage
Bd1328 BolA-like protein from Bdellovibrio bacteriovorus HD100
39% identity, 67% coverage
- DivIVA Controls Progeny Morphology and Diverse ParA Proteins Regulate Cell Division or Gliding Motility in Bdellovibrio bacteriovorus
Milner, Frontiers in microbiology 2020 - “...to each other (12% sequence identity). Adjacent to parA1 is a bolA -like gene ( bd1328 ). BolA is a transcription factor involved in the regulation of penicillin-binding proteins PBP5 and PBP6, and of MreB ( Guinote et al., 2011 ; Singh and Montgomery, 2014 )....”
- “...encoded within operons ( Supplementary Figure S7 ). Co-expression of parA1 , bd1327 , and bd1328 (encoding a BolA homolog) was observed at the 3 h time point, whilst parA2 was found to be co-transcribed with bd2329 at 3 h post-infection. Co-transcription of parA3, bd3905 (...”
PP0963 toluene-tolerance protein from Pseudomonas putida KT2440
33% identity, 90% coverage
PP1757 bolA protein from Pseudomonas putida KT2440
41% identity, 63% coverage
- New transposon tools tailored for metabolic engineering of gram-negative microbial cell factories
Martínez-García, Frontiers in bioengineering and biotechnology 2014 - “...4,444,716 PP3941 Isochorismatase superfamily hydrolase KT-G7 ME-I 4,792,981 PP4221 Non-ribosomal peptide synthetase KT-G8 ME-I 1,959,862 PP1757 + bolA , BolA family protein KT-S1 ME-I 4,463,594 PP3956 + Hypothetical protein KT-S2 ME-I 3,552,337 PP3136 NA Intergenic region (PP3136 and PP3137) KT-S3 ME-I 4,671,241 PP4132 Hypothetical protein KT-S4...”
PA4451 hypothetical protein from Pseudomonas aeruginosa PAO1
31% identity, 90% coverage
- Full Transcriptomic Response of Pseudomonas aeruginosa to an Inulin-Derived Fructooligosaccharide
Rubio-Gómez, Frontiers in microbiology 2020 - “...protein 1.2 0.000 1.1 0.000 PA4432 rpsI 30S ribosomal protein S9 0.7 0.012 0.8 0.000 PA4451 yrbA Uncharacterized protein 0.7 0.000 0.9 0.000 PA4462 rpoN RNA polymerase sigma-54 factor 0.9 0.000 0.6 0.000 PA4520 Probable chemotaxis transducer 0.4 0.009 0.5 0.003 PA4541 lepA Large extracellular protease...”
- “...0.000 PA4421 yabC Uncharacterized protein 0.9 0.000 PA4432 rpsI 30S ribosomal protein S9 0.6 0.005 PA4451 yrbA Uncharacterized protein 0.9 0.000 PA4462 rpoN RNA polymerase sigma-54 factor 0.9 0.000 PA4475 Uncharacterized protein 0.7 0.001 PA4520 Probable chemotaxis transducer 0.6 0.002 PA4525 pilA Type 4 fimbrial precursor...”
JUK32_RS25875 BolA family protein from Halomicronema sp. CCY15110
42% identity, 62% coverage
Rta_08200 BolA family protein from Ramlibacter tataouinensis TTB310
28% identity, 90% coverage
BPSL3142 BolA-like protein from Burkholderia pseudomallei K96243
32% identity, 88% coverage
ABA1_00155 BolA family protein from Acinetobacter baumannii
34% identity, 75% coverage
C0J56_05010 BolA family protein from Pseudomonas fluorescens
31% identity, 90% coverage
Synpcc7942_1146 conserved hypothetical protein from Synechococcus elongatus PCC 7942
35% identity, 65% coverage
- Structural Determinants and Their Role in Cyanobacterial Morphogenesis
Springstein, Life (Basel, Switzerland) 2020 - “...All0086 (WP_010994263.1) Cell elongation MreD N/A Synpcc7942_0298 (ABB56330.1) All0085 (BAB77609.1) Cell elongation BolA Ssr3122 (WP_010871705.1) Synpcc7942_1146 (ABB57176.1) Asr0798 (WP_010994972.1) Cell elongation CikA Slr1969 (WP_010872820.1) Synpcc7942_0644 (WP_011243194.1) All1688 (WP_010995857.1) Circadian rhythm PBP1 Sll0002 (WP_010873436.1) Synpcc7942_2000 (WP_011378270.1) Alr5101 (WP_010999227.1) Cell wall synthesis PBP2 Slr1710 (WP_010871874.1) Synpcc7942_0785 (ABB56817.1) Alr4579...”
Saro_2520 BolA-like protein from Novosphingobium aromaticivorans DSM 12444
38% identity, 88% coverage
PSPTO_4442 toluene tolerance protein, putative from Pseudomonas syringae pv. tomato str. DC3000
31% identity, 90% coverage
VT47_19750 BolA family protein from Pseudomonas syringae pv. syringae
31% identity, 90% coverage
VF_0724 transcriptional regulator BolA from Aliivibrio fischeri ES114
VF_0724 regulator of penicillin binding proteins and beta lactamase transcription (morphogene) from Vibrio fischeri ES114
40% identity, 68% coverage
RSP_2952 BolA-like protein from Rhodobacter sphaeroides 2.4.1
34% identity, 87% coverage
- Convergence of the transcriptional responses to heat shock and singlet oxygen stresses
Dufour, PLoS genetics 2012 - “...RSP_1684, RSP_1743, RSP_1852, RSP_2121, RSP_2125, RSP_2214, RSP_2219, RSP_2387, RSP_2638, RSP_2640, RSP_2641, RSP_2739, RSP_2763, RSP_2764, RSP_2816, RSP_2952, RSP_2953, RSP_3067, RSP_3068, RSP_3378, RSP_3426, RSP_3552, RSP_3597, RSP_3598, RSP_3634, RSP_3809, RSP_3810, RSP_4244, RSP_4245, RSP_4248, RSP_4305 RpoHII regulon (99 genes) Energy metabolism Biosynthesis and degradation of polysaccharides RSP_0482 Electron transport RSP_0108,...”
- Identification, functional studies, and genomic comparisons of new members of the NnrR regulon in Rhodobacter sphaeroides
Hartsock, Journal of bacteriology 2010 - “...site is located upstream of a gene designated bolA (gene RSP_2952 in 2.4.1) (47). While this gene is conserved in strain 2.4.3, the NnrR binding site is not...”
NGO1657 hypothetical protein from Neisseria gonorrhoeae FA 1090
33% identity, 60% coverage
- Identification of Novel Immunogenic Proteins of Neisseria gonorrhoeae by Phage Display
Connor, PloS one 2016 - “...known antigens. One clone carried parts of two genesNGO1656 and NGO1657the last 92 bp of NGO1657, a non-coding part of 48 bp and 131 bp of NGO1656. Both genes were chosen to further characterise the immunogenic character of their encoded proteins. The eight potential immunogenic proteins...”
- “...6 NGO0777, 7 NGO0916, 8 NGO1043, 9 NGO1500 (- control), 10 NGO1634, 11 NGO1656, 12 NGO1657, 13 NGO1796, 14 NGO1852. B SDS-PAGE (15%) of 0.5 g each of produced proteins: M Spectra Multicolor Low Range Protein Ladder (Thermo Scientific, 26628), 1 NGO0326, 2 NGO0777, 3 NGO1043,...”
- Identification of the iron-responsive genes of Neisseria gonorrhoeae by microarray analysis in defined medium
Ducey, Journal of bacteriology 2005 - “...NGO1686 NGO1686 NGO1189 NGO0959 NGO1318 NGO1652 NGO1559 NGO1657 a Abbreviations: NA, not applicable; CHP, conserved hypothetical protein. of NGO0173, which...”
BAB1_0856 ATP/GTP-binding site motif A (P-loop):BolA-like protein from Brucella melitensis biovar Abortus 2308
35% identity, 87% coverage
3tr3B / Q83DW0 Structure of a bola protein homologue from coxiella burnetii (see paper)
31% identity, 71% coverage
- Ligand: cobalt (ii) ion (3tr3B)
NMB0344 BolA/YrbA family protein from Neisseria meningitidis MC58
33% identity, 60% coverage
- Identification of Novel Immunogenic Proteins of Neisseria gonorrhoeae by Phage Display
Connor, PloS one 2016 - “...Unknown, non-cytoplasmic Yes [ 48 , 53 ] NGO1657* stress-induced morphogen BolA insert NGO1556 10.02 NMB0344 Unknown, non-cytoplasmic No - NGO1796* ribosome recycling factor 1 295 20.64 NMB0187 Cytoplasm No - NGO1852* 50S ribosomal protein L7/L12 3 136289 12.56 NMB0131 Cytoplasmic membrane/ Periplasm Yes [ 25...”
- Transcriptional profiling of Neisseria meningitidis interacting with human epithelial cells in a long-term in vitro colonization model
Hey, Infection and immunity 2013 - “...of a Kanr cassette in the targeted genes NMB0342, NMB0344, NMB0345, NMB0347, and NMB0348 (see Fig. S3 in the supplemental material). The PCR products were...”
- “...meningitidis MC58 and its isogenic mutants (indicated as NMB0342, NMB0344, NMB0345, NMB0347, and NMB0348, for the genes being disrupted) for 4 h, 24 h, and 96...”
HVO_2899 hypothetical protein from Haloferax volcanii DS2
32% identity, 83% coverage
asr0798 hypothetical protein from Nostoc sp. PCC 7120
31% identity, 81% coverage
- β-N-Methylamino-L-Alanine (BMAA) Causes Severe Stress in Nostoc sp. PCC 7120 Cells under Diazotrophic Conditions: A Proteomic Study
Koksharova, Toxins 2021 - “...STRING. The protein network is represented with the following 10 protein partners: arl0045 is ferredoxin; asr0798 is a hypothetical protein; alr0799 is monothiol glutaredoxin; all3791 is Ribonuclease D; gshB is Glutathione synthetase (all3859); alr3798 is Glutathione S-transferase; alr2204 and all0737 are Thioredoxin reductases; all4873 is Glutaredoxin-3;...”
- Structural Determinants and Their Role in Cyanobacterial Morphogenesis
Springstein, Life (Basel, Switzerland) 2020 - “...Cell elongation MreD N/A Synpcc7942_0298 (ABB56330.1) All0085 (BAB77609.1) Cell elongation BolA Ssr3122 (WP_010871705.1) Synpcc7942_1146 (ABB57176.1) Asr0798 (WP_010994972.1) Cell elongation CikA Slr1969 (WP_010872820.1) Synpcc7942_0644 (WP_011243194.1) All1688 (WP_010995857.1) Circadian rhythm PBP1 Sll0002 (WP_010873436.1) Synpcc7942_2000 (WP_011378270.1) Alr5101 (WP_010999227.1) Cell wall synthesis PBP2 Slr1710 (WP_010871874.1) Synpcc7942_0785 (ABB56817.1) Alr4579 (WP_010998711.1) Cell...”
For advice on how to use these tools together, see
Interactive tools for functional annotation of bacterial genomes.
The PaperBLAST database links 793,807 different protein sequences to 1,259,118 scientific articles. Searches against EuropePMC were last performed on March 13 2025.
PaperBLAST builds a database of protein sequences that are linked
to scientific articles. These links come from automated text searches
against the articles in EuropePMC
and from manually-curated information from GeneRIF, UniProtKB/Swiss-Prot,
BRENDA,
CAZy (as made available by dbCAN),
BioLiP,
CharProtDB,
MetaCyc,
EcoCyc,
TCDB,
REBASE,
the Fitness Browser,
and a subset of the European Nucleotide Archive with the /experiment tag.
Given this database and a protein sequence query,
PaperBLAST uses protein-protein BLAST
to find similar sequences with E < 0.001.
To build the database, we query EuropePMC with locus tags, with RefSeq protein
identifiers, and with UniProt
accessions. We obtain the locus tags from RefSeq or from MicrobesOnline. We use
queries of the form "locus_tag AND genus_name" to try to ensure that
the paper is actually discussing that gene. Because EuropePMC indexes
most recent biomedical papers, even if they are not open access, some
of the links may be to papers that you cannot read or that our
computers cannot read. We query each of these identifiers that
appears in the open access part of EuropePMC, as well as every locus
tag that appears in the 500 most-referenced genomes, so that a gene
may appear in the PaperBLAST results even though none of the papers
that mention it are open access. We also incorporate text-mined links
from EuropePMC that link open access articles to UniProt or RefSeq
identifiers. (This yields some additional links because EuropePMC
uses different heuristics for their text mining than we do.)
For every article that mentions a locus tag, a RefSeq protein
identifier, or a UniProt accession, we try to select one or two
snippets of text that refer to the protein. If we cannot get access to
the full text, we try to select a snippet from the abstract, but
unfortunately, unique identifiers such as locus tags are rarely
provided in abstracts.
PaperBLAST also incorporates manually-curated protein functions:
- Proteins from NCBI's RefSeq are included if a
GeneRIF
entry links the gene to an article in
PubMed®.
GeneRIF also provides a short summary of the article's claim about the
protein, which is shown instead of a snippet.
- Proteins from Swiss-Prot (the curated part of UniProt)
are included if the curators
identified experimental evidence for the protein's function (evidence
code ECO:0000269). For these proteins, the fields of the Swiss-Prot entry that
describe the protein's function are shown (with bold headings).
- Proteins from BRENDA,
a curated database of enzymes, are included if they are linked to a paper in PubMed
and their full sequence is known.
- Every protein from the non-redundant subset of
BioLiP,
a database
of ligand-binding sites and catalytic residues in protein structures, is included. Since BioLiP itself
does not include descriptions of the proteins, those are taken from the
Protein Data Bank.
Descriptions from PDB rely on the original submitter of the
structure and cannot be updated by others, so they may be less reliable.
(For SitesBLAST and Sites on a Tree, we use a larger subset of BioLiP so that every
ligand is represented among a group of structures with similar sequences, but for
PaperBLAST, we use the non-redundant set provided by BioLiP.)
- Every protein from EcoCyc, a curated
database of the proteins in Escherichia coli K-12, is included, regardless
of whether they are characterized or not.
- Proteins from the MetaCyc metabolic pathway database
are included if they are linked to a paper in PubMed and their full sequence is known.
- Proteins from the Transport Classification Database (TCDB)
are included if they have known substrate(s), have reference(s),
and are not described as uncharacterized or putative.
(Some of the references are not visible on the PaperBLAST web site.)
- Every protein from CharProtDB,
a database of experimentally characterized protein annotations, is included.
- Proteins from the CAZy database of carbohydrate-active enzymes
are included if they are associated with an Enzyme Classification number.
Even though CAZy does not provide links from individual protein sequences to papers,
these should all be experimentally-characterized proteins.
- Proteins from the REBASE database
of restriction enzymes are included if they have known specificity.
- Every protein with an evidence-based reannotation (based on mutant phenotypes)
in the Fitness Browser is included.
- Sequence-specific transcription factors (including sigma factors and DNA-binding response regulators)
with experimentally-determined DNA binding sites from the
PRODORIC database of gene regulation in prokaryotes.
- Putative transcription factors from RegPrecise
that have manually-curated predictions for their binding sites. These predictions are based on
conserved putative regulatory sites across genomes that contain similar transcription factors,
so PaperBLAST clusters the TFs at 70% identity and retains just one member of each cluster.
- Coding sequence (CDS) features from the
European Nucleotide Archive (ENA)
are included if the /experiment tag is set (implying that there is experimental evidence for the annotation),
the nucleotide entry links to paper(s) in PubMed,
and the nucleotide entry is from the STD data class
(implying that these are targeted annotated sequences, not from shotgun sequencing).
Also, to filter out genes whose transcription or translation was detected, but whose function
was not studied, nucleotide entries or papers with more than 25 such proteins are excluded.
Descriptions from ENA rely on the original submitter of the
sequence and cannot be updated by others, so they may be less reliable.
Except for GeneRIF and ENA,
the curated entries include a short curated
description of the protein's function.
For entries from BioLiP, the protein's function may not be known beyond binding to the ligand.
Many of these entries also link to articles in PubMed.
For more information see the
PaperBLAST paper (mSystems 2017)
or the code.
You can download PaperBLAST's database here.
Changes to PaperBLAST since the paper was written:
- November 2023: incorporated PRODORIC and RegPrecise. Many PRODORIC entries were not linked to a protein sequence (no UniProt identifier), so we added this information.
- February 2023: BioLiP changed their download format. PaperBLAST now includes their non-redundant subset. SitesBLAST and Sites on a Tree use a larger non-redundant subset that ensures that every ligand is represented within each cluster. This should ensure that every binding site is represented.
- June 2022: incorporated some coding sequences from ENA with the /experiment tag.
- March 2022: incorporated BioLiP.
- April 2020: incorporated TCDB.
- April 2019: EuropePMC now returns table entries in their search results. This has expanded PaperBLAST's database, but most of the new entries are of low relevance, and the resulting snippets are often just lists of locus tags with annotations.
- February 2018: the alignment page reports the conservation of the hit's functional sites (if available from from Swiss-Prot or UniProt)
- January 2018: incorporated BRENDA.
- December 2017: incorporated MetaCyc, CharProtDB, CAZy, REBASE, and the reannotations from the Fitness Browser.
- September 2017: EuropePMC no longer returns some table entries in their search results. This has shrunk PaperBLAST's database, but has also reduced the number of low-relevance hits.
Many of these changes are described in Interactive tools for functional annotation of bacterial genomes.
PaperBLAST cannot provide snippets for many of the papers that are
published in non-open-access journals. This limitation applies even if
the paper is marked as "free" on the publisher's web site and is
available in PubmedCentral or EuropePMC. If a journal that you publish
in is marked as "secret," please consider publishing elsewhere.
Many important articles are missing from PaperBLAST, either because
the article's full text is not in EuropePMC (as for many older
articles), or because the paper does not mention a protein identifier such as a locus tag, or because of PaperBLAST's heuristics. If you notice an
article that characterizes a protein's function but is missing from
PaperBLAST, please notify the curators at UniProt
or add an entry to GeneRIF.
Entries in either of these databases will eventually be incorporated
into PaperBLAST. Note that to add an entry to UniProt, you will need
to find the UniProt identifier for the protein. If the protein is not
already in UniProt, you can ask them to create an entry. To add an
entry to GeneRIF, you will need an NCBI Gene identifier, but
unfortunately many prokaryotic proteins in RefSeq do not have
corresponding Gene identifers.
References
PaperBLAST: Text-mining papers for information about homologs.
M. N. Price and A. P. Arkin (2017). mSystems, 10.1128/mSystems.00039-17.
Europe PMC in 2017.
M. Levchenko et al (2017). Nucleic Acids Research, 10.1093/nar/gkx1005.
Gene indexing: characterization and analysis of NLM's GeneRIFs.
J. A. Mitchell et al (2003). AMIA Annu Symp Proc 2003:460-464.
UniProt: the universal protein knowledgebase.
The UniProt Consortium (2016). Nucleic Acids Research, 10.1093/nar/gkw1099.
BRENDA in 2017: new perspectives and new tools in BRENDA.
S. Placzek et al (2017). Nucleic Acids Research, 10.1093/nar/gkw952.
The EcoCyc database: reflecting new knowledge about Escherichia coli K-12.
I. M. Keeseler et al (2016). Nucleic Acids Research, 10.1093/nar/gkw1003.
The MetaCyc database of metabolic pathways and enzymes.
R. Caspi et al (2018). Nucleic Acids Research, 10.1093/nar/gkx935.
CharProtDB: a database of experimentally characterized protein annotations.
R. Madupu et al (2012). Nucleic Acids Research, 10.1093/nar/gkr1133.
The carbohydrate-active enzymes database (CAZy) in 2013.
V. Lombard et al (2014). Nucleic Acids Research, 10.1093/nar/gkt1178.
The Transporter Classification Database (TCDB): recent advances
M. H. Saier, Jr. et al (2016). Nucleic Acids Research, 10.1093/nar/gkv1103.
REBASE - a database for DNA restriction and modification: enzymes, genes and genomes.
R. J. Roberts et al (2015). Nucleic Acids Research, 10.1093/nar/gku1046.
Deep annotation of protein function across diverse bacteria from mutant phenotypes.
M. N. Price et al (2016). bioRxiv, 10.1101/072470.
by Morgan Price,
Arkin group
Lawrence Berkeley National Laboratory