PaperBLAST
PaperBLAST Hits for OKFHMN_12955 (82 a.a., MRIEICIAKE...)
Show query sequence
>OKFHMN_12955
MRIEICIAKEKMTKMPNGAVDALKEELTRRISKRYDDVEVIVKATSNDGLSVTRTADKDS
AKTFVQETLKDTWESSDEWFVH
Running BLASTp...
Found 18 similar proteins in the literature:
Z3916 No description from Escherichia coli O157:H7 EDL933
98% identity, 100% coverage
- Clonal and antigenic analysis of serogroup A Neisseria meningitidis with particular reference to epidemiological features of epidemic meningitis in the People's Republic of China
Wang, Infection and immunity 1992 - “...Z3921, Z4102, Z4735 Z3911, Z3912, Z3913, Z3914, Z3915, Z3916, Z3920, Z3922, Z3923, Z3924, Z3925, Z3926, Z3927 Israel (6) 1987-1989 USSR (10) 1969-1971 E 1977...”
SF1879 orf, conserved hypothetical protein from Shigella flexneri 2a str. 301
96% identity, 100% coverage
Z3305 unknown protein encoded within prophage CP-933V from Escherichia coli O157:H7 EDL933
ECs2939 hypothetical protein from Escherichia coli O157:H7 str. Sakai
95% identity, 100% coverage
c3144 DNA-damage-inducible protein I from Escherichia coli CFT073
99% identity, 83% coverage
Z2152 No description from Escherichia coli O157:H7 EDL933
82% identity, 62% coverage
Z2083 unknown protein encoded within CP-933O from Escherichia coli O157:H7 EDL933
82% identity, 64% coverage
ECs2153 putative damage-inducible protein from Escherichia coli O157:H7 str. Sakai
82% identity, 87% coverage
ROD_25751 putative prophage damage-inducible protein from Citrobacter rodentium ICC168
55% identity, 98% coverage
KP1_2061 DNA-damage-inducible protein I from Klebsiella pneumoniae NTUH-K2044
46% identity, 80% coverage
STM2621 Gifsy-1 prophage from Salmonella typhimurium LT2
STM1019 Gifsy-2 prophage from Salmonella typhimurium LT2
STM14_1156, STMUK_0985 DinI family protein from Salmonella enterica subsp. enterica serovar Typhimurium str. 14028S
42% identity, 98% coverage
- Toxicogenomic analysis incorporating operon-transcriptional coupling and toxicant concentration-expression response: analysis of MX-treated Salmonella
Ward, BMC bioinformatics 2007 - “...ybfE 0.80643 0.1 0.9 0.9 recG 0.91485 1 0.9 1 STM0925 0.97301 1 0.9 1 STM2621 No Data ssb No Data yjiW 1.8 1.4 1.9 *The p -value was determined by a Cyber-T t-test of control to the high concentration of MX. Functional analyses Functional and...”
- The pKO2 linear plasmid prophage of Klebsiella oxytoca
Casjens, Journal of bacteriology 2004 - “...STM1019 protein (identical to the phage Gifsy-1 gene STM2621 and prophage Sti1 STY1032 proteins); (iv) plasmid TP110-encoded ImpC protein; (v) phage PY54 gene...”
- “...(58, 62), and Fels-2, Gifsy-1, and Gifsy-2 (genes STM2731, STM2621, and STM1019, respectively) by the LexA repressor (11); control of the Mu Pmom promoter by...”
- Toxicogenomic analysis incorporating operon-transcriptional coupling and toxicant concentration-expression response: analysis of MX-treated Salmonella
Ward, BMC bioinformatics 2007 - “...dinP 0.00003 3.7 4.8 4.8 sulA 0.00004 4.4 7.6 12.6 umuD 0.00004 5.2 8.3 14.3 STM1019 0.00005 4.8 8.4 8.9 dinI 0.00016 8.7 14 16.9 yebG 0.00019 5.5 8.8 13.5 lexA 0.00035 2.9 3.5 4 STM1309 0.00102 -1.3 1.9 2.5 polB 0.00103 2.3 3.1 2.8 yigN...”
- The pKO2 linear plasmid prophage of Klebsiella oxytoca
Casjens, Journal of bacteriology 2004 - “...Orf6 protein [50]); (iii) the phage Gifsy-2 gene STM1019 protein (identical to the phage Gifsy-1 gene STM2621 and prophage Sti1 STY1032 proteins); (iv) plasmid...”
- “...and Fels-2, Gifsy-1, and Gifsy-2 (genes STM2731, STM2621, and STM1019, respectively) by the LexA repressor (11); control of the Mu Pmom promoter by the OxyR...”
- Genome-Wide Identification and Expression Analysis of SOS Response Genes in Salmonella enterica Serovar Typhimurium
Mérida-Floriano, Cells 2021 - “...identified prophage genes known to be regulated by LexA, such as the dinI homologues STM14_3210, STM14_1156 and STM14_1439 encoded by Gifsy-1, Gifsy-2 and Gifsy-3, respectively [ 46 ]. Other prophage-borne candidates identified in our analysis were STM14_3214 and STM14_1432, which encode phage replication proteins of Gifsy-1...”
- “...1 44 8.71 STM14_3002 cysP 15.47 1 65 18.16 STM14_4846 nlhH 15.11 1 13 7.63 STM14_1156 dinI Gifsy-2 14.46 1 19 3.92 STM14_0926 uvrB 14.13 1 73 5.66 STM14_4752 uvrD 13.87 1 105 11.48 STM14_0369 dinP *** 11.30 1 15 6.80 STM14_5112 uvrA 11.16 1 80...”
- Transcriptional Profiling of a Cross-Protective Salmonella enterica serovar Typhimurium UK-1 dam Mutant Identifies a Set of Genes More Transcriptionally Active Compared to Wild-Type, and Stably Transcribed across Biologically Relevant Microenvironments
Miller, Pathogens (Basel, Switzerland) 2014 - “...to gain insight into probable function of identified hypothetical proteins, a BLAST analysis was performed. STMUK_0985 had 100% nucleotide sequence identity with a gene encoding a DNA damage inducible protein whereas its neighbor STMUK_0986 had 100% identity with a gene encoding a Gifsy-2 bacteriophage protein. STMUK_1849...”
- “...5.78 NP Putative ribonucleoprotein-related protein Virulence STMUK_1011 16.94 3.11 13.26 13.73 Attachment/invasion protein Hypothetical Proteins STMUK_0985 2.54 4.97 2.98 1.04 Hypothetical protein STMUK_0986 4.54 3.45 10.26 1.83 Hypothetical protein STMUK_1493 2.50 4.11 3.12 2.788 Putative outer membrane protein STMUK_1849 5.81 13.81 4.77 28.65 Hypothetical protein STMUK_2239...”
STY1032 putative damage-inducible protein from Salmonella enterica subsp. enterica serovar Typhi str. CT18
t1908 putative damage-inducible protein from Salmonella enterica subsp. enterica serovar Typhi Ty2
42% identity, 98% coverage
- The pKO2 linear plasmid prophage of Klebsiella oxytoca
Casjens, Journal of bacteriology 2004 - “...to the phage Gifsy-1 gene STM2621 and prophage Sti1 STY1032 proteins); (iv) plasmid TP110-encoded ImpC protein; (v) phage PY54 gene 56 protein; (vi) E. coli...”
- A non-redundant microarray of genes for two related bacteria
Porwollik, Nucleic acids research 2003 - “...its peak only after 30 min, exempli(R)ed by STY1032 and STY1033. Overall, this peroxide treatment experiment was consistent with previous data and, in addition,...”
- Staphylococcus aureus Isolated From Retail Meat and Meat Products in China: Incidence, Antibiotic Resistance and Genetic Diversity
Wu, Frontiers in microbiology 2018 - “...t8619 (1), t286 (1), t13819 (1), t693 (1), t591 (1), t899 (1), t17635 * (1), t1908 (1), t9632 (1), t701 (1) ST573 (1) t345 (1) ST1920 (1) t286 (1) ST4446 * (2) t127 (2) ST4477 * (1) t1491 (1) ST4473 * (1) t127 (1) ST4455 *...”
STM14_1439, STM14_3210, STMMW_26401, STMUK_2656 DinI family protein from Salmonella enterica subsp. enterica serovar Typhimurium str. D23580
42% identity, 98% coverage
- Genome-Wide Identification and Expression Analysis of SOS Response Genes in Salmonella enterica Serovar Typhimurium
Mérida-Floriano, Cells 2021 - “...genes known to be regulated by LexA, such as the dinI homologues STM14_3210, STM14_1156 and STM14_1439 encoded by Gifsy-1, Gifsy-2 and Gifsy-3, respectively [ 46 ]. Other prophage-borne candidates identified in our analysis were STM14_3214 and STM14_1432, which encode phage replication proteins of Gifsy-1 and Gifsy-3,...”
- “...27 14.48, 7.94 STM14_3568 gudD 5.43 1 0 16.97 STM14_3405 yqaB 5.32 1 12 16.26 STM14_1439 dinI Gifsy-3 4.56 1 19 5.26 STM14_2752 yejK 4.54 1 102 20.89 STM14_2422 umuC ** 3.91 1 -- 4.38 STM14_2753 yejL 3.74 1 57 20.89 STM14_2648 thiM 3.62 1 177...”
- “...also identified prophage genes known to be regulated by LexA, such as the dinI homologues STM14_3210, STM14_1156 and STM14_1439 encoded by Gifsy-1, Gifsy-2 and Gifsy-3, respectively [ 46 ]. Other prophage-borne candidates identified in our analysis were STM14_3214 and STM14_1432, which encode phage replication proteins of...”
- “...12.97, 7.72, 8.82 STM14_3417 recA 35.71 1 64 5.92 STM14_1215 sulA 26.70 1 21 1.49 STM14_3210 dinI Gifsy-1 23.20 1 19 4.55 STM14_1331 dinI 22.82 2 19, 42 4.00, 17.60 STM14_2423 umuD ** 20.10 1 15 4.38 STM14_4775 yigN 19.46 1 44 8.71 STM14_3002 cysP 15.47...”
- Characterization of the Prophage Repertoire of African Salmonella Typhimurium ST313 Reveals High Levels of Spontaneous Induction of Novel Phage BTP1
Owen, Frontiers in microbiology 2017 - “...a single nucleotide substitution (CT position 2790162) was introduced into the promoter controlling dinI ( STMMW_26401 )- gfoA(STMMW_26391) transcription (P dinI - gfoA ) of D23580 by a protocol based on the principle described by Blank et al. (2011) . Due to the MDR phenotype of...”
- Transcriptional Profiling of a Cross-Protective Salmonella enterica serovar Typhimurium UK-1 dam Mutant Identifies a Set of Genes More Transcriptionally Active Compared to Wild-Type, and Stably Transcribed across Biologically Relevant Microenvironments
Miller, Pathogens (Basel, Switzerland) 2014 - “...STMUK_2239, STMUK_2240, rtcB and yeeA did not reveal any similarities with genes of known functions. STMUK_2656 did not have a clear similarity with any one gene as it had very high identity with multiple genes of unrelated function. pathogens-03-00417-t001_Table 1 Table 1 Gene Set A. Functional...”
- “...15.57 28.49 1.27 59.26 Putative inner membrane protein STMUK_2655 4.77 4.67 3.39 1.76 Hypothetical protein STMUK_2656 5.04 12.62 12.65 26.71 Hypothetical protein rtcB 2.61 2.35 1.41 NP Putative cytoplasmic protein yeeA 4.79 3.61 3.40 3.42 Putative inner membrane protein Plasmid ccdA 7.39 4.61 NP NP Toxin...”
BHE81_21170 DNA damage-inducible protein I from Klebsiella sp. AqSCr
44% identity, 99% coverage
plu1815 DNA damage-inducible protein I from Photorhabdus luminescens subsp. laumondii TTO1
48% identity, 98% coverage
DinI / b1061 DNA damage-inducible protein I from Escherichia coli K-12 substr. MG1655 (see 11 papers)
dinI DNA-damage-inducible protein I from Escherichia coli W3110 (see 6 papers)
c1328 DNA damage-inducible protein I from Escherichia coli CFT073
NP_415579 DNA damage-inducible protein I from Escherichia coli str. K-12 substr. MG1655
b1061 DNA damage-inducible protein I from Escherichia coli str. K-12 substr. MG1655
SF1067 damage-inducible protein I from Shigella flexneri 2a str. 301
S1145 damage-inducible protein I from Shigella flexneri 2a str. 2457T
Z1698 putative damage induced protein I from Escherichia coli O157:H7 EDL933
EDL933_1637, EDL933_RS07940 DNA damage-inducible protein I from Escherichia coli O157:H7 str. EDL933
45% identity, 99% coverage
- Fortieth Annual Meeting February 17-21, 1996 Baltimore Convention Center Baltimore, Maryland : Tuesday Symposia and Posters, Part V
, Biophysical journal 1996 - Two modes of binding of DinI to RecA filament provide a new insight into the regulation of SOS response by DinI protein.
Galkin, Journal of molecular biology 2011 - GeneRIF: DinI either binds to the C-terminus of a RecA protomer or DinI resides deeply in the groove of the RecA filament, with its negatively charged C-terminal helix proximal to the L2 loop of RecA.
- The DinI and RecX proteins are competing modulators of RecA function.
Lusetti, The Journal of biological chemistry 2004 (PubMed)- GeneRIF: DinI and RecX are competing modulators of RecA function
- The Mutant βE202K Sliding Clamp Protein Impairs DNA Polymerase III Replication Activity
Homiski, Journal of bacteriology 2021 (secret) - NfiS, a species-specific regulatory noncoding RNA of Pseudomonas stutzeri, enhances oxidative stress tolerance in Escherichia coli
Hu, AMB Express 2019 - “...4.69 Transcription termination factor 1.18E03 29.28 b3096 mzrA 2.50 Modulator of EnvZ/OmpR regulon 1.22E03 29.21 b1061 dinI 1.93 DNA damage-inducible protein I 1.23E03 29.20 b3060 ttdR 1.06 Transcriptional activator of ttdABT 1.28E03 29.11 b1125 potB 1.05 Spermidine/putrescine ABC transporter permease 1.53E03 28.76 b0920 elyC 2.64 Envelope...”
- In vitro transcription profiling of the σS subunit of bacterial RNA polymerase: re-definition of the σS regulon and identification of σS-specific promoter sequence elements
Maciag, Nucleic acids research 2011 - “...inhibitor of cell division b0958 2.04 LexA ( 81 ) dinI AP endonuclease, SOS response b1061 2.18 LexA ( 82 ); upregulated in an rpoS mutant derivative of OH157:H7 EDL 933 ( 19 ) Multifunctional operons cvpA Colicin V production; in cvpA-purF-ubiX operon b2313 1.85 PurR...”
- The HU regulon is composed of genes responding to anaerobiosis, acid stress, high osmolarity and SOS induction
Oberto, PloS one 2009 - “...11.39 LexA repressed suppressor of lon , inhibits cell division and ftsZ ring formation dinI b1061 dinI 1 1.09 0.72 10.61 1 1.52 1.07 7.34 1 1.25 1.09 14.51 LexA repressed damage-inducible protein I xisE b1141 ymfH-xisE-intE 1 2.45 0.74 34.01 1 2.34 1.41 7.31 1...”
- Genome-wide transcriptional responses of Escherichia coli K-12 to continuous osmotic and heat stresses
Gunasekera, Journal of bacteriology 2008 - “...(b1183 and b1184) (mutagenic repair pathway), dinI (b1061) and dinD (b3645) (encode damage-inducible proteins), and ykfG (b0247) (encodes a Downloaded from...”
- A semi-supervised method for predicting transcription factor-gene interactions in Escherichia coli
Ernst, PLoS computational biology 2008 - “...typhimurium [67] PhoB b4068, yjcH 1 NagC b2677, proV 1 FhlA b1924, fliD 1 LexA b1061, dinI 1 Yes Yes Gel shift assay and site-directed mutagenesis [68] ; ChIP-chip evidence [12] OxyR b4367, fhuF 1 DNaseI footprinting evidence [69] SoxS b2530, iscS 1 GadE b3506, slp...”
- Analysis of global gene expression and double-strand-break formation in DNA adenine methyltransferase- and mismatch repair-deficient Escherichia coli
Robbins-Manke, Journal of bacteriology 2005 - “...sulA (b0958) yebG (b1848) ruvA (b1861) ruvB (b1860) dinI (b1061) uvrA (b4058) uvrB (b0779) chol ydjQ (b1741) dam mutS/WT VOL. 187, 2005 GENE EXPRESSION AND DNA...”
- The pKO2 linear plasmid prophage of Klebsiella oxytoca
Casjens, Journal of bacteriology 2004 - “...protein; (ix) E. coli K-12 dinI protein (gene b1061; identical proteins are encoded by the non-prophage-associated E. coli EDL933 gene Z1698 and Shigella...”
- “...proteins are 85 and 83% identical to the b1061 protein, respectively); (x) Serratia marcescens dinI protein; (xi) Gifsy-2 gene STM1056 protein; and (xi) Stm6...”
- More
- The pKO2 linear plasmid prophage of Klebsiella oxytoca
Casjens, Journal of bacteriology 2004 - “...E. coli EDL933 gene Z1698 and Shigella flexneri 301 gene SF1067; S. enterica LT2 gene STM1162 and CT18 gene STY1200 proteins are 85 and 83% identical to the...”
- Addendum
, Open forum infectious diseases 2019 - A distinct regulatory sequence is essential for the expression of a subset of nle genes in attaching and effacing Escherichia coli
García-Angulo, Journal of bacteriology 2012 - “...ECs2715 (espFu)* Other function Z4595 (mdh)* Z1698 (dinI)* ECs4109 (mdh)* E2348C_3507 (mdh)* Unknown function Z1485** ECs1230** E2348C_1441 E2348C_1263*...”
- The pKO2 linear plasmid prophage of Klebsiella oxytoca
Casjens, Journal of bacteriology 2004 - “...are encoded by the non-prophage-associated E. coli EDL933 gene Z1698 and Shigella flexneri 301 gene SF1067; S. enterica LT2 gene STM1162 and CT18 gene STY1200...”
- Transcriptomic and proteomic analysis of the virulence inducing effect of ciprofloxacin on enterohemorrhagic Escherichia coli
Kijewski, PloS one 2024 - “...inhibitor SulA 4.6 -1.4 -1.3 EDL933_RS06545 EDL933_1330 yccM 4Fe-4S binding protein 5.7 --- --- EDL933_RS07940 EDL933_1637 dinI DNA-damage-inducible protein I 4.2 10.1 12.0 EDL933_RS09140 EDL933_1877 umuD Protein UmuD 3.9 --- --- EDL933_RS09145 EDL933_1878 umuC DNA polymerase V subunit UmuC 2.5 --- --- EDL933_RS13800 EDL933_2821 yebG DNA...”
- “...division inhibitor SulA 4.6 -1.4 -1.3 EDL933_RS06545 EDL933_1330 yccM 4Fe-4S binding protein 5.7 --- --- EDL933_RS07940 EDL933_1637 dinI DNA-damage-inducible protein I 4.2 10.1 12.0 EDL933_RS09140 EDL933_1877 umuD Protein UmuD 3.9 --- --- EDL933_RS09145 EDL933_1878 umuC DNA polymerase V subunit UmuC 2.5 --- --- EDL933_RS13800 EDL933_2821 yebG...”
GW13_PRO1056, STM14_1331 DNA damage-inducible protein I from Salmonella enterica subsp. enterica serovar Typhimurium str. 14028S
STM1162 DNA damage-inducible protein I, inhibits UmuD processing from Salmonella typhimurium LT2
42% identity, 100% coverage
- Genome-Wide Identification and Expression Analysis of SOS Response Genes in Salmonella enterica Serovar Typhimurium
Mérida-Floriano, Cells 2021 - “...64 5.92 STM14_1215 sulA 26.70 1 21 1.49 STM14_3210 dinI Gifsy-1 23.20 1 19 4.55 STM14_1331 dinI 22.82 2 19, 42 4.00, 17.60 STM14_2423 umuD ** 20.10 1 15 4.38 STM14_4775 yigN 19.46 1 44 8.71 STM14_3002 cysP 15.47 1 65 18.16 STM14_4846 nlhH 15.11 1...”
- Genotoxic, Metabolic, and Oxidative Stresses Regulate the RNA Repair Operon of Salmonella enterica Serovar Typhimurium
Kurasz, Journal of bacteriology 2018 - “...STM14_0117 STM14_0369 STM14_0926 STM14_0953 STM14_1215 STM14_1331 STM14_1589 STM14_1605 STM14_2287 STM14_2303 STM14_2304 STM14_2422 STM14_2423 STM14_2551...”
- DNA phosphorothioate modifications influence the global transcriptional response and protect DNA from double-stranded breaks
Gan, Scientific reports 2014 - “...GW13_PRO0297 Protein YebF 3.39 0.86 0.66 GW13_PRO0298 FIG004088: inner membrane protein YebE 3.23 0.82 1.08 GW13_PRO1056 DNA-damage-inducible protein I 4.06 0.35 0.70 GW13_PRO3190 DNA-damage-inducible protein F 2.41 0.46 0.64 GW13_PRO0041 Cell division inhibitor SulA 3.33 0.15 0.03 GW13_PRO1798 Regulatory protein RecX 3.55 2.33 2.41 GW13_PRO1799 RecA...”
- The pKO2 linear plasmid prophage of Klebsiella oxytoca
Casjens, Journal of bacteriology 2004 - “...dinI (6) and S. enterica serovar Typhimurium LT2 gene STM1162 (70). Many of the homologs are phage encoded; for example, at least six of the eight homologues...”
- “...Shigella flexneri 301 gene SF1067; S. enterica LT2 gene STM1162 and CT18 gene STY1200 proteins are 85 and 83% identical to the b1061 protein, respectively); (x)...”
STY1200 damage-inducible protein from Salmonella enterica subsp. enterica serovar Typhi str. CT18
40% identity, 100% coverage
YPO1586 DNA-damage-inducible protein I from Yersinia pestis CO92
YPTB2483 DNA-damage-inducible protein I from Yersinia pseudotuberculosis IP 32953
39% identity, 98% coverage
For advice on how to use these tools together, see
Interactive tools for functional annotation of bacterial genomes.
The PaperBLAST database links 793,807 different protein sequences to 1,259,118 scientific articles. Searches against EuropePMC were last performed on March 13 2025.
PaperBLAST builds a database of protein sequences that are linked
to scientific articles. These links come from automated text searches
against the articles in EuropePMC
and from manually-curated information from GeneRIF, UniProtKB/Swiss-Prot,
BRENDA,
CAZy (as made available by dbCAN),
BioLiP,
CharProtDB,
MetaCyc,
EcoCyc,
TCDB,
REBASE,
the Fitness Browser,
and a subset of the European Nucleotide Archive with the /experiment tag.
Given this database and a protein sequence query,
PaperBLAST uses protein-protein BLAST
to find similar sequences with E < 0.001.
To build the database, we query EuropePMC with locus tags, with RefSeq protein
identifiers, and with UniProt
accessions. We obtain the locus tags from RefSeq or from MicrobesOnline. We use
queries of the form "locus_tag AND genus_name" to try to ensure that
the paper is actually discussing that gene. Because EuropePMC indexes
most recent biomedical papers, even if they are not open access, some
of the links may be to papers that you cannot read or that our
computers cannot read. We query each of these identifiers that
appears in the open access part of EuropePMC, as well as every locus
tag that appears in the 500 most-referenced genomes, so that a gene
may appear in the PaperBLAST results even though none of the papers
that mention it are open access. We also incorporate text-mined links
from EuropePMC that link open access articles to UniProt or RefSeq
identifiers. (This yields some additional links because EuropePMC
uses different heuristics for their text mining than we do.)
For every article that mentions a locus tag, a RefSeq protein
identifier, or a UniProt accession, we try to select one or two
snippets of text that refer to the protein. If we cannot get access to
the full text, we try to select a snippet from the abstract, but
unfortunately, unique identifiers such as locus tags are rarely
provided in abstracts.
PaperBLAST also incorporates manually-curated protein functions:
- Proteins from NCBI's RefSeq are included if a
GeneRIF
entry links the gene to an article in
PubMed®.
GeneRIF also provides a short summary of the article's claim about the
protein, which is shown instead of a snippet.
- Proteins from Swiss-Prot (the curated part of UniProt)
are included if the curators
identified experimental evidence for the protein's function (evidence
code ECO:0000269). For these proteins, the fields of the Swiss-Prot entry that
describe the protein's function are shown (with bold headings).
- Proteins from BRENDA,
a curated database of enzymes, are included if they are linked to a paper in PubMed
and their full sequence is known.
- Every protein from the non-redundant subset of
BioLiP,
a database
of ligand-binding sites and catalytic residues in protein structures, is included. Since BioLiP itself
does not include descriptions of the proteins, those are taken from the
Protein Data Bank.
Descriptions from PDB rely on the original submitter of the
structure and cannot be updated by others, so they may be less reliable.
(For SitesBLAST and Sites on a Tree, we use a larger subset of BioLiP so that every
ligand is represented among a group of structures with similar sequences, but for
PaperBLAST, we use the non-redundant set provided by BioLiP.)
- Every protein from EcoCyc, a curated
database of the proteins in Escherichia coli K-12, is included, regardless
of whether they are characterized or not.
- Proteins from the MetaCyc metabolic pathway database
are included if they are linked to a paper in PubMed and their full sequence is known.
- Proteins from the Transport Classification Database (TCDB)
are included if they have known substrate(s), have reference(s),
and are not described as uncharacterized or putative.
(Some of the references are not visible on the PaperBLAST web site.)
- Every protein from CharProtDB,
a database of experimentally characterized protein annotations, is included.
- Proteins from the CAZy database of carbohydrate-active enzymes
are included if they are associated with an Enzyme Classification number.
Even though CAZy does not provide links from individual protein sequences to papers,
these should all be experimentally-characterized proteins.
- Proteins from the REBASE database
of restriction enzymes are included if they have known specificity.
- Every protein with an evidence-based reannotation (based on mutant phenotypes)
in the Fitness Browser is included.
- Sequence-specific transcription factors (including sigma factors and DNA-binding response regulators)
with experimentally-determined DNA binding sites from the
PRODORIC database of gene regulation in prokaryotes.
- Putative transcription factors from RegPrecise
that have manually-curated predictions for their binding sites. These predictions are based on
conserved putative regulatory sites across genomes that contain similar transcription factors,
so PaperBLAST clusters the TFs at 70% identity and retains just one member of each cluster.
- Coding sequence (CDS) features from the
European Nucleotide Archive (ENA)
are included if the /experiment tag is set (implying that there is experimental evidence for the annotation),
the nucleotide entry links to paper(s) in PubMed,
and the nucleotide entry is from the STD data class
(implying that these are targeted annotated sequences, not from shotgun sequencing).
Also, to filter out genes whose transcription or translation was detected, but whose function
was not studied, nucleotide entries or papers with more than 25 such proteins are excluded.
Descriptions from ENA rely on the original submitter of the
sequence and cannot be updated by others, so they may be less reliable.
Except for GeneRIF and ENA,
the curated entries include a short curated
description of the protein's function.
For entries from BioLiP, the protein's function may not be known beyond binding to the ligand.
Many of these entries also link to articles in PubMed.
For more information see the
PaperBLAST paper (mSystems 2017)
or the code.
You can download PaperBLAST's database here.
Changes to PaperBLAST since the paper was written:
- November 2023: incorporated PRODORIC and RegPrecise. Many PRODORIC entries were not linked to a protein sequence (no UniProt identifier), so we added this information.
- February 2023: BioLiP changed their download format. PaperBLAST now includes their non-redundant subset. SitesBLAST and Sites on a Tree use a larger non-redundant subset that ensures that every ligand is represented within each cluster. This should ensure that every binding site is represented.
- June 2022: incorporated some coding sequences from ENA with the /experiment tag.
- March 2022: incorporated BioLiP.
- April 2020: incorporated TCDB.
- April 2019: EuropePMC now returns table entries in their search results. This has expanded PaperBLAST's database, but most of the new entries are of low relevance, and the resulting snippets are often just lists of locus tags with annotations.
- February 2018: the alignment page reports the conservation of the hit's functional sites (if available from from Swiss-Prot or UniProt)
- January 2018: incorporated BRENDA.
- December 2017: incorporated MetaCyc, CharProtDB, CAZy, REBASE, and the reannotations from the Fitness Browser.
- September 2017: EuropePMC no longer returns some table entries in their search results. This has shrunk PaperBLAST's database, but has also reduced the number of low-relevance hits.
Many of these changes are described in Interactive tools for functional annotation of bacterial genomes.
PaperBLAST cannot provide snippets for many of the papers that are
published in non-open-access journals. This limitation applies even if
the paper is marked as "free" on the publisher's web site and is
available in PubmedCentral or EuropePMC. If a journal that you publish
in is marked as "secret," please consider publishing elsewhere.
Many important articles are missing from PaperBLAST, either because
the article's full text is not in EuropePMC (as for many older
articles), or because the paper does not mention a protein identifier such as a locus tag, or because of PaperBLAST's heuristics. If you notice an
article that characterizes a protein's function but is missing from
PaperBLAST, please notify the curators at UniProt
or add an entry to GeneRIF.
Entries in either of these databases will eventually be incorporated
into PaperBLAST. Note that to add an entry to UniProt, you will need
to find the UniProt identifier for the protein. If the protein is not
already in UniProt, you can ask them to create an entry. To add an
entry to GeneRIF, you will need an NCBI Gene identifier, but
unfortunately many prokaryotic proteins in RefSeq do not have
corresponding Gene identifers.
References
PaperBLAST: Text-mining papers for information about homologs.
M. N. Price and A. P. Arkin (2017). mSystems, 10.1128/mSystems.00039-17.
Europe PMC in 2017.
M. Levchenko et al (2017). Nucleic Acids Research, 10.1093/nar/gkx1005.
Gene indexing: characterization and analysis of NLM's GeneRIFs.
J. A. Mitchell et al (2003). AMIA Annu Symp Proc 2003:460-464.
UniProt: the universal protein knowledgebase.
The UniProt Consortium (2016). Nucleic Acids Research, 10.1093/nar/gkw1099.
BRENDA in 2017: new perspectives and new tools in BRENDA.
S. Placzek et al (2017). Nucleic Acids Research, 10.1093/nar/gkw952.
The EcoCyc database: reflecting new knowledge about Escherichia coli K-12.
I. M. Keeseler et al (2016). Nucleic Acids Research, 10.1093/nar/gkw1003.
The MetaCyc database of metabolic pathways and enzymes.
R. Caspi et al (2018). Nucleic Acids Research, 10.1093/nar/gkx935.
CharProtDB: a database of experimentally characterized protein annotations.
R. Madupu et al (2012). Nucleic Acids Research, 10.1093/nar/gkr1133.
The carbohydrate-active enzymes database (CAZy) in 2013.
V. Lombard et al (2014). Nucleic Acids Research, 10.1093/nar/gkt1178.
The Transporter Classification Database (TCDB): recent advances
M. H. Saier, Jr. et al (2016). Nucleic Acids Research, 10.1093/nar/gkv1103.
REBASE - a database for DNA restriction and modification: enzymes, genes and genomes.
R. J. Roberts et al (2015). Nucleic Acids Research, 10.1093/nar/gku1046.
Deep annotation of protein function across diverse bacteria from mutant phenotypes.
M. N. Price et al (2016). bioRxiv, 10.1101/072470.
by Morgan Price,
Arkin group
Lawrence Berkeley National Laboratory