PaperBLAST
PaperBLAST Hits for ydfZ (67 a.a., MTTYDRNRNA...)
Show query sequence
>ydfZ
MTTYDRNRNAITTGSRVMVSGTGHTGKILSIDTEGLTAEQIRRGKTVVVEGCEEKLAPLD
LIRLGMN
Running BLASTp...
Found 4 similar proteins in the literature:
YdfZ / b1541 putative selenoprotein YdfZ from Escherichia coli K-12 substr. MG1655 (see paper)
Z2156 orf, hypothetical protein from Escherichia coli O157:H7 EDL933
NP_416059 putative selenoprotein YdfZ from Escherichia coli str. K-12 substr. MG1655
P64463 Putative selenoprotein YdfZ from Escherichia coli (strain K12)
P64466 Putative selenoprotein YdfZ from Shigella flexneri
b1541 hypothetical protein from Escherichia coli str. K-12 substr. MG1655
c1967 Hypothetical protein ydfZ from Escherichia coli CFT073
ECs2150 hypothetical protein from Escherichia coli O157:H7 str. Sakai
100% identity, 100% coverage
- Pathogen invasion-dependent tissue reservoirs and plasmid-encoded antibiotic degradation boost plasmid spread in the gut
Bakkeren, eLife 2021 - “...P2 TAG7 Z2295 WITS21-cat on P2; invG ssaV Sm, Cm This study E. coli pESBL Z2156 pESBL cured None This study E. coli pESBL P2 cat T305 pESBL cured; cat on P2 Cm This study 14028S SPI-1 Sm R T2429 invG::aphT Sm, Kan This study *...”
- Comparison of strand-specific transcriptomes of enterohemorrhagic Escherichia coli O157:H7 EDL933 (EHEC) under eleven different environmental conditions including radish sprouts and cattle feces
Landstorfer, BMC genomics 2014 - “...(3) 0.5 (245) 3.9 (743) 0.3 (52) 5.6 (5816) 1.3 (45) 4.8 (4) 2.0 (8) Z2156 hypothetical protein spinach 1 (17) 1.1 (4) 4.7 (0) 4.7 (0) 0.6 (10) 0.9 (8) 4.7 (0) 4.4 (221) 5.5 (448) 1.2 (23) 4.7 (0) Z3271 hypothetical protein spinach 1...”
- Direct detection of potential selenium delivery proteins by using an Escherichia coli strain unable to incorporate selenium from selenite into proteins.
Lacourciere, Proceedings of the National Academy of Sciences of the United States of America 2002 - GeneRIF: N-terminus verified by Edman degradation on mature peptide
- Comparative NanoUPLC-MSE analysis between magainin I-susceptible and -resistant Escherichia coli strains
Cardoso, Scientific reports 2017 - “...Uncharacterized protein P76268 KDGR_ECOLI Downregulated 403.70 0.79 0.30 Transcriptional regulator kdgR 3002.9 Genetic Information Processing P64463 YDFZ_ECOLI Downregulated 12233.5 0.77 0.05 Putative selenoprotein ydfZ 7276.0 Metabolism P77454 GLSA1_ECOLI Downregulated 485.06 0.74 0.17 Glutaminase 1 3290.3 Metabolism P0AAS7 YBCJ_ECOLI Downregulated 854.17 0.60 0.37 Uncharacterized protein 7390.0 Uncharacterized...”
- Genomic and proteomic characterization of two strains of Shigella flexneri 2 isolated from infants' stool samples in Argentina
Torrez, BMC genomics 2022 - “...Pyridoxine/pyridoxal/pyridoxamine kinase OS= Shigella flexneri OX=623 GN= pdxK PE=3 SV=1 26.86 25 30.9 5.34 H P64466 ydfZ Putative selenoprotein YdfZ OS= Shigella flexneri OX=623 GN= ydfZ PE=3 SV=1 26.87 8 7.3 8.21 C/E P0ADZ6 rpsO 30S ribosomal protein S15 OS= Shigella flexneri OX=623 GN= rpsO PE=3...”
- The transcription regulator and c-di-GMP phosphodiesterase PdeL represses motility in Escherichia coli
Yilmaz, Journal of bacteriology 2020 (secret) - Genome-scale analysis of escherichia coli FNR reveals complex features of transcription factor binding
Myers, PLoS genetics 2013 - “...[41] [41] 1,279,003 narGHJI b1224 Nitrate Reductase 1 41.5 + + [41] [130] 1,627,208 ydfZ b1541 Unknown Function 2 41.5 + + [29] [18] , [19] 1,837,412 ynjE b1757 Molybdopterin Synthase Sulfurtransferase 1 41.5 + + None [18] 3,491,947 nirBDC- cysG b3365 Nitrite Reductase ( nirBDC...”
- 18th Congress of the European Hematology Association, Stockholm, Sweden, June 13–16, 2013
, Haematologica 2013 - The HU regulon is composed of genes responding to anaerobiosis, acid stress, high osmolarity and SOS induction
Oberto, PloS one 2009 - “...2.48 1 1.22 1.56 1.17 FA, FAec formate dehydrogenase-N, nitrate-inducible, cytochrome B556(Fdn) gamma subunit ydfZ b1541 ydfZ 1 0.17 0.96 0.39 1 3.97 3.66 1.14 1 1.22 1.23 1.39 FA hypothetical protein ynfE b1587 ynfEFGH-dmsD 1 0.18 1.73 0.26 1 2.05 2.19 0.09 1 1.66 1.19...”
- Global gene expression profiling of the asymptomatic bacteriuria Escherichia coli strain 83972 in the human urinary tract
Roos, Infection and immunity 2006 - “...c1686 b1476 b1226 b1796 b4209 b1223 ECs5443 b1797 Z2001 b1541 Z0893 b1227 b1225 b4013 b2941 b2552 c4141 b1475 b0873 b3240 b3556 b2732 b3437 b3242 b0872 c3046...”
- Sustainable Practices and Microbial Quality of Cattle Offal in Slaughterhouses
Cândido, Veterinary sciences 2025 (no snippet) - Global transcriptional response of Escherichia coli O157:H7 to growth transitions in glucose minimal medium
Bergholz, BMC microbiology 2007 - “...ECs2027 ydcI putative transcriptional regulator LYSR-type -2.41 2 ECs2078 fdnG formate dehydrogenase-N, nitrate-inducible 2.30 6 ECs2150 ydfZ orf, hypothetical protein 4.70 4 ECs2293 ynfE putative oxidoreductase, major subunit 2.83 6 ECs2457 ydjY orf, hypothetical protein 2.03 6 ECs2463 ynjE putative thiosulfate sulfur transferase 3.47 1 ECs2614...”
- “...ydeI 3.74 2 ECs2614 yecH -2.94 6 ECs2146 ydeJ 2.05 2 ECs2668 yedE -2.15 1 ECs2150 ydfZ -5.10 4 ECs2669 yedF -2.78 1 ECs2281 O157 2.30 2 ECs2670 yedK 2.55 2 ECs2292 - 3.22 2 ECs2693 - 2.62 2 ECs2295 ynfG -2.81 6 ECs2785 erfK 2.37...”
t1430 conserved hypothetical protein from Salmonella enterica subsp. enterica serovar Typhi Ty2
Q7CQJ6 Cytoplasmic protein from Salmonella typhimurium (strain LT2 / SGSC1412 / ATCC 700720)
STM1509 putative cytoplasmic protein from Salmonella typhimurium LT2
79% identity, 100% coverage
- The European Union summary report on antimicrobial resistance in zoonotic and indicator bacteria from humans, animals and food in 2021-2022
European, EFSA journal. European Food Safety Authority 2024 - “...CC5 (1), CC398 spa types t011 (16), t034 (85), t899 (26), t1255 (1), t1422 (3), t1430 (3), t1580 (1), t2011 (2), t5452 (2), t10204 (2). Bovine meat (AT, 2021): CC45 spa type t095 (7), CC398 spa types t011 (4), t034 (2), CC121 spa type t898 (1),...”
- “...(1). Bovine meat (DE, 2021): CC1 spa types t174 (1), t559 (1), CC9 spa type t1430 (1), CC97 spa types t359 (1), CC130 spa type t843 mecC positive (1), CC9/CC398 spa type t899 (1), CC398 spa types t011 (4), t034 (4), t1451 (1). Pig meat (AT,...”
- The European Union Summary Report on Antimicrobial Resistance in zoonotic and indicator bacteria from humans, animals and food in 2020/2021
European, EFSA journal. European Food Safety Authority 2023 - “...5 2020 0 Sheep meat (1) 1 t1346 8 2021 0 Bovine meat (1) 1 t1430 9 * 2021 0 Bovine meat (1), pig meat (3) 4 t2112 97 * 0 Bovine meat (1) 1 t15010 97 2020 0 Sheep meat (1) 1 * The CC...”
- “...t011 (4), t034 (4), t359 (1), t559 (1), t843 mec C positive (1), t899 (1), t1430 (1), t1451 (1). Pig meat (FI, 2021): spa types: t728 St45 (1 isolate), t034 ST398 (14), t899 ST398 (1), t2741 ST398 (9), t4677 ST398 (1). Pig meat (DE, 2021) spa...”
- The European Union Summary Report on Antimicrobial Resistance in zoonotic and indicator bacteria from humans, animals and food in 2019-2020
European, EFSA journal. European Food Safety Authority 2022 - “...t011 (4 isolates), t034 (4 isolates) [4] In 2016, spatypes: t034 (3 isolates), t153 (1), t1430 (3), t2123 (2). PVL status of the t153 isolate was not reported. In 2018, spatypes: t034 CC398 (1 isolate), t1430 (1), t571 CC398 (1), t13177 (1). Belgium provided data on...”
- The European Union Summary Report on Antimicrobial Resistance in zoonotic and indicator bacteria from humans, animals and food in 2018/2019
European, EFSA journal. European Food Safety Authority 2021 - “...).5: spa types: t011 (2 isolates), t034 (1).6: spa types: t034 CC 398 (1 isolate), t1430 (1), t571 CC 398 (1), t13177 (1).7: spa type: t011 (1 isolate). In 2018, molecular typing data were reported for only 8 of 345 MRSA isolates recovered from meat, with...”
- “...types associated with the livestockassociated lineages CC398 ( spa types t034 and t571) and CC9 (t1430 and t13177) from the monitoring of broiler meat in 2018. MRSA belonging to CC9 represent a further LAMRSA lineage which is disseminated worldwide, although particularly prevalent among various species of...”
- ESKAPE Bacteria and Extended-Spectrum-β-Lactamase-Producing Escherichia coli Isolated from Wastewater and Process Water from German Poultry Slaughterhouses
Savin, Applied and environmental microbiology 2020 - “...Five of them were livestock associated and belonged to clonal complex 9 (CC9; spa types t1430 and t13177) and CC398 ( spa types t8588, t011, and t034), whereas one isolate from S1 (4.0%) was assigned to the health care-associated spa type t045 of CC5. It was...”
- “...Of the MRSA strains from slaughterhouse S2, 75.8% ( n =25) belonged to spa type t1430, whereas 24.2% were assigned to spa types t034 and t13177 (12.1% each, n =4 each). Vancomycin-resistant enterococci. The vancomycin-resistant E. faecium isolate was allocated to ST1249 and carried the vanA...”
- The European Union Summary Report on Antimicrobial Resistance in zoonotic and indicator bacteria from humans, animals and food in 2017/2018
European, EFSA journal. European Food Safety Authority 2020 - “...types associated with the LA lineages CC398 ( spa types t034 and t571) and CC9 (t1430 and t13177) from broiler meat. MRSA belonging to CC9 represent a further LAMRSA lineage which is disseminated worldwide, although particularly prevalent among various species of livestock in Asia (Cuny etal.,...”
- “...spa types: t011 (2 isolates), t034 (1). 4. spa types: t034 CC 398 (1 isolate), t1430 (1), t571 CC 398 (1), t13177 (1). 5. spa types: t011 (1). *: spa types not reported. 6.1.2 Monitoring of MRSA in animals Monitoring of MRSA in healthy foodproducing animals...”
- Prevalence and Characteristics of Antimicrobial-Resistant Staphylococcus aureus and Methicillin-Resistant Staphylococcus aureus from Retail Meat in Korea
Kim, Food science of animal resources 2020 - “...and were from countries. Two strains revealed as ST9 showed different spa types (t1939 and t1430), as well as different resistance patterns. spa typing is a DNA sequencing method for the mutation of the X-region of the Protein-A gene, and is particularly useful for distinguishing strains...”
- The European Union summary report on antimicrobial resistance in zoonotic and indicator bacteria from humans, animals and food in 2016
European, EFSA journal. European Food Safety Authority 2018 - “...2016 ). Switzerland reported two livestockassociated MRSA isolates; spa types t2123 (associated with CC398) and t1430 (associated with ST9/CC9, another LAMRSA clonal lineage). spa type t153 was also reported in broiler meat by Switzerland; t153 is a spa type that has been observed in S.aureus isolates...”
- “...active Single 458 204 (44.5%) * ARM: Atretail monitoring. a spa types: t034 (3 isolates), t1430 (3), t2123 (2), t153 (1). PantonValentine leukocidin (PVL) status of the t153 isolate was not reported. b spa types: t011 (3 isolates), t1190 (1). PVL status of the t1190 isolate...”
- More
- N-dodecanoyl-homoserine lactone influences the levels of thiol and proteins related to oxidation-reduction process in Salmonella
de, PloS one 2018 - “...ycfF Unclassified ND ND ND ND ND ND 7.127 1.107 ND ND Putative cytoplasmic protein Q7CQJ6 ydfZ Unclassified -1.164 1.244 0.884 2.134 0.262 2.329 -0.047 0.221 -0.048 0.063 Putative cytoplasmic protein Q7CQB7 yecF Unclassified -9.069 0.706 ND ND ND ND ND ND 6.772 1.342 UPF0265 protein...”
- High resolution clustering of Salmonella enterica serovar Montevideo strains using a next-generation sequencing approach
Allard, BMC genomics 2012 - “...b,c,d,e,f triphosphoribosyl-dephospho-CoA synthase citF STM0621 SEEM020_04249 C/T V/A 602 b,c,d,e,f citrate lyase alpha chain ydfZ STM1509 SEEM020_04749 G/A P 174 b putative selenium-binding protein YdfZ STM1546 SEEM020_04939 C/T L 1473 d,e,f 1) putative multidrug efflux protein, 2) hypothetical protein SeSA_A1664 SEEM020_05139 C/T L 667 a1 LysR...”
YPO1649 conserved hypothetical protein from Yersinia pestis CO92
39% identity, 93% coverage
YPTB2420 hypothetical protein from Yersinia pseudotuberculosis IP 32953
39% identity, 99% coverage
For advice on how to use these tools together, see
Interactive tools for functional annotation of bacterial genomes.
The PaperBLAST database links 793,807 different protein sequences to 1,259,118 scientific articles. Searches against EuropePMC were last performed on March 13 2025.
PaperBLAST builds a database of protein sequences that are linked
to scientific articles. These links come from automated text searches
against the articles in EuropePMC
and from manually-curated information from GeneRIF, UniProtKB/Swiss-Prot,
BRENDA,
CAZy (as made available by dbCAN),
BioLiP,
CharProtDB,
MetaCyc,
EcoCyc,
TCDB,
REBASE,
the Fitness Browser,
and a subset of the European Nucleotide Archive with the /experiment tag.
Given this database and a protein sequence query,
PaperBLAST uses protein-protein BLAST
to find similar sequences with E < 0.001.
To build the database, we query EuropePMC with locus tags, with RefSeq protein
identifiers, and with UniProt
accessions. We obtain the locus tags from RefSeq or from MicrobesOnline. We use
queries of the form "locus_tag AND genus_name" to try to ensure that
the paper is actually discussing that gene. Because EuropePMC indexes
most recent biomedical papers, even if they are not open access, some
of the links may be to papers that you cannot read or that our
computers cannot read. We query each of these identifiers that
appears in the open access part of EuropePMC, as well as every locus
tag that appears in the 500 most-referenced genomes, so that a gene
may appear in the PaperBLAST results even though none of the papers
that mention it are open access. We also incorporate text-mined links
from EuropePMC that link open access articles to UniProt or RefSeq
identifiers. (This yields some additional links because EuropePMC
uses different heuristics for their text mining than we do.)
For every article that mentions a locus tag, a RefSeq protein
identifier, or a UniProt accession, we try to select one or two
snippets of text that refer to the protein. If we cannot get access to
the full text, we try to select a snippet from the abstract, but
unfortunately, unique identifiers such as locus tags are rarely
provided in abstracts.
PaperBLAST also incorporates manually-curated protein functions:
- Proteins from NCBI's RefSeq are included if a
GeneRIF
entry links the gene to an article in
PubMed®.
GeneRIF also provides a short summary of the article's claim about the
protein, which is shown instead of a snippet.
- Proteins from Swiss-Prot (the curated part of UniProt)
are included if the curators
identified experimental evidence for the protein's function (evidence
code ECO:0000269). For these proteins, the fields of the Swiss-Prot entry that
describe the protein's function are shown (with bold headings).
- Proteins from BRENDA,
a curated database of enzymes, are included if they are linked to a paper in PubMed
and their full sequence is known.
- Every protein from the non-redundant subset of
BioLiP,
a database
of ligand-binding sites and catalytic residues in protein structures, is included. Since BioLiP itself
does not include descriptions of the proteins, those are taken from the
Protein Data Bank.
Descriptions from PDB rely on the original submitter of the
structure and cannot be updated by others, so they may be less reliable.
(For SitesBLAST and Sites on a Tree, we use a larger subset of BioLiP so that every
ligand is represented among a group of structures with similar sequences, but for
PaperBLAST, we use the non-redundant set provided by BioLiP.)
- Every protein from EcoCyc, a curated
database of the proteins in Escherichia coli K-12, is included, regardless
of whether they are characterized or not.
- Proteins from the MetaCyc metabolic pathway database
are included if they are linked to a paper in PubMed and their full sequence is known.
- Proteins from the Transport Classification Database (TCDB)
are included if they have known substrate(s), have reference(s),
and are not described as uncharacterized or putative.
(Some of the references are not visible on the PaperBLAST web site.)
- Every protein from CharProtDB,
a database of experimentally characterized protein annotations, is included.
- Proteins from the CAZy database of carbohydrate-active enzymes
are included if they are associated with an Enzyme Classification number.
Even though CAZy does not provide links from individual protein sequences to papers,
these should all be experimentally-characterized proteins.
- Proteins from the REBASE database
of restriction enzymes are included if they have known specificity.
- Every protein with an evidence-based reannotation (based on mutant phenotypes)
in the Fitness Browser is included.
- Sequence-specific transcription factors (including sigma factors and DNA-binding response regulators)
with experimentally-determined DNA binding sites from the
PRODORIC database of gene regulation in prokaryotes.
- Putative transcription factors from RegPrecise
that have manually-curated predictions for their binding sites. These predictions are based on
conserved putative regulatory sites across genomes that contain similar transcription factors,
so PaperBLAST clusters the TFs at 70% identity and retains just one member of each cluster.
- Coding sequence (CDS) features from the
European Nucleotide Archive (ENA)
are included if the /experiment tag is set (implying that there is experimental evidence for the annotation),
the nucleotide entry links to paper(s) in PubMed,
and the nucleotide entry is from the STD data class
(implying that these are targeted annotated sequences, not from shotgun sequencing).
Also, to filter out genes whose transcription or translation was detected, but whose function
was not studied, nucleotide entries or papers with more than 25 such proteins are excluded.
Descriptions from ENA rely on the original submitter of the
sequence and cannot be updated by others, so they may be less reliable.
Except for GeneRIF and ENA,
the curated entries include a short curated
description of the protein's function.
For entries from BioLiP, the protein's function may not be known beyond binding to the ligand.
Many of these entries also link to articles in PubMed.
For more information see the
PaperBLAST paper (mSystems 2017)
or the code.
You can download PaperBLAST's database here.
Changes to PaperBLAST since the paper was written:
- November 2023: incorporated PRODORIC and RegPrecise. Many PRODORIC entries were not linked to a protein sequence (no UniProt identifier), so we added this information.
- February 2023: BioLiP changed their download format. PaperBLAST now includes their non-redundant subset. SitesBLAST and Sites on a Tree use a larger non-redundant subset that ensures that every ligand is represented within each cluster. This should ensure that every binding site is represented.
- June 2022: incorporated some coding sequences from ENA with the /experiment tag.
- March 2022: incorporated BioLiP.
- April 2020: incorporated TCDB.
- April 2019: EuropePMC now returns table entries in their search results. This has expanded PaperBLAST's database, but most of the new entries are of low relevance, and the resulting snippets are often just lists of locus tags with annotations.
- February 2018: the alignment page reports the conservation of the hit's functional sites (if available from from Swiss-Prot or UniProt)
- January 2018: incorporated BRENDA.
- December 2017: incorporated MetaCyc, CharProtDB, CAZy, REBASE, and the reannotations from the Fitness Browser.
- September 2017: EuropePMC no longer returns some table entries in their search results. This has shrunk PaperBLAST's database, but has also reduced the number of low-relevance hits.
Many of these changes are described in Interactive tools for functional annotation of bacterial genomes.
PaperBLAST cannot provide snippets for many of the papers that are
published in non-open-access journals. This limitation applies even if
the paper is marked as "free" on the publisher's web site and is
available in PubmedCentral or EuropePMC. If a journal that you publish
in is marked as "secret," please consider publishing elsewhere.
Many important articles are missing from PaperBLAST, either because
the article's full text is not in EuropePMC (as for many older
articles), or because the paper does not mention a protein identifier such as a locus tag, or because of PaperBLAST's heuristics. If you notice an
article that characterizes a protein's function but is missing from
PaperBLAST, please notify the curators at UniProt
or add an entry to GeneRIF.
Entries in either of these databases will eventually be incorporated
into PaperBLAST. Note that to add an entry to UniProt, you will need
to find the UniProt identifier for the protein. If the protein is not
already in UniProt, you can ask them to create an entry. To add an
entry to GeneRIF, you will need an NCBI Gene identifier, but
unfortunately many prokaryotic proteins in RefSeq do not have
corresponding Gene identifers.
References
PaperBLAST: Text-mining papers for information about homologs.
M. N. Price and A. P. Arkin (2017). mSystems, 10.1128/mSystems.00039-17.
Europe PMC in 2017.
M. Levchenko et al (2017). Nucleic Acids Research, 10.1093/nar/gkx1005.
Gene indexing: characterization and analysis of NLM's GeneRIFs.
J. A. Mitchell et al (2003). AMIA Annu Symp Proc 2003:460-464.
UniProt: the universal protein knowledgebase.
The UniProt Consortium (2016). Nucleic Acids Research, 10.1093/nar/gkw1099.
BRENDA in 2017: new perspectives and new tools in BRENDA.
S. Placzek et al (2017). Nucleic Acids Research, 10.1093/nar/gkw952.
The EcoCyc database: reflecting new knowledge about Escherichia coli K-12.
I. M. Keeseler et al (2016). Nucleic Acids Research, 10.1093/nar/gkw1003.
The MetaCyc database of metabolic pathways and enzymes.
R. Caspi et al (2018). Nucleic Acids Research, 10.1093/nar/gkx935.
CharProtDB: a database of experimentally characterized protein annotations.
R. Madupu et al (2012). Nucleic Acids Research, 10.1093/nar/gkr1133.
The carbohydrate-active enzymes database (CAZy) in 2013.
V. Lombard et al (2014). Nucleic Acids Research, 10.1093/nar/gkt1178.
The Transporter Classification Database (TCDB): recent advances
M. H. Saier, Jr. et al (2016). Nucleic Acids Research, 10.1093/nar/gkv1103.
REBASE - a database for DNA restriction and modification: enzymes, genes and genomes.
R. J. Roberts et al (2015). Nucleic Acids Research, 10.1093/nar/gku1046.
Deep annotation of protein function across diverse bacteria from mutant phenotypes.
M. N. Price et al (2016). bioRxiv, 10.1101/072470.
by Morgan Price,
Arkin group
Lawrence Berkeley National Laboratory