PaperBLAST
PaperBLAST Hits for MCAODC_10750 (73 a.a., MTTQSSPVIT...)
Show query sequence
>MCAODC_10750
MTTQSSPVITDMKVIPVAGYDSMLLNIGGAHNAYFTRNIVVLTDNAGHTGIGEAPGGEVI
YQTLVDAIPMVLG
Running BLASTp...
Found 19 similar proteins in the literature:
SEN2806 glucarate dehydratase-related protein from Salmonella enterica subsp. enterica serovar Enteritidis str. P125109
97% identity, 16% coverage
- Whole genome sequencing provides insights into the genetic determinants of invasiveness in Salmonella Dublin
Mohammed, Epidemiology and infection 2016 - “...SEN2783), the gene encoding a probable glucarate dehydratase 2 (SEN2806), the gene encoding the outer membrane usher protein LpfC (SEN3461) and the gene 2434 M....”
- “...phosphotransferase system permease SEN0784 SEN2182* SEN2783 SEN2806 SEN3461 SEN3672 Enteritidis (PT4) Dublin (Irish isolates) Gallinarum (287/91) Cholerasuis...”
- Genomic Comparison of the Closely Related Salmonella enterica Serovars Enteritidis and Dublin
Betancor, The open microbiology journal 2012 - “...I restriction modification system protein (SEN4290), and the gene encoding a probable glucarate dehydratase 2 (SEN2806 or ygcY ). The other two genes that complete this list are mglA (SEN2182) and shdA (SEN2493), which are pseudogenes in S. Typhi CT18 and Ty2 as well as in...”
- “...ATCAACCGGTTTGTCATTCG Reverse TACCGTCCCAGTCGCCGTTG Reverse2 SEN2783 GTGAGGTATATCAACAAAAAAGACCA Forward TCCAGAGGCAATCCAGGA Forward2 TGTGCAGGCGCCGTTG Forward3 ACGGACGGGGAGCCAGG Reverse CAACCTCTTTGCGTGTATCAACC Reverse2 SEN2806 GTGCTGGTAGGCGATATTAAG Forward CTTCCCGGACGCGCGTAT Forward2 AACCTGCATTTCAGTCACTACAG Reverse SEN3461 TTTGGCACGGCTGGCGACAT Forward GAATGCCCTGCTGGTGGATT Forward2 CGTGCCGGGAACTATAACAG Forward3 AGCACCGACCCGCCCAACA Reverse GCCGCGCAAACCGTAGTTCA Reverse2 SEN3672 GGCCTGGTCACGTCTGTAAC Forward CTCTCTTTTGTCTTCGGTATCC Forward2 TATGACGGTTTGATGACAATGG Reverse SEN4290 AACGCTTGAGGATTTAATAGAA Forward CTGATTCAGTACCGTCAGTG Reverse Table...”
SL1344_2942 enolase C-terminal domain-like protein from Salmonella enterica subsp. enterica serovar Typhimurium str. SL1344
96% identity, 16% coverage
- speG Is Required for Intracellular Replication of Salmonella in Various Human Cells and Affects Its Polyamine Metabolism and Global Transcriptomes
Fang, Frontiers in microbiology 2017 - “...which are involved in the periplasmic nitrate reductase system; ygcX, ygcZ, garL, garR , and SL1344_2942, which are associated with glucarate metabolism; SL1344_3736 and SL1344_4467, which are related to the phosphotransferase system; cyoA, cyoB , and cyoC , which encode cytochrome-related proteins; and sdhA, sdhB, sdhC...”
- “...garL SL1344_3222 5-Keto-4-deoxy-D-glucarate aldolase 2.819 garR SL1344_3221 2-Hydroxy-3-oxopropionate reductase 2.760 ygcX SL1344_2941 Glucarate dehydratase 2.037 SL1344_2942 SL1344_2942 Glucarate dehydratase 1.257 Genes of Phosphotransferase System SL1344_3736 SL1344_3736 Putative PTS system protein 1.673 SL1344_4467 SL1344_4467 PTS transport system, IIB component 1.621 Genes of Cytochromes cyoA SL1344_0437 Cytochrome o...”
Z4104 No description from Escherichia coli O157:H7 EDL933
100% identity, 100% coverage
- Clonal and antigenic analysis of serogroup A Neisseria meningitidis with particular reference to epidemiological features of epidemic meningitis in the People's Republic of China
Wang, Infection and immunity 1992 - “...Z4069, Z4070, Z4071, Z4073, Z4075, Z4079, Z4081, Z4097, Z4104, Z4109, Z4115, Z4736, Z4737, Z4738, Z4739, Z4740, Z4744, Z4745, Z4747, Z4748, Z4749, Z4750 Z4752,...”
S2995 putative glucarate dehydratase from Shigella flexneri 2a str. 2457T
97% identity, 16% coverage
GudX / b2788 glucarate dehydratase-related protein from Escherichia coli K-12 substr. MG1655 (see 2 papers)
GUDX_ECOLI / Q46915 Glucarate dehydratase-related protein; GDH-RP; GlucDRP; EC 4.2.1.- from Escherichia coli (strain K12) (see paper)
gudX / RF|NP_417268 glucarate dehydratase-related protein from Escherichia coli K12
b2788 predicted glucarate dehydratase from Escherichia coli str. K-12 substr. MG1655
96% identity, 16% coverage
c3352 Glucarate dehydratase related protein from Escherichia coli CFT073
96% identity, 16% coverage
OA04_36650 enolase C-terminal domain-like protein from Pectobacterium versatile
84% identity, 15% coverage
3n6hB / A6VQF6 Crystal structure of mandelate racemase/muconate lactonizing protein from actinobacillus succinogenes 130z complexed with magnesium/sulfate
74% identity, 16% coverage
- Ligand: magnesium ion (3n6hB)
STM2960 d-glucarate dehydratase from Salmonella typhimurium LT2
SL1344_2941, STM14_3568 glucarate dehydratase from Salmonella enterica subsp. enterica serovar Typhimurium str. 14028S
65% identity, 16% coverage
- Hydrogen-stimulated carbon acquisition and conservation in Salmonella enterica serovar Typhimurium
Lamichhane-Khadka, Journal of bacteriology 2011 - “...Carbon transport and metabolism Carbohydrates STM2962 STM4077/78 STM2960 STM3557 STM2190 STM3884 STM1830/31 STM4325 STM0685 STM4074 STM4075/76 Genea VOL. 193,...”
- Salmonella serovar identification using PCR-based detection of gene presence and absence
Arrach, Journal of clinical microbiology 2008 - “...STM2767, STM2816, STM2914, STM2917, STM2922, STM2941, STM2960, STM3024, STM3026, STM3028, STM3036, STM3082, STM3120, STM3253, STM3254, STM3256, STM3257,...”
- Genome-Wide Identification and Expression Analysis of SOS Response Genes in Salmonella enterica Serovar Typhimurium
Mérida-Floriano, Cells 2021 - “...8.27 STM14_3214 -- 7.70 1 6 12.33 STM14_5094 lexA 6.86 2 6, 27 14.48, 7.94 STM14_3568 gudD 5.43 1 0 16.97 STM14_3405 yqaB 5.32 1 12 16.26 STM14_1439 dinI Gifsy-3 4.56 1 19 5.26 STM14_2752 yejK 4.54 1 102 20.89 STM14_2422 umuC ** 3.91 1 --...”
- speG Is Required for Intracellular Replication of Salmonella in Various Human Cells and Affects Its Polyamine Metabolism and Global Transcriptomes
Fang, Frontiers in microbiology 2017 - “...SL1344_2943 Glucarate transporter 3.248 garL SL1344_3222 5-Keto-4-deoxy-D-glucarate aldolase 2.819 garR SL1344_3221 2-Hydroxy-3-oxopropionate reductase 2.760 ygcX SL1344_2941 Glucarate dehydratase 2.037 SL1344_2942 SL1344_2942 Glucarate dehydratase 1.257 Genes of Phosphotransferase System SL1344_3736 SL1344_3736 Putative PTS system protein 1.673 SL1344_4467 SL1344_4467 PTS transport system, IIB component 1.621 Genes of Cytochromes...”
Z4102 putative glucarate dehydratase from Escherichia coli O157:H7 EDL933
64% identity, 16% coverage
- Clonal and antigenic analysis of serogroup A Neisseria meningitidis with particular reference to epidemiological features of epidemic meningitis in the People's Republic of China
Wang, Infection and immunity 1992 - “...Z3771, Z3786, Z3787 Z3905, Z3909 B503, Z3917, Z3921, Z4102, Z4735 Z3911, Z3912, Z3913, Z3914, Z3915, Z3916, Z3920, Z3922, Z3923, Z3924, Z3925, Z3926, Z3927...”
YgcX / b2787 D-glucarate dehydratase (EC 4.2.1.40) from Escherichia coli K-12 substr. MG1655 (see 6 papers)
gudD / P0AES2 D-glucarate dehydratase (EC 4.2.1.40) from Escherichia coli (strain K12) (see 5 papers)
GUDD_ECOLI / P0AES2 Glucarate dehydratase; GDH; GlucD; D-glucarate dehydratase; EC 4.2.1.40 from Escherichia coli (strain K12) (see 3 papers)
gudD / RF|NP_417267 glucarate dehydratase; EC 4.2.1.40 from Escherichia coli K12 (see 8 papers)
b2787 (D)-glucarate dehydratase 1 from Escherichia coli str. K-12 substr. MG1655
NP_417267 D-glucarate dehydratase from Escherichia coli str. K-12 substr. MG1655
64% identity, 16% coverage
1ec9D / P0AES2 E. Coli glucarate dehydratase bound to xylarohydroxamate (see paper)
65% identity, 16% coverage
- Ligands: magnesium ion; xylarohydroxamate (1ec9D)
RSc1079 PROBABLE GLUCARATE DEHYDRATASE PROTEIN from Ralstonia solanacearum GMI1000
60% identity, 15% coverage
- Changes in DNA methylation contribute to rapid adaptation in bacterial plant pathogen evolution
Gopalan-Nair, PLoS biology 2024 - “...6mA 6A 6A 6A 6A 6A 6A 6A 6mA 6A 6A 6mA 6A RSc1078 / RSc1079 GTAAAC upstream /gudD1 Transcription regulator / D-Glucarate dehydratase 1134729 6mA 6mA 6mA 6mA 6mA 6mA 6mA 6mA 6mA 6mA 6mA 6mA 6mA 6mA 6mA 6A 6mA 6mA 6mA 6mA 6mA...”
- “...2.62 1.94 1.80 3.29 2.96 -0.14 -0.66 0.40 0.05 0.09 0.13 0.28 -0.11 0.26 0.81 RSc1079 0.05 -0.21 -1.31 0.15 -0.98 0.22 -0.79 0.23 -0.41 0.38 -0.21 -0.39 0.40 -1.07 -1.29 -0.13 0.30 -0.01 -0.91 0.62 0.30 -0.09 -0.52 -0.13 -0.21 0.37 -0.08 0.06 0.21 -0.17...”
BCAL1043 glucarate dehydratase from Burkholderia cenocepacia J2315
K562_RS13470 glucarate dehydratase from Burkholderia cenocepacia
60% identity, 15% coverage
- Elucidation of the mechanism behind the potentiating activity of baicalin against Burkholderia cenocepacia biofilms
Slachmuylders, PloS one 2018 - “...chain - 1.7 BCAL2622 ( ppa ) Polyphosphate kinase - -1.5 Glucarate/galactarate metabolism to 2-oxo-glutarate BCAL1043 ( gudD ) Glucarate dehydratase 2.6 1.5 BCAM2511 ( garD ) Putative galactarate dehydratase 2.3 1.6 BCAM2512 5-dehydro-4-deoxyglucarate dehydratase 2.2 2.9 BCAM2514* Putative fatty aldehyde dehydrogenase 2.0 1.6 Quorum sensing...”
- Comparative transcriptomic analysis of the Burkholderia cepacia tyrosine kinase bceF mutant reveals a role in tolerance to stress, biofilm formation, and virulence
Ferreira, Applied and environmental microbiology 2013 - “...ion transport and metabolism BCAL0040 BCAL0475 BCAL0665 BCAL1043 BCAL1047 BCAL1728 BCAL2112 BCAL2458 BCAL2782 BCAL3049 BCAL3094 BCAM2626 1.4 1.2 1.5 1.3 1.2...”
- The mechanism of action of auranofin analogs in B. cenocepacia revealed by chemogenomic profiling
Maydaniuk, Microbiology spectrum 2024 - “...family protein), K562_RS12100 (acyl-CoA dehydrogenase), K562_RS01045 (Raf kinase inhibitor-like protein), K562_RS06455 (putative PHA depolymerase protein), K562_RS13470 ( gudD , glucarate dehydratase), K562_RS16220 (DUF3025 domain-containing protein), K562_RS18550 (hypothetical protein), and K562_RS28510 (hypothetical protein). Fitness values for each strain were calculated as the log 2 (reads in experimental...”
gudD / Q6FFQ2 D-glucarate dehydratase subunit (EC 4.2.1.40) from Acinetobacter baylyi (strain ATCC 33305 / BD413 / ADP1) (see paper)
GUDD_ACIAD / Q6FFQ2 Glucarate dehydratase; GDH; GlucD; EC 4.2.1.40 from Acinetobacter baylyi (strain ATCC 33305 / BD413 / ADP1) (see paper)
Q6FFQ2 glucarate dehydratase (EC 4.2.1.40) from Acinetobacter baylyi (see paper)
ACIAD0128 D-glucarate dehydratase from Acinetobacter sp. ADP1
58% identity, 16% coverage
- function: Catalyzes the dehydration of glucarate to 5-keto-4-deoxy-D- glucarate (5-kdGluc).
catalytic activity: D-glucarate = 5-dehydro-4-deoxy-D-glucarate + H2O (RHEA:14573)
cofactor: Mg(2+) - L-Hydroxyproline and d-Proline Catabolism in Sinorhizobium meliloti
Chen, Journal of bacteriology 2016 - “...using the A. baylyi enzymes D-glucarate dehydratase (ACIAD0128) and D-5-keto-4-deoxyglucarate dehydratase (ACIAD0130), which were overexpressed from E. coli as...”
- “...Alain Perret for clones carrying the A. baylyi proteins ACIAD0128 and ACIAD0130 used for synthesis of -KGSA, and Seiya Watanabe for clones carrying the P....”
- New insights into the alternative D-glucarate degradation pathway
Aghaie, The Journal of biological chemistry 2008 - “...on D-glucarate Gene ID ACIAD0127 ACIAD0128 ACIAD0130 ACIAD0131 ACIAD0244 ACIAD2275 ACIAD2417 ACIAD2876 Function D-Glucarate/D-Galactarate D-Glucarate...”
- “...are co-located on the genome (ACIAD0127, ACIAD0128, ACIAD0130, and ACIAD0131) and are respectively annotated as D-glucarate/D-galactarate permease, D-glucarate...”
3p0wB / B2UIZ1 Crystal structure of d-glucarate dehydratase from ralstonia solanacearum complexed with mg and d-glucarate
63% identity, 16% coverage
- Ligands: magnesium ion; d-glucarate (3p0wB)
gudD / P42206 D-glucarate dehydratase subunit (EC 4.2.1.40) from Pseudomonas putida (see paper)
GUDD_PSEPU / P42206 Glucarate dehydratase; GDH; GlucD; EC 4.2.1.40 from Pseudomonas putida (Arthrobacter siderocapsulatus) (see paper)
68% identity, 14% coverage
- function: Catalyzes the dehydration of glucarate to 5-keto-4-deoxy-D- glucarate (5-kdGluc)
catalytic activity: D-glucarate = 5-dehydro-4-deoxy-D-glucarate + H2O (RHEA:14573)
cofactor: Mg(2+)
subunit: Homotetramer.
3nxlC / Q39KL8 Crystal structure of glucarate dehydratase from burkholderia cepacia complexed with magnesium
59% identity, 16% coverage
- Ligands: magnesium ion; carbonate ion (3nxlC)
3nfuA / Q1QUN0 Crystal structure of probable glucarate dehydratase from chromohalobacter salexigens dsm 3043 complexed with magnesium
45% identity, 15% coverage
- Ligand: magnesium ion (3nfuA)
For advice on how to use these tools together, see
Interactive tools for functional annotation of bacterial genomes.
The PaperBLAST database links 798,070 different protein sequences to 1,261,478 scientific articles. Searches against EuropePMC were last performed on May 12 2025.
PaperBLAST builds a database of protein sequences that are linked
to scientific articles. These links come from automated text searches
against the articles in EuropePMC
and from manually-curated information from GeneRIF, UniProtKB/Swiss-Prot,
BRENDA,
CAZy (as made available by dbCAN),
BioLiP,
CharProtDB,
MetaCyc,
EcoCyc,
TCDB,
REBASE,
the Fitness Browser,
and a subset of the European Nucleotide Archive with the /experiment tag.
Given this database and a protein sequence query,
PaperBLAST uses protein-protein BLAST
to find similar sequences with E < 0.001.
To build the database, we query EuropePMC with locus tags, with RefSeq protein
identifiers, and with UniProt
accessions. We obtain the locus tags from RefSeq or from MicrobesOnline. We use
queries of the form "locus_tag AND genus_name" to try to ensure that
the paper is actually discussing that gene. Because EuropePMC indexes
most recent biomedical papers, even if they are not open access, some
of the links may be to papers that you cannot read or that our
computers cannot read. We query each of these identifiers that
appears in the open access part of EuropePMC, as well as every locus
tag that appears in the 500 most-referenced genomes, so that a gene
may appear in the PaperBLAST results even though none of the papers
that mention it are open access. We also incorporate text-mined links
from EuropePMC that link open access articles to UniProt or RefSeq
identifiers. (This yields some additional links because EuropePMC
uses different heuristics for their text mining than we do.)
For every article that mentions a locus tag, a RefSeq protein
identifier, or a UniProt accession, we try to select one or two
snippets of text that refer to the protein. If we cannot get access to
the full text, we try to select a snippet from the abstract, but
unfortunately, unique identifiers such as locus tags are rarely
provided in abstracts.
PaperBLAST also incorporates manually-curated protein functions:
- Proteins from NCBI's RefSeq are included if a
GeneRIF
entry links the gene to an article in
PubMed®.
GeneRIF also provides a short summary of the article's claim about the
protein, which is shown instead of a snippet.
- Proteins from Swiss-Prot (the curated part of UniProt)
are included if the curators
identified experimental evidence for the protein's function (evidence
code ECO:0000269). For these proteins, the fields of the Swiss-Prot entry that
describe the protein's function are shown (with bold headings).
- Proteins from BRENDA,
a curated database of enzymes, are included if they are linked to a paper in PubMed
and their full sequence is known.
- Every protein from the non-redundant subset of
BioLiP,
a database
of ligand-binding sites and catalytic residues in protein structures, is included. Since BioLiP itself
does not include descriptions of the proteins, those are taken from the
Protein Data Bank.
Descriptions from PDB rely on the original submitter of the
structure and cannot be updated by others, so they may be less reliable.
(For SitesBLAST and Sites on a Tree, we use a larger subset of BioLiP so that every
ligand is represented among a group of structures with similar sequences, but for
PaperBLAST, we use the non-redundant set provided by BioLiP.)
- Every protein from EcoCyc, a curated
database of the proteins in Escherichia coli K-12, is included, regardless
of whether they are characterized or not.
- Proteins from the MetaCyc metabolic pathway database
are included if they are linked to a paper in PubMed and their full sequence is known.
- Proteins from the Transport Classification Database (TCDB)
are included if they have known substrate(s), have reference(s),
and are not described as uncharacterized or putative.
(Some of the references are not visible on the PaperBLAST web site.)
- Every protein from CharProtDB,
a database of experimentally characterized protein annotations, is included.
- Proteins from the CAZy database of carbohydrate-active enzymes
are included if they are associated with an Enzyme Classification number.
Even though CAZy does not provide links from individual protein sequences to papers,
these should all be experimentally-characterized proteins.
- Proteins from the REBASE database
of restriction enzymes are included if they have known specificity.
- Every protein with an evidence-based reannotation (based on mutant phenotypes)
in the Fitness Browser is included.
- Sequence-specific transcription factors (including sigma factors and DNA-binding response regulators)
with experimentally-determined DNA binding sites from the
PRODORIC database of gene regulation in prokaryotes.
- Putative transcription factors from RegPrecise
that have manually-curated predictions for their binding sites. These predictions are based on
conserved putative regulatory sites across genomes that contain similar transcription factors,
so PaperBLAST clusters the TFs at 70% identity and retains just one member of each cluster.
- Coding sequence (CDS) features from the
European Nucleotide Archive (ENA)
are included if the /experiment tag is set (implying that there is experimental evidence for the annotation),
the nucleotide entry links to paper(s) in PubMed,
and the nucleotide entry is from the STD data class
(implying that these are targeted annotated sequences, not from shotgun sequencing).
Also, to filter out genes whose transcription or translation was detected, but whose function
was not studied, nucleotide entries or papers with more than 25 such proteins are excluded.
Descriptions from ENA rely on the original submitter of the
sequence and cannot be updated by others, so they may be less reliable.
Except for GeneRIF and ENA,
the curated entries include a short curated
description of the protein's function.
For entries from BioLiP, the protein's function may not be known beyond binding to the ligand.
Many of these entries also link to articles in PubMed.
For more information see the
PaperBLAST paper (mSystems 2017)
or the code.
You can download PaperBLAST's database here.
Changes to PaperBLAST since the paper was written:
- November 2023: incorporated PRODORIC and RegPrecise. Many PRODORIC entries were not linked to a protein sequence (no UniProt identifier), so we added this information.
- February 2023: BioLiP changed their download format. PaperBLAST now includes their non-redundant subset. SitesBLAST and Sites on a Tree use a larger non-redundant subset that ensures that every ligand is represented within each cluster. This should ensure that every binding site is represented.
- June 2022: incorporated some coding sequences from ENA with the /experiment tag.
- March 2022: incorporated BioLiP.
- April 2020: incorporated TCDB.
- April 2019: EuropePMC now returns table entries in their search results. This has expanded PaperBLAST's database, but most of the new entries are of low relevance, and the resulting snippets are often just lists of locus tags with annotations.
- February 2018: the alignment page reports the conservation of the hit's functional sites (if available from from Swiss-Prot or UniProt)
- January 2018: incorporated BRENDA.
- December 2017: incorporated MetaCyc, CharProtDB, CAZy, REBASE, and the reannotations from the Fitness Browser.
- September 2017: EuropePMC no longer returns some table entries in their search results. This has shrunk PaperBLAST's database, but has also reduced the number of low-relevance hits.
Many of these changes are described in Interactive tools for functional annotation of bacterial genomes.
PaperBLAST cannot provide snippets for many of the papers that are
published in non-open-access journals. This limitation applies even if
the paper is marked as "free" on the publisher's web site and is
available in PubmedCentral or EuropePMC. If a journal that you publish
in is marked as "secret," please consider publishing elsewhere.
Many important articles are missing from PaperBLAST, either because
the article's full text is not in EuropePMC (as for many older
articles), or because the paper does not mention a protein identifier such as a locus tag, or because of PaperBLAST's heuristics. If you notice an
article that characterizes a protein's function but is missing from
PaperBLAST, please notify the curators at UniProt
or add an entry to GeneRIF.
Entries in either of these databases will eventually be incorporated
into PaperBLAST. Note that to add an entry to UniProt, you will need
to find the UniProt identifier for the protein. If the protein is not
already in UniProt, you can ask them to create an entry. To add an
entry to GeneRIF, you will need an NCBI Gene identifier, but
unfortunately many prokaryotic proteins in RefSeq do not have
corresponding Gene identifers.
References
PaperBLAST: Text-mining papers for information about homologs.
M. N. Price and A. P. Arkin (2017). mSystems, 10.1128/mSystems.00039-17.
Europe PMC in 2017.
M. Levchenko et al (2017). Nucleic Acids Research, 10.1093/nar/gkx1005.
Gene indexing: characterization and analysis of NLM's GeneRIFs.
J. A. Mitchell et al (2003). AMIA Annu Symp Proc 2003:460-464.
UniProt: the universal protein knowledgebase.
The UniProt Consortium (2016). Nucleic Acids Research, 10.1093/nar/gkw1099.
BRENDA in 2017: new perspectives and new tools in BRENDA.
S. Placzek et al (2017). Nucleic Acids Research, 10.1093/nar/gkw952.
The EcoCyc database: reflecting new knowledge about Escherichia coli K-12.
I. M. Keeseler et al (2016). Nucleic Acids Research, 10.1093/nar/gkw1003.
The MetaCyc database of metabolic pathways and enzymes.
R. Caspi et al (2018). Nucleic Acids Research, 10.1093/nar/gkx935.
CharProtDB: a database of experimentally characterized protein annotations.
R. Madupu et al (2012). Nucleic Acids Research, 10.1093/nar/gkr1133.
The carbohydrate-active enzymes database (CAZy) in 2013.
V. Lombard et al (2014). Nucleic Acids Research, 10.1093/nar/gkt1178.
The Transporter Classification Database (TCDB): recent advances
M. H. Saier, Jr. et al (2016). Nucleic Acids Research, 10.1093/nar/gkv1103.
REBASE - a database for DNA restriction and modification: enzymes, genes and genomes.
R. J. Roberts et al (2015). Nucleic Acids Research, 10.1093/nar/gku1046.
Deep annotation of protein function across diverse bacteria from mutant phenotypes.
M. N. Price et al (2016). bioRxiv, 10.1101/072470.
by Morgan Price,
Arkin group
Lawrence Berkeley National Laboratory