PaperBLAST
PaperBLAST Hits for SM_b21330 (69 a.a., MDWNRVEGNW...)
Show query sequence
>SM_b21330
MDWNRVEGNWKQVKGKVKEQWGKLTDDDLDQISGSREQLEGKIQERYGIEKDRVRRDIDD
WYGRQTWNW
Running BLASTp...
Found 19 similar proteins in the literature:
SM2011_RS11135 CsbD family protein from Sinorhizobium meliloti 2011
100% identity, 100% coverage
RL2307 hypothetical protein from Rhizobium leguminosarum bv. viciae 3841
91% identity, 97% coverage
NGR_RS22185 CsbD family protein from Sinorhizobium fredii NGR234
87% identity, 99% coverage
MAFF_RS37185 CsbD family protein from Mesorhizobium japonicum MAFF 303099
88% identity, 97% coverage
bsl1473 bsl1473 from Bradyrhizobium japonicum USDA 110
71% identity, 90% coverage
BN69_2599 CsbD family protein from Methylocystis sp. SC2
J7QV15 CsbD family protein from Methylocystis sp. (strain SC2)
60% identity, 94% coverage
KPN_04433 hypothetical protein from Klebsiella pneumoniae subsp. pneumoniae MGH 78578
57% identity, 80% coverage
KQQSB11_50044 CsbD family protein from Klebsiella quasipneumoniae subsp. quasipneumoniae
57% identity, 100% coverage
YjbJ / b4045 putative stress response protein YjbJ from Escherichia coli K-12 substr. MG1655 (see 9 papers)
yjbJ / RF|NP_418469 UPF0337 protein yjbJ from Escherichia coli K12 (see paper)
EDL933_5382, EDL933_RS26675 CsbD family protein from Escherichia coli O157:H7 str. EDL933
NP_418469 putative stress response protein YjbJ from Escherichia coli str. K-12 substr. MG1655
P68206 UPF0337 protein YjbJ from Escherichia coli (strain K12)
b4045 predicted stress response protein from Escherichia coli str. K-12 substr. MG1655
ECs5028 hypothetical protein from Escherichia coli O157:H7 str. Sakai
58% identity, 100% coverage
- Transcriptomic and proteomic analysis of the virulence inducing effect of ciprofloxacin on enterohemorrhagic Escherichia coli
Kijewski, PloS one 2024 - “...FliN --- -1.7 -1.3 EDL933_RS20840 EDL933_4247 qseC Sensory histidine kinase QseC --- -1.8 1.6 EDL933_RS26675 EDL933_5382 yjbJ UPF0337 protein YjbJ --- 2.5 13.2 EDL933_RS28250 EDL933_5698 tsr Methyl-accepting chemotaxis protein I (serine chemoreceptor protein) --- -1.4 -1.1 Motility related DEGs and proteins shown as fold changes, between...”
- “...protein FliN --- -1.7 -1.3 EDL933_RS20840 EDL933_4247 qseC Sensory histidine kinase QseC --- -1.8 1.6 EDL933_RS26675 EDL933_5382 yjbJ UPF0337 protein YjbJ --- 2.5 13.2 EDL933_RS28250 EDL933_5698 tsr Methyl-accepting chemotaxis protein I (serine chemoreceptor protein) --- -1.4 -1.1 Motility related DEGs and proteins shown as fold changes,...”
- Comparing the predicted and observed properties of proteins encoded in the genome of Escherichia coli K-12.
Link, Electrophoresis 1997 (PubMed)- GeneRIF: N-terminus verified by Edman degradation on complete protein
- Identification of specific protein amino acid substitutions of extended-spectrum β-lactamase (ESBL)-producing Escherichia coli ST131: a proteomics approach using mass spectrometry
Nakamura, Scientific reports 2019 - “...is unknown. The m/z 8351 peak was identified as UPF0337 protein YjbJ (Uniprot accession no. P68206) belonging to the UPF0337 (CsbD) family. The domain of YjbJ protein was CsbD, and its function is unknown. The m/z 8448 peak was identified as uncharacterized protein YnfD (Uniprot accession...”
- Top-Down LESA Mass Spectrometry Protein Analysis of Gram-Positive and Gram-Negative Bacteria
Kocurek, Journal of the American Society for Mass Spectrometry 2017 - “...7701.88 -1.5 YahO P75694 91 -signal peptide 1189.5904 +7 8320.08 -1.3 UPF0337 protein YjbJ a P68206 80 1254.2614 +7 8772.78 -2.2 YdfK P76154 31 1494.9477 +7 10,457.58 +1.02 Da YbgS P0AAV6 45 -signal peptide; putative disulfide 7176 and deamidation Escherichia coli K-12 923.0049 +10 9219.98 -2.0...”
- Tracing the phylogenetic history of the Crl regulon through the Bacteria and Archaea genomes
Santos-Zavaleta, BMC genomics 2019 - “...11 ] yhjR b3555 yhjR + MSI [ 10 ] bacterial cellulose biosynthetic process yiaG b4045 yiaG + MSI [ 10 ] regulation of transcription yjbJ b4329 yjbJ FliZ () + MSI [ 10 ] yjiG b1044 yjiH G-iadA + MSI [ 10 ] ymdA b1138...”
- Depletion of the non-coding regulatory 6S RNA in E. coli causes a surprising reduction in the expression of the translation machinery
Neusser, BMC genomics 2010 - “...with H-NS b4401 arcA 1.55 response regulator in two-component regulatory system with ArcB or CpxA b4045 yjbJ 1.53 predicted stress response protein b2869 ygeV 1.53 predicted transcriptional regulator b3410 yhgG 1.50 transcriptional regulator 1 Meaningful genes were selected by the following criteria: known or predicted function...”
- Global analysis of extracytoplasmic stress signaling in Escherichia coli
Bury-Moné, PLoS genetics 2009 - “...length of O-antigen -; 2.4 yfdC b2347 Predicted inner membrane protein 2.0 yjbJ F H b4045 Predicted stress response protein, belongs to the S regulon 2.0 galU F b1236 GalU: Subunit of glucose-1-phosphate uridylyltransferase 2.0 wza F wzb F wzc F H wcaA F H wcaB...”
- The HU regulon is composed of genes responding to anaerobiosis, acid stress, high osmolarity and SOS induction
Oberto, PloS one 2009 - “...1.04 1.19 1 1.89 1.3 1.31 1 1.64 0.92 0.97 c isocitrate dehydrogenase kinase/phosphatase yjbJ b4045 yjbJ 1 0.74 1.1 1.67 1 1.94 1.32 1.73 1 1.52 1.06 0.47 a, b hypothetical protein ytfK b4217 ytfK 1 0.36 0.81 0.93 1 0.62 0.66 0.36 1 1.45...”
- YcfR (BhsA) influences Escherichia coli biofilm formation through stress response and surface hydrophobicity
Zhang, Journal of bacteriology 2007 - “...yjbJ yjdN ygaM ymgE b0806 b0456 b1050 b4045 b4107 b2672 b1195 Hypothetical protein Hypothetical protein Hypothetical protein Highly abundant nonessential...”
- Genome-wide analysis of the general stress response network in Escherichia coli: sigmaS-dependent genes, promoters, and sigma factor selectivity
Weber, Journal of bacteriology 2005 - “...OD 4 b2732 b2886 b2922 b3003 b3024 b3362 b3448 b3524 b4045 b4126 b4127 b4178 b4247 b4263 b4310 b1165 b1449 b1678 b1758 b1957 b1953 b2137 b3097 b3098 b3099 b3102...”
- SigmaS-dependent gene expression at the onset of stationary phase in Escherichia coli: function of sigmaS-dependent genes and identification of their promoter sequences
Lacour, Journal of bacteriology 2004 - “...(regulatory) protein (b3555) Hypothetical protein (b4045) Hypothetical (membrane) protein (b1582) Hypothetical (periplasmic) protein (b3097) Hypothetical...”
- Adaptation to famine: a family of stationary-phase genes revealed by microarray analysis
Tani, Proceedings of the National Academy of Sciences of the United States of America 2002 - “...ORF Fold osmC osmY b1482 b4376 5.9 8.9 poxB yjbJ b0871 b4045 8.0 1.7 Association with Lrp revealed by this study (47 genes) adhE b1241 2.9 frdA b4514 aldB b3588...”
- Gene expression induced in Escherichia coli O157:H7 upon exposure to model apple juice
Bergholz, Applied and environmental microbiology 2009 - “...ECs4610 ECs4642 ECs4836 ECs4958 ECs4959 ECs4981 ECs5013 ECs5028 ECs5042 ECs5043 Exponential phase/ stationary phasec Log2 expression ratio Category and ECs no.a...”
STY4436 conserved hypothetical protein from Salmonella enterica subsp. enterica serovar Typhi str. CT18
STM4240 putative cytoplasmic protein from Salmonella typhimurium LT2
57% identity, 99% coverage
STM14_5097 CsbD family protein from Salmonella enterica subsp. enterica serovar Typhimurium str. 14028S
57% identity, 100% coverage
- Proteome remodelling by the stress sigma factor RpoS/σS in Salmonella: identification of small proteins and evidence for post-transcriptional regulation
Lago, Scientific reports 2017 - “...likely correspond to long 5 UnTranslated Regions (UTR) of the S -dependent genes STM14_0421, STM14_1558, STM14_5097, and STM14_1275, respectively (Supplementary Fig. S4 and Table S1 ). This hypothesis is consistent with the non-canonical start codons and lack of ribosome binding sites for the putative ORFs STM14_0419,...”
- “...identity with E . coli YibT, DNA polymerase III-theta IPR009052 69 Yes yibT 14 Enterobacteriaceae STM14_5097 CsbD like IPR008462, pdb1RYK 70 Yes yjbJ 7, 1114 Bacteria, Archaea and Eukaryota STM14_5292 DUF1107 IPR009491 68 ytfK 7, 1214 - Proteobacteria STM14_5469 65% identity with E . coli YjjZ,...”
- Mapping the Regulatory Network for Salmonella enterica Serovar Typhimurium Invasion
Smith, mBio 2016 - “...STM0341 SprB STM14_2227 SL1770 STM1841 SprB STM14_3799 SL3112 STM3138 SprB STM14_4215 pckA SL3467 STM3500 SprB STM14_5097 yjbJ SL4176 STM4240 SprB NA STnc520 STnc520 STnc520 SprB STM14_1174 SL0973 STM1034 HilA STM14_1176 SL0975 STM1036 HilA STM14_1177 SL0976 STM1037 HilA a The genes listed are direct regulatory targets of...”
OA04_05950 CsbD family protein from Pectobacterium versatile
52% identity, 100% coverage
PMI0360 general stress response protein from Proteus mirabilis HI4320
50% identity, 94% coverage
BCAM0504 CsbD-like protein from Burkholderia cenocepacia J2315
49% identity, 88% coverage
BP1738 conserved hypothetical protein from Bordetella pertussis Tohama I
45% identity, 93% coverage
PA14_62680 hypothetical protein from Pseudomonas aeruginosa UCBPP-PA14
Q9HV61 UPF0337 protein PA4738 from Pseudomonas aeruginosa (strain ATCC 15692 / DSM 22644 / CIP 104116 / JCM 14847 / LMG 12228 / 1C / PRS 101 / PAO1)
PA4738 hypothetical protein from Pseudomonas aeruginosa PAO1
48% identity, 93% coverage
- Quorum quenching quandary: resistance to antivirulence compounds
Maeda, The ISME journal 2012 - “...PA14_10380 PA14_58350 PA4677 PA4738 PA4739 PA5482 PA14_61870 PA14_62680 PA14_62690 PA14_72370 mexB oprM by previous studies lasR rhlR coxB coxA coIII napA...”
- Top-Down LESA Mass Spectrometry Protein Analysis of Gram-Positive and Gram-Negative Bacteria
Kocurek, Journal of the American Society for Mass Spectrometry 2017 - “...51 Incubation: 48 h, 37 C Sampled fresh 951.8583 +8 7606.81 0.0 UPF0337 protein PA4738 Q9HV61 58 956.3376 +6 5731.98 0.1 PA0039 Q9I793 24 Incubation: 24 h, 37 C Sampled fresh -signal peptide, 442 disulfide 958.5127 +16 15,320.09 3.5 PA5178 Q9HU11 27 Incubation: 48 h, 37...”
- Predicting Pseudomonas aeruginosa drug resistance using artificial intelligence and clinical MALDI-TOF mass spectra
Nguyen, mSystems 2024 - “...isolates. Three reviewed proteins, namely, protein RegB, major cold shock protein CspA, and UPF0337 protein PA4738, were identified within the feature bin 7,5687,636 Da. Using the same approach, we investigated the most important spectral ranges in amikacin and ciprofloxacin. In ciprofloxacin-resistant isolates, a significantly increased signal...”
- “...work ( 13 , 18 , 29 ). The proteins RegB, CspA, and UPF0337 protein PA4738 were identified within the most contributing bin of our best performing model for predicting ceftazidime/avibactam resistance. RegB is known to facilitate production of exotoxin A, a potent virulence factor in...”
- Quantitative proteomics reveals unique responses to antimicrobial treatments in clinical Pseudomonas aeruginosa isolates
Goodyear, mSystems 2023 - “...PA3787 PA3787 Hypothetical, unknown 3.30 2.98 3.55 PA4571 PA4571 Electron transfer activity 2.61 3.51 2.94 PA4738 PA4738 Hypothetical, unknown 3.68 2.37 3.38 PA4739 PA4739 Hypothetical, unknown 6.04 4.09 5.60 PA5313 GabT2 Polyamine catabolic process 3.20 2.87 3.00 Decreased PA0284 PA0284 Hypothetical, unknown 4.20 4.11 5.15 PA0619...”
- A gene network-driven approach to infer novel pathogenicity-associated genes: application to Pseudomonas aeruginosa PAO1
De, mSystems 2023 - “...which is within pathway implicated in biofilm formation and long-term infection ( 106 ), and PA4738 and PA5482, which are involved in protection against osmotic stress ( 107 ). As pathogens encounter various stress factors during infection including osmotic stress that can interfere with cell envelope...”
- A VirB4 ATPase of the mobile accessory genome orchestrates core genome-encoded features of physiology, metabolism, and virulence of Pseudomonas aeruginosa TBCF10839
Wiehlmann, Frontiers in cellular and infection microbiology 2023 - “...probable toxin transporter 8.1 PA4190 Probable FAD-dependent monooxygenase 6.5 PA4209 phzM, probable phenazine-specific methyltransferase 37.6 PA4738 Conserved hypothetical protein 14.8 PA4739 Conserved hypothetical protein 21.4 PA4778 cueR, negative regulator of H2-T6SS dependent copper binding, regulator of surfing motility, CueR 7.0 PA4828 Conserved hypothetical protein 10.3 PA4876...”
- Top-Down LESA Mass Spectrometry Protein Analysis of Gram-Positive and Gram-Negative Bacteria
Kocurek, Journal of the American Society for Mass Spectrometry 2017 - “...P05384 51 Incubation: 48 h, 37 C Sampled fresh 951.8583 +8 7606.81 0.0 UPF0337 protein PA4738 Q9HV61 58 956.3376 +6 5731.98 0.1 PA0039 Q9I793 24 Incubation: 24 h, 37 C Sampled fresh -signal peptide, 442 disulfide 958.5127 +16 15,320.09 3.5 PA5178 Q9HU11 27 Incubation: 48 h,...”
- “...Most notably, however, the UPF0337 family of stress response proteins, represented in P. aeruginosa by PA4738, was observed both in E. coli (YjbJ) and in S. aureus (SAOUHSC_00845) as well as in all three streptococci. In addition to PA2146, P. aeruginosa yielded multiple proteins whose existence...”
- Physiological and transcriptional responses to osmotic stress of two Pseudomonas syringae strains that differ in epiphytic fitness and osmotolerance
Freeman, Journal of bacteriology 2013 - “...PSPTO_1596, and the putative hydrophilin-encoding, osmoinduced PAO1 gene PA4738 (39) exhibit 40, 36, and 55% amino acid identity to the Escherichia coli...”
- A non-classical LysR-type transcriptional regulator PA2206 is required for an effective oxidative stress response in Pseudomonas aeruginosa
Reen, PloS one 2013 - “...component, subunit 0.50 PA3451 hypothetical protein 0.50 PA3788 hypothetical protein 0.40 PA4141 hypothetical protein 0.50 PA4738 conserved hypothetical protein 0.44 PA4739 conserved hypothetical protein 0.48 PA5085 probable transcriptional regulator 0.31 PA5481 hypothetical protein 0.39 PA5482 hypothetical protein 0.37 Genes that exhibited a 2-fold or greater alteration...”
- Quorum quenching quandary: resistance to antivirulence compounds
Maeda, The ISME journal 2012 - “...PA14_28600 PA14_24860 PA14_13390 PA14_10380 PA14_58350 PA4677 PA4738 PA4739 PA5482 PA14_61870 PA14_62680 PA14_62690 PA14_72370 mexB oprM by previous studies...”
- More
XAC4007 conserved hypothetical protein from Xanthomonas axonopodis pv. citri str. 306
E2P69_RS15800 CsbD family protein from Xanthomonas perforans
45% identity, 84% coverage
BCAM0507 CsbD-like protein from Burkholderia cenocepacia J2315
52% identity, 81% coverage
XAC0100 conserved hypothetical protein from Xanthomonas axonopodis pv. citri str. 306
45% identity, 80% coverage
For advice on how to use these tools together, see
Interactive tools for functional annotation of bacterial genomes.
The PaperBLAST database links 793,807 different protein sequences to 1,259,118 scientific articles. Searches against EuropePMC were last performed on March 13 2025.
PaperBLAST builds a database of protein sequences that are linked
to scientific articles. These links come from automated text searches
against the articles in EuropePMC
and from manually-curated information from GeneRIF, UniProtKB/Swiss-Prot,
BRENDA,
CAZy (as made available by dbCAN),
BioLiP,
CharProtDB,
MetaCyc,
EcoCyc,
TCDB,
REBASE,
the Fitness Browser,
and a subset of the European Nucleotide Archive with the /experiment tag.
Given this database and a protein sequence query,
PaperBLAST uses protein-protein BLAST
to find similar sequences with E < 0.001.
To build the database, we query EuropePMC with locus tags, with RefSeq protein
identifiers, and with UniProt
accessions. We obtain the locus tags from RefSeq or from MicrobesOnline. We use
queries of the form "locus_tag AND genus_name" to try to ensure that
the paper is actually discussing that gene. Because EuropePMC indexes
most recent biomedical papers, even if they are not open access, some
of the links may be to papers that you cannot read or that our
computers cannot read. We query each of these identifiers that
appears in the open access part of EuropePMC, as well as every locus
tag that appears in the 500 most-referenced genomes, so that a gene
may appear in the PaperBLAST results even though none of the papers
that mention it are open access. We also incorporate text-mined links
from EuropePMC that link open access articles to UniProt or RefSeq
identifiers. (This yields some additional links because EuropePMC
uses different heuristics for their text mining than we do.)
For every article that mentions a locus tag, a RefSeq protein
identifier, or a UniProt accession, we try to select one or two
snippets of text that refer to the protein. If we cannot get access to
the full text, we try to select a snippet from the abstract, but
unfortunately, unique identifiers such as locus tags are rarely
provided in abstracts.
PaperBLAST also incorporates manually-curated protein functions:
- Proteins from NCBI's RefSeq are included if a
GeneRIF
entry links the gene to an article in
PubMed®.
GeneRIF also provides a short summary of the article's claim about the
protein, which is shown instead of a snippet.
- Proteins from Swiss-Prot (the curated part of UniProt)
are included if the curators
identified experimental evidence for the protein's function (evidence
code ECO:0000269). For these proteins, the fields of the Swiss-Prot entry that
describe the protein's function are shown (with bold headings).
- Proteins from BRENDA,
a curated database of enzymes, are included if they are linked to a paper in PubMed
and their full sequence is known.
- Every protein from the non-redundant subset of
BioLiP,
a database
of ligand-binding sites and catalytic residues in protein structures, is included. Since BioLiP itself
does not include descriptions of the proteins, those are taken from the
Protein Data Bank.
Descriptions from PDB rely on the original submitter of the
structure and cannot be updated by others, so they may be less reliable.
(For SitesBLAST and Sites on a Tree, we use a larger subset of BioLiP so that every
ligand is represented among a group of structures with similar sequences, but for
PaperBLAST, we use the non-redundant set provided by BioLiP.)
- Every protein from EcoCyc, a curated
database of the proteins in Escherichia coli K-12, is included, regardless
of whether they are characterized or not.
- Proteins from the MetaCyc metabolic pathway database
are included if they are linked to a paper in PubMed and their full sequence is known.
- Proteins from the Transport Classification Database (TCDB)
are included if they have known substrate(s), have reference(s),
and are not described as uncharacterized or putative.
(Some of the references are not visible on the PaperBLAST web site.)
- Every protein from CharProtDB,
a database of experimentally characterized protein annotations, is included.
- Proteins from the CAZy database of carbohydrate-active enzymes
are included if they are associated with an Enzyme Classification number.
Even though CAZy does not provide links from individual protein sequences to papers,
these should all be experimentally-characterized proteins.
- Proteins from the REBASE database
of restriction enzymes are included if they have known specificity.
- Every protein with an evidence-based reannotation (based on mutant phenotypes)
in the Fitness Browser is included.
- Sequence-specific transcription factors (including sigma factors and DNA-binding response regulators)
with experimentally-determined DNA binding sites from the
PRODORIC database of gene regulation in prokaryotes.
- Putative transcription factors from RegPrecise
that have manually-curated predictions for their binding sites. These predictions are based on
conserved putative regulatory sites across genomes that contain similar transcription factors,
so PaperBLAST clusters the TFs at 70% identity and retains just one member of each cluster.
- Coding sequence (CDS) features from the
European Nucleotide Archive (ENA)
are included if the /experiment tag is set (implying that there is experimental evidence for the annotation),
the nucleotide entry links to paper(s) in PubMed,
and the nucleotide entry is from the STD data class
(implying that these are targeted annotated sequences, not from shotgun sequencing).
Also, to filter out genes whose transcription or translation was detected, but whose function
was not studied, nucleotide entries or papers with more than 25 such proteins are excluded.
Descriptions from ENA rely on the original submitter of the
sequence and cannot be updated by others, so they may be less reliable.
Except for GeneRIF and ENA,
the curated entries include a short curated
description of the protein's function.
For entries from BioLiP, the protein's function may not be known beyond binding to the ligand.
Many of these entries also link to articles in PubMed.
For more information see the
PaperBLAST paper (mSystems 2017)
or the code.
You can download PaperBLAST's database here.
Changes to PaperBLAST since the paper was written:
- November 2023: incorporated PRODORIC and RegPrecise. Many PRODORIC entries were not linked to a protein sequence (no UniProt identifier), so we added this information.
- February 2023: BioLiP changed their download format. PaperBLAST now includes their non-redundant subset. SitesBLAST and Sites on a Tree use a larger non-redundant subset that ensures that every ligand is represented within each cluster. This should ensure that every binding site is represented.
- June 2022: incorporated some coding sequences from ENA with the /experiment tag.
- March 2022: incorporated BioLiP.
- April 2020: incorporated TCDB.
- April 2019: EuropePMC now returns table entries in their search results. This has expanded PaperBLAST's database, but most of the new entries are of low relevance, and the resulting snippets are often just lists of locus tags with annotations.
- February 2018: the alignment page reports the conservation of the hit's functional sites (if available from from Swiss-Prot or UniProt)
- January 2018: incorporated BRENDA.
- December 2017: incorporated MetaCyc, CharProtDB, CAZy, REBASE, and the reannotations from the Fitness Browser.
- September 2017: EuropePMC no longer returns some table entries in their search results. This has shrunk PaperBLAST's database, but has also reduced the number of low-relevance hits.
Many of these changes are described in Interactive tools for functional annotation of bacterial genomes.
PaperBLAST cannot provide snippets for many of the papers that are
published in non-open-access journals. This limitation applies even if
the paper is marked as "free" on the publisher's web site and is
available in PubmedCentral or EuropePMC. If a journal that you publish
in is marked as "secret," please consider publishing elsewhere.
Many important articles are missing from PaperBLAST, either because
the article's full text is not in EuropePMC (as for many older
articles), or because the paper does not mention a protein identifier such as a locus tag, or because of PaperBLAST's heuristics. If you notice an
article that characterizes a protein's function but is missing from
PaperBLAST, please notify the curators at UniProt
or add an entry to GeneRIF.
Entries in either of these databases will eventually be incorporated
into PaperBLAST. Note that to add an entry to UniProt, you will need
to find the UniProt identifier for the protein. If the protein is not
already in UniProt, you can ask them to create an entry. To add an
entry to GeneRIF, you will need an NCBI Gene identifier, but
unfortunately many prokaryotic proteins in RefSeq do not have
corresponding Gene identifers.
References
PaperBLAST: Text-mining papers for information about homologs.
M. N. Price and A. P. Arkin (2017). mSystems, 10.1128/mSystems.00039-17.
Europe PMC in 2017.
M. Levchenko et al (2017). Nucleic Acids Research, 10.1093/nar/gkx1005.
Gene indexing: characterization and analysis of NLM's GeneRIFs.
J. A. Mitchell et al (2003). AMIA Annu Symp Proc 2003:460-464.
UniProt: the universal protein knowledgebase.
The UniProt Consortium (2016). Nucleic Acids Research, 10.1093/nar/gkw1099.
BRENDA in 2017: new perspectives and new tools in BRENDA.
S. Placzek et al (2017). Nucleic Acids Research, 10.1093/nar/gkw952.
The EcoCyc database: reflecting new knowledge about Escherichia coli K-12.
I. M. Keeseler et al (2016). Nucleic Acids Research, 10.1093/nar/gkw1003.
The MetaCyc database of metabolic pathways and enzymes.
R. Caspi et al (2018). Nucleic Acids Research, 10.1093/nar/gkx935.
CharProtDB: a database of experimentally characterized protein annotations.
R. Madupu et al (2012). Nucleic Acids Research, 10.1093/nar/gkr1133.
The carbohydrate-active enzymes database (CAZy) in 2013.
V. Lombard et al (2014). Nucleic Acids Research, 10.1093/nar/gkt1178.
The Transporter Classification Database (TCDB): recent advances
M. H. Saier, Jr. et al (2016). Nucleic Acids Research, 10.1093/nar/gkv1103.
REBASE - a database for DNA restriction and modification: enzymes, genes and genomes.
R. J. Roberts et al (2015). Nucleic Acids Research, 10.1093/nar/gku1046.
Deep annotation of protein function across diverse bacteria from mutant phenotypes.
M. N. Price et al (2016). bioRxiv, 10.1101/072470.
by Morgan Price,
Arkin group
Lawrence Berkeley National Laboratory