PaperBLAST
PaperBLAST Hits for sp|Q9I507|Y951_PSEAE UPF0761 membrane protein PA0951 OS=Pseudomonas aeruginosa (strain ATCC 15692 / DSM 22644 / CIP 104116 / JCM 14847 / LMG 12228 / 1C / PRS 101 / PAO1) OX=208964 GN=PA0951 PE=3 SV=1 (411 a.a., MREHFNDGVE...)
Show query sequence
>sp|Q9I507|Y951_PSEAE UPF0761 membrane protein PA0951 OS=Pseudomonas aeruginosa (strain ATCC 15692 / DSM 22644 / CIP 104116 / JCM 14847 / LMG 12228 / 1C / PRS 101 / PAO1) OX=208964 GN=PA0951 PE=3 SV=1
MREHFNDGVEFARFLAHRFVTDKAPNSAAALTYTTLFAVVPMMTVMFSMLSLIPAFHGMG
ESIQTFIFRNFVPSAGEAVETYLKSFTTQARHLTWVGVVFLAVTAFTMLVTIEKAFNEIW
RVRQPRRGVGRFLLYWAILSLGPLLLGAGFAVTTYITSLSLLHGPDALPGAETLLGLMPL
AFSVAAFTLLYSAVPNARVPVRHALMGGVFTAVLFEAAKTLFGLYVSLFPGYQLIYGAFA
TVPIFLLWIYLSWMIVLFGAVLVCNLSSSRLWRRRSLPKLIVLLGVLRVFLQRQQLGQSL
RLTHLHRAGWLLPEDEWEELLDFLEKEQFVCRAGGGEWVLCRDLGAYSLHRLLNRCPWPM
PSRERMPASLDEAWYPPFQQAMERLQVEQEALFGESLAHWLADGTSGAKVT
Running BLASTp...
Found 31 similar proteins in the literature:
ABUW_0456 YihY family inner membrane protein from Acinetobacter baumannii
37% identity, 87% coverage
VF_0100 ribonuclease BN from Vibrio fischeri ES114
VF_0100 virulence factor BrkB family protein from Aliivibrio fischeri ES114
43% identity, 66% coverage
- Comparative genomics-based investigation of resequencing targets in Vibrio fischeri: focus on point miscalls and artefactual expansions
Mandel, BMC genomics 2008 - “...behind the other data sources. As one example, we point to the case of yihY (VF_0100, ortholog of E. coli locus tag b3886). Previously annotated as encoding the ribonuclease BN [ 45 ], this annotation has been propagated through numerous sources, including most of the Vibrionaceae...”
- “...sources described above, as well as the literature described, and captured this update by calling VF_0100 as yihY with a product of "predicted inner membrane protein". In fact, V. fischeri , like most sequenced Vibrio spp., does not contain an rbn ortholog, and therefore having any...”
SO4401 ribonuclease BN from Shewanella oneidensis MR-1
46% identity, 61% coverage
VC2742 ribonuclease BN from Vibrio cholerae O1 biovar eltor str. N16961
44% identity, 66% coverage
ETAE_3490 ribonuclease BN from Edwardsiella tarda EIB202
42% identity, 64% coverage
BCAL1463 putative tRNA processing exoribonuclease from Burkholderia cenocepacia J2315
42% identity, 60% coverage
- Response of Burkholderia cenocepacia H111 to micro-oxia
Pessi, PloS one 2013 - “...CCE51463 BCAL1232 Hypothetical protein I35_5360 nd M only CCE51401 BCAL1294 VgrG protein 13.8 2.6 CCE51631 BCAL1463 Ribonuclease BN TM nd 21.7 CCE53456 BCAL1664 Hypothetical protein I35_7395 nd 16.7 CCE53455 BCAL1665 SpoVR-like protein nd 16.4 CCE50747 BCAL1830 Dioxygenase,2-nitropropane dioxygenase-like 19.1 1.8 CCE50719 BCAL1857 Hypothetical protein I35_4602 TM...”
YPO0028 ribonuclease BN from Yersinia pestis CO92
39% identity, 65% coverage
YP_0029 ribonuclease BN from Yersinia pestis biovar Medievalis str. 91001
39% identity, 65% coverage
PM1616 Rbn from Pasteurella multocida subsp. multocida str. Pm70
39% identity, 66% coverage
- Pathogenomic analysis and characterization of Pasteurella multocida strains recovered from human infections
Smallman, Microbiology spectrum 2024 - “...b Pm1612 Cat multocida / gallicida A L3 Pm1613 Cat multocida / gallicida A L3 Pm1616 Cat multocida / gallicida No capsule locus L3 Pm1617 Cat multocida / gallicida No capsule locus L3 Pm1618 Cat septica A L1 Pm1620 Cat multocida / gallicida A L3 Pm1621...”
- “...and Pm1621), no capsule locus was identified ( Fig. 2 ). While the genomes of Pm1616, Pm1617, P1591, and NCTC 11620 were incomplete, in all of these isolate genomes, the two genes typically flanking the capsule locus in P. multocida , grxD and DUF441, were immediately...”
Rbn / b3886 PF03631 family membrane protein YihY from Escherichia coli K-12 substr. MG1655 (see 2 papers)
YIHY_ECOLI / P0A8K8 UPF0761 membrane protein YihY from Escherichia coli (strain K12) (see paper)
b3886 ribonuclease BN from Escherichia coli str. K-12 substr. MG1655
39% identity, 63% coverage
- Bioinformatic analyses of integral membrane transport proteins encoded within the genome of the planctomycetes species, Rhodopirellula baltica.
Paparoditis, Biochimica et biophysica acta 2014 - “...9.B.105.1.1 Q58AJ7 10 cations Pb 2+ Q7UIY3 6 9.B.126 Putative Lipid Exporter (YhjD) Family 9.B.126.2.1 P0A8K8 6 lipids lipids Q7USG3 6 9.B.126.2.1 P0A8K8 6 lipids lipids Q7UQD0 5 9.B.128 O-antigen Polymerase, WzyE (WzyE) Family 9.B.128.2.1 D7DZ35 12 lipids lipids Q7UVS2 13 Table 2 Integral membrane MFS...”
- Systematic discovery of uncharacterized transcription factors in Escherichia coli K-12 MG1655
Gao, Nucleic acids research 2018 - “...N/A 328 1.0 N/A N/A YafC b0208 LysR 304 0.96 N/A Supplementary Figure S9 YihY b3886 N/A 290 0.91 N/A N/A YieP b3755 GntR 230 0.38 N/A Supplementary Figure S9 YddM b1477 Xre 94 0.34 N/A Supplementary Figure S9 YiaG b3555 Xre 96 0.31 N/A N/A...”
- Comparative genomics-based investigation of resequencing targets in Vibrio fischeri: focus on point miscalls and artefactual expansions
Mandel, BMC genomics 2008 - “...example, we point to the case of yihY (VF_0100, ortholog of E. coli locus tag b3886). Previously annotated as encoding the ribonuclease BN [ 45 ], this annotation has been propagated through numerous sources, including most of the Vibrionaceae genomes. A subsequent report identified the E....”
- “...b2268) as the gene that encodes RNase BN, and the most recent genome annotation for b3886 has been updated as yihY , "predicted inner membrane protein" [ 46 ]. We compared data from the sources described above, as well as the literature described, and captured this...”
HI0276 ribonuclease BN (rbn) from Haemophilus influenzae Rd KW20
38% identity, 63% coverage
- lpt6, a gene required for addition of phosphoethanolamine to inner-core lipopolysaccharide of Neisseria meningitidis and Haemophilus influenzae
Wright, Journal of bacteriology 2004 - “...AAC galE (NMB0064) galE (NMB0064) lpt6 (HI0275) HI0276 Random Random Random Transposon RC325 Transposon RC325 Transposon RC325 Transposon RC325 mtr (NMA0409)...”
- “...between the genes HI0274, encoding a glutamyl-tRNA synthetase, and HI0276, encoding an RNase (Fig. 8A). To test for the prevalence of lpt6 in the H. influenzae...”
- Identification of regions of the chromosome of Neisseria meningitidis and Neisseria gonorrhoeae which are specific to the pathogenic Neisseria species
Perrin, Infection and immunity 1999 - “...E. coli; 5, CvaB, plasmid ColV, E. coli; 6, HI0276, H. influenzae; 7, YkvJ, Bacillus subtilis; 8, HI1190, H. influenzae; 9, HI1189, H. influenzae; 10, YcfO, 11,...”
- Identification and characterization of the Escherichia coli rbn gene encoding the tRNA processing enzyme RNase BN
Callahan, Journal of bacteriology 1996 - “...revealed a striking similarity between RNase BN and HI0276, an ORF in Haemophilus influenzae, as previously noted by Fleischmann et al. (8). HI0276 encodes...”
- “...found in the two proteins. These findings suggest that HI0276 encodes the H. influenzae homolog of RNase BN. RNase BN also shares 23.5% identity with the...”
ESA_04062 virulence factor BrkB family protein from Cronobacter sakazakii ATCC BAA-894
38% identity, 64% coverage
- Transcriptomic Analyses to Unravel Cronobacter sakazakii Resistance Pathways
Liu, Foods (Basel, Switzerland) 2024 - “...being upregulated and two being downregulated. Specifically, the b0877 gene showed a 0.10-fold upregulation, while ESA_04062 exhibited a 0.76-fold increase, indicating an increase in the expression of virulence factor VirK and BrkV family proteins. In contrast to the earlier strains, the PA0086 gene exhibited a 1.25-fold...”
NGO0127 putative tRNA processing exoribonuclease BN from Neisseria gonorrhoeae FA 1090
Q5FAA1 UPF0761 membrane protein NGO_0127 from Neisseria gonorrhoeae (strain ATCC 700825 / FA 1090)
36% identity, 91% coverage
NMA0700 putative ribonuclease BN from Neisseria meningitidis Z2491
35% identity, 91% coverage
PG0958 ribonuclease BN, putative from Porphyromonas gingivalis W83
30% identity, 63% coverage
- VimA-dependent modulation of the secretome in Porphyromonas gingivalis
Osbourne, Molecular oral microbiology 2012 - “...0 7 D-lysine 5,6-aminomutase alpha subunit PG0955 Energy metabolism 57 kDa 0 6 Butyryl-CoA dehydrogenase PG0958 Fatty acid metabolism 42 kDa 0 7 Alanine racemase; N-acetylymuramoylalanyl-D-glutamate-2,6,-diaminopimelate-D-alanine-D-alanine ligase PG0976 Cell envelope 92 kDa 0 2 Conserved hypothetical protein PG0981 Unknown 107 kDa 0 9 Ribonucleotide reductase alpha...”
Q7UQD0 Probable ribonuclease BN from Rhodopirellula baltica (strain DSM 10527 / NCIMB 13988 / SH1)
48% identity, 15% coverage
SERP1421 ribonuclease BN, putative from Staphylococcus epidermidis RP62A
28% identity, 62% coverage
Cj1212c putative ribonuclease BN from Campylobacter jejuni subsp. jejuni NCTC 11168
CJJ81176_1225 YihY family protein from Campylobacter jejuni subsp. jejuni 81-176
29% identity, 57% coverage
Npun_R2514 putative ribonuclease BN from Nostoc punctiforme
25% identity, 58% coverage
LBDG_22000 YihY/virulence factor BrkB family protein from Leptolyngbya boryana dg5
28% identity, 58% coverage
- The GGDEF protein Dgc2 suppresses both motility and biofilm formation in the filamentous cyanobacterium Leptolyngbya boryana
Toida, Microbiology spectrum 2023 - “...in which the kanamycin resistance gene was inserted between open reading frames (ORFs) LBDG_21990 and LBDG_22000 did not show any phenotypic changes. After about 1.5 y of continuous passaging of the E22m1-dg5 strain on BG-11 agar plates, we found that a subpopulation of cells became motile...”
- “...derived from pBR322 to pIL1195. For generating Ptrc::yhjH strain, we inserted them into LBDG_21990 and LBDG_22000 region. We ligated a 1,500-bp USR including LBDG_21990, the kanamycin resistance gene cassette from pYFC10, lacIq and Ptrc from pTrc99A, yhjH from E. coli JM109, and a 1,500-bp DSR including...”
PA2751 hypothetical protein from Pseudomonas aeruginosa PAO1
26% identity, 60% coverage
AKJ12_RS16605 YihY/virulence factor BrkB family protein from Xanthomonas arboricola pv. juglandis
31% identity, 50% coverage
- Proteome Analysis of Walnut Bacterial Blight Disease
H, International journal of molecular sciences 2020 - “...three asparaginases (AKJ12_RS06635, AKJ12_RS18355, and AKJ12_RS13110); four peptidyl-prolyl isomerases (AKJ12_RS10980, AKJ12_RS11150, AKJ12_RS06285, and AKJ12_RS11525); beta-glucosidase (AKJ12_RS16605); cell wall degradation proteins such as two polygalacturonases (AKJ12_RS07840 and AKJ12_RS01955), a cellulase (AKJ12_RS14810), two serine proteases (AKJ12_RS11910 and AKJ12_RS21510); proteases/peptidases (33); two esterases (AKJ12_RS20070 and AKJ12_RS13810); endoglucanase (AKJ12_RS08070); two...”
BCAM1016 putative ribonuclease from Burkholderia cenocepacia J2315
24% identity, 59% coverage
brkB / AAA51647.1 BrkB from Bordetella pertussis (see 2 papers)
28% identity, 64% coverage
lmo1706 similar to transport proteins from Listeria monocytogenes EGD-e
24% identity, 67% coverage
BSU07900 putative integral inner membrane protein with ribonuclease fold from Bacillus subtilis subsp. subtilis str. 168
30% identity, 25% coverage
ACSP50_3383 YihY/virulence factor BrkB family protein from Actinoplanes sp. SE50/110
23% identity, 62% coverage
BC0452 Ribonuclease BN from Bacillus cereus ATCC 14579
23% identity, 62% coverage
SYNW2185 similar to serum resistance locus BrkB from Synechococcus sp. WH 8102
25% identity, 61% coverage
SACOL1941 ribonuclease BN, putative from Staphylococcus aureus subsp. aureus COL
26% identity, 61% coverage
SA1699 hypothetical protein from Staphylococcus aureus subsp. aureus N315
26% identity, 61% coverage
- Site-specific mutation of Staphylococcus aureus VraS reveals a crucial role for the VraR-VraS sensor in the emergence of glycopeptide resistance
Galbusera, Antimicrobial agents and chemotherapy 2011 - “...Primer 3 Bam SA1699 staEco-Pst GGGGTACCGGATCCATGAACTATGTTGAACGTTATATTGAACAG CGGGATCCGTTCATCGATAAATCACCTCTACG CGGGATCCCAGCAACTTTTTGCGGCAAGTATGA...”
- “...from the vraR-SA1699 intergenic region and the adjacent SA1699 gene. The fragments were cloned together in a three-piece ligation with KpnI-PstI-digested pBT2....”
- Whole genome sequencing and complete genetic analysis reveals novel pathways to glycopeptide resistance in Staphylococcus aureus
Renzoni, PloS one 2011 - “...vraS- G45R kan r nearby ts shuttle vector This study pAR712 pBT2, vraS -kan r -SA1699 intergenic ts shuttle vector [32] pAR784 pBT2, stp1 -kan r nearby ts shuttle vector This study pAR787 pBT2, stp1 -tet r nearby ts shuttle vector This study pAR1063 pBT2, yjbH...”
- Characterizing the effects of inorganic acid and alkaline shock on the Staphylococcus aureus transcriptome and messenger RNA turnover
Anderson, FEMS immunology and medical microbiology 2010 - “...reductase sa_c8453s7413_a_at 5.8 15 stable nuc SA0860 thermonuclease precursor sa_c2914s2476_a_at * 4.6 2.5 30 obgE SA1699 GTPase ObgE sa_c1773s1505_a_at 4.9 2.5 30 parC SA1390 DNA topoisomerase IV, A subunit sa_c2887s2452_a_at * 3.3 2.5 2.5 recJ SA1691 single-stranded-DNA-specific exonuclease sa_c1320s1091_a_at * 3.8 2.5 ND rnhB SA1261 ribonuclease...”
- “...sa_c3824s3292_a_at 3.3 2.5 5 ligA SA1965 DNA ligase, NAD-dependent sa_c2914s2476_a_at * 3.8 2.5 2.5 obgE SA1699 GTPase sa_c1768s1502_a_at 2.3 2.5 2.5 parE SA1389 DNA topoisomerase IV subunit B sa_c3826s3296_a_at 2.4 2.5 2.5 pcrA SA1966 ATP-dependent DNA helicase sa_c1376s1149_a_at 2.6 2.5 2.5 polC SA1283 DNA polymerase III...”
- Characterization of the oxygen-responsive NreABC regulon of Staphylococcus aureus
Schlag, Journal of bacteriology 2008 - “...SA0211 SA0212 SA0318 SA0580 SA0653 SA0654 SA0655 SA1014 SA1675 SA1699 SA2167 SA2209 SA2242 SA2424 SA2425 SA2426 SA2427 SA2428 fruB fruA scrA hlgB arcC arcD arcB...”
- Differential gene expression profiling of Staphylococcus aureus cultivated under biofilm and planktonic conditions
Resch, Applied and environmental microbiology 2005 - “...3.714 3.878 3.447 3.367 3.299 2.98 SA2202 2.7 SA0813 SA1699 SA0793 SA2172 SA0733 SA0493 2.67 2.633 2.615 2.59 2.575 2.52 SA0912 SA0913 SA0911 SA0910 SA0505...”
For advice on how to use these tools together, see
Interactive tools for functional annotation of bacterial genomes.
The PaperBLAST database links 793,807 different protein sequences to 1,259,118 scientific articles. Searches against EuropePMC were last performed on March 13 2025.
PaperBLAST builds a database of protein sequences that are linked
to scientific articles. These links come from automated text searches
against the articles in EuropePMC
and from manually-curated information from GeneRIF, UniProtKB/Swiss-Prot,
BRENDA,
CAZy (as made available by dbCAN),
BioLiP,
CharProtDB,
MetaCyc,
EcoCyc,
TCDB,
REBASE,
the Fitness Browser,
and a subset of the European Nucleotide Archive with the /experiment tag.
Given this database and a protein sequence query,
PaperBLAST uses protein-protein BLAST
to find similar sequences with E < 0.001.
To build the database, we query EuropePMC with locus tags, with RefSeq protein
identifiers, and with UniProt
accessions. We obtain the locus tags from RefSeq or from MicrobesOnline. We use
queries of the form "locus_tag AND genus_name" to try to ensure that
the paper is actually discussing that gene. Because EuropePMC indexes
most recent biomedical papers, even if they are not open access, some
of the links may be to papers that you cannot read or that our
computers cannot read. We query each of these identifiers that
appears in the open access part of EuropePMC, as well as every locus
tag that appears in the 500 most-referenced genomes, so that a gene
may appear in the PaperBLAST results even though none of the papers
that mention it are open access. We also incorporate text-mined links
from EuropePMC that link open access articles to UniProt or RefSeq
identifiers. (This yields some additional links because EuropePMC
uses different heuristics for their text mining than we do.)
For every article that mentions a locus tag, a RefSeq protein
identifier, or a UniProt accession, we try to select one or two
snippets of text that refer to the protein. If we cannot get access to
the full text, we try to select a snippet from the abstract, but
unfortunately, unique identifiers such as locus tags are rarely
provided in abstracts.
PaperBLAST also incorporates manually-curated protein functions:
- Proteins from NCBI's RefSeq are included if a
GeneRIF
entry links the gene to an article in
PubMed®.
GeneRIF also provides a short summary of the article's claim about the
protein, which is shown instead of a snippet.
- Proteins from Swiss-Prot (the curated part of UniProt)
are included if the curators
identified experimental evidence for the protein's function (evidence
code ECO:0000269). For these proteins, the fields of the Swiss-Prot entry that
describe the protein's function are shown (with bold headings).
- Proteins from BRENDA,
a curated database of enzymes, are included if they are linked to a paper in PubMed
and their full sequence is known.
- Every protein from the non-redundant subset of
BioLiP,
a database
of ligand-binding sites and catalytic residues in protein structures, is included. Since BioLiP itself
does not include descriptions of the proteins, those are taken from the
Protein Data Bank.
Descriptions from PDB rely on the original submitter of the
structure and cannot be updated by others, so they may be less reliable.
(For SitesBLAST and Sites on a Tree, we use a larger subset of BioLiP so that every
ligand is represented among a group of structures with similar sequences, but for
PaperBLAST, we use the non-redundant set provided by BioLiP.)
- Every protein from EcoCyc, a curated
database of the proteins in Escherichia coli K-12, is included, regardless
of whether they are characterized or not.
- Proteins from the MetaCyc metabolic pathway database
are included if they are linked to a paper in PubMed and their full sequence is known.
- Proteins from the Transport Classification Database (TCDB)
are included if they have known substrate(s), have reference(s),
and are not described as uncharacterized or putative.
(Some of the references are not visible on the PaperBLAST web site.)
- Every protein from CharProtDB,
a database of experimentally characterized protein annotations, is included.
- Proteins from the CAZy database of carbohydrate-active enzymes
are included if they are associated with an Enzyme Classification number.
Even though CAZy does not provide links from individual protein sequences to papers,
these should all be experimentally-characterized proteins.
- Proteins from the REBASE database
of restriction enzymes are included if they have known specificity.
- Every protein with an evidence-based reannotation (based on mutant phenotypes)
in the Fitness Browser is included.
- Sequence-specific transcription factors (including sigma factors and DNA-binding response regulators)
with experimentally-determined DNA binding sites from the
PRODORIC database of gene regulation in prokaryotes.
- Putative transcription factors from RegPrecise
that have manually-curated predictions for their binding sites. These predictions are based on
conserved putative regulatory sites across genomes that contain similar transcription factors,
so PaperBLAST clusters the TFs at 70% identity and retains just one member of each cluster.
- Coding sequence (CDS) features from the
European Nucleotide Archive (ENA)
are included if the /experiment tag is set (implying that there is experimental evidence for the annotation),
the nucleotide entry links to paper(s) in PubMed,
and the nucleotide entry is from the STD data class
(implying that these are targeted annotated sequences, not from shotgun sequencing).
Also, to filter out genes whose transcription or translation was detected, but whose function
was not studied, nucleotide entries or papers with more than 25 such proteins are excluded.
Descriptions from ENA rely on the original submitter of the
sequence and cannot be updated by others, so they may be less reliable.
Except for GeneRIF and ENA,
the curated entries include a short curated
description of the protein's function.
For entries from BioLiP, the protein's function may not be known beyond binding to the ligand.
Many of these entries also link to articles in PubMed.
For more information see the
PaperBLAST paper (mSystems 2017)
or the code.
You can download PaperBLAST's database here.
Changes to PaperBLAST since the paper was written:
- November 2023: incorporated PRODORIC and RegPrecise. Many PRODORIC entries were not linked to a protein sequence (no UniProt identifier), so we added this information.
- February 2023: BioLiP changed their download format. PaperBLAST now includes their non-redundant subset. SitesBLAST and Sites on a Tree use a larger non-redundant subset that ensures that every ligand is represented within each cluster. This should ensure that every binding site is represented.
- June 2022: incorporated some coding sequences from ENA with the /experiment tag.
- March 2022: incorporated BioLiP.
- April 2020: incorporated TCDB.
- April 2019: EuropePMC now returns table entries in their search results. This has expanded PaperBLAST's database, but most of the new entries are of low relevance, and the resulting snippets are often just lists of locus tags with annotations.
- February 2018: the alignment page reports the conservation of the hit's functional sites (if available from from Swiss-Prot or UniProt)
- January 2018: incorporated BRENDA.
- December 2017: incorporated MetaCyc, CharProtDB, CAZy, REBASE, and the reannotations from the Fitness Browser.
- September 2017: EuropePMC no longer returns some table entries in their search results. This has shrunk PaperBLAST's database, but has also reduced the number of low-relevance hits.
Many of these changes are described in Interactive tools for functional annotation of bacterial genomes.
PaperBLAST cannot provide snippets for many of the papers that are
published in non-open-access journals. This limitation applies even if
the paper is marked as "free" on the publisher's web site and is
available in PubmedCentral or EuropePMC. If a journal that you publish
in is marked as "secret," please consider publishing elsewhere.
Many important articles are missing from PaperBLAST, either because
the article's full text is not in EuropePMC (as for many older
articles), or because the paper does not mention a protein identifier such as a locus tag, or because of PaperBLAST's heuristics. If you notice an
article that characterizes a protein's function but is missing from
PaperBLAST, please notify the curators at UniProt
or add an entry to GeneRIF.
Entries in either of these databases will eventually be incorporated
into PaperBLAST. Note that to add an entry to UniProt, you will need
to find the UniProt identifier for the protein. If the protein is not
already in UniProt, you can ask them to create an entry. To add an
entry to GeneRIF, you will need an NCBI Gene identifier, but
unfortunately many prokaryotic proteins in RefSeq do not have
corresponding Gene identifers.
References
PaperBLAST: Text-mining papers for information about homologs.
M. N. Price and A. P. Arkin (2017). mSystems, 10.1128/mSystems.00039-17.
Europe PMC in 2017.
M. Levchenko et al (2017). Nucleic Acids Research, 10.1093/nar/gkx1005.
Gene indexing: characterization and analysis of NLM's GeneRIFs.
J. A. Mitchell et al (2003). AMIA Annu Symp Proc 2003:460-464.
UniProt: the universal protein knowledgebase.
The UniProt Consortium (2016). Nucleic Acids Research, 10.1093/nar/gkw1099.
BRENDA in 2017: new perspectives and new tools in BRENDA.
S. Placzek et al (2017). Nucleic Acids Research, 10.1093/nar/gkw952.
The EcoCyc database: reflecting new knowledge about Escherichia coli K-12.
I. M. Keeseler et al (2016). Nucleic Acids Research, 10.1093/nar/gkw1003.
The MetaCyc database of metabolic pathways and enzymes.
R. Caspi et al (2018). Nucleic Acids Research, 10.1093/nar/gkx935.
CharProtDB: a database of experimentally characterized protein annotations.
R. Madupu et al (2012). Nucleic Acids Research, 10.1093/nar/gkr1133.
The carbohydrate-active enzymes database (CAZy) in 2013.
V. Lombard et al (2014). Nucleic Acids Research, 10.1093/nar/gkt1178.
The Transporter Classification Database (TCDB): recent advances
M. H. Saier, Jr. et al (2016). Nucleic Acids Research, 10.1093/nar/gkv1103.
REBASE - a database for DNA restriction and modification: enzymes, genes and genomes.
R. J. Roberts et al (2015). Nucleic Acids Research, 10.1093/nar/gku1046.
Deep annotation of protein function across diverse bacteria from mutant phenotypes.
M. N. Price et al (2016). bioRxiv, 10.1101/072470.
by Morgan Price,
Arkin group
Lawrence Berkeley National Laboratory