PaperBLAST
PaperBLAST Hits for SwissProt::O32332 PTS system glucitol/sorbitol-specific EIIC component; EIIC-Gut; Glucitol/sorbitol permease IIC component (Clostridium beijerinckii (strain ATCC 51743 / NCIMB 8052) (Clostridium acetobutylicum)) (182 a.a., MDAIVYFAKG...)
Show query sequence
>SwissProt::O32332 PTS system glucitol/sorbitol-specific EIIC component; EIIC-Gut; Glucitol/sorbitol permease IIC component (Clostridium beijerinckii (strain ATCC 51743 / NCIMB 8052) (Clostridium acetobutylicum))
MDAIVYFAKGFMYLFEVGGNTFVSWVTGIIPKVLLLLVFMNSIIAFIGQDKVDRFAKFAS
RNVILAYGVLPFLSAFMLGNPMALSMGKFLPERMKPSYYASASYHCHTNSGIFPHINVGE
IFIYLGIANGITTLGLDPTALGLRYLLVGLVMNFFAGWVTDFTTKIVMRQQGIELSNQLK
AN
Running BLASTp...
Found 15 similar proteins in the literature:
PTHC_CLOB8 / O32332 PTS system glucitol/sorbitol-specific EIIC component; EIIC-Gut; Glucitol/sorbitol permease IIC component from Clostridium beijerinckii (strain ATCC 51743 / NCIMB 8052) (Clostridium acetobutylicum) (see paper)
TC 4.A.4.1.2 / O32332 Glucitol/sorbitol permease IIC component, component of The Glucitol Enzyme II complex, IICBC (GutA1A2) IIA (GutB) from Clostridium beijerinckii (strain ATCC 51743 / NCIMB 8052) (see paper)
gutA1 / CAA05513.1 GutA1 from Clostridium beijerinckii (see paper)
Cbei_0336 PTS system, glucitol/sorbitol-specific, IIC subunit from Clostridium beijerincki NCIMB 8052
100% identity, 100% coverage
SPSF3K_00182 PTS glucitol/sorbitol transporter subunit IIC from Streptococcus parauberis
59% identity, 99% coverage
- Transcriptome analysis unveils survival strategies of Streptococcus parauberis against fish serum
Lee, PloS one 2021 - “...deaminase 4.6 2.3 1.7 G / SPSF3K_02218 Fic family protein 4.3 1.6 0.7 D srlA SPSF3K_00182 Glucitol/sorbitol permease IIC component - -4.1 -3.5 G srlE SPSF3K_00183 Protein-N(pi)-phosphohistidinesugar phosphotransferase - -4.1 -3.6 G srlB SPSF3K_00184 Protein-N(pi)-phosphohistidinesugar phosphotransferase - -3.4 -3.3 G ptsG SPSF3K_00506 Protein-N(pi)-phosphohistidinesugar phosphotransferase -1.0 -3.1...”
SMU_311 PTS glucitol/sorbitol transporter subunit IIC from Streptococcus mutans UA159
59% identity, 98% coverage
- Inhibitory Effect of Adsorption of Streptococcus mutans onto Scallop-Derived Hydroxyapatite
Usuda, International journal of molecular sciences 2023 - “...Among the upregulated genes, 5 of the 6 ( citG2 , glgD , trk , SMU_311, and SMU_1487, but not SMU_1230c) were in a network of 5 genes, with the greatest interaction around glgD ( Figure 3 a). In contrast, only 3 of the 15 downregulated...”
- Cnm of Streptococcus mutans is important for cell surface structure and membrane permeability
Naka, Frontiers in cellular and infection microbiology 2022 - “...SMU_1067c ABC transporter permease 1029512 SMU_1067c 3.304 SMU_803c ABC transporter ATP-binding protein 1029385 SMU_803c 3.297 SMU_311 PTS system sorbitol (glucitol) transporter subunit IIC2 1028201 SMU_311 3.273 Gene name Description NCBI Gene ID Locus Tag Fold-change SMU_1897 ABC transporter ATP-binding protein 1029101 SMU_1897 3.138 SMU_312 PTS system...”
- A five-species transcriptome array for oral mixed-biofilm studies
Redanz, PloS one 2011 - “...down Metabolism, EnvironmentalInformation Processing SMU_2047 ptsG - putative PTS system, glucose-specific IIABC component 2.13 up SMU_311 PTS system, sorbitol (glucitol) phosphotransferase enzyme IIC2 3.41 up SMU_312 PTS system, sorbitol phosphotransferase enzyme IIBC 2.93 up SMU_313 putative PTS system, sorbitol-specific enzyme IIA 3.86 up Genetic information processing...”
lp_3620 sorbitol PTS, EIIC from Lactobacillus plantarum WCFS1
57% identity, 97% coverage
lp_3654 sorbitol PTS, EIIC from Lactobacillus plantarum WCFS1
lp_3654 PTS glucitol/sorbitol transporter subunit IIC from Lactiplantibacillus plantarum WCFS1
58% identity, 99% coverage
- Butanol Tolerance of Lactiplantibacillus plantarum: A Transcriptome Study
Petrov, Genes 2021 - “...both strains, 2.48-fold in 8-1 and 4.35-fold in Ym1, is fructose-specific. Two genes for transporters (lp_3654 and lp_0286, celB ) are uniquely upregulated in Ym1, the first for sorbitol (2.29-fold) and the second for cellobiose (2.55-fold). The sugar uptake in Ym1 under butanol stress is much...”
- “...PTS mannitol transporter subunit IICBA +6.48 3.22 lp_2097, fruA PTS transporter subunit EIIA +4.35 +2.48 lp_3654, pts38C PTS sorbitol transporter subunit IIC +2.29 NC lp_0286, pts6C PTS cellobiose transporter subunit IIC +2.55 NC lp_2531, pts18CBA PTS transporter subunit EIIC NC +3.22 lp_0886, pts11BC PTS transporter subunit...”
SEN2673 PTS system, glucitol/sorbitol-specific IIBC component from Salmonella enterica subsp. enterica serovar Enteritidis str. P125109
53% identity, 96% coverage
- Global transcriptomic analysis of ethanol tolerance response in Salmonella Enteritidis
He, Current research in food science 2022 - “...6.52 PTS system mannose-specific transporter subunit IID SEN1206 manY 5.13 Phosphotransferase enzyme II, C component SEN2673 srlA 2.83 PTS system glucitol/sorbitol-specific transporter subunit IIBC SEN2197 fruA 3.09 Fructose PTS system EIIA component SEN2675 slrB 5.09 PTS system glucitol/sorbitol-specific transporter subunit IIA SEN2674 srlE 4.91 PTS system...”
- “...6.52 PTS system mannose-specific transporter subunit IID SEN2197 fruA 3.09 Fructose PTS system EIIA component SEN2673 srlA 2.83 PTS system glucitol/sorbitol-specific transporter subunit IIBC SEN2675 slrB 5.09 PTS system glucitol/sorbitol-specific transporter subunit IIA SEN2674 srlE 4.91 PTS system glucitol/sorbitol-specific transporter subunit IIBC Bacterial secretion systems SEN1636...”
PM1971 unknown from Pasteurella multocida subsp. multocida str. Pm70
52% identity, 98% coverage
PTHC_ERWAM / O32521 PTS system glucitol/sorbitol-specific EIIC component; EIIC-Gut; Glucitol/sorbitol permease IIC component from Erwinia amylovora (Fire blight bacteria) (see paper)
EAM_RS02625 PTS glucitol/sorbitol transporter subunit IIC from Erwinia amylovora ATCC 49946
52% identity, 98% coverage
- function: The phosphoenolpyruvate-dependent sugar phosphotransferase system (PTS), a major carbohydrate active transport system, catalyzes the phosphorylation of incoming sugar substrates concomitant with their translocation across the cell membrane. The enzyme II complex composed of SrlA, SrlB and SrlE is involved in glucitol/sorbitol transport.
- A complete twelve-gene deletion null mutant reveals that cyclic di-GMP is a global regulator of phase-transition and host colonization in Erwinia amylovora
Kharadi, PLoS pathogens 2022 - “...protein CDS hypothetical protein -4.127207088 0 EAM_RS06765 yccA CDS FtsH protease modulator YccA -4.094370008 0 EAM_RS02625 glucitol/sorbitol permease IIC component CDS glucitol/sorbitol permease IIC component -4.082619297 0 EAM_RS01695 acs CDS acetateCoA ligase -3.99767443 4.442E-206 EAM_RS02150 groL CDS chaperonin GroEL -3.934347101 3.1088E-288 EAM_RS12125 dihydrodipicolinate synthase family protein...”
UGYR_RS07350 PTS glucitol/sorbitol transporter subunit IIC from Yersinia ruckeri
52% identity, 99% coverage
STM14_RS15195 PTS glucitol/sorbitol transporter subunit IIC from Salmonella enterica subsp. enterica serovar Typhimurium str. 14028S
STM2832 PTS family, glucitol/sorbitol-specific enzyme IIC component,one of two IIC components from Salmonella typhimurium LT2
52% identity, 96% coverage
- Salmonella enterica Serovar Typhimurium 14028s Genomic Regions Required for Colonization of Lettuce Leaves
Montano, Frontiers in microbiology 2020 - “...33,516 STM14_RS22490 to STM14_RS22630 Mut2 K_77/78_F03 36188353626190 7,355 STM14_RS18330 to STM14_RS18370 Mut3 C_03_H10 29986483042149 43,501 STM14_RS15195 to STM14_RS15425 Mut4 C_01_H4 24510612455149 4,088 STM14_RS12670 to STM14_RS12690 Mut5 C_01_G2 20160002046442 30,442 STM14_RS10460 to STM14_RS10615 Mut6 C_01_F12 19480411981245 33,204 STM14_RS10090 to STM14_RS10285 Mut7 C_01_E9 15727541583690 10,936 STM14_RS08285 to STM14_RS08335...”
- Genetic Determinants of Salmonella enterica Serovar Typhimurium Proliferation in the Cytosol of Epithelial Cells
Wrande, Infection and immunity 2016 - “...(STM2832-STM2877 [this mutant contains a deletion of genes STM2832 to STM2877] and STM4565- STM4579), so their phenotypes could not be confirmed in this cell...”
CBG46_03170 PTS glucitol/sorbitol transporter subunit IIC from Actinobacillus succinogenes
51% identity, 98% coverage
c3256 PTS system, glucitol/sorbitol-specific IIC2 component from Escherichia coli CFT073
UTI89_C3064 PTS system, glucitol/sorbitol-specific IIC2 component from Escherichia coli UTI89
51% identity, 96% coverage
YE1098 pts system, glucitol/sorbitol-specific iic2 component from Yersinia enterocolitica subsp. enterocolitica 8081
51% identity, 99% coverage
- Comparison of Yersinia enterocolitica DNA Methylation at Ambient and Host Temperatures
Van, Epigenomes 2023 - “...a hypothetical protein with a Dam site 74 bp 5 from the start codon; and YE1098 with the Dam site 51 bp 5 from the start codon. This gene encodes GutA, also referred to as SrlA, a glucitol/sorbitol-specific IIC2 component, a subunit of the phoshotransferase system...”
- “...et al. [ 41 ], however, reveals little variation in the temperature expression of YE1259. YE1098, coding for GutA, also showed the same pattern of methylation as YE1259 (). Van der Woude et al. [ 52 ] reported that the Dam site 44 bp from the...”
SrlA / b2702 sorbitol-specific PTS enzyme IIC2 component (EC 2.7.1.198; EC 2.7.1.197) from Escherichia coli K-12 substr. MG1655 (see 3 papers)
SrlA / P56579 sorbitol-specific PTS enzyme IIC2 component (EC 2.7.1.198; EC 2.7.1.197) from Escherichia coli (strain K12) (see 3 papers)
PTHC_ECOLI / P56579 PTS system glucitol/sorbitol-specific EIIC component; EIIC-Gut; Glucitol/sorbitol permease IIC component from Escherichia coli (strain K12) (see 2 papers)
TC 4.A.4.1.1 / P56579 PTHC aka SRLA aka GUTA aka SBL aka B2702, component of Glucitol porter from Escherichia coli (see 7 papers)
b2702 glucitol/sorbitol-specific enzyme IIC component of PTS from Escherichia coli str. K-12 substr. MG1655
51% identity, 96% coverage
- function: The phosphoenolpyruvate-dependent sugar phosphotransferase system (PTS), a major carbohydrate active transport system, catalyzes the phosphorylation of incoming sugar substrates concomitant with their translocation across the cell membrane. The enzyme II complex composed of SrlA, SrlB and SrlE is involved in glucitol/sorbitol transport. It can also use D-mannitol.
- substrates: glucitol
- The two-component system histidine kinase EnvZ contributes to Avian pathogenic Escherichia coli pathogenicity by regulating biofilm formation and stress responses
Fu, Poultry science 2023 - “...Putative PTS multi-phosphoryl transfer protein PtsA 2.20 b2704 srlB PTS system, glucitol/sorbitol-specific IIA component 2.00 b2702 srlA PTS system, glucitol/sorbitol-specific IIC2 component 1.19 b2167 fruA PTS system, fructose-specific IIBC component 1.93 b1737 celB PTS system, cellobiose-specific IIC component 1.59 b3599 mtlA Fused mannitol-specific PTS enzymes: IIA...”
- Human body temperature (37degrees C) increases the expression of iron, carbohydrate, and amino acid utilization genes in Escherichia coli K-12
White-Ziegler, Journal of bacteriology 2007 - “...b2013 b2579 database b1172 b2191 b0162 b2423 b4037 b2972 b2702 b2703 b0556 b0557 b1495 b1919 b2670 b3217 b3964 b4326 b0458 b1147 Change (n-fold) at 37/23Cc...”
- DNA microarray analyses of the long-term adaptive response of Escherichia coli to acetate and propionate
Polen, Applied and environmental microbiology 2003 - “...protein Putative transport protein 1.65* 0.97 2.33* 1.54* 1.20 0.87 b2702 b2703 b2704 b2705 b2706 b2707 b2708 srlA1 srlA2 srlB srlD gutM srlR gutQ 2 2 2 2 2...”
- “...0.51* 0.60* b0929 ompF 1 Outer membrane protein 1a (la;b;F) 2.18* b2702 b2703 b2704 b2705 b2706 b2707 b2708 srlA1 srlA2 srlB srlD gutM srlR gutQ 2 2 2 2 2 2 2...”
- Third International Workshop on Reactive Arthritis. 23-26 September 1995, Berlin, Germany. Report and abstracts
Kingsley, Annals of the rheumatic diseases 1996
lmo0544 similar to PTS system, glucitol/sorbitol-specific enzyme II CII component from Listeria monocytogenes EGD-e
45% identity, 95% coverage
- Transcriptomic analysis of Listeria monocytogenes biofilm formation at different times
Gou, Canadian journal of veterinary research = Revue canadienne de recherche veterinaire 2023 - “...the quorum sensing, and the 2-component system. The top 5 upregulated DEGs were lmo0024, lmo0374, lmo0544, hly, and lmo2434. The top 5 downregulated DEGs were lmo2192, lmo1211, cheY, lmo0689, and secY. After real-time quantitative polymerase chain reaction, the expression of these 10 DEGs were consistent with...”
- DegU-mediated suppression of carbohydrate uptake in Listeria monocytogenes increases adaptation to oxidative stress
Chen, Applied and environmental microbiology 2023 (secret) - Listeria monocytogenes GshF contributes to oxidative stress tolerance via regulation of the phosphoenolpyruvate-carbohydrate phosphotransferase system
Chen, Microbiology spectrum 2023 - “...lmo0542 PTS sorbitol transporter subunit IIA lmo0543 3.25 Yes/down lmo0543 PTS sorbitol transporter subunit IIBC lmo0544 4.74 Yes/down lmo0544 PTS sorbitol transporter subunit IIC lmo0631 6.83 Yes/down lmo0631 PTS fructose transporter subunit IIA lmo0632 3.13 Yes/down lmo0632 PTS fructose transporter subunit IIC lmo0633 lmo0633 PTS fructose...”
- New Insights into the Lactic Acid Resistance Determinants of Listeria monocytogenes Based on Transposon Sequencing and Transcriptome Sequencing Analyses
Liu, Microbiology spectrum 2023 - “...sorbitol transporter subunit IIA 0.1163 2.30E-02 lmo0543 lmo0543 PTS sorbitol transporter subunit IIBC 0.1233 5.02E-05 lmo0544 lmo0544 PTS sorbitol transporter subunit IIC 0.0485 9.23E-05 lmo0738 lmo0738 PTS beta-glucoside transporter subunit IIABC 0.0009 0.00E+00 lmo0781 lmo0781 PTS mannose transporter subunit IID 0.4964 3.58E-05 lmo0874 lmo0874 PTS sugar...”
- A Machine Learning Model for Food Source Attribution of Listeria monocytogenes
Tanui, Pathogens (Basel, Switzerland) 2022 - “...0.7952 0.7781 0.6482 0.6923 lmo0625 lmo0625 Putative lipase/acylhydrolase 0.6548 0.6242 0.6813 0.7945 0.743 0.6242 0.6548 lmo0544 srlA PTS sorbitol transporter subunit IIC 0.7125 0.6483 0.7073 0.7928 0.7713 0.6483 0.7125 lmo2728 mlrA Transcriptional regulator, MerR family protein 0.62 0.6322 0.6294 0.7909 0.6994 0.6041 0.6322 lmo2348 lmo2348 Amino...”
- Blue Light Sensing in Listeria monocytogenes Is Temperature-Dependent and the Transcriptional Response to It Is Predominantly SigB-Dependent
Dorey, Frontiers in microbiology 2019 - “...In contrast to the wild-type, the sigB mutant significantly increased the transcription of rli78 and lmo0544 and significantly decreased the transcription of lmo0481 and lmo2818. These genes were distributed across several functional categories ( Table 3 ), with three being identified as transporters ( lmo0544 ,...”
- “...visible light, in a sigB mutant. Gene name Log 2 fold change Functional category RAST_product lmo0544 2.39 Transport/binding proteins and lipoproteins PTS system, glucitol/sorbitol-specific IIC component RatA-1 (rli78) 1.03 sRNA Unknown lmo2346 1.00 From other organisms ThiJ/PfpI family protein lmo2343 1.04 Detoxification Coenzyme F420-dependent N5,N10-methylene tetrahydromethanopterin...”
For advice on how to use these tools together, see
Interactive tools for functional annotation of bacterial genomes.
The PaperBLAST database links 793,807 different protein sequences to 1,259,118 scientific articles. Searches against EuropePMC were last performed on March 13 2025.
PaperBLAST builds a database of protein sequences that are linked
to scientific articles. These links come from automated text searches
against the articles in EuropePMC
and from manually-curated information from GeneRIF, UniProtKB/Swiss-Prot,
BRENDA,
CAZy (as made available by dbCAN),
BioLiP,
CharProtDB,
MetaCyc,
EcoCyc,
TCDB,
REBASE,
the Fitness Browser,
and a subset of the European Nucleotide Archive with the /experiment tag.
Given this database and a protein sequence query,
PaperBLAST uses protein-protein BLAST
to find similar sequences with E < 0.001.
To build the database, we query EuropePMC with locus tags, with RefSeq protein
identifiers, and with UniProt
accessions. We obtain the locus tags from RefSeq or from MicrobesOnline. We use
queries of the form "locus_tag AND genus_name" to try to ensure that
the paper is actually discussing that gene. Because EuropePMC indexes
most recent biomedical papers, even if they are not open access, some
of the links may be to papers that you cannot read or that our
computers cannot read. We query each of these identifiers that
appears in the open access part of EuropePMC, as well as every locus
tag that appears in the 500 most-referenced genomes, so that a gene
may appear in the PaperBLAST results even though none of the papers
that mention it are open access. We also incorporate text-mined links
from EuropePMC that link open access articles to UniProt or RefSeq
identifiers. (This yields some additional links because EuropePMC
uses different heuristics for their text mining than we do.)
For every article that mentions a locus tag, a RefSeq protein
identifier, or a UniProt accession, we try to select one or two
snippets of text that refer to the protein. If we cannot get access to
the full text, we try to select a snippet from the abstract, but
unfortunately, unique identifiers such as locus tags are rarely
provided in abstracts.
PaperBLAST also incorporates manually-curated protein functions:
- Proteins from NCBI's RefSeq are included if a
GeneRIF
entry links the gene to an article in
PubMed®.
GeneRIF also provides a short summary of the article's claim about the
protein, which is shown instead of a snippet.
- Proteins from Swiss-Prot (the curated part of UniProt)
are included if the curators
identified experimental evidence for the protein's function (evidence
code ECO:0000269). For these proteins, the fields of the Swiss-Prot entry that
describe the protein's function are shown (with bold headings).
- Proteins from BRENDA,
a curated database of enzymes, are included if they are linked to a paper in PubMed
and their full sequence is known.
- Every protein from the non-redundant subset of
BioLiP,
a database
of ligand-binding sites and catalytic residues in protein structures, is included. Since BioLiP itself
does not include descriptions of the proteins, those are taken from the
Protein Data Bank.
Descriptions from PDB rely on the original submitter of the
structure and cannot be updated by others, so they may be less reliable.
(For SitesBLAST and Sites on a Tree, we use a larger subset of BioLiP so that every
ligand is represented among a group of structures with similar sequences, but for
PaperBLAST, we use the non-redundant set provided by BioLiP.)
- Every protein from EcoCyc, a curated
database of the proteins in Escherichia coli K-12, is included, regardless
of whether they are characterized or not.
- Proteins from the MetaCyc metabolic pathway database
are included if they are linked to a paper in PubMed and their full sequence is known.
- Proteins from the Transport Classification Database (TCDB)
are included if they have known substrate(s), have reference(s),
and are not described as uncharacterized or putative.
(Some of the references are not visible on the PaperBLAST web site.)
- Every protein from CharProtDB,
a database of experimentally characterized protein annotations, is included.
- Proteins from the CAZy database of carbohydrate-active enzymes
are included if they are associated with an Enzyme Classification number.
Even though CAZy does not provide links from individual protein sequences to papers,
these should all be experimentally-characterized proteins.
- Proteins from the REBASE database
of restriction enzymes are included if they have known specificity.
- Every protein with an evidence-based reannotation (based on mutant phenotypes)
in the Fitness Browser is included.
- Sequence-specific transcription factors (including sigma factors and DNA-binding response regulators)
with experimentally-determined DNA binding sites from the
PRODORIC database of gene regulation in prokaryotes.
- Putative transcription factors from RegPrecise
that have manually-curated predictions for their binding sites. These predictions are based on
conserved putative regulatory sites across genomes that contain similar transcription factors,
so PaperBLAST clusters the TFs at 70% identity and retains just one member of each cluster.
- Coding sequence (CDS) features from the
European Nucleotide Archive (ENA)
are included if the /experiment tag is set (implying that there is experimental evidence for the annotation),
the nucleotide entry links to paper(s) in PubMed,
and the nucleotide entry is from the STD data class
(implying that these are targeted annotated sequences, not from shotgun sequencing).
Also, to filter out genes whose transcription or translation was detected, but whose function
was not studied, nucleotide entries or papers with more than 25 such proteins are excluded.
Descriptions from ENA rely on the original submitter of the
sequence and cannot be updated by others, so they may be less reliable.
Except for GeneRIF and ENA,
the curated entries include a short curated
description of the protein's function.
For entries from BioLiP, the protein's function may not be known beyond binding to the ligand.
Many of these entries also link to articles in PubMed.
For more information see the
PaperBLAST paper (mSystems 2017)
or the code.
You can download PaperBLAST's database here.
Changes to PaperBLAST since the paper was written:
- November 2023: incorporated PRODORIC and RegPrecise. Many PRODORIC entries were not linked to a protein sequence (no UniProt identifier), so we added this information.
- February 2023: BioLiP changed their download format. PaperBLAST now includes their non-redundant subset. SitesBLAST and Sites on a Tree use a larger non-redundant subset that ensures that every ligand is represented within each cluster. This should ensure that every binding site is represented.
- June 2022: incorporated some coding sequences from ENA with the /experiment tag.
- March 2022: incorporated BioLiP.
- April 2020: incorporated TCDB.
- April 2019: EuropePMC now returns table entries in their search results. This has expanded PaperBLAST's database, but most of the new entries are of low relevance, and the resulting snippets are often just lists of locus tags with annotations.
- February 2018: the alignment page reports the conservation of the hit's functional sites (if available from from Swiss-Prot or UniProt)
- January 2018: incorporated BRENDA.
- December 2017: incorporated MetaCyc, CharProtDB, CAZy, REBASE, and the reannotations from the Fitness Browser.
- September 2017: EuropePMC no longer returns some table entries in their search results. This has shrunk PaperBLAST's database, but has also reduced the number of low-relevance hits.
Many of these changes are described in Interactive tools for functional annotation of bacterial genomes.
PaperBLAST cannot provide snippets for many of the papers that are
published in non-open-access journals. This limitation applies even if
the paper is marked as "free" on the publisher's web site and is
available in PubmedCentral or EuropePMC. If a journal that you publish
in is marked as "secret," please consider publishing elsewhere.
Many important articles are missing from PaperBLAST, either because
the article's full text is not in EuropePMC (as for many older
articles), or because the paper does not mention a protein identifier such as a locus tag, or because of PaperBLAST's heuristics. If you notice an
article that characterizes a protein's function but is missing from
PaperBLAST, please notify the curators at UniProt
or add an entry to GeneRIF.
Entries in either of these databases will eventually be incorporated
into PaperBLAST. Note that to add an entry to UniProt, you will need
to find the UniProt identifier for the protein. If the protein is not
already in UniProt, you can ask them to create an entry. To add an
entry to GeneRIF, you will need an NCBI Gene identifier, but
unfortunately many prokaryotic proteins in RefSeq do not have
corresponding Gene identifers.
References
PaperBLAST: Text-mining papers for information about homologs.
M. N. Price and A. P. Arkin (2017). mSystems, 10.1128/mSystems.00039-17.
Europe PMC in 2017.
M. Levchenko et al (2017). Nucleic Acids Research, 10.1093/nar/gkx1005.
Gene indexing: characterization and analysis of NLM's GeneRIFs.
J. A. Mitchell et al (2003). AMIA Annu Symp Proc 2003:460-464.
UniProt: the universal protein knowledgebase.
The UniProt Consortium (2016). Nucleic Acids Research, 10.1093/nar/gkw1099.
BRENDA in 2017: new perspectives and new tools in BRENDA.
S. Placzek et al (2017). Nucleic Acids Research, 10.1093/nar/gkw952.
The EcoCyc database: reflecting new knowledge about Escherichia coli K-12.
I. M. Keeseler et al (2016). Nucleic Acids Research, 10.1093/nar/gkw1003.
The MetaCyc database of metabolic pathways and enzymes.
R. Caspi et al (2018). Nucleic Acids Research, 10.1093/nar/gkx935.
CharProtDB: a database of experimentally characterized protein annotations.
R. Madupu et al (2012). Nucleic Acids Research, 10.1093/nar/gkr1133.
The carbohydrate-active enzymes database (CAZy) in 2013.
V. Lombard et al (2014). Nucleic Acids Research, 10.1093/nar/gkt1178.
The Transporter Classification Database (TCDB): recent advances
M. H. Saier, Jr. et al (2016). Nucleic Acids Research, 10.1093/nar/gkv1103.
REBASE - a database for DNA restriction and modification: enzymes, genes and genomes.
R. J. Roberts et al (2015). Nucleic Acids Research, 10.1093/nar/gku1046.
Deep annotation of protein function across diverse bacteria from mutant phenotypes.
M. N. Price et al (2016). bioRxiv, 10.1101/072470.
by Morgan Price,
Arkin group
Lawrence Berkeley National Laboratory