PaperBLAST

PaperBLAST – Find papers about a protein or its homologs

PaperBLAST

PaperBLAST Hits for SwissProt::O32332 PTS system glucitol/sorbitol-specific EIIC component; EIIC-Gut; Glucitol/sorbitol permease IIC component (Clostridium beijerinckii (strain ATCC 51743 / NCIMB 8052) (Clostridium acetobutylicum)) (182 a.a., MDAIVYFAKG...)

Other sequence analysis tools:

Find functional residues: SitesBLAST

Search for conserved domains

Find the best match in UniProt

Compare to protein structures

Predict transmenbrane helices: Phobius

Predict protein localization: PSORTb

Find homologs in fast.genomics

Fitness BLAST: loading...

Show query sequence

Found 15 similar proteins in the literature:

PTHC_CLOB8 / O32332 PTS system glucitol/sorbitol-specific EIIC component; EIIC-Gut; Glucitol/sorbitol permease IIC component from Clostridium beijerinckii (strain ATCC 51743 / NCIMB 8052) (Clostridium acetobutylicum) (see paper)
TC 4.A.4.1.2 / O32332 Glucitol/sorbitol permease IIC component, component of The Glucitol Enzyme II complex, IICBC (GutA1A2) IIA (GutB) from Clostridium beijerinckii (strain ATCC 51743 / NCIMB 8052) (see paper)
gutA1 / CAA05513.1 GutA1 from Clostridium beijerinckii (see paper)
Cbei_0336 PTS system, glucitol/sorbitol-specific, IIC subunit from Clostridium beijerincki NCIMB 8052
100% identity, 100% coverage

function: The phosphoenolpyruvate-dependent sugar phosphotransferase system (PTS), a major carbohydrate active transport system, catalyzes the phosphorylation of incoming sugar substrates concomitant with their translocation across the cell membrane. The enzyme II complex composed of SrlA, SrlB and SrlE is involved in glucitol/sorbitol transport.
substrates: glucitol
Transcriptomic characterization of recombinant Clostridium beijerinckii NCIMB 8052 expressing methylglyoxal synthase and glyoxal reductase from Clostridium pasteurianum ATCC 6013
Kumar, Applied and environmental microbiology 2024
- “...transporter subunit IIC 2.34 1.01E 32 Cbei_2534 feoA ABR34690.1 FeoA family protein 1.95 6.46E 09 Cbei_0336 srlA ABR32524.1 PTS system glucitol/sorbitol-specific transporter subunit IIC 1.67 8.70E 33 Cbei_2535 feoB ABR34691.1 Ferrous iron transport protein B 1.66 2.47E 84 Cbei_0337 srlE ABR32525.1 PTS system glucitol/sorbitol-specific transporter subunit...”
Transcriptional analysis of Clostridium beijerinckii NCIMB 8052 to elucidate role of furfural stress during acetone butanol ethanol fermentation
Zhang, Biotechnology for biofuels 2013
- “...(Cbei_4558), sorbose subfamily transporter subunit IIB (Cbei_4559), mannose-6-phosphate isomerase (Cbei_0996), and glucitol/sorbitol-specific transporter subunit IIC (Cbei_0336) (Figure 1 B). Besides the listed genes, N-acetylglucosamine-specific IIBC subunit (Cbei_4532) and glucose subfamily transporter subunit IIA (Cbei_4533) are also involved in amino sugar and nucleotide sugar metabolism (cbe00520) (Additional...”

SPSF3K_00182 PTS glucitol/sorbitol transporter subunit IIC from Streptococcus parauberis
59% identity, 99% coverage

Transcriptome analysis unveils survival strategies of Streptococcus parauberis against fish serum
Lee, PloS one 2021
- “...deaminase 4.6 2.3 1.7 G / SPSF3K_02218 Fic family protein 4.3 1.6 0.7 D srlA SPSF3K_00182 Glucitol/sorbitol permease IIC component - -4.1 -3.5 G srlE SPSF3K_00183 Protein-N(pi)-phosphohistidinesugar phosphotransferase - -4.1 -3.6 G srlB SPSF3K_00184 Protein-N(pi)-phosphohistidinesugar phosphotransferase - -3.4 -3.3 G ptsG SPSF3K_00506 Protein-N(pi)-phosphohistidinesugar phosphotransferase -1.0 -3.1...”

SMU_311 PTS glucitol/sorbitol transporter subunit IIC from Streptococcus mutans UA159
59% identity, 98% coverage

Inhibitory Effect of Adsorption of Streptococcus mutans onto Scallop-Derived Hydroxyapatite
Usuda, International journal of molecular sciences 2023
- “...Among the upregulated genes, 5 of the 6 ( citG2 , glgD , trk , SMU_311, and SMU_1487, but not SMU_1230c) were in a network of 5 genes, with the greatest interaction around glgD ( Figure 3 a). In contrast, only 3 of the 15 downregulated...”
Cnm of Streptococcus mutans is important for cell surface structure and membrane permeability
Naka, Frontiers in cellular and infection microbiology 2022
- “...SMU_1067c ABC transporter permease 1029512 SMU_1067c 3.304 SMU_803c ABC transporter ATP-binding protein 1029385 SMU_803c 3.297 SMU_311 PTS system sorbitol (glucitol) transporter subunit IIC2 1028201 SMU_311 3.273 Gene name Description NCBI Gene ID Locus Tag Fold-change SMU_1897 ABC transporter ATP-binding protein 1029101 SMU_1897 3.138 SMU_312 PTS system...”
A five-species transcriptome array for oral mixed-biofilm studies
Redanz, PloS one 2011
- “...down Metabolism, EnvironmentalInformation Processing SMU_2047 ptsG - putative PTS system, glucose-specific IIABC component 2.13 up SMU_311 PTS system, sorbitol (glucitol) phosphotransferase enzyme IIC2 3.41 up SMU_312 PTS system, sorbitol phosphotransferase enzyme IIBC 2.93 up SMU_313 putative PTS system, sorbitol-specific enzyme IIA 3.86 up Genetic information processing...”

lp_3620 sorbitol PTS, EIIC from Lactobacillus plantarum WCFS1
57% identity, 97% coverage

Transcriptome signatures of class I and III stress response deregulation in Lactobacillus plantarum reveal pleiotropic adaptation
Van, Microbial cell factories 2013
- “...1.33 1.26 lp_3619 pts37BC Sorbitol PTS, EIIBC 1.710 -6 2.15 1.31 2.50 2.68 1.40 1.23 lp_3620 pts37C Sorbitol PTS, EIIC 1.710 -6 1.00 1.33 1.10 1.88 1.19 1.46 lp_3621 srlM1 Sorbitol operon activator 1.710 -6 1.39 1.17 2.13 2.22 1.08 1.40 lp_3622 srlR1 Sorbitol operon transcription...”

lp_3654 sorbitol PTS, EIIC from Lactobacillus plantarum WCFS1
lp_3654 PTS glucitol/sorbitol transporter subunit IIC from Lactiplantibacillus plantarum WCFS1
58% identity, 99% coverage

Butanol Tolerance of Lactiplantibacillus plantarum: A Transcriptome Study
Petrov, Genes 2021
- “...both strains, 2.48-fold in 8-1 and 4.35-fold in Ym1, is fructose-specific. Two genes for transporters (lp_3654 and lp_0286, celB ) are uniquely upregulated in Ym1, the first for sorbitol (2.29-fold) and the second for cellobiose (2.55-fold). The sugar uptake in Ym1 under butanol stress is much...”
- “...PTS mannitol transporter subunit IICBA +6.48 3.22 lp_2097, fruA PTS transporter subunit EIIA +4.35 +2.48 lp_3654, pts38C PTS sorbitol transporter subunit IIC +2.29 NC lp_0286, pts6C PTS cellobiose transporter subunit IIC +2.55 NC lp_2531, pts18CBA PTS transporter subunit EIIC NC +3.22 lp_0886, pts11BC PTS transporter subunit...”

SEN2673 PTS system, glucitol/sorbitol-specific IIBC component from Salmonella enterica subsp. enterica serovar Enteritidis str. P125109
53% identity, 96% coverage

Global transcriptomic analysis of ethanol tolerance response in Salmonella Enteritidis
He, Current research in food science 2022
- “...6.52 PTS system mannose-specific transporter subunit IID SEN1206 manY 5.13 Phosphotransferase enzyme II, C component SEN2673 srlA 2.83 PTS system glucitol/sorbitol-specific transporter subunit IIBC SEN2197 fruA 3.09 Fructose PTS system EIIA component SEN2675 slrB 5.09 PTS system glucitol/sorbitol-specific transporter subunit IIA SEN2674 srlE 4.91 PTS system...”
- “...6.52 PTS system mannose-specific transporter subunit IID SEN2197 fruA 3.09 Fructose PTS system EIIA component SEN2673 srlA 2.83 PTS system glucitol/sorbitol-specific transporter subunit IIBC SEN2675 slrB 5.09 PTS system glucitol/sorbitol-specific transporter subunit IIA SEN2674 srlE 4.91 PTS system glucitol/sorbitol-specific transporter subunit IIBC Bacterial secretion systems SEN1636...”

PM1971 unknown from Pasteurella multocida subsp. multocida str. Pm70
52% identity, 98% coverage

Transcriptional response of Pasteurella multocida to defined iron sources
Paustian, Journal of bacteriology 2002
- “...PM0435 and PM0436 (phosphate), PM0914 (peptide), and PM1971 and PM1972 (carbohydrate and an unknown product, respectively). Genes expressed at lower levels...”

PTHC_ERWAM / O32521 PTS system glucitol/sorbitol-specific EIIC component; EIIC-Gut; Glucitol/sorbitol permease IIC component from Erwinia amylovora (Fire blight bacteria) (see paper)
EAM_RS02625 PTS glucitol/sorbitol transporter subunit IIC from Erwinia amylovora ATCC 49946
52% identity, 98% coverage

function: The phosphoenolpyruvate-dependent sugar phosphotransferase system (PTS), a major carbohydrate active transport system, catalyzes the phosphorylation of incoming sugar substrates concomitant with their translocation across the cell membrane. The enzyme II complex composed of SrlA, SrlB and SrlE is involved in glucitol/sorbitol transport.
A complete twelve-gene deletion null mutant reveals that cyclic di-GMP is a global regulator of phase-transition and host colonization in Erwinia amylovora
Kharadi, PLoS pathogens 2022
- “...protein CDS hypothetical protein -4.127207088 0 EAM_RS06765 yccA CDS FtsH protease modulator YccA -4.094370008 0 EAM_RS02625 glucitol/sorbitol permease IIC component CDS glucitol/sorbitol permease IIC component -4.082619297 0 EAM_RS01695 acs CDS acetateCoA ligase -3.99767443 4.442E-206 EAM_RS02150 groL CDS chaperonin GroEL -3.934347101 3.1088E-288 EAM_RS12125 dihydrodipicolinate synthase family protein...”

UGYR_RS07350 PTS glucitol/sorbitol transporter subunit IIC from Yersinia ruckeri
52% identity, 99% coverage

Comparative genome analysis reveals important genetic differences among serotype O1 and serotype O2 strains of Y. ruckeri and provides insights into host adaptation and virulence
Cascales, MicrobiologyOpen 2017
- “...NJ56_RS11145 UGYR_RS07355 PTS glucitol/sorbitol transporter subunit IIA NJ56_RS11150 UGYR_RS07360 PTS glucitol/sorbitol transporter subunit IIC NJ56_RS11140 UGYR_RS07350 Other proteins Transcriptional regulator LysR family NJ56_RS10085 UGYR_RS06300 Sadenosylhomocysteine hydrolase NJ56_RS02625 UGYR_RS12500 rimosomal protein NJ56_RS14055 UGYR_RS02835 Outer memb component of tripartite multidrug resistance system NJ56_RS10260 UGYR_RS06475 Chromosome partitioning protein ParA...”

STM14_RS15195 PTS glucitol/sorbitol transporter subunit IIC from Salmonella enterica subsp. enterica serovar Typhimurium str. 14028S
STM2832 PTS family, glucitol/sorbitol-specific enzyme IIC component,one of two IIC components from Salmonella typhimurium LT2
52% identity, 96% coverage

Salmonella enterica Serovar Typhimurium 14028s Genomic Regions Required for Colonization of Lettuce Leaves
Montano, Frontiers in microbiology 2020
- “...33,516 STM14_RS22490 to STM14_RS22630 Mut2 K_77/78_F03 36188353626190 7,355 STM14_RS18330 to STM14_RS18370 Mut3 C_03_H10 29986483042149 43,501 STM14_RS15195 to STM14_RS15425 Mut4 C_01_H4 24510612455149 4,088 STM14_RS12670 to STM14_RS12690 Mut5 C_01_G2 20160002046442 30,442 STM14_RS10460 to STM14_RS10615 Mut6 C_01_F12 19480411981245 33,204 STM14_RS10090 to STM14_RS10285 Mut7 C_01_E9 15727541583690 10,936 STM14_RS08285 to STM14_RS08335...”
Genetic Determinants of Salmonella enterica Serovar Typhimurium Proliferation in the Cytosol of Epithelial Cells
Wrande, Infection and immunity 2016
- “...(STM2832-STM2877 [this mutant contains a deletion of genes STM2832 to STM2877] and STM4565- STM4579), so their phenotypes could not be confirmed in this cell...”

CBG46_03170 PTS glucitol/sorbitol transporter subunit IIC from Actinobacillus succinogenes
51% identity, 98% coverage

Comparative Transcriptome Analysis Reveals the Molecular Mechanisms of Acetic Acid Reduction by Adding NaHSO3 in Actinobacillus succinogenes GXAS137
Li, Polish journal of microbiology 2023
- “...in cell NADH regeneration. Two kinds of sugar phosphotransferase system (PTS) transporter genes (CBG46_00530 and CBG46_03170) and maltose ATP-binding cassette (ABC) transporter gene were up-regulated (log 2 FC 1.5), which are conducive to the absorption of carbohydrates. ATPase (CBG46_04725) was highly up-regulated (log 2 FC =...”

c3256 PTS system, glucitol/sorbitol-specific IIC2 component from Escherichia coli CFT073
UTI89_C3064 PTS system, glucitol/sorbitol-specific IIC2 component from Escherichia coli UTI89
51% identity, 96% coverage

Antimicrobial resistance and prescribing in England, Wales and Northern Ireland, 2008
, 2008
Metabolic Requirements of Escherichia coli in Intracellular Bacterial Communities during Urinary Tract Infection Pathogenesis
Conover, mBio 2016
- “...name Fold change Function a UTI89_C4028 chuA 13.6170702 OM hemin receptor UTI89_C4030 12.62532234 Hypothetical protein UTI89_C3064 srlA 8.533024788 PTS, glucitol/sorbitol-specific IIC2 component UTI89_C4027 chuS 8.363228798 Putative hemin/Hb transport protein UTI89_C2178 ybtS 6.132778168 Salicylate synthase UTI89_C4033 chuT 5.365860462 Putative periplasmic binding protein with FepB/HutB regions UTI89_C1122 iroB...”

YE1098 pts system, glucitol/sorbitol-specific iic2 component from Yersinia enterocolitica subsp. enterocolitica 8081
51% identity, 99% coverage

Comparison of Yersinia enterocolitica DNA Methylation at Ambient and Host Temperatures
Van, Epigenomes 2023
- “...a hypothetical protein with a Dam site 74 bp 5 from the start codon; and YE1098 with the Dam site 51 bp 5 from the start codon. This gene encodes GutA, also referred to as SrlA, a glucitol/sorbitol-specific IIC2 component, a subunit of the phoshotransferase system...”
- “...et al. [ 41 ], however, reveals little variation in the temperature expression of YE1259. YE1098, coding for GutA, also showed the same pattern of methylation as YE1259 (). Van der Woude et al. [ 52 ] reported that the Dam site 44 bp from the...”

SrlA / b2702 sorbitol-specific PTS enzyme IIC₂ component (EC 2.7.1.198; EC 2.7.1.197) from Escherichia coli K-12 substr. MG1655 (see 3 papers)
SrlA / P56579 sorbitol-specific PTS enzyme IIC₂ component (EC 2.7.1.198; EC 2.7.1.197) from Escherichia coli (strain K12) (see 3 papers)
PTHC_ECOLI / P56579 PTS system glucitol/sorbitol-specific EIIC component; EIIC-Gut; Glucitol/sorbitol permease IIC component from Escherichia coli (strain K12) (see 2 papers)
TC 4.A.4.1.1 / P56579 PTHC aka SRLA aka GUTA aka SBL aka B2702, component of Glucitol porter from Escherichia coli (see 7 papers)
b2702 glucitol/sorbitol-specific enzyme IIC component of PTS from Escherichia coli str. K-12 substr. MG1655
51% identity, 96% coverage

function: The phosphoenolpyruvate-dependent sugar phosphotransferase system (PTS), a major carbohydrate active transport system, catalyzes the phosphorylation of incoming sugar substrates concomitant with their translocation across the cell membrane. The enzyme II complex composed of SrlA, SrlB and SrlE is involved in glucitol/sorbitol transport. It can also use D-mannitol.
substrates: glucitol
The two-component system histidine kinase EnvZ contributes to Avian pathogenic Escherichia coli pathogenicity by regulating biofilm formation and stress responses
Fu, Poultry science 2023
- “...Putative PTS multi-phosphoryl transfer protein PtsA 2.20 b2704 srlB PTS system, glucitol/sorbitol-specific IIA component 2.00 b2702 srlA PTS system, glucitol/sorbitol-specific IIC2 component 1.19 b2167 fruA PTS system, fructose-specific IIBC component 1.93 b1737 celB PTS system, cellobiose-specific IIC component 1.59 b3599 mtlA Fused mannitol-specific PTS enzymes: IIA...”
Human body temperature (37degrees C) increases the expression of iron, carbohydrate, and amino acid utilization genes in Escherichia coli K-12
White-Ziegler, Journal of bacteriology 2007
- “...b2013 b2579 database b1172 b2191 b0162 b2423 b4037 b2972 b2702 b2703 b0556 b0557 b1495 b1919 b2670 b3217 b3964 b4326 b0458 b1147 Change (n-fold) at 37/23Cc...”
DNA microarray analyses of the long-term adaptive response of Escherichia coli to acetate and propionate
Polen, Applied and environmental microbiology 2003
- “...protein Putative transport protein 1.65* 0.97 2.33* 1.54* 1.20 0.87 b2702 b2703 b2704 b2705 b2706 b2707 b2708 srlA1 srlA2 srlB srlD gutM srlR gutQ 2 2 2 2 2...”
- “...0.51* 0.60* b0929 ompF 1 Outer membrane protein 1a (la;b;F) 2.18* b2702 b2703 b2704 b2705 b2706 b2707 b2708 srlA1 srlA2 srlB srlD gutM srlR gutQ 2 2 2 2 2 2 2...”
Third International Workshop on Reactive Arthritis. 23-26 September 1995, Berlin, Germany. Report and abstracts
Kingsley, Annals of the rheumatic diseases 1996

lmo0544 similar to PTS system, glucitol/sorbitol-specific enzyme II CII component from Listeria monocytogenes EGD-e
45% identity, 95% coverage

Transcriptomic analysis of Listeria monocytogenes biofilm formation at different times
Gou, Canadian journal of veterinary research = Revue canadienne de recherche veterinaire 2023
- “...the quorum sensing, and the 2-component system. The top 5 upregulated DEGs were lmo0024, lmo0374, lmo0544, hly, and lmo2434. The top 5 downregulated DEGs were lmo2192, lmo1211, cheY, lmo0689, and secY. After real-time quantitative polymerase chain reaction, the expression of these 10 DEGs were consistent with...”
DegU-mediated suppression of carbohydrate uptake in Listeria monocytogenes increases adaptation to oxidative stress
Chen, Applied and environmental microbiology 2023 (secret)
Listeria monocytogenes GshF contributes to oxidative stress tolerance via regulation of the phosphoenolpyruvate-carbohydrate phosphotransferase system
Chen, Microbiology spectrum 2023
- “...lmo0542 PTS sorbitol transporter subunit IIA lmo0543 3.25 Yes/down lmo0543 PTS sorbitol transporter subunit IIBC lmo0544 4.74 Yes/down lmo0544 PTS sorbitol transporter subunit IIC lmo0631 6.83 Yes/down lmo0631 PTS fructose transporter subunit IIA lmo0632 3.13 Yes/down lmo0632 PTS fructose transporter subunit IIC lmo0633 lmo0633 PTS fructose...”
New Insights into the Lactic Acid Resistance Determinants of Listeria monocytogenes Based on Transposon Sequencing and Transcriptome Sequencing Analyses
Liu, Microbiology spectrum 2023
- “...sorbitol transporter subunit IIA 0.1163 2.30E-02 lmo0543 lmo0543 PTS sorbitol transporter subunit IIBC 0.1233 5.02E-05 lmo0544 lmo0544 PTS sorbitol transporter subunit IIC 0.0485 9.23E-05 lmo0738 lmo0738 PTS beta-glucoside transporter subunit IIABC 0.0009 0.00E+00 lmo0781 lmo0781 PTS mannose transporter subunit IID 0.4964 3.58E-05 lmo0874 lmo0874 PTS sugar...”
A Machine Learning Model for Food Source Attribution of Listeria monocytogenes
Tanui, Pathogens (Basel, Switzerland) 2022
- “...0.7952 0.7781 0.6482 0.6923 lmo0625 lmo0625 Putative lipase/acylhydrolase 0.6548 0.6242 0.6813 0.7945 0.743 0.6242 0.6548 lmo0544 srlA PTS sorbitol transporter subunit IIC 0.7125 0.6483 0.7073 0.7928 0.7713 0.6483 0.7125 lmo2728 mlrA Transcriptional regulator, MerR family protein 0.62 0.6322 0.6294 0.7909 0.6994 0.6041 0.6322 lmo2348 lmo2348 Amino...”
Blue Light Sensing in Listeria monocytogenes Is Temperature-Dependent and the Transcriptional Response to It Is Predominantly SigB-Dependent
Dorey, Frontiers in microbiology 2019
- “...In contrast to the wild-type, the sigB mutant significantly increased the transcription of rli78 and lmo0544 and significantly decreased the transcription of lmo0481 and lmo2818. These genes were distributed across several functional categories ( Table 3 ), with three being identified as transporters ( lmo0544 ,...”
- “...visible light, in a sigB mutant. Gene name Log 2 fold change Functional category RAST_product lmo0544 2.39 Transport/binding proteins and lipoproteins PTS system, glucitol/sorbitol-specific IIC component RatA-1 (rli78) 1.03 sRNA Unknown lmo2346 1.00 From other organisms ThiJ/PfpI family protein lmo2343 1.04 Detoxification Coenzyme F420-dependent N5,N10-methylene tetrahydromethanopterin...”

New Search

For advice on how to use these tools together, see Interactive tools for functional annotation of bacterial genomes.

Statistics

The PaperBLAST database links 793,807 different protein sequences to 1,259,118 scientific articles. Searches against EuropePMC were last performed on March 13 2025.

How It Works

PaperBLAST builds a database of protein sequences that are linked to scientific articles. These links come from automated text searches against the articles in EuropePMC and from manually-curated information from GeneRIF, UniProtKB/Swiss-Prot, BRENDA, CAZy (as made available by dbCAN), BioLiP, CharProtDB, MetaCyc, EcoCyc, TCDB, REBASE, the Fitness Browser, and a subset of the European Nucleotide Archive with the /experiment tag. Given this database and a protein sequence query, PaperBLAST uses protein-protein BLAST to find similar sequences with E < 0.001.

To build the database, we query EuropePMC with locus tags, with RefSeq protein identifiers, and with UniProt accessions. We obtain the locus tags from RefSeq or from MicrobesOnline. We use queries of the form "locus_tag AND genus_name" to try to ensure that the paper is actually discussing that gene. Because EuropePMC indexes most recent biomedical papers, even if they are not open access, some of the links may be to papers that you cannot read or that our computers cannot read. We query each of these identifiers that appears in the open access part of EuropePMC, as well as every locus tag that appears in the 500 most-referenced genomes, so that a gene may appear in the PaperBLAST results even though none of the papers that mention it are open access. We also incorporate text-mined links from EuropePMC that link open access articles to UniProt or RefSeq identifiers. (This yields some additional links because EuropePMC uses different heuristics for their text mining than we do.)

For every article that mentions a locus tag, a RefSeq protein identifier, or a UniProt accession, we try to select one or two snippets of text that refer to the protein. If we cannot get access to the full text, we try to select a snippet from the abstract, but unfortunately, unique identifiers such as locus tags are rarely provided in abstracts.

PaperBLAST also incorporates manually-curated protein functions:

Proteins from NCBI's RefSeq are included if a GeneRIF entry links the gene to an article in PubMed^®. GeneRIF also provides a short summary of the article's claim about the protein, which is shown instead of a snippet.
Proteins from Swiss-Prot (the curated part of UniProt) are included if the curators identified experimental evidence for the protein's function (evidence code ECO:0000269). For these proteins, the fields of the Swiss-Prot entry that describe the protein's function are shown (with bold headings).
Proteins from BRENDA, a curated database of enzymes, are included if they are linked to a paper in PubMed and their full sequence is known.
Every protein from the non-redundant subset of BioLiP, a database of ligand-binding sites and catalytic residues in protein structures, is included. Since BioLiP itself does not include descriptions of the proteins, those are taken from the Protein Data Bank. Descriptions from PDB rely on the original submitter of the structure and cannot be updated by others, so they may be less reliable. (For SitesBLAST and Sites on a Tree, we use a larger subset of BioLiP so that every ligand is represented among a group of structures with similar sequences, but for PaperBLAST, we use the non-redundant set provided by BioLiP.)
Every protein from EcoCyc, a curated database of the proteins in Escherichia coli K-12, is included, regardless of whether they are characterized or not.
Proteins from the MetaCyc metabolic pathway database are included if they are linked to a paper in PubMed and their full sequence is known.
Proteins from the Transport Classification Database (TCDB) are included if they have known substrate(s), have reference(s), and are not described as uncharacterized or putative. (Some of the references are not visible on the PaperBLAST web site.)
Every protein from CharProtDB, a database of experimentally characterized protein annotations, is included.
Proteins from the CAZy database of carbohydrate-active enzymes are included if they are associated with an Enzyme Classification number. Even though CAZy does not provide links from individual protein sequences to papers, these should all be experimentally-characterized proteins.
Proteins from the REBASE database of restriction enzymes are included if they have known specificity.
Every protein with an evidence-based reannotation (based on mutant phenotypes) in the Fitness Browser is included.
Sequence-specific transcription factors (including sigma factors and DNA-binding response regulators) with experimentally-determined DNA binding sites from the PRODORIC database of gene regulation in prokaryotes.
Putative transcription factors from RegPrecise that have manually-curated predictions for their binding sites. These predictions are based on conserved putative regulatory sites across genomes that contain similar transcription factors, so PaperBLAST clusters the TFs at 70% identity and retains just one member of each cluster.
Coding sequence (CDS) features from the European Nucleotide Archive (ENA) are included if the /experiment tag is set (implying that there is experimental evidence for the annotation), the nucleotide entry links to paper(s) in PubMed, and the nucleotide entry is from the STD data class (implying that these are targeted annotated sequences, not from shotgun sequencing). Also, to filter out genes whose transcription or translation was detected, but whose function was not studied, nucleotide entries or papers with more than 25 such proteins are excluded. Descriptions from ENA rely on the original submitter of the sequence and cannot be updated by others, so they may be less reliable.

Except for GeneRIF and ENA, the curated entries include a short curated description of the protein's function. For entries from BioLiP, the protein's function may not be known beyond binding to the ligand. Many of these entries also link to articles in PubMed.

For more information see the PaperBLAST paper (mSystems 2017) or the code. You can download PaperBLAST's database here.

Changes to PaperBLAST since the paper was written:

November 2023: incorporated PRODORIC and RegPrecise. Many PRODORIC entries were not linked to a protein sequence (no UniProt identifier), so we added this information.
February 2023: BioLiP changed their download format. PaperBLAST now includes their non-redundant subset. SitesBLAST and Sites on a Tree use a larger non-redundant subset that ensures that every ligand is represented within each cluster. This should ensure that every binding site is represented.
June 2022: incorporated some coding sequences from ENA with the /experiment tag.
March 2022: incorporated BioLiP.
April 2020: incorporated TCDB.
April 2019: EuropePMC now returns table entries in their search results. This has expanded PaperBLAST's database, but most of the new entries are of low relevance, and the resulting snippets are often just lists of locus tags with annotations.
February 2018: the alignment page reports the conservation of the hit's functional sites (if available from from Swiss-Prot or UniProt)
January 2018: incorporated BRENDA.
December 2017: incorporated MetaCyc, CharProtDB, CAZy, REBASE, and the reannotations from the Fitness Browser.
September 2017: EuropePMC no longer returns some table entries in their search results. This has shrunk PaperBLAST's database, but has also reduced the number of low-relevance hits.

Many of these changes are described in Interactive tools for functional annotation of bacterial genomes.

Secrets

PaperBLAST cannot provide snippets for many of the papers that are published in non-open-access journals. This limitation applies even if the paper is marked as "free" on the publisher's web site and is available in PubmedCentral or EuropePMC. If a journal that you publish in is marked as "secret," please consider publishing elsewhere.

Omissions from the PaperBLAST Database

Many important articles are missing from PaperBLAST, either because the article's full text is not in EuropePMC (as for many older articles), or because the paper does not mention a protein identifier such as a locus tag, or because of PaperBLAST's heuristics. If you notice an article that characterizes a protein's function but is missing from PaperBLAST, please notify the curators at UniProt or add an entry to GeneRIF. Entries in either of these databases will eventually be incorporated into PaperBLAST. Note that to add an entry to UniProt, you will need to find the UniProt identifier for the protein. If the protein is not already in UniProt, you can ask them to create an entry. To add an entry to GeneRIF, you will need an NCBI Gene identifier, but unfortunately many prokaryotic proteins in RefSeq do not have corresponding Gene identifers.

References

PaperBLAST: Text-mining papers for information about homologs.
M. N. Price and A. P. Arkin (2017). mSystems, 10.1128/mSystems.00039-17.

Europe PMC in 2017.
M. Levchenko et al (2017). Nucleic Acids Research, 10.1093/nar/gkx1005.

Gene indexing: characterization and analysis of NLM's GeneRIFs.
J. A. Mitchell et al (2003). AMIA Annu Symp Proc 2003:460-464.

UniProt: the universal protein knowledgebase.
The UniProt Consortium (2016). Nucleic Acids Research, 10.1093/nar/gkw1099.

BRENDA in 2017: new perspectives and new tools in BRENDA.
S. Placzek et al (2017). Nucleic Acids Research, 10.1093/nar/gkw952.

The EcoCyc database: reflecting new knowledge about Escherichia coli K-12.
I. M. Keeseler et al (2016). Nucleic Acids Research, 10.1093/nar/gkw1003.

The MetaCyc database of metabolic pathways and enzymes.
R. Caspi et al (2018). Nucleic Acids Research, 10.1093/nar/gkx935.

CharProtDB: a database of experimentally characterized protein annotations.
R. Madupu et al (2012). Nucleic Acids Research, 10.1093/nar/gkr1133.

The carbohydrate-active enzymes database (CAZy) in 2013.
V. Lombard et al (2014). Nucleic Acids Research, 10.1093/nar/gkt1178.

The Transporter Classification Database (TCDB): recent advances
M. H. Saier, Jr. et al (2016). Nucleic Acids Research, 10.1093/nar/gkv1103.

REBASE - a database for DNA restriction and modification: enzymes, genes and genomes.
R. J. Roberts et al (2015). Nucleic Acids Research, 10.1093/nar/gku1046.

Deep annotation of protein function across diverse bacteria from mutant phenotypes.
M. N. Price et al (2016). bioRxiv, 10.1101/072470.

by Morgan Price, Arkin group
Lawrence Berkeley National Laboratory