PaperBLAST

PaperBLAST – Find papers about a protein or its homologs

PaperBLAST

PaperBLAST Hits for reanno::psRCH2:GFF4196 C4-dicarboxylate transporter, DctQ subunit (Pseudomonas stutzeri RCH2) (208 a.a., MNALWRVWDH...)

Other sequence analysis tools:

Find functional residues: SitesBLAST

Search for conserved domains

Find the best match in UniProt

Compare to protein structures

Predict transmenbrane helices: Phobius

Predict protein localization: PSORTb

Find homologs in fast.genomics

Fitness BLAST: loading...

Show query sequence

Found 20 similar proteins in the literature:

Psest_4269 C4-dicarboxylate transporter, DctQ subunit from Pseudomonas stutzeri RCH2
100% identity, 100% coverage

mutant phenotype: Important for succinate and fumarate utilization (). The other components (Psest_4268 and Psest_4270) were already annotated correctly

DCTQ_PSEAE / Q9HU17 C4-dicarboxylate TRAP transporter small permease protein DctQ from Pseudomonas aeruginosa (strain ATCC 15692 / DSM 22644 / CIP 104116 / JCM 14847 / LMG 12228 / 1C / PRS 101 / PAO1) (see paper)
PA5168 probable dicarboxylate transporter from Pseudomonas aeruginosa PAO1
75% identity, 99% coverage

function: Part of the tripartite ATP-independent periplasmic (TRAP) transport system DctPQM involved in C4-dicarboxylates uptake.
subunit: The complex comprises the extracytoplasmic solute receptor protein DctP, and the two transmembrane proteins DctQ and DctM.
disruption phenotype: The dctA-dctPQM double mutant shows no growth on malate and fumarate and residual growth on succinate.
Catabolite repression control protein antagonist, a novel player in Pseudomonas aeruginosa carbon catabolite repression control
Sonnleitner, Frontiers in microbiology 2023
- “...Probable chaperone PA3366 9.76 4.52E-43 amiE Aliphatic amidase PA4022 3.00 2.41E-04 hdhA Hydrazone dehydrogenase, HdhA PA5168 2.41 1.29E-02 dctQ C(4)-dicarboxylate transport system protein DctQ PA5169 3.27 3.02E-04 dctM C(4)-dicarboxylate transport system protein DctM These transcripts are also up-regulated under the same growth conditions in PAO1 hfq...”
Differential transcription profiling of the phage LUZ19 infection process in different growth media
Brandão, RNA biology 2021 (secret)
The development of a new parameter for tracking post-transcriptional regulation allows the detailed map of the Pseudomonas aeruginosa Crc regulon
Corona, Scientific reports 2018
- “...periplasmic binding protein 0,8 1,32 2,95 Transport dctP PA5167 DctP 2,07 2,44 4,87 Transport dctQ PA5168 DctQ 1,98 2 3,72 Transport PA5217 Probable binding protein component of ABC iron transporter 0,01 0,83 2,37 Catabolism dadX PA5302 Catabolic alanine racemase 1,34 1,41 2,67 Catabolism dadA PA5304 D-amino...”
Additive Effects of Quorum Sensing Anti-Activators on Pseudomonas aeruginosa Virulence Traits and Transcriptome
Asfahl, Frontiers in microbiology 2017
- “...3.3 PA5156 hypothetical protein 1.8 NC NC NC PA5167 dctP DctP 3.9 NC NC NC PA5168 dctQ DctQ 4.3 NC NC NC PA5169 dctM DctM 4.9 NC NC NC PA5194 hypothetical protein 1.7 NC NC NC PA5250 conserved hypothetical protein 1.7 NC NC NC PA5251 hypothetical...”
Novel targets of the CbrAB/Crc carbon catabolite control system revealed by transcript abundance in Pseudomonas aeruginosa
Sonnleitner, PloS one 2012
- “...binding protein PA5167 dctP 4,35 4,05 3,37 6,85 probable c4-dicarboxylate-binding protein AAGAACAA (20 to 13) PA5168 dctQ 2,19 2,12 6,54 6,4 probable dicarboxylate transporter AAUAAGAA (20 to 13) PA5169 dctM 2,41 2,2 6,72 10,69 probable C4-dicarboxylate transporter PA5220 2,38 4,08 hypothetical protein AAGAACAACAAGAA (31 to 18)...”
Identification of C(4)-dicarboxylate transport systems in Pseudomonas aeruginosa PAO1
Valentini, Journal of bacteriology 2011
- “...further studied. Its Tn5 insertion was mapped to the PA5168 gene. This gene is proposed to be a probable dicarboxylate transporter gene showing 47% similarity...”
- “...gene product of R. capsulatus (55). Furthermore, the PA5168 gene is annotated as being part of the PA5167-PA5169 operon, encoding a TRAP-type C4-dicarboxylate...”
Effect of anaerobiosis and nitrate on gene expression in Pseudomonas aeruginosa
Filiatrault, Infection and immunity 2005
- “...PA4920 PA4921 PA5023 PA5048 PA5117 PA5118 PA5139 PA5167 PA5168 PA5169 PA5275 PA5296 PA5304 PA5372 PA5415 PA5436 PA5446 PA5448 PA5460 PA5496 PA5553 PA5554 PA5556...”
DNA microarrays in analysis of quorum sensing: strengths and limitations
Vasil, Journal of bacteriology 2003
- “...(hypothetical proteins) PA4442 (cysN, amino acid biosynthesis) PA5168 (dicarboxylate transporter) a 2064 GUEST COMMENTARY clearly not the case, and thus these...”
More

PA0885 probable C4-dicarboxylate transporter from Pseudomonas aeruginosa PAO1
64% identity, 97% coverage

Reverse engineering antibiotic sensitivity in a multidrug-resistant Pseudomonas aeruginosa isolate
Struble, Antimicrobial agents and chemotherapy 2006
- “...M98270cds3 a Permeability/membrane PA0013 PA0203 PA0450 PA0786 PA0885 PA1308 PA1361 PA1386 PA1735 PA2042 PA2070 PA2219 PA2397 PA2398 PA2549 PA2853 PA3141 PA3145...”

PA14_52820 probable C4-dicarboxylate transporter from Pseudomonas aeruginosa UCBPP-PA14
63% identity, 97% coverage

Evolution of biofilm-adapted gene expression profiles in lasR-deficient clinical Pseudomonas aeruginosa isolates
Jeske, NPJ biofilms and microbiomes 2022
- “...1.29 PA14_55340 exbD2 1.52 PA14_23190 1.09 PA14_52810 dctM 1.06 PA14_55360 exbB2 1.06 PA14_26210 hisP 1.14 PA14_52820 dctQ 1.39 PA14_26220 hisM 1.33 PA14_52840 dctP 1.06 Phosphate assimilation/T2SS PA14_26230 hisQ 1.71 PA14_52900 1.21 PA14_20300 phnC 1.39 PA14_26240 hisJ 1.18 PA14_63620 lipC 1.21 PA14_20320 phnD 1.57 PA14_26260 1.14 PA14_63640...”

PM0274 unknown from Pasteurella multocida subsp. multocida str. Pm70
57% identity, 100% coverage

Transcriptional response of Pasteurella multocida to nutrient limitation
Paustian, Journal of bacteriology 2002
- “...artP PM1885 rpL10 rpL5 artQ PM0084 PM1374 trx PM1372 PM0274 PM0091 PM1367 glpX crr PM0965 tolQ rpL21 purF dod mtr pyrD PM1193 rpL9 ttrB metC t-Protein,...”

SO3135 C4-dicarboxylate transporter, putative from Shewanella oneidensis MR-1
SO_3135 TRAP transporter small permease from Shewanella oneidensis MR-1
30% identity, 89% coverage

Knock-out of SO1377 gene, which encodes the member of a conserved hypothetical bacterial protein family COG2268, results in alteration of iron metabolism, increased spontaneous mutation and hydrogen peroxide sensitivity in Shewanella oneidensis MR-1
Gao, BMC genomics 2006
- “...ferric alcaligin siderophore receptor -2.079 0.071 *** SO3134 dctP C4-dicarboxylate-binding periplasmic protein -3.165 0.048 ***** SO3135 C4-dicarboxylate transporter, putative -2.793 0.037 ***** SO3136 dctM C4-dicarboxylate transport protein -1.969 0.051 ***** Cellular process SO3065 colicin V production protein +3.219 0.580 * SO4405 katG-2 catalase/peroxidase HPI +5.409 1.310...”
Acetylation of xenogeneic silencer H-NS regulates biofilm development through the nitrogen homeostasis regulator in Shewanella
Liu, Nucleic acids research 2024
- “...153.95 TonB-dependent receptor SO_3134 ( dctP ) 27.3 822.17 164.72 209.41 TRAP transporter substrate-binding protein SO_3135 11.46 301.86 46.07 53.37 TRAP transporter small permease SO_3136 ( dctM ) 12.66 221.85 22.16 28.44 TRAP transporter large permease SO_3146 ( hns ) 880.33 731.65 978.23 1032.82 H-NS histone...”

Sama_2210 alpha-ketoglutarate TRAP transporter, small permease component from Shewanella amazonensis SB2B
28% identity, 94% coverage

mutant phenotype: specific phenotype on a-ketoglutarate. A putrescine ABC transporter (Sama_2642:Sama_2638) is also important in this condition, which is not explained. This organism can also utilize succinate or fumarate (but not L-malate), which we do not have fitness data for these under aerobic conditions. This could also transport some C4 compounds.

Shew_1445 dicarboxylate TRAP transporter (succinate, fumarate, L-malate, and alpha-ketoglutarate), small permease component from Shewanella loihica PV-4
26% identity, 96% coverage

mutant phenotype: Important for utilizing succinate, fumarate, and L-malate, as expected, and also for utilizing a-ketoglutarate

HF298_RS12835 TRAP transporter small permease from Vibrio parahaemolyticus
27% identity, 77% coverage

Transcriptome analysis of the biofilm formation mechanism of Vibrio parahaemolyticus under the sub-inhibitory concentrations of copper and carbenicillin
Xie, Frontiers in microbiology 2023
- “...speculated that MCP interacts with flagellin to affect bacterial attachment and biofilm formation. HF298_RS12830 and HF298_RS12835 genes were up-regulated by about 1.46 and 1.11 folds under 1/2 MIC CARB treatment, and down-regulated by about 13 folds under 1/2 MIC Cu 2+ and 1/2 MIC Cu 2+...”

VP_RS04435, WU75_21765 TRAP transporter small permease from Vibrio parahaemolyticus RIMD 2210633
27% identity, 77% coverage

Identification of Antibacterial Components and Modes in the Methanol-Phase Extract from a Herbal Plant Potentilla kleiniana Wight et Arn
Tang, Foods (Basel, Switzerland) 2023
- “...transporter WU75_01920 mcp 0.32 Chemotaxis protein WU75_21745 dctB 0.352 ATPase WU75_10200 phoA 0.353 Alkaline phosphatase WU75_21765 dctQ 0.368 C4-dicarboxylate ABC transporter permease WU75_00210 dctD 0.406 C4-dicarboxylate ABC transporter WU75_16210 qseC 0.423 Histidine kinase WU75_23015 fliC 0.435 Flagellin WU75_07100 mcp 0.453 Chemotaxis protein WU75_13380 crp 0.457 Transcriptional...”
Comparative Transcriptome Analysis Reveals Regulatory Factors Involved in Vibrio Parahaemolyticus Biofilm Formation
Wang, Frontiers in cellular and infection microbiology 2022
- “...subunit I VP_RS22155 5.10 phosphate ABC transporter substrate-binding protein VP_RS04430 4.66 TRAP transporter substrate-binding protein VP_RS04435 4.13 TRAP transporter small permease VP_RS15995 3.98 response regulator transcription factor ompR 3.40 two-component system response regulator OmpR VP_RS20550 3.07 thiolase family protein VP_RS22100 2.93 methyl-accepting chemotaxis protein VP_RS10510 2.92...”
- “...is speculated that MCP interacts with flagellin to affect bacterial attachment and clustering. VP_RS04430 and VP_RS04435 were upregulated by about 4.66 and 4.13, and their encoded proteins are TRAP transporter protein substrate binding protein and TRAP transporter protein small permease, which use ion electrochemical gradients to...”

VC_1928 TRAP transporter small permease from Vibrio cholerae O1 biovar El Tor str. N16961
25% identity, 88% coverage

Comparative genome analysis of non-toxigenic non-O1 versus toxigenic O1 Vibrio cholerae.
Mukherjee, Genomics discovery 2014
- “...transporter VC_1448 Q9KS14 Uncharacterized protein similar to VCA0109 VC_A0109 Q9KN56 C4-dicarboxylate transport protein DctQ, putative VC_1928 Q9KQS0 Trk system potassium uptake protein VC_0042 Q9KVU7 PTS system, cellobiose-specific IIC component VC_1282 Q9KSH4 Multidrug resistance protein VceB VC_1411 Q9KS49 Iron(III) compound receptor VC_0200 Q9KVE6 Sugar transporter family protein...”

Q9KQS0 C4-dicarboxylate TRAP transporter small permease protein DctQ from Vibrio cholerae serotype O1 (strain ATCC 39315 / El Tor Inaba N16961)
VC1928 C4-dicarboxylate transport protein DctQ, putative from Vibrio cholerae O1 biovar eltor str. N16961
25% identity, 83% coverage

Comparative genome analysis of non-toxigenic non-O1 versus toxigenic O1 Vibrio cholerae.
Mukherjee, Genomics discovery 2014
- “...VC_1448 Q9KS14 Uncharacterized protein similar to VCA0109 VC_A0109 Q9KN56 C4-dicarboxylate transport protein DctQ, putative VC_1928 Q9KQS0 Trk system potassium uptake protein VC_0042 Q9KVU7 PTS system, cellobiose-specific IIC component VC_1282 Q9KSH4 Multidrug resistance protein VceB VC_1411 Q9KS49 Iron(III) compound receptor VC_0200 Q9KVE6 Sugar transporter family protein VC_A0669...”
Roles of the sodium-translocating NADH:quinone oxidoreductase (Na+-NQR) on vibrio cholerae metabolism, motility and osmotic stress resistance
Minato, PloS one 2014
- “...down VC1784 neuraminidase 2.475 down 2.646 down VC1927 C4-dicarboxylate transport protein 1.745 down 1.763 down VC1928 C4-dicarboxylate transport protein DctQ, putative 1.970 down 1.947 down VC1929 C4-dicarboxylate-binding periplasmic protein 2.449 down 2.796 down VC2037 Na + /H + antiporter, nhaC-1 1.680 down 1.599 down VC2127 flagellar...”
Distinct centromere-like parS sites on the two chromosomes of Vibrio spp
Yamaichi, Journal of bacteriology 2007
- “...parS1-3 VC0062 VC0066 and VC0067 VC0069 VC0501 VC1494 VC1928 VCA1095 Sequences having up to two mismatches with the B. subtilis parS consensus sequence...”

Tmz1t_0544 Tripartite ATP-independent periplasmic transporter DctQ component from Thauera sp. MZ1T
27% identity, 91% coverage

Lessons and Considerations for the Creation of Universal Primers Targeting Non-Conserved, Horizontally Mobile Genes
Brown, Applied and environmental microbiology 2021
- “...Fig. 1 ). Two of these sites are located within T. aromatica and target dctM (Tmz1t_0544) and yfdV (Tmz1t_0790). dctM encodes a tripartite ATP-independent periplasmic transporter which falls under the C 4 -dicarboxylate transport system classification according to KEGG, while yfdV encodes an auxin efflux carrier...”

PGA1_c20670 TRAP transporter for fumarate, succinate, L-malate, and 2-oxogulatarate, small permease component from Phaeobacter inhibens DSM 17395
30% identity, 57% coverage

mutant phenotype: Specifically important for utilization of fumarate, succinate, L-malate, and alpha-ketoglutarate. Phenotypes on alpha-ketoglutarate are more mild, which might indicate some genetic redundancy.

SPO2627 TRAP transporter small permease from Ruegeria pomeroyi DSS-3
26% identity, 57% coverage

Bacterial transcriptional response to labile exometabolites from photosynthetic picoeukaryote Micromonas commoda
Ferrer-González, ISME communications 2023
- “...branched-chain amino acid ABC transporter, permease SPO2626* AAV95871.1 3.8 fumarate, succinate, malate TRAP transporter, dctM SPO2627 AAV95872.1 3.2 fumarate, succinate, malate TRAP transporter, dctQ SPO2628* AAV95873.1 2.3 fumarate, succinate, malate TRAP transporter, dctP SPO0608* AAV93923.1 7.1 glycerol ABC transporter, substrate binding protein SPO0609 AAV93924.1 12.2 glycerol...”
A mutant fitness assay identifies bacterial interactions in a model ocean hot spot
Schreier, Proceedings of the National Academy of Sciences of the United States of America 2023
- “...organic acid 0.17 0.18 SPO2626 TRAP transporter ( dctM ) Transporter organic acid 0.13 0.12 SPO2627 C4 dicarbodylateTRAP transporter ( dctQ ) Transporter organic acid 0.17 0.65 SPO2630 C4-dicarboxylate TRAP regulatory protein Transporter organic acid 0.25 SPOA0238 TRAP dicarboxylate transporter ( dctP ) Transporter organic acid...”
Transcriptional changes underlying elemental stoichiometry shifts in a marine heterotrophic bacterium
Chan, Frontiers in microbiology 2012
- “...phoB Phosphate regulon transcriptional regulatory protein 10.7 26.1 SPO2626 TRAP transporter, DctM subunit 1.8 4.2 SPO2627 TRAP transporter, DctQ subunit 3.3 SPO3198 rnc Ribonuclease III 4.1 1.6 SPO3625 cspA Cold shock protein 2.6 4.2 SPO3868 Hypothetical protein 1.6 3.6 SPOA0294 pmtA Phosphatidylethanolamine N -methyltransferase 3.6 1.2...”

DCTQ_RHOCA / O07837 C4-dicarboxylate TRAP transporter small permease protein DctQ from Rhodobacter capsulatus (Rhodopseudomonas capsulata) (see 2 papers)
TC 2.A.56.1.1 / O07837 DctQ, component of Tripartite dicarboxylate:H+ symporter (substrates include: fumarate, D- and L-malate, succinate, succinamide, orotate, iticonate and mesaconate) from Rhodobacter capsulatus (Rhodopseudomonas capsulata) (see 2 papers)
27% identity, 77% coverage

function: Part of the tripartite ATP-independent periplasmic (TRAP) transport system DctPQM involved in C4-dicarboxylates uptake.
subunit: The complex comprises the extracytoplasmic solute receptor protein DctP, and the two transmembrane proteins DctQ and DctM.
disruption phenotype: Deletion mutant is unable to transport succinate, and does not grow on D-malate, L-malate, succinate or fumarate as the sole carbon source under aerobic conditions in the dark.
substrates: Fumarate, H+, Iticonate, Mesaconate, Orotate, Succinamide, Succinate, malate

SPO2357 TRAP transporter small permease from Ruegeria pomeroyi DSS-3
30% identity, 55% coverage

Functional annotation and importance of marine bacterial transporters of plankton exometabolites
Schroer, ISME communications 2023
- “...9.210 3 nagTUVW SPO1839 GlcNAc homology 78.4 7.1 2.610 7 92.7 5.0 5.710 8 iseKLM SPO2357 Isethionate expression 101.1 10.2 1.810 9 96.0 41.9 1.010 3 SPO2358 Isethionate expression 104.2 10.2 1.610 9 69.8 41.9 5.310 3 hbtABC SPO2573 3-OH butyrate novel 92.7 3.9 5.410 10...”
Diel investments in metabolite production and consumption in a model microbial system
Uchimiya, The ISME journal 2022
- “...DMSP lyase 19.2 Sulfur compound Isethionate SPO2358 iseK TRAP transporter, periplasmic 30.0 [ 41 ] SPO2357 iseL TRAP transporter, small permease 13.2 SPO2356 iseM TRAP transporter, DctM 38.2 Sulfur compound N -acetyltaurine SPO0660 naaA ABC transporter, periplasmic substrate-binding 54.3 [ 41 ] SPO0661 naaB ABC transporter,...”
Sulfur metabolites that facilitate oceanic phytoplankton-bacteria carbon flux
Landa, The ISME journal 2019
- “...DHPS utilization tauR SPO3562 SPO2358 Regulation iseK SPO2357 iseJ iseL SPO2356 SPO2359 iseR iseM SPO2355 Regulation Isethionate transport Isethionate...”
An Updated genome annotation for the model marine bacterium Ruegeria pomeroyi DSS-3
Rivers, Standards in genomic sciences 2014
- “...YP_167578 SPO2355 Isethionate dissimilation regulator iseR Function YP_167579 SPO2356 Isethionate TRAP transporter iseM Function YP_167580 SPO2357 Isethionate TRAP transporter iseL Function YP_167581 SPO2358 Isethionate TRAP transporter iseK Function YP_167582 SPO2359 Isethionate dehydrogenase iseJ Function YP_167694 SPO2477 Manganese uptake regulator mur Function YP_168390 SPO3187 (2R)-3-sulfolactate dehydrogenase comC...”

SPO2357, YP_167580 TRAP dicarboxylate transporter, DctQ subunit from Silicibacter pomeroyi DSS-3
30% identity, 59% coverage

An Updated genome annotation for the model marine bacterium Ruegeria pomeroyi DSS-3
Rivers, Standards in genomic sciences 2014
- “...YP_167578 SPO2355 Isethionate dissimilation regulator iseR Function YP_167579 SPO2356 Isethionate TRAP transporter iseM Function YP_167580 SPO2357 Isethionate TRAP transporter iseL Function YP_167581 SPO2358 Isethionate TRAP transporter iseK Function YP_167582 SPO2359 Isethionate dehydrogenase iseJ Function YP_167694 SPO2477 Manganese uptake regulator mur Function YP_168390 SPO3187 (2R)-3-sulfolactate dehydrogenase comC...”
- “...Function YP_167578 SPO2355 Isethionate dissimilation regulator iseR Function YP_167579 SPO2356 Isethionate TRAP transporter iseM Function YP_167580 SPO2357 Isethionate TRAP transporter iseL Function YP_167581 SPO2358 Isethionate TRAP transporter iseK Function YP_167582 SPO2359 Isethionate dehydrogenase iseJ Function YP_167694 SPO2477 Manganese uptake regulator mur Function YP_168390 SPO3187 (2R)-3-sulfolactate dehydrogenase...”

HMPREF0397_0437 TRAP transporter small permease from Fusobacterium nucleatum subsp. nucleatum ATCC 23726
34% identity, 34% coverage

Forward Genetic Dissection of Biofilm Development by Fusobacterium nucleatum: Novel Functions of Cell Division Proteins FtsX and EnvC
Wu, mBio 2018
- “...protein FtsX Tn 5 - 7 HMPREF0397_0833 254 (525) Hypothetical protein Tn 5 - 8 HMPREF0397_0437 60 (471) C 4 -dicarboxylate transporter Tn 5 - 9 HMPREF0397_1811 3,032 (7,905) Filamentous hemagglutinin Tn 5 - 10 HMPREF0397_1858 376 (450) Acetyltransferase a Numbers in parentheses indicate the gene...”

Pden_1646 Tripartite ATP-independent periplasmic transporter, DctQ component from Paracoccus denitrificans PD1222
22% identity, 86% coverage

Paracoccus denitrificans PD1222 utilizes hypotaurine via transamination followed by spontaneous desulfination to yield acetaldehyde and, finally, acetate for growth
Felux, Journal of bacteriology 2013
- “...Pden_1641 Pden_1642 Pden_1643 Pden_1644 Pden_1645 Pden_1646 Pden_1647 Pden_1648 Phosphate acetyltransferase (Pta1) Sulfite exporter (TauZ) Sulfoacetaldehyde...”

New Search

For advice on how to use these tools together, see Interactive tools for functional annotation of bacterial genomes.

Statistics

The PaperBLAST database links 793,807 different protein sequences to 1,259,118 scientific articles. Searches against EuropePMC were last performed on March 13 2025.

How It Works

PaperBLAST builds a database of protein sequences that are linked to scientific articles. These links come from automated text searches against the articles in EuropePMC and from manually-curated information from GeneRIF, UniProtKB/Swiss-Prot, BRENDA, CAZy (as made available by dbCAN), BioLiP, CharProtDB, MetaCyc, EcoCyc, TCDB, REBASE, the Fitness Browser, and a subset of the European Nucleotide Archive with the /experiment tag. Given this database and a protein sequence query, PaperBLAST uses protein-protein BLAST to find similar sequences with E < 0.001.

To build the database, we query EuropePMC with locus tags, with RefSeq protein identifiers, and with UniProt accessions. We obtain the locus tags from RefSeq or from MicrobesOnline. We use queries of the form "locus_tag AND genus_name" to try to ensure that the paper is actually discussing that gene. Because EuropePMC indexes most recent biomedical papers, even if they are not open access, some of the links may be to papers that you cannot read or that our computers cannot read. We query each of these identifiers that appears in the open access part of EuropePMC, as well as every locus tag that appears in the 500 most-referenced genomes, so that a gene may appear in the PaperBLAST results even though none of the papers that mention it are open access. We also incorporate text-mined links from EuropePMC that link open access articles to UniProt or RefSeq identifiers. (This yields some additional links because EuropePMC uses different heuristics for their text mining than we do.)

For every article that mentions a locus tag, a RefSeq protein identifier, or a UniProt accession, we try to select one or two snippets of text that refer to the protein. If we cannot get access to the full text, we try to select a snippet from the abstract, but unfortunately, unique identifiers such as locus tags are rarely provided in abstracts.

PaperBLAST also incorporates manually-curated protein functions:

Proteins from NCBI's RefSeq are included if a GeneRIF entry links the gene to an article in PubMed^®. GeneRIF also provides a short summary of the article's claim about the protein, which is shown instead of a snippet.
Proteins from Swiss-Prot (the curated part of UniProt) are included if the curators identified experimental evidence for the protein's function (evidence code ECO:0000269). For these proteins, the fields of the Swiss-Prot entry that describe the protein's function are shown (with bold headings).
Proteins from BRENDA, a curated database of enzymes, are included if they are linked to a paper in PubMed and their full sequence is known.
Every protein from the non-redundant subset of BioLiP, a database of ligand-binding sites and catalytic residues in protein structures, is included. Since BioLiP itself does not include descriptions of the proteins, those are taken from the Protein Data Bank. Descriptions from PDB rely on the original submitter of the structure and cannot be updated by others, so they may be less reliable. (For SitesBLAST and Sites on a Tree, we use a larger subset of BioLiP so that every ligand is represented among a group of structures with similar sequences, but for PaperBLAST, we use the non-redundant set provided by BioLiP.)
Every protein from EcoCyc, a curated database of the proteins in Escherichia coli K-12, is included, regardless of whether they are characterized or not.
Proteins from the MetaCyc metabolic pathway database are included if they are linked to a paper in PubMed and their full sequence is known.
Proteins from the Transport Classification Database (TCDB) are included if they have known substrate(s), have reference(s), and are not described as uncharacterized or putative. (Some of the references are not visible on the PaperBLAST web site.)
Every protein from CharProtDB, a database of experimentally characterized protein annotations, is included.
Proteins from the CAZy database of carbohydrate-active enzymes are included if they are associated with an Enzyme Classification number. Even though CAZy does not provide links from individual protein sequences to papers, these should all be experimentally-characterized proteins.
Proteins from the REBASE database of restriction enzymes are included if they have known specificity.
Every protein with an evidence-based reannotation (based on mutant phenotypes) in the Fitness Browser is included.
Sequence-specific transcription factors (including sigma factors and DNA-binding response regulators) with experimentally-determined DNA binding sites from the PRODORIC database of gene regulation in prokaryotes.
Putative transcription factors from RegPrecise that have manually-curated predictions for their binding sites. These predictions are based on conserved putative regulatory sites across genomes that contain similar transcription factors, so PaperBLAST clusters the TFs at 70% identity and retains just one member of each cluster.
Coding sequence (CDS) features from the European Nucleotide Archive (ENA) are included if the /experiment tag is set (implying that there is experimental evidence for the annotation), the nucleotide entry links to paper(s) in PubMed, and the nucleotide entry is from the STD data class (implying that these are targeted annotated sequences, not from shotgun sequencing). Also, to filter out genes whose transcription or translation was detected, but whose function was not studied, nucleotide entries or papers with more than 25 such proteins are excluded. Descriptions from ENA rely on the original submitter of the sequence and cannot be updated by others, so they may be less reliable.

Except for GeneRIF and ENA, the curated entries include a short curated description of the protein's function. For entries from BioLiP, the protein's function may not be known beyond binding to the ligand. Many of these entries also link to articles in PubMed.

For more information see the PaperBLAST paper (mSystems 2017) or the code. You can download PaperBLAST's database here.

Changes to PaperBLAST since the paper was written:

November 2023: incorporated PRODORIC and RegPrecise. Many PRODORIC entries were not linked to a protein sequence (no UniProt identifier), so we added this information.
February 2023: BioLiP changed their download format. PaperBLAST now includes their non-redundant subset. SitesBLAST and Sites on a Tree use a larger non-redundant subset that ensures that every ligand is represented within each cluster. This should ensure that every binding site is represented.
June 2022: incorporated some coding sequences from ENA with the /experiment tag.
March 2022: incorporated BioLiP.
April 2020: incorporated TCDB.
April 2019: EuropePMC now returns table entries in their search results. This has expanded PaperBLAST's database, but most of the new entries are of low relevance, and the resulting snippets are often just lists of locus tags with annotations.
February 2018: the alignment page reports the conservation of the hit's functional sites (if available from from Swiss-Prot or UniProt)
January 2018: incorporated BRENDA.
December 2017: incorporated MetaCyc, CharProtDB, CAZy, REBASE, and the reannotations from the Fitness Browser.
September 2017: EuropePMC no longer returns some table entries in their search results. This has shrunk PaperBLAST's database, but has also reduced the number of low-relevance hits.

Many of these changes are described in Interactive tools for functional annotation of bacterial genomes.

Secrets

PaperBLAST cannot provide snippets for many of the papers that are published in non-open-access journals. This limitation applies even if the paper is marked as "free" on the publisher's web site and is available in PubmedCentral or EuropePMC. If a journal that you publish in is marked as "secret," please consider publishing elsewhere.

Omissions from the PaperBLAST Database

Many important articles are missing from PaperBLAST, either because the article's full text is not in EuropePMC (as for many older articles), or because the paper does not mention a protein identifier such as a locus tag, or because of PaperBLAST's heuristics. If you notice an article that characterizes a protein's function but is missing from PaperBLAST, please notify the curators at UniProt or add an entry to GeneRIF. Entries in either of these databases will eventually be incorporated into PaperBLAST. Note that to add an entry to UniProt, you will need to find the UniProt identifier for the protein. If the protein is not already in UniProt, you can ask them to create an entry. To add an entry to GeneRIF, you will need an NCBI Gene identifier, but unfortunately many prokaryotic proteins in RefSeq do not have corresponding Gene identifers.

References

PaperBLAST: Text-mining papers for information about homologs.
M. N. Price and A. P. Arkin (2017). mSystems, 10.1128/mSystems.00039-17.

Europe PMC in 2017.
M. Levchenko et al (2017). Nucleic Acids Research, 10.1093/nar/gkx1005.

Gene indexing: characterization and analysis of NLM's GeneRIFs.
J. A. Mitchell et al (2003). AMIA Annu Symp Proc 2003:460-464.

UniProt: the universal protein knowledgebase.
The UniProt Consortium (2016). Nucleic Acids Research, 10.1093/nar/gkw1099.

BRENDA in 2017: new perspectives and new tools in BRENDA.
S. Placzek et al (2017). Nucleic Acids Research, 10.1093/nar/gkw952.

The EcoCyc database: reflecting new knowledge about Escherichia coli K-12.
I. M. Keeseler et al (2016). Nucleic Acids Research, 10.1093/nar/gkw1003.

The MetaCyc database of metabolic pathways and enzymes.
R. Caspi et al (2018). Nucleic Acids Research, 10.1093/nar/gkx935.

CharProtDB: a database of experimentally characterized protein annotations.
R. Madupu et al (2012). Nucleic Acids Research, 10.1093/nar/gkr1133.

The carbohydrate-active enzymes database (CAZy) in 2013.
V. Lombard et al (2014). Nucleic Acids Research, 10.1093/nar/gkt1178.

The Transporter Classification Database (TCDB): recent advances
M. H. Saier, Jr. et al (2016). Nucleic Acids Research, 10.1093/nar/gkv1103.

REBASE - a database for DNA restriction and modification: enzymes, genes and genomes.
R. J. Roberts et al (2015). Nucleic Acids Research, 10.1093/nar/gku1046.

Deep annotation of protein function across diverse bacteria from mutant phenotypes.
M. N. Price et al (2016). bioRxiv, 10.1101/072470.

by Morgan Price, Arkin group
Lawrence Berkeley National Laboratory