PaperBLAST
PaperBLAST Hits for sp|A1WVR8|GATC_HALHL Aspartyl/glutamyl-tRNA(Asn/Gln) amidotransferase subunit C OS=Halorhodospira halophila (strain DSM 244 / SL1) OX=349124 GN=gatC PE=3 SV=1 (95 a.a., MAIDADEVQQ...)
Show query sequence
>sp|A1WVR8|GATC_HALHL Aspartyl/glutamyl-tRNA(Asn/Gln) amidotransferase subunit C OS=Halorhodospira halophila (strain DSM 244 / SL1) OX=349124 GN=gatC PE=3 SV=1
MAIDADEVQQIAHLARIRIDEEAVSGYARDLTGILAFVEQMGNVDTDGVEPMAHPWDATQ
RLRPDEVTEPNLREHYQSGAPAVEAGLYLVPRVVE
Running BLASTp...
Found 30 similar proteins in the literature:
PA4482 aspartyl/glutamyl-tRNA amidotransferase subunit C from Pseudomonas aeruginosa PAO1
55% identity, 99% coverage
ACIAD0824 aspartyl/glutamyl-tRNA(Asn/Gln) amidotransferase subunit C from Acinetobacter sp. ADP1
53% identity, 90% coverage
ZMO0784 glutamyl-tRNA(Gln) amidotransferase, C subunit from Zymomonas mobilis subsp. mobilis ZM4
43% identity, 97% coverage
- Investigation of the impact of a broad range of temperatures on the physiological and transcriptional profiles of Zymomonas mobilis ZM4 for high-temperature-tolerant recombinant strain development
Li, Biotechnology for biofuels 2021 - “...and cysteine synthesis-related genes ( ZMO1117 , ZMO0457 , ZMO1964 , ZMO0783 , ZMO0782 , ZMO0784 ; ZMO0480 , ZMO1105 , ZMO1962 , ZMO0752 , ZMO1508 ; ZMO0005 , ZMO0007 , and ZMO0008 ), as well as ribosome protein-related genes ( ZMO0884 , ZMO0532 , ZMO1079...”
- Genome-Scale Transcription-Translation Mapping Reveals Features of Zymomonas mobilis Transcription Units and Promoters
Vera, mSystems 2020 - “...173 ZMO0619 flgA Flagellum basal body P-ring formation protein 1,331 1,215 1,564 TSS_1798 783747 0 ZMO0784 gatC Glutamyl-tRNA(Gln) amidotransferase C subunit 1,261 1,257 1,050 TSS_3191 1420891 42 ZMO1406 Alpha/beta hydrolase fold protein 1,192 1,270 1,288 TSS_3157 1398315 + 0 ZMO1384 era GTP-binding protein 2,368 2,383 2,277...”
AZOBR_RS20640 Asp-tRNA(Asn)/Glu-tRNA(Gln) amidotransferase subunit GatC from Azospirillum baldaniorum
44% identity, 100% coverage
RL2075 putative glutamyl-tRNA(Gln) amidotransferase subunit C from Rhizobium leguminosarum bv. viciae 3841
38% identity, 100% coverage
BAB2_0645 Glu-tRNAGln amidotransferase, C subunit:Glutamyl-tRNA(Gln) amidotransferase C subunit from Brucella melitensis biovar Abortus 2308
38% identity, 100% coverage
DVU0809 glutamyl-tRNA(Gln) amidotransferase, C subunit from Desulfovibrio vulgaris Hildenborough
36% identity, 99% coverage
D9S200 Aspartyl/glutamyl-tRNA(Asn/Gln) amidotransferase subunit C from Thermosediminibacter oceani (strain ATCC BAA-1034 / DSM 16646 / JW/IW-1228P)
34% identity, 99% coverage
- Activation of PPARγ in bladder cancer via introduction of the long arm of human chromosome 9.
Shimizu, Oncology letters 2022 - “...using 21 specific sequence-tagged site (STS) markers (D9S54, 9p24.2; D9S268, 9p23; D9S285, 9p22.3; D9S165, 9p21.1; D9S200, 9p13.1; SHGC-103793, 9p12; UT801, 9p11.2; SHGC-141463, 9q12; SHGC-146514, 9q13; D9S15, 9q21.12; D9S1122, 9q21.2; D9S153, 9q21.31; D9S777, 9q22.1; D9S318, 9q22.2; D9S287, 9q22.32; D9S277, 9q31.1; D9S177, 9q33.1; D9S290, 9q34.11; D9S66, 9q34.2; and...”
GSU3383 glutamyl-tRNA(Gln) amidotransferase, C subunit from Geobacter sulfurreducens PCA
35% identity, 100% coverage
gatC / O06492 glutamyl-tRNAGln amidotransferase subunit C (EC 6.3.5.7) from Bacillus subtilis (strain 168) (see paper)
O06492 Glutamyl-tRNA(Gln) amidotransferase subunit C from Bacillus subtilis (strain 168)
BSU06670 aspartyl/glutamyl-tRNA amidotransferase subunit C from Bacillus subtilis subsp. subtilis str. 168
33% identity, 97% coverage
- Direct glutaminyl-tRNA biosynthesis and indirect asparaginyl-tRNA biosynthesis in Pseudomonas aeruginosa PAO1
Akochy, Journal of bacteriology 2004 - “...AAG07871, AAG07872; C_trac, NP_219504, NP_219505, NP_219506; B_subt, O06492, CAB12488, O30509. The AspRS sequences used were as follows: P_aeru, NP_249654;...”
- The Blueprint of a Minimal Cell: MiniBacillus
Reuß, Microbiology and molecular biology reviews : MMBR 2016 - “...asnS aspS cysS gatC gatA gatB gltX BSU22360 BSU27550 BSU00940 BSU06670 BSU06680 BSU06690 BSU00920 Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes 6.1.1.22...”
- Secondary structural entropy in RNA switch (Riboswitch) identification
Manzourolajdad, BMC bioinformatics 2015 - “...nt 728532 728731 forward BSU06640 yerI -2436 -46.30 - 0.3100 122.2779999 - 0.3600 102 gatC BSU06670 0.8268005848 200 nt 1540686 1540885 forward BSU14680 ykzC -1895 -60.72 - 0.4350 120.6729965 - 0.3350 1102 ylaA BSU14710 0.8265900016 200 nt 3746052 3746251 forward BSU36380 rapD -540 -57.10 - 0.4350...”
AH68_09115 Asp-tRNA(Asn)/Glu-tRNA(Gln) amidotransferase subunit GatC from Bifidobacterium catenulatum PV20-2
35% identity, 91% coverage
Q92J74 Glutamyl-tRNA(Gln) amidotransferase subunit C from Rickettsia conorii (strain ATCC VR-613 / Malish 7)
28% identity, 93% coverage
H375_4740 Asp-tRNA(Asn)/Glu-tRNA(Gln) amidotransferase subunit GatC from Rickettsia prowazekii str. Breinl
27% identity, 93% coverage
RT0142 glutaminyl-tRNA synthase (glutamine-hydrolyzing) subunit C from Rickettsia typhi str. wilmington
27% identity, 93% coverage
- GroEL is an immunodominant surface-exposed antigen of Rickettsia typhi
Rauch, PloS one 2021 - “...L27 (RT0737), the ribosome recycling factor (RT0143) and the aspartyl/glutamyl-tRNA (Asn/Gln) amidotransferase subunit C (GatC, RT0142). One downregulated protein, HIT-like protein (RP317), might be involved in gene regulation by binding to small nucleolar ribonucleic acids (snoRNAs) [ 107 ]. Five of the downregulated proteins are housekeeping...”
MAB_3342c Glutamyl-tRNA(Gln) amidotransferase subunit C (GatC) from Mycobacterium abscessus ATCC 19977
36% identity, 91% coverage
SERP1439 glutamyl-tRNA(Gln) amidotransferase, C subunit from Staphylococcus epidermidis RP62A
36% identity, 89% coverage
CLP_3829 Asp-tRNA(Asn)/Glu-tRNA(Gln) amidotransferase subunit GatC from Clostridium butyricum E4 str. BoNT E BL5262
31% identity, 99% coverage
HMPREF0389_00480 Asp-tRNA(Asn)/Glu-tRNA(Gln) amidotransferase subunit GatC from Filifactor alocis ATCC 35896
29% identity, 91% coverage
- Proteome variation among Filifactor alocis strains
Aruni, Proteomics 2012 - “...UDP-N-muramyl tripeptide synthetase 57.06 145/0.13 16 C-7.50 EC-0.73 Cell wall tripeptide synthetase domain Nonsecretory 27. HMPREF0389_00480 Amido transferase family protein 54.7 345/0.54 34 C-7.5 EC-0.73 GATB domain Nonsecretory 28. HMPREF0389_01130 Ferrous hydrogenase 54.3 32/0.07 3 C-7.5 EC-0.73 Ferrodoxin type Fe-S binding domain 1.88 N terminal signal...”
Bd0058 glutamyl-tRNA(Gln) amidotransferase (subunit C) from Bdellovibrio bacteriovorus HD100
34% identity, 97% coverage
SAOUHSC_02118 glutamyl-tRNA(Gln) amidotransferase, C subunit from Staphylococcus aureus subsp. aureus NCTC 8325
Q2FFJ4 Aspartyl/glutamyl-tRNA(Asn/Gln) amidotransferase subunit C from Staphylococcus aureus (strain USA300)
SA1717 glutamyl-tRNAGln amidotransferase subunit C from Staphylococcus aureus subsp. aureus N315
SAUSA300_1882 aspartyl/glutamyl-tRNA amidotransferase subunit C from Staphylococcus aureus subsp. aureus USA300_FPR3757
MW1842 glutamyl-tRNAGln amidotransferase subunit C from Staphylococcus aureus subsp. aureus MW2
36% identity, 89% coverage
- The Staphylococcus aureus KdpDE two-component system couples extracellular K+ sensing and Agr signaling to infection programming
Xue, Infection and immunity 2011 - “...SAOUHSC_01030 SAOUHSC_00465 SAOUHSC_00898 SAOUHSC_00899 SAOUHSC_02118 SAOUHSC_01191 SAOUHSC_01216 SAOUHSC_01218 SAOUHSC_00195 SAOUHSC_00196 SAOUHSC_00197...”
- The Spl Serine Proteases Modulate Staphylococcus aureus Protein Production and Virulence in a Rabbit Model of Pneumonia
Paharik, mSphere 2016 - “...0.0000033 Q2FJA3 ( RL11_STAA3 ) 50S ribosomal protein R11 RplK Housekeeping, translation 250.00 320.67 2.4E09 Q2FFJ4 ( GATC_STAA3 ) Aspartyl/glutamyl-tRNA(Asn/Gln) amidotransferase subunit C GATC Housekeeping, translation 41.67 57.00 0.0019 Q2FER1 ( IF1_STAA3 ) Translation initiation factor IF-1 InfA Housekeeping, translation 15.33 26.33 0.0011 Q2FEQ4 ( RL6_STAA3...”
- “...5.9E10 Q2FG80 ( RL21_STAA3 ) 50S ribosomal protein L21 RplU Housekeeping, translation 5.67 14.67 0.00037 Q2FFJ4 ( GATC_STAA3 ) Aspartyl/glutamyl-tRNA(Asn/Gln) amidotransferase subunit C GATC Housekeeping, translation 0.00 2.67 0.0039 Q2FEX8 ( Q2FEX8_STAA3 ) Glucosamine-fructose-6-phosphate aminotransferase (isomerizing) GlmS Metabolism, amino acid metabolism 0.00 5.67 0.0000077 Q2FJ90 (...”
- Transcriptional profiling analysis of the global regulator NorG, a GntR-like protein of Staphylococcus aureus
Truong-Bolduc, Journal of bacteriology 2011 - “...SA0946 SA2011 SA2138 SA2265 SA2266 SA2272 SA1716 SA1717 SA1718 SA1887 SA1888 SA1889 SA2173 Na/H antiporter family protein Sodium transport family protein...”
- walK and clpP mutations confer reduced vancomycin susceptibility in Staphylococcus aureus
Shoji, Antimicrobial agents and chemotherapy 2011 - “...SA1456 SA1457 SA1488 SA1506 SA1550 SA1563 SA1579 SA1715 SA1716 SA1717 serS metS lysS gltX cysS argS trpS pheS pheT ileS proS asnC alaS aspS hisS valS thrS...”
- Characterizing the effects of inorganic acid and alkaline shock on the Staphylococcus aureus transcriptome and messenger RNA turnover
Anderson, FEMS immunology and medical microbiology 2010 - “...pyrophosphokinase sa_c9351s8180_a_at 2.4 2.5 2.5 hemB SA1715 delta-aminolevulinic acid dehydratase sa_c2992s2549_a_at 4.2 2.5 2.5 hemC SA1717 porphobilinogen deaminase sa_c2990s2545_a_at 3.1 2.5 2.5 hemD SA1716 uroporphyrinogen-III synthase sa_c3585s3067_a_at 2.8 2.5 2.5 hemE SA1889 uroporphyrinogen decarboxylase sa_c3577s3059_a_at 4.1 2.5 2.5 hemG SA1887 protoporphyrinogen oxidase sa_c3584s3063_a_at 2.8 2.5 2.5...”
- Transcriptome and functional analysis of the eukaryotic-type serine/threonine kinase PknB in Staphylococcus aureus
Donat, Journal of bacteriology 2009 - “...GntR family SA1704 map Methionyl aminopeptidase SA1717 Glutamyl-tRNAGln amidotransferase subunit C SA1961 Hypothetical protein, similar to transcription...”
- Differential gene expression profiling of Staphylococcus aureus cultivated under biofilm and planktonic conditions
Resch, Applied and environmental microbiology 2005 - “...3.218 3.554 3.337 3.284 3.107 2.956 2.946 SA0731 SA2204 SA1717 SA2191 SA0802 SA0843 SA2077 SA2027 SA1548 2.874 2.862 2.861 2.833 2.832 2.621 2.614 2.557 2.516...”
- Global regulation of Staphylococcus aureus genes by Rot
Saïd-Salim, Journal of bacteriology 2003 - “...SA0373 SA0220 SAS013 SA0739 SA0651 SA1613 SA0428 SA2378 SA1717 SA0523 SA2133 SA0682 SA2261 SAS088 SA2007 SA0620 SA2284 SA2170 SA2171 SA0914 SA0407 SA0408 SA2303...”
- Novel Regulation of Alpha-Toxin and the Phenol-Soluble Modulins by Peptidyl-Prolyl cis/trans Isomerase Enzymes in Staphylococcus aureus
Keogh, Toxins 2019 - “...SAUSA300_1178 RecA 0.42 DNA metabolism SAUSA300_1269 FemA 0.42 Cellular processes (includes toxins and virulence factors) SAUSA300_1882 GatC 0.41 Signal transduction SAUSA300_1614 HemL1 0.41 Biosynthesis of cofactors, prosthetic groups, and carriers SAUSA300_0067 0.41 Unknown function SAUSA300_1634 CoaE 0.40 Biosynthesis of cofactors, prosthetic groups, and carriers SAUSA300_1288 DapA...”
- The SaeR/S gene regulatory system is essential for innate immune evasion by Staphylococcus aureus
Voyich, The Journal of infectious diseases 2009 - “...hippurate hydrolase 2.24 ES MW1517 glyS glycyl-tRNA synthetase 2.08 ES MW1721 - transaldolase 2.01 ES MW1842 - glutamyl-tRNAGln amidotransferase subunit C 2.43 ES MW1954 groES GroES protein 2.03 ES MW2115 lacG 6-phospho-beta-galactosidase 17.19 ES MW2116 lacE PTS system lactose-specific IIBC component 18.69 ES MW2117 lacF PTS...”
slr0033 unknown protein from Synechocystis sp. PCC 6803
32% identity, 88% coverage
cce_0284 glutamyl-tRNA (Gln) amidotransferase subunit C from Cyanothece sp. ATCC 51142
32% identity, 94% coverage
CBO3267 aspartyl/glutamyl-tRNA amidotransferase subunit C from Clostridium botulinum A str. ATCC 3502
26% identity, 100% coverage
- Gene expression profiling of Clostridium botulinum under heat shock stress
Liang, BioMed research international 2013 - “...addition, gatB (CBO3265, aspartyl/glutamyl-tRNA amidotransferase subunit B1), gatA (CBO3266, glutamyl-tRNA amidotransferase subunit A), and gatC (CBO3267, glutamyl-tRNA amidotransferase subunit C) compose an operon and were downregulated by heat shock stress. Moreover, proS2 (CBO3503, prolyl-tRNA synthetase), aspS (CBO1019, aspartyl-tRNA synthetase), and tyrS (CBO3323, tyrosyl-tRNA synthetase) were also...”
Gmet_0076 Glu-tRNAGln amidotransferase, C subunit from Geobacter metallireducens GS-15
32% identity, 100% coverage
FN0755 Glutamyl-tRNA(Gln) amidotransferase subunit C from Fusobacterium nucleatum subsp. nucleatum ATCC 25586
27% identity, 98% coverage
- Proteomics of Fusobacterium nucleatum within a model developing oral microbial community
Hendrickson, MicrobiologyOpen 2014 - “...FN0040, FN0054, FN0067, FN0069, FN0070, FN0110, FN0298, FN0299, FN0405, FN0466, FN0506, FN0611, FN0697, FN0753, FN0754, FN0755, FN1268, FN1340, FN1489, FN1517, FN1579, FN1597, FN1658, FN1977, FN2011, FN2122, FN2123. Ribosomal proteins are noted to correlate with growth rates (Nomura etal. 1984 ). However, the cells were not given...”
gatC / Q9RUV6 glutamyl-tRNAGln amidotransferase subunit C (EC 6.3.5.7; EC 6.3.5.6) from Deinococcus radiodurans (strain ATCC 13939 / DSM 20539 / JCM 16871 / CCUG 27074 / LMG 4051 / NBRC 15346 / NCIMB 9279 / VKM B-1422 / R1) (see paper)
DR1275 Glu-tRNA(Gln) amidotransferase, subunit C from Deinococcus radiodurans R1
32% identity, 97% coverage
Rv3012c aspartyl/glutamyl-tRNA amidotransferase subunit C from Mycobacterium tuberculosis H37Rv
32% identity, 93% coverage
- The efflux pumps Rv1877 and Rv0191 play differential roles in the protection of Mycobacterium tuberculosis against chemical stress
Sao, Frontiers in microbiology 2024 - “...while two genes involved in nitrogen metabolism were differentially regulated in all strains, which were rv3012c encoding a glutamyl-tRNA(GLN) amidotransferase-subunit C ( Wolfe et al., 2010 ; downregulated) and rv2780 encoding l-alanine dehydrogenase ( Giffin et al., 2012 ; upregulated), more genes in this context were...”
- “...had an altered expression in this context in the rv0191 mutant (including other strains) were rv3012c which encodes a glutamyl-tRNA(GLN) amidotransferase-subunit C that was identified to be a protein of the cell wal ( Wolfe et al., 2010 ; downregulated) and rv2780 which encodes l-alanine dehydrogenase...”
- A High Throughput Whole Blood Assay for Analysis of Multiple Antigen-Specific T Cell Responses in Human Mycobacterium tuberculosis Infection
Whatney, Journal of immunology (Baltimore, Md. : 1950) 2018 - “...Membrane Pool 42 Rv2875 MPT70 1 Secreted Pool 43 Rv2996c SerA1 2 Membrane Pool 44 Rv3012c GatC 4 Membrane Pool 45 Rv3015c Rv3015c 4 Cytoplasm Pool 46 Rv3018c PPE46 PPE family Unknown Pool 47 Rv3019c EsxR (TB10.3) 1 Predicted secreted Pool 48 Rv3020c EsxS 1 Predicted...”
- A side-by-side comparison of T cell reactivity to fifty-nine Mycobacterium tuberculosis antigens in diverse populations from five continents
Carpenter, Tuberculosis (Edinburgh, Scotland) 2015 - “...known Rv3018c PPE46 Prev. known Inform- ation pathways Rv1317c - Novel Rv3021c PPE47 Prev. known Rv3012c - Novel Rv3022c PPE48 Novel Rv3024c - Novel Rv3135 PPE50 Novel Insertion sequences and phages Rv1199c - Prev. known Rv3136 PPE51 Prev. known Rv3023c - Prev. known Virulence, detoxi- fication,...”
- “...33% Novel Rv2875 100% Prev. known Rv0294 67% Novel Rv2996c 100% Novel Rv0298 44% Novel Rv3012c 78% Novel Rv0299 67% Novel Rv3015c 33% Prev. known Rv0453 100% Prev. known Rv3018c 100% Prev. known Rv0690c 78% Novel Rv3019c 100% Prev. known Rv0985c 67% Novel Rv3020c 100% Prev....”
- Memory T cells in latent Mycobacterium tuberculosis infection are directed against three antigenic islands and largely contained in a CXCR3+CCR6+ Th1 subset
Lindestam, PLoS pathogens 2013 - “...1 - Rv0297 14% 154 PE/PPE Non-island - Rv0299 14% 467 Conserved hypotheticals Non-island - Rv3012c 14% 233 Information pathways Non-island - Rv3025c 14% 423 Intermediary metabolism and respiration Island 2 - Rv0278c 11% 45 PE/PPE Non-island - Rv0279c 11% 45 PE/PPE Non-island - Rv0298 11%...”
- Characterization of a Clp protease gene regulator and the reaeration response in Mycobacterium tuberculosis
Sherrid, PloS one 2010 - “...utilization protein viuB Rv0753c methylmalonate-semialdehyde dehydrogenase mmsA Rv2913c D-amino acid aminohydrolase Rv0762c conserved hypothetical protein Rv3012c glutamyl-tRNA(gln) amidotransferase subunit C gatC Rv0790c hypothetical protein Rv3046c conserved hypothetical protein Rv0791c conserved hypothetical protein Rv3047c hypothetical protein Rv0793 conserved hypothetical protein Rv3048c ribonucleoside-diphosphate reductase beta chain Rv0885 conserved...”
- Proteomic definition of the cell wall of Mycobacterium tuberculosis
Wolfe, Journal of proteome research 2010 - “...DRRA 3 III.A.6 D Rv2938 drrC PROBABLE DAUNORUBICIN-DIM-TRANSPORT ABC TRANSPORTER DRRC 3 III.A.6 A, B Rv3012c gatC PROBABLE GLUTAMYL-TRNA(GLN) AMIDOTRANSFERASE (GLU-ADT SUBUNIT C) 2 II.A.3 D Rv3083 Rv3083 PROBABLE MONOOXYGENASE (HYDROYLASE) 7 I.B.7 C Rv3086 adhD PROBABLE ZINC-TYPE ALCOHOL DEHYDROGENASE ADHD (ALDEHYDE REDUCTASE) 7 I.B.7 B...”
- AsnB is involved in natural resistance of Mycobacterium smegmatis to multiple drugs
Ren, Antimicrobial agents and chemotherapy 2006 - “...following genes for the transamidation pathway: aspS (Rv2572c), gatCAB (Rv3012c, Rv3011c, Rv3009c). VOL. 50, 2006 14. 15. 16. 18. 19. 20. 21. 22. 23. 255 24....”
TM0252 glutamyl tRNA-Gln amidotransferase, subunit C from Thermotoga maritima MSB8
29% identity, 95% coverage
FORC47_RS01875 Asp-tRNA(Asn)/Glu-tRNA(Gln) amidotransferase subunit GatC from Bacillus cereus
32% identity, 97% coverage
3al0C / Q9WY94,Q9X2I8 Crystal structure of the glutamine transamidosome from thermotoga maritima in the glutamylation state. (see paper)
29% identity, 16% coverage
- Ligands: rna; o5'-(l-glutamyl-sulfamoyl)-adenosine (3al0C)
For advice on how to use these tools together, see
Interactive tools for functional annotation of bacterial genomes.
The PaperBLAST database links 789,361 different protein sequences to 1,256,019 scientific articles. Searches against EuropePMC were last performed on January 10 2025.
PaperBLAST builds a database of protein sequences that are linked
to scientific articles. These links come from automated text searches
against the articles in EuropePMC
and from manually-curated information from GeneRIF, UniProtKB/Swiss-Prot,
BRENDA,
CAZy (as made available by dbCAN),
BioLiP,
CharProtDB,
MetaCyc,
EcoCyc,
TCDB,
REBASE,
the Fitness Browser,
and a subset of the European Nucleotide Archive with the /experiment tag.
Given this database and a protein sequence query,
PaperBLAST uses protein-protein BLAST
to find similar sequences with E < 0.001.
To build the database, we query EuropePMC with locus tags, with RefSeq protein
identifiers, and with UniProt
accessions. We obtain the locus tags from RefSeq or from MicrobesOnline. We use
queries of the form "locus_tag AND genus_name" to try to ensure that
the paper is actually discussing that gene. Because EuropePMC indexes
most recent biomedical papers, even if they are not open access, some
of the links may be to papers that you cannot read or that our
computers cannot read. We query each of these identifiers that
appears in the open access part of EuropePMC, as well as every locus
tag that appears in the 500 most-referenced genomes, so that a gene
may appear in the PaperBLAST results even though none of the papers
that mention it are open access. We also incorporate text-mined links
from EuropePMC that link open access articles to UniProt or RefSeq
identifiers. (This yields some additional links because EuropePMC
uses different heuristics for their text mining than we do.)
For every article that mentions a locus tag, a RefSeq protein
identifier, or a UniProt accession, we try to select one or two
snippets of text that refer to the protein. If we cannot get access to
the full text, we try to select a snippet from the abstract, but
unfortunately, unique identifiers such as locus tags are rarely
provided in abstracts.
PaperBLAST also incorporates manually-curated protein functions:
- Proteins from NCBI's RefSeq are included if a
GeneRIF
entry links the gene to an article in
PubMed®.
GeneRIF also provides a short summary of the article's claim about the
protein, which is shown instead of a snippet.
- Proteins from Swiss-Prot (the curated part of UniProt)
are included if the curators
identified experimental evidence for the protein's function (evidence
code ECO:0000269). For these proteins, the fields of the Swiss-Prot entry that
describe the protein's function are shown (with bold headings).
- Proteins from BRENDA,
a curated database of enzymes, are included if they are linked to a paper in PubMed
and their full sequence is known.
- Every protein from the non-redundant subset of
BioLiP,
a database
of ligand-binding sites and catalytic residues in protein structures, is included. Since BioLiP itself
does not include descriptions of the proteins, those are taken from the
Protein Data Bank.
Descriptions from PDB rely on the original submitter of the
structure and cannot be updated by others, so they may be less reliable.
(For SitesBLAST and Sites on a Tree, we use a larger subset of BioLiP so that every
ligand is represented among a group of structures with similar sequences, but for
PaperBLAST, we use the non-redundant set provided by BioLiP.)
- Every protein from EcoCyc, a curated
database of the proteins in Escherichia coli K-12, is included, regardless
of whether they are characterized or not.
- Proteins from the MetaCyc metabolic pathway database
are included if they are linked to a paper in PubMed and their full sequence is known.
- Proteins from the Transport Classification Database (TCDB)
are included if they have known substrate(s), have reference(s),
and are not described as uncharacterized or putative.
(Some of the references are not visible on the PaperBLAST web site.)
- Every protein from CharProtDB,
a database of experimentally characterized protein annotations, is included.
- Proteins from the CAZy database of carbohydrate-active enzymes
are included if they are associated with an Enzyme Classification number.
Even though CAZy does not provide links from individual protein sequences to papers,
these should all be experimentally-characterized proteins.
- Proteins from the REBASE database
of restriction enzymes are included if they have known specificity.
- Every protein with an evidence-based reannotation (based on mutant phenotypes)
in the Fitness Browser is included.
- Sequence-specific transcription factors (including sigma factors and DNA-binding response regulators)
with experimentally-determined DNA binding sites from the
PRODORIC database of gene regulation in prokaryotes.
- Putative transcription factors from RegPrecise
that have manually-curated predictions for their binding sites. These predictions are based on
conserved putative regulatory sites across genomes that contain similar transcription factors,
so PaperBLAST clusters the TFs at 70% identity and retains just one member of each cluster.
- Coding sequence (CDS) features from the
European Nucleotide Archive (ENA)
are included if the /experiment tag is set (implying that there is experimental evidence for the annotation),
the nucleotide entry links to paper(s) in PubMed,
and the nucleotide entry is from the STD data class
(implying that these are targeted annotated sequences, not from shotgun sequencing).
Also, to filter out genes whose transcription or translation was detected, but whose function
was not studied, nucleotide entries or papers with more than 25 such proteins are excluded.
Descriptions from ENA rely on the original submitter of the
sequence and cannot be updated by others, so they may be less reliable.
Except for GeneRIF and ENA,
the curated entries include a short curated
description of the protein's function.
For entries from BioLiP, the protein's function may not be known beyond binding to the ligand.
Many of these entries also link to articles in PubMed.
For more information see the
PaperBLAST paper (mSystems 2017)
or the code.
You can download PaperBLAST's database here.
Changes to PaperBLAST since the paper was written:
- November 2023: incorporated PRODORIC and RegPrecise. Many PRODORIC entries were not linked to a protein sequence (no UniProt identifier), so we added this information.
- February 2023: BioLiP changed their download format. PaperBLAST now includes their non-redundant subset. SitesBLAST and Sites on a Tree use a larger non-redundant subset that ensures that every ligand is represented within each cluster. This should ensure that every binding site is represented.
- June 2022: incorporated some coding sequences from ENA with the /experiment tag.
- March 2022: incorporated BioLiP.
- April 2020: incorporated TCDB.
- April 2019: EuropePMC now returns table entries in their search results. This has expanded PaperBLAST's database, but most of the new entries are of low relevance, and the resulting snippets are often just lists of locus tags with annotations.
- February 2018: the alignment page reports the conservation of the hit's functional sites (if available from from Swiss-Prot or UniProt)
- January 2018: incorporated BRENDA.
- December 2017: incorporated MetaCyc, CharProtDB, CAZy, REBASE, and the reannotations from the Fitness Browser.
- September 2017: EuropePMC no longer returns some table entries in their search results. This has shrunk PaperBLAST's database, but has also reduced the number of low-relevance hits.
Many of these changes are described in Interactive tools for functional annotation of bacterial genomes.
PaperBLAST cannot provide snippets for many of the papers that are
published in non-open-access journals. This limitation applies even if
the paper is marked as "free" on the publisher's web site and is
available in PubmedCentral or EuropePMC. If a journal that you publish
in is marked as "secret," please consider publishing elsewhere.
Many important articles are missing from PaperBLAST, either because
the article's full text is not in EuropePMC (as for many older
articles), or because the paper does not mention a protein identifier such as a locus tag, or because of PaperBLAST's heuristics. If you notice an
article that characterizes a protein's function but is missing from
PaperBLAST, please notify the curators at UniProt
or add an entry to GeneRIF.
Entries in either of these databases will eventually be incorporated
into PaperBLAST. Note that to add an entry to UniProt, you will need
to find the UniProt identifier for the protein. If the protein is not
already in UniProt, you can ask them to create an entry. To add an
entry to GeneRIF, you will need an NCBI Gene identifier, but
unfortunately many prokaryotic proteins in RefSeq do not have
corresponding Gene identifers.
References
PaperBLAST: Text-mining papers for information about homologs.
M. N. Price and A. P. Arkin (2017). mSystems, 10.1128/mSystems.00039-17.
Europe PMC in 2017.
M. Levchenko et al (2017). Nucleic Acids Research, 10.1093/nar/gkx1005.
Gene indexing: characterization and analysis of NLM's GeneRIFs.
J. A. Mitchell et al (2003). AMIA Annu Symp Proc 2003:460-464.
UniProt: the universal protein knowledgebase.
The UniProt Consortium (2016). Nucleic Acids Research, 10.1093/nar/gkw1099.
BRENDA in 2017: new perspectives and new tools in BRENDA.
S. Placzek et al (2017). Nucleic Acids Research, 10.1093/nar/gkw952.
The EcoCyc database: reflecting new knowledge about Escherichia coli K-12.
I. M. Keeseler et al (2016). Nucleic Acids Research, 10.1093/nar/gkw1003.
The MetaCyc database of metabolic pathways and enzymes.
R. Caspi et al (2018). Nucleic Acids Research, 10.1093/nar/gkx935.
CharProtDB: a database of experimentally characterized protein annotations.
R. Madupu et al (2012). Nucleic Acids Research, 10.1093/nar/gkr1133.
The carbohydrate-active enzymes database (CAZy) in 2013.
V. Lombard et al (2014). Nucleic Acids Research, 10.1093/nar/gkt1178.
The Transporter Classification Database (TCDB): recent advances
M. H. Saier, Jr. et al (2016). Nucleic Acids Research, 10.1093/nar/gkv1103.
REBASE - a database for DNA restriction and modification: enzymes, genes and genomes.
R. J. Roberts et al (2015). Nucleic Acids Research, 10.1093/nar/gku1046.
Deep annotation of protein function across diverse bacteria from mutant phenotypes.
M. N. Price et al (2016). bioRxiv, 10.1101/072470.
by Morgan Price,
Arkin group
Lawrence Berkeley National Laboratory