PaperBLAST
Full List of Papers Linked to VIMSS10086476
RR17_ARATH / P16180 Small ribosomal subunit protein uS17c; 30S ribosomal protein S17, chloroplastic; CS17 from Arabidopsis thaliana (Mouse-ear cress) (see paper)
rps17 / CAA77502.1 Plastid ribosomal protein CS17 from Arabidopsis thaliana (see 2 papers)
AT1G79850 RPS17 (RIBOSOMAL PROTEIN S17); structural constituent of ribosome from Arabidopsis thaliana
- function: One of the primary rRNA binding proteins, it binds specifically to the 5'-end of 16S ribosomal RNA (By similarity). Required for optimal plastid performance in terms of photosynthesis and growth. Required for the translation of plastid mRNAs. Plays a critical role in biosynthesis of thylakoid membrane proteins encoded by chloroplast genes (PubMed:22900828).
subunit: Part of the 30S ribosomal subunit
disruption phenotype: Reduced plant size and pale green leaves. - Plastid ribosome protein L5 is essential for post-globular embryo development in Arabidopsis thaliana
Dupouy, Plant reproduction 2022 - “...( 2011 ) S16 Essential ATCG00050 Essential Fleischmann et al. ( 2011 ) S17 Non-essential AT1G79850 Non-essential Woo et al. 2002 ; Romani et al. ( 2012 ); Lloyd and Meinke ( 2012 ) S18 Essential ATCG00650 Essential Rogalski et al. ( 2006 ) S19 Essential...”
- Spaceflight induces novel regulatory responses in Arabidopsis seedling as revealed by combined proteomic and transcriptomic analyses
Kruse, BMC plant biology 2020 - “...0.021 AT5G30510 Ribosomal protein S1 1.01 0.002 AT5G54600 50S ribosomal protein L24, chloroplastic 1.01 0.012 AT1G79850 30S ribosomal protein S17, chloroplastic 1.02 0.002 AT5G20180 Ribosomal protein L36 1.03 0.019 AT5G40950 50S ribosomal protein L27, chloroplastic 1.05 0.002 AT2G24090 Ribosomal protein L35 1.06 0.001 AT4G11175 Translation initiation...”
- “...(AT5G54600, L 2 FC RNA =1.01), PSRP3/1 (AT1G68590, L 2 FC RNA =1.13), and RPS17 (AT1G79850, L 2 FC RNA =1.02), translocons TIC55-II (AT2G24820, L 2 FC RNA =1.01), and TIC21 (AT2G15290, L 2 FC RNA =1.03), as well as OUTER ENVELOPE PROTEIN 16 (OEP16) (AT2G28900,...”
- Redox Conformation-Specific Protein-Protein Interactions of the 2-Cysteine Peroxiredoxin in Arabidopsis
Liebthal, Antioxidants (Basel, Switzerland) 2020 - “...C54D (Pseudo-Hyperoxidized) Reduced Oxidized Reduced Oxidized Reduced Oxidized 1 M NaCl 30S ribosomal protein S17 (AT1G79850) HTPA synthase 2 (AT2G45440) 30S ribosomal protein S6 alpha (AT1G64510) MECDP synthase (AT1G63970) Acyl-ACP thioesterase ATL3 (AT1G68260) HTPA synthase 1 (AT3G60880) DAHP synthase 2 (AT4G33510) Elongation factor 1 alpha (AT4G20360)...”
- “...reductase (AT1G20020) 50S ribosomal protein L27 (AT5G40950) Allene oxide synthase (AT5G42650) 50S ribosomal protein L29 (AT1G79850) Protein MET1 (AT1G55480) 50S ribosomal protein L5 (AT4G01310) AT5g64380/MSJ1_22 (AT5G64380) 60S ribosomal protein L26-1 (AT3G49910) Glutathione S-transferase F8 (AT2G47730) ATPase alpha subunit (AtCg00120) Dihydroxy-acid dehydratase (AT3G23940) PLAT domain-containing protein 1...”
- GUN1 and Plastid RNA Metabolism: Learning from Genetics
Tadini, Cells 2020 - “...: pale green cotyledons and leaves; reduced growth albino-seedling lethal A No [ 41 ] At1g79850 PRPS17: plastid ribosomal protein S17 prps17-1 d : pale green cotyledons and leaves; reduced growth albino-seedling lethal A No [ 41 ] Plastid Protein Import At5g16620 Tic40: subunit of the...”
- Separation and Paired Proteome Profiling of Plant Chloroplast and Cytoplasmic Ribosomes
Firmino, Plants (Basel, Switzerland) 2020 - “...NA ATCG01120 plastid 30S uS15c RPS15 + NA AT4G34620 plastid 30S bS16c RPS16 + NA AT1G79850 plastid 30S uS17c RPS17 + NA ATCG00650 plastid 30S bS18c RPS18 + NA ATCG00820 plastid 30S uS19c RPS19 + NA AT3G15190 plastid 30S bS20c RPS20 + NA AT3G27160 plastid 30S...”
- The ArathEULS3 Lectin Ends up in Stress Granules and Can Follow an Unconventional Route for Secretion
Dubiel, International journal of molecular sciences 2020 - “...66.0 At1g74970 30S ribosomal protein S9 RPS9 60.0 At1g78630 50S ribosomal protein L13 RPL13 107.9 At1G79850 30S ribosomal protein S17 RPS17 18.6 At1g07660 Histone H4 At1g07660 77.8 At4g35090 Catalase-2 CAT2 323.3 AtCG00820 30S ribosomal protein S19 rps19 69.7 At1g11860 Aminomethyltransferase, GDCST 39.8 At1G07320 50S ribosomal protein...”
- Systematic Review of Plant Ribosome Heterogeneity and Specialization
Martinez-Seidel, Frontiers in plant science 2020 - “...Dooner, 2004 ), albino at three leaf stage in rice ( Qiu etal., 2018 ). AT1G79850 RPS17 uS17c Reductions in growth, leaf pigments and photosynthesis ( Romani etal., 2012 ), embryo-lethal in maize ( Schultes etal., 2000 ). To reduce ambiguity of interpretation, we avoided to...”
- Integrated Transcriptional and Proteomic Profiling Reveals Potential Amino Acid Transporters Targeted by Nitrogen Limitation Adaptation
Liao, International journal of molecular sciences 2020 - “...protein 2.25 0.002 3.16 0.001 At5g47190 ribosomal protein L19 family protein 2.26 0.003 5.64 0.004 At1g79850 chloroplast 30S ribosomal protein S17 2.36 0.002 5.52 0.001 At4g29060 elongation factor Ts family protein 2.52 0.003 4.68 0.004 At3g15190 chloroplast 30S ribosomal protein S20 2.53 0.002 7.40 0.003 At2g24090...”
- Insights into the function of NADPH thioredoxin reductase C (NTRC) based on identification of NTRC-interacting proteins in vivo
González, Journal of experimental botany 2019 - “...2 (2) P56801 AtCg00770 RPS8, 30S ribosomal protein S8 C 2 2.72 1 (1) P16180 At1g79850 RPS17, 30S ribosomal protein small subunit protein 17 C 2 7.34 1 (1) P56807 AtCg00650 RPS18, 30S ribosomal protein S18 C 2 4.13 0 Q94K97 At5g24490 Putative 30S ribosomal protein...”
- Transcriptional analysis of sweet orange trees co-infected with 'Candidatus Liberibacter asiaticus' and mild or severe strains of Citrus tristeza virus
Fu, BMC genomics 2017 - “...chloroplast, putative 1.70 RPS13 orange1.1g030930m AT5G14320 30S ribosomal protein S13, chloroplast (CS13) 1.39 RPS17 orange1.1g033970m AT1G79850 ribosomal protein S17 1.48 RPS20 orange1.1g029900m AT3G15190 chloroplast 30S ribosomal protein S20, putative 1.11 1.69 RPS1 orange1.1g015066m AT5G30510 ribosomal protein S1 1.57 GHS1/S21 orange1.1g030080m AT3G27160 glucose hypersensitive1, structural constituent of...”
- Co-infection of Sweet Orange with Severe and Mild Strains of Citrus tristeza virus Is Overwhelmingly Dominated by the Severe Strain on Both the Transcriptional and Biological Levels
Fu, Frontiers in plant science 2017 - “...of ribosome 1.13 RPS10 Orange1.1g041275m AT3G13120 30S ribosomal protein S10, chloroplast, putative 1.35 RPS17 Orange1.1g033970m AT1G79850 Ribosomal protein S17; structural constituent of ribosome 1.01 RPL3 Orange1.1g023905m AT2G43030 Ribosomal protein L3 family protein 1.17 RPL5 Orange1.1g024440m AT4G01310 Ribosomal protein L5 family protein 1.21 RPL9 Orange1.1g029153m AT3G44890 Ribosomal...”
- Defects in the Expression of Chloroplast Proteins Leads to H2O2 Accumulation and Activation of Cyclic Electron Flow around Photosystem I
Strand, Frontiers in plant science 2016 - “...PRSP3 (Tiller et al., 2012 ). The rps17 mutant contains a T-DNA insert in the At1g79850 locus, resulting in decreased expression of RPS17 by 85% (Tiller et al., 2012 ). Both of these mutations resulted in partial loss of ribosomal proteins and impaired chloroplast translation (Tiller...”
- cDNA Library Screening Identifies Protein Interactors Potentially Involved in Non-Telomeric Roles of Arabidopsis Telomerase
Dokládal, Frontiers in plant science 2015 - “...0.23 2.35 0.05 Yes Tremousaygue et al., 1999 No co-regulation Genes encoding plastid ribosomal proteins At1g79850 PRPS17 0.68 0.02 1.68 0.28 Yes Tremousaygue et al., 1999 No co-regulation At2g33450 PRPL28 2.62 0.55 1.33 0.15 Yes Tremousaygue et al., 1999 No co-regulation Genes encoding translation factors At1g07940...”
- Expression profiling and functional analysis reveals that TOR is a key player in regulating photosynthesis and phytohormone signaling pathways in Arabidopsis
Dong, Frontiers in plant science 2015 - “...protein involved in chloroplast development 1.20 0.85 AT3G51890 Clathrin light chain 3 (CLC3) 1.73 0.83 AT1G79850 Pigment defective 347 (PDE347) 1.49 0.81 CARBON FIXATION AT2G01290 Ribose-5-phosphate isomerase 2 (RPI2) 1.01 0.82 AT1G71100 Ribose 5-phosphate isomerase 1.13 0.84 AT3G04790 Ribose 5-phosphate isomerase, type A protein 1.50 0.88...”
- Comparison of Leaf Sheath Transcriptome Profiles with Physiological Traits of Bread Wheat Cultivars under Salinity Stress
Takahashi, PloS one 2015 - “...e 1 Day 2 Day 1 Day 3 Day A_99_P457852 CA728141 4.04 10.91 Os04g0691600 * At1g79850 30S ribosomal protein S17 A_99_P144338 CJ851704 3.32 2.51 Os02g0190500 At1g16150 * WALL ASSOCIATED KINASE-LIKE 4 A_99_P069515 CD491253 2.74 2.49 Unknown A_99_P146292 CK153204 2.54 8.38 Unknown A_99_P481317 CD877401 2.48 6.31 Os03g0307200...”
- The translational apparatus of plastids and its role in plant development
Tiller, Molecular plant 2014 - “...et al., 2011 Atcg00050 Essential Fleischmann et al., 2011 rps17 Non-essential Shoji et al., 2011* At1g79850 Non-essential Schultes et al., 2000; Romani et al., 2012 rps18 Essential Shoji et al., 2011 Atcg00650 Essential Rogalski et al., 2006 rps19 Essential Shoji et al., 2011 Atcg00820 NA rps20...”
- Photosynthetic control of Arabidopsis leaf cytoplasmic translation initiation by protein phosphorylation
Boex-Fontvieille, PloS one 2013 - “...1 Ser 19 0 0 RPS14B At3g11510 1 1 0 0 Ser 19 0 RPS17 At1g79850 1 1 0 0 0 Thr 115 or Ser 117 RPS27A, RPS27B, RPS27C At2g45710, At3G61110, At5g47930 1 1 0 0 Ser 29 0 RPP0B At3g09200 1 1 0 0 Ser...”
- Downregulation of chloroplast RPS1 negatively modulates nuclear heat-responsive expression of HsfA2 and its target genes in Arabidopsis
Yu, PLoS genetics 2012 - “...expression in rps17 mutant plants leads to heat susceptibility. (A) Schematic diagram of RPS17 gene (At1g79850) showing the T-DNA insertion site. Open box indicates 5or 3UTR; Closed box indicates ORF. The T-DNA insertion site and positions of the start and stop codons are indicated (SALK_066943). (B)...”
- Cell surface and intracellular auxin signalling for H<sup>+</sup> fluxes in root growth
Li, Nature 2021 - “...2.94711492 2 8.340412262 P38666;Q42347 60S ribosomal protein L24-2;60S ribosomal protein L24-1 2.98809878 1.73797235 2 3.851309445 P16180 30S ribosomal protein S17, chloroplastic 2.459372203 0.85221949 2 1.834243789 Q94AH6 Cullin-1 2.273972829 2.46638828 2 6.212296758 F4JZ17;Q39129 Thiosulfate sulfurtransferase 16, chloroplastic 2.201576869 0.96994873 2 2.070539977 Q42431 Oleosin 20.3 kDa 2.001925786 0.87062371...”
- Insights into the function of NADPH thioredoxin reductase C (NTRC) based on identification of NTRC-interacting proteins in vivo
González, Journal of experimental botany 2019 - “...3.12 2 (2) P56801 AtCg00770 RPS8, 30S ribosomal protein S8 C 2 2.72 1 (1) P16180 At1g79850 RPS17, 30S ribosomal protein small subunit protein 17 C 2 7.34 1 (1) P56807 AtCg00650 RPS18, 30S ribosomal protein S18 C 2 4.13 0 Q94K97 At5g24490 Putative 30S ribosomal...”
For advice on how to use these tools together, see
Interactive tools for functional annotation of bacterial genomes.
The PaperBLAST database links 793,807 different protein sequences to 1,259,118 scientific articles. Searches against EuropePMC were last performed on March 13 2025.
PaperBLAST builds a database of protein sequences that are linked
to scientific articles. These links come from automated text searches
against the articles in EuropePMC
and from manually-curated information from GeneRIF, UniProtKB/Swiss-Prot,
BRENDA,
CAZy (as made available by dbCAN),
BioLiP,
CharProtDB,
MetaCyc,
EcoCyc,
TCDB,
REBASE,
the Fitness Browser,
and a subset of the European Nucleotide Archive with the /experiment tag.
Given this database and a protein sequence query,
PaperBLAST uses protein-protein BLAST
to find similar sequences with E < 0.001.
To build the database, we query EuropePMC with locus tags, with RefSeq protein
identifiers, and with UniProt
accessions. We obtain the locus tags from RefSeq or from MicrobesOnline. We use
queries of the form "locus_tag AND genus_name" to try to ensure that
the paper is actually discussing that gene. Because EuropePMC indexes
most recent biomedical papers, even if they are not open access, some
of the links may be to papers that you cannot read or that our
computers cannot read. We query each of these identifiers that
appears in the open access part of EuropePMC, as well as every locus
tag that appears in the 500 most-referenced genomes, so that a gene
may appear in the PaperBLAST results even though none of the papers
that mention it are open access. We also incorporate text-mined links
from EuropePMC that link open access articles to UniProt or RefSeq
identifiers. (This yields some additional links because EuropePMC
uses different heuristics for their text mining than we do.)
For every article that mentions a locus tag, a RefSeq protein
identifier, or a UniProt accession, we try to select one or two
snippets of text that refer to the protein. If we cannot get access to
the full text, we try to select a snippet from the abstract, but
unfortunately, unique identifiers such as locus tags are rarely
provided in abstracts.
PaperBLAST also incorporates manually-curated protein functions:
- Proteins from NCBI's RefSeq are included if a
GeneRIF
entry links the gene to an article in
PubMed®.
GeneRIF also provides a short summary of the article's claim about the
protein, which is shown instead of a snippet.
- Proteins from Swiss-Prot (the curated part of UniProt)
are included if the curators
identified experimental evidence for the protein's function (evidence
code ECO:0000269). For these proteins, the fields of the Swiss-Prot entry that
describe the protein's function are shown (with bold headings).
- Proteins from BRENDA,
a curated database of enzymes, are included if they are linked to a paper in PubMed
and their full sequence is known.
- Every protein from the non-redundant subset of
BioLiP,
a database
of ligand-binding sites and catalytic residues in protein structures, is included. Since BioLiP itself
does not include descriptions of the proteins, those are taken from the
Protein Data Bank.
Descriptions from PDB rely on the original submitter of the
structure and cannot be updated by others, so they may be less reliable.
(For SitesBLAST and Sites on a Tree, we use a larger subset of BioLiP so that every
ligand is represented among a group of structures with similar sequences, but for
PaperBLAST, we use the non-redundant set provided by BioLiP.)
- Every protein from EcoCyc, a curated
database of the proteins in Escherichia coli K-12, is included, regardless
of whether they are characterized or not.
- Proteins from the MetaCyc metabolic pathway database
are included if they are linked to a paper in PubMed and their full sequence is known.
- Proteins from the Transport Classification Database (TCDB)
are included if they have known substrate(s), have reference(s),
and are not described as uncharacterized or putative.
(Some of the references are not visible on the PaperBLAST web site.)
- Every protein from CharProtDB,
a database of experimentally characterized protein annotations, is included.
- Proteins from the CAZy database of carbohydrate-active enzymes
are included if they are associated with an Enzyme Classification number.
Even though CAZy does not provide links from individual protein sequences to papers,
these should all be experimentally-characterized proteins.
- Proteins from the REBASE database
of restriction enzymes are included if they have known specificity.
- Every protein with an evidence-based reannotation (based on mutant phenotypes)
in the Fitness Browser is included.
- Sequence-specific transcription factors (including sigma factors and DNA-binding response regulators)
with experimentally-determined DNA binding sites from the
PRODORIC database of gene regulation in prokaryotes.
- Putative transcription factors from RegPrecise
that have manually-curated predictions for their binding sites. These predictions are based on
conserved putative regulatory sites across genomes that contain similar transcription factors,
so PaperBLAST clusters the TFs at 70% identity and retains just one member of each cluster.
- Coding sequence (CDS) features from the
European Nucleotide Archive (ENA)
are included if the /experiment tag is set (implying that there is experimental evidence for the annotation),
the nucleotide entry links to paper(s) in PubMed,
and the nucleotide entry is from the STD data class
(implying that these are targeted annotated sequences, not from shotgun sequencing).
Also, to filter out genes whose transcription or translation was detected, but whose function
was not studied, nucleotide entries or papers with more than 25 such proteins are excluded.
Descriptions from ENA rely on the original submitter of the
sequence and cannot be updated by others, so they may be less reliable.
Except for GeneRIF and ENA,
the curated entries include a short curated
description of the protein's function.
For entries from BioLiP, the protein's function may not be known beyond binding to the ligand.
Many of these entries also link to articles in PubMed.
For more information see the
PaperBLAST paper (mSystems 2017)
or the code.
You can download PaperBLAST's database here.
Changes to PaperBLAST since the paper was written:
- November 2023: incorporated PRODORIC and RegPrecise. Many PRODORIC entries were not linked to a protein sequence (no UniProt identifier), so we added this information.
- February 2023: BioLiP changed their download format. PaperBLAST now includes their non-redundant subset. SitesBLAST and Sites on a Tree use a larger non-redundant subset that ensures that every ligand is represented within each cluster. This should ensure that every binding site is represented.
- June 2022: incorporated some coding sequences from ENA with the /experiment tag.
- March 2022: incorporated BioLiP.
- April 2020: incorporated TCDB.
- April 2019: EuropePMC now returns table entries in their search results. This has expanded PaperBLAST's database, but most of the new entries are of low relevance, and the resulting snippets are often just lists of locus tags with annotations.
- February 2018: the alignment page reports the conservation of the hit's functional sites (if available from from Swiss-Prot or UniProt)
- January 2018: incorporated BRENDA.
- December 2017: incorporated MetaCyc, CharProtDB, CAZy, REBASE, and the reannotations from the Fitness Browser.
- September 2017: EuropePMC no longer returns some table entries in their search results. This has shrunk PaperBLAST's database, but has also reduced the number of low-relevance hits.
Many of these changes are described in Interactive tools for functional annotation of bacterial genomes.
PaperBLAST cannot provide snippets for many of the papers that are
published in non-open-access journals. This limitation applies even if
the paper is marked as "free" on the publisher's web site and is
available in PubmedCentral or EuropePMC. If a journal that you publish
in is marked as "secret," please consider publishing elsewhere.
Many important articles are missing from PaperBLAST, either because
the article's full text is not in EuropePMC (as for many older
articles), or because the paper does not mention a protein identifier such as a locus tag, or because of PaperBLAST's heuristics. If you notice an
article that characterizes a protein's function but is missing from
PaperBLAST, please notify the curators at UniProt
or add an entry to GeneRIF.
Entries in either of these databases will eventually be incorporated
into PaperBLAST. Note that to add an entry to UniProt, you will need
to find the UniProt identifier for the protein. If the protein is not
already in UniProt, you can ask them to create an entry. To add an
entry to GeneRIF, you will need an NCBI Gene identifier, but
unfortunately many prokaryotic proteins in RefSeq do not have
corresponding Gene identifers.
References
PaperBLAST: Text-mining papers for information about homologs.
M. N. Price and A. P. Arkin (2017). mSystems, 10.1128/mSystems.00039-17.
Europe PMC in 2017.
M. Levchenko et al (2017). Nucleic Acids Research, 10.1093/nar/gkx1005.
Gene indexing: characterization and analysis of NLM's GeneRIFs.
J. A. Mitchell et al (2003). AMIA Annu Symp Proc 2003:460-464.
UniProt: the universal protein knowledgebase.
The UniProt Consortium (2016). Nucleic Acids Research, 10.1093/nar/gkw1099.
BRENDA in 2017: new perspectives and new tools in BRENDA.
S. Placzek et al (2017). Nucleic Acids Research, 10.1093/nar/gkw952.
The EcoCyc database: reflecting new knowledge about Escherichia coli K-12.
I. M. Keeseler et al (2016). Nucleic Acids Research, 10.1093/nar/gkw1003.
The MetaCyc database of metabolic pathways and enzymes.
R. Caspi et al (2018). Nucleic Acids Research, 10.1093/nar/gkx935.
CharProtDB: a database of experimentally characterized protein annotations.
R. Madupu et al (2012). Nucleic Acids Research, 10.1093/nar/gkr1133.
The carbohydrate-active enzymes database (CAZy) in 2013.
V. Lombard et al (2014). Nucleic Acids Research, 10.1093/nar/gkt1178.
The Transporter Classification Database (TCDB): recent advances
M. H. Saier, Jr. et al (2016). Nucleic Acids Research, 10.1093/nar/gkv1103.
REBASE - a database for DNA restriction and modification: enzymes, genes and genomes.
R. J. Roberts et al (2015). Nucleic Acids Research, 10.1093/nar/gku1046.
Deep annotation of protein function across diverse bacteria from mutant phenotypes.
M. N. Price et al (2016). bioRxiv, 10.1101/072470.
by Morgan Price,
Arkin group
Lawrence Berkeley National Laboratory