PaperBLAST
Full List of Papers Linked to VIMSS10099062
SECA1_ARATH / Q9SYI0 Protein translocase subunit SECA1, chloroplastic; AtcpSecA; Protein ALBINO OR GLASSY YELLOW 1; EC 7.4.2.4 from Arabidopsis thaliana (Mouse-ear cress) (see 2 papers)
AT4G01800 preprotein translocase secA subunit, putative from Arabidopsis thaliana
- function: Has a central role in coupling the hydrolysis of ATP to the transfer of proteins across the thylakoid membrane. Involved in photosynthetic acclimation and required for chloroplast biogenesis.
catalytic activity: ATP + H2O + chloroplast-proteinSide 1 = ADP + phosphate + chloroplast-proteinSide 2.
subunit: Part of the Sec protein translocation apparatus. Interacts probably with SCY1
disruption phenotype: Seedling lethal. Albino seedlings with yellow and translucent (glassy) lateral organs when grown heterotrophically. - Yellow barley xan-m mutants are deficient in the motor unit SECA1 of the SEC1 translocase system
Stuart, Planta 2025 (no snippet) - Defining the heterogeneous composition of Arabidopsis thylakoid membrane
Trotta, The Plant journal : for cell and molecular biology 2025 (no snippet) - Light Quality Modulates Plant Cold Response and Freezing Tolerance
Kameniarová, Frontiers in plant science 2022 - “...), calcium-sensing receptor CaS (AT5G23060; Huang et al., 2012 ), and protein translocase subunit SECA1 (AT4G01800; Skalitzky et al., 2011 ). An increase in abundance was found also for a protein required for chlorophyll accumulation under normal growth conditions (GUN4, AT3G59400; Larkin et al., 2003 ),...”
- artMAP: A user-friendly tool for mapping ethyl methanesulfonate-induced mutations in Arabidopsis
Javorka, Plant direct 2019 - “...SIMPLE Mapped by artMAP 300 AT3G13870 Ser584Phe + + 3004 AT3G13870 Ser584Phe + + 3007 AT4G01800 Arg752* + + EMS608 AT5G24630 Gly324Glu + + EMS633 AT3G54660 Ala264Thr + + John Wiley & Sons, Ltd 3 Experimental procedure The biggest challenge in creating artMAP was integration of...”
- Natural variation among Arabidopsis thaliana accessions in tolerance to high magnesium supply
Niu, Scientific reports 2018 - “...under the normal Mg 2+ , had a MAF of 0.079 and was located within AT4G01800 ALBINO OR GLASSY YELLOW 1 ( AGY1 ) gene which was co-expressed with MAGNESIUM CHELATASE I2 (CHLI2) while CHLI2 regulates the function of Mg 2+ chelatase 48 . It should...”
- Consideration of non-canonical splice sites improves gene prediction on the Arabidopsis thaliana Niederzenz-1 genome sequence
Pucker, BMC research notes 2017 - “...) TCGGGTTCATCAATCGAGCATCC 23 Reverse 61 S017 At1g79350 ( FGT1 ) AAGAACAGGTAGTTTCTCCTGCTCC 25 Reverse 60 S003 At4g01800 ( AGY1 ) ACTGGTGAAGGGAAAACGCTTG 22 Forward 59 S004 At4g01800 ( AGY1 ) AATGTATATCCCGCTCAAAGGCTG 24 Reverse 59 S005 At4g01800 ( AGY1 ) TCTTCTGCTTTTCATCAACAGTGTAATG 28 Reverse 58 S018 At4g27500 ( PPI1 )...”
- “...in the Col-0 nucleome within the Araport11 annotation, as well as the first transcripts of At4g01800 and At3g10350, were mapped to the Nd-1 genome sequence via BLAT [ 43 ]. Perl scripts provided in the AUGUSTUS package filterPSL.pl and blat2hints.pl ( http://bioinf.uni-greifswald.de/augustus/binaries/scripts/ ) were used to...”
- Identification and Roles of Photosystem II Assembly, Stability, and Repair Factors in Arabidopsis
Lu, Frontiers in plant science 2016 - “...; Schunemann, 2007 ; Cline and Dabney-Smith, 2008 ; Walter et al., 2015 cpSecA1 sll0616 At4g01800 117 111 CS, TM Thylakoid protein targeting: cpSec translocase Insertion and assembly of PSII proteins such as PsbO Cline and Theg, 2007 ; Schunemann, 2007 ; Cline and Dabney-Smith, 2008...”
- Transcriptional profiling unravels potential metabolic activities of the olive leaf non-glandular trichome
Koudounas, Frontiers in plant science 2015 - “...al., 2009 ). Finally, a unigene similar to Arabidopsis AGY1 (Albino or Glassy Yellow 1, At4g01800), which encodes a subunit of the translocase subunit secA, was identified. Loss of AGY1 function leads to decreased branching of Arabidopsis trichome. Transcriptional analysis employing the -glucuronidase ( GUS )...”
- Proteasome targeting of proteins in Arabidopsis leaf mesophyll, epidermal and vascular tissues
Svozil, Frontiers in plant science 2015 - “...Peptidase M50 family protein AT1G73060 LPA3 Low PSII accumulation 3 Incorporation of proteins in photosystems AT4G01800 AGY1 Albino or glassy yellow 1 AT2G45770 cpFTSY Signal recognition particle receptor protein, chloroplast (FTSY) AT1G08380 PSAO Photosystem I subunit O Photosystem components ATCG01010 NDHF NADH-Ubiquinone oxidoreductase (complex I), chain...”
- Chaperone-assisted Post-translational Transport of Plastidic Type I Signal Peptidase 1
Endow, The Journal of biological chemistry 2015 - “...The cDNA sequence encoding residues 63-1042 of cpSecA1 (At4g01800) was amplified by PCR using cDNA synthesized from total RNA isolated from 15-day-old...”
- Plastids contain a second sec translocase system with essential functions
Skalitzky, Plant physiology 2011 (PubMed)- “...Sec system, designated as SCY1 (At2g18710), SECA1 (At4g01800), and SECE1 (At4g14870) in Arabidopsis (Arabidopsis thaliana), result in albino seedlings and...”
- “...genome contains two loci that encode SecA proteins, At4g01800 and At1g21650. We initiated a study of the functions of these proteins by identifying T-DNA...”
- A transcriptional analysis of carotenoid, chlorophyll and plastidial isoprenoid biosynthesis genes during development and osmotic stress responses in Arabidopsis thaliana
Meier, BMC systems biology 2011 - “...2;1 (PHT2;1) Pd, PP, CPl AT5G04140 0.881 Glutamate synthase 1 (GLU1)/ferredoxin-dependent AB, Pd, CPl, OR AT4G01800 0.877 Preprotein translocase secA subunit, chloroplast [precursor] AT1G11860 0.876 Aminomethyltransferase, mitochondrial precursor AT1G45474 0.874 Photosystem I light harvesting complex gene 5 (LHCA5) PS, PM, PSL, TP AT1G73110 0.874 Ribulose bisphosphate...”
- Proteome and Interactome Linked to Metabolism, Genetic Information Processing, and Abiotic Stress in Gametophytes of Two Woodferns
Ojosnegros, International journal of molecular sciences 2023 - “...Q9SKQ0 CYP19-2 PEPTIDYL-PROLYL CIS-TRANS ISOMERASE CYP19-2 21.6 27 5 24 2.24 10 90 Sorting 19573-562_5_ORF2 Q9SYI0 SECA1 PROTEIN TRANSLOCASE SUBUNIT SECA1 115.7 2 1 2 0 Sorting 146969-201_2_ORF1 F4JL11 IMPA2 IMPORTIN SUBUNIT ALPHA-2 59.1 5 0 2 6 10 61 Sorting 151836-193_1_ORF2 P40941 AAC2 ADP, ATP...”
For advice on how to use these tools together, see
Interactive tools for functional annotation of bacterial genomes.
The PaperBLAST database links 793,807 different protein sequences to 1,259,118 scientific articles. Searches against EuropePMC were last performed on March 13 2025.
PaperBLAST builds a database of protein sequences that are linked
to scientific articles. These links come from automated text searches
against the articles in EuropePMC
and from manually-curated information from GeneRIF, UniProtKB/Swiss-Prot,
BRENDA,
CAZy (as made available by dbCAN),
BioLiP,
CharProtDB,
MetaCyc,
EcoCyc,
TCDB,
REBASE,
the Fitness Browser,
and a subset of the European Nucleotide Archive with the /experiment tag.
Given this database and a protein sequence query,
PaperBLAST uses protein-protein BLAST
to find similar sequences with E < 0.001.
To build the database, we query EuropePMC with locus tags, with RefSeq protein
identifiers, and with UniProt
accessions. We obtain the locus tags from RefSeq or from MicrobesOnline. We use
queries of the form "locus_tag AND genus_name" to try to ensure that
the paper is actually discussing that gene. Because EuropePMC indexes
most recent biomedical papers, even if they are not open access, some
of the links may be to papers that you cannot read or that our
computers cannot read. We query each of these identifiers that
appears in the open access part of EuropePMC, as well as every locus
tag that appears in the 500 most-referenced genomes, so that a gene
may appear in the PaperBLAST results even though none of the papers
that mention it are open access. We also incorporate text-mined links
from EuropePMC that link open access articles to UniProt or RefSeq
identifiers. (This yields some additional links because EuropePMC
uses different heuristics for their text mining than we do.)
For every article that mentions a locus tag, a RefSeq protein
identifier, or a UniProt accession, we try to select one or two
snippets of text that refer to the protein. If we cannot get access to
the full text, we try to select a snippet from the abstract, but
unfortunately, unique identifiers such as locus tags are rarely
provided in abstracts.
PaperBLAST also incorporates manually-curated protein functions:
- Proteins from NCBI's RefSeq are included if a
GeneRIF
entry links the gene to an article in
PubMed®.
GeneRIF also provides a short summary of the article's claim about the
protein, which is shown instead of a snippet.
- Proteins from Swiss-Prot (the curated part of UniProt)
are included if the curators
identified experimental evidence for the protein's function (evidence
code ECO:0000269). For these proteins, the fields of the Swiss-Prot entry that
describe the protein's function are shown (with bold headings).
- Proteins from BRENDA,
a curated database of enzymes, are included if they are linked to a paper in PubMed
and their full sequence is known.
- Every protein from the non-redundant subset of
BioLiP,
a database
of ligand-binding sites and catalytic residues in protein structures, is included. Since BioLiP itself
does not include descriptions of the proteins, those are taken from the
Protein Data Bank.
Descriptions from PDB rely on the original submitter of the
structure and cannot be updated by others, so they may be less reliable.
(For SitesBLAST and Sites on a Tree, we use a larger subset of BioLiP so that every
ligand is represented among a group of structures with similar sequences, but for
PaperBLAST, we use the non-redundant set provided by BioLiP.)
- Every protein from EcoCyc, a curated
database of the proteins in Escherichia coli K-12, is included, regardless
of whether they are characterized or not.
- Proteins from the MetaCyc metabolic pathway database
are included if they are linked to a paper in PubMed and their full sequence is known.
- Proteins from the Transport Classification Database (TCDB)
are included if they have known substrate(s), have reference(s),
and are not described as uncharacterized or putative.
(Some of the references are not visible on the PaperBLAST web site.)
- Every protein from CharProtDB,
a database of experimentally characterized protein annotations, is included.
- Proteins from the CAZy database of carbohydrate-active enzymes
are included if they are associated with an Enzyme Classification number.
Even though CAZy does not provide links from individual protein sequences to papers,
these should all be experimentally-characterized proteins.
- Proteins from the REBASE database
of restriction enzymes are included if they have known specificity.
- Every protein with an evidence-based reannotation (based on mutant phenotypes)
in the Fitness Browser is included.
- Sequence-specific transcription factors (including sigma factors and DNA-binding response regulators)
with experimentally-determined DNA binding sites from the
PRODORIC database of gene regulation in prokaryotes.
- Putative transcription factors from RegPrecise
that have manually-curated predictions for their binding sites. These predictions are based on
conserved putative regulatory sites across genomes that contain similar transcription factors,
so PaperBLAST clusters the TFs at 70% identity and retains just one member of each cluster.
- Coding sequence (CDS) features from the
European Nucleotide Archive (ENA)
are included if the /experiment tag is set (implying that there is experimental evidence for the annotation),
the nucleotide entry links to paper(s) in PubMed,
and the nucleotide entry is from the STD data class
(implying that these are targeted annotated sequences, not from shotgun sequencing).
Also, to filter out genes whose transcription or translation was detected, but whose function
was not studied, nucleotide entries or papers with more than 25 such proteins are excluded.
Descriptions from ENA rely on the original submitter of the
sequence and cannot be updated by others, so they may be less reliable.
Except for GeneRIF and ENA,
the curated entries include a short curated
description of the protein's function.
For entries from BioLiP, the protein's function may not be known beyond binding to the ligand.
Many of these entries also link to articles in PubMed.
For more information see the
PaperBLAST paper (mSystems 2017)
or the code.
You can download PaperBLAST's database here.
Changes to PaperBLAST since the paper was written:
- November 2023: incorporated PRODORIC and RegPrecise. Many PRODORIC entries were not linked to a protein sequence (no UniProt identifier), so we added this information.
- February 2023: BioLiP changed their download format. PaperBLAST now includes their non-redundant subset. SitesBLAST and Sites on a Tree use a larger non-redundant subset that ensures that every ligand is represented within each cluster. This should ensure that every binding site is represented.
- June 2022: incorporated some coding sequences from ENA with the /experiment tag.
- March 2022: incorporated BioLiP.
- April 2020: incorporated TCDB.
- April 2019: EuropePMC now returns table entries in their search results. This has expanded PaperBLAST's database, but most of the new entries are of low relevance, and the resulting snippets are often just lists of locus tags with annotations.
- February 2018: the alignment page reports the conservation of the hit's functional sites (if available from from Swiss-Prot or UniProt)
- January 2018: incorporated BRENDA.
- December 2017: incorporated MetaCyc, CharProtDB, CAZy, REBASE, and the reannotations from the Fitness Browser.
- September 2017: EuropePMC no longer returns some table entries in their search results. This has shrunk PaperBLAST's database, but has also reduced the number of low-relevance hits.
Many of these changes are described in Interactive tools for functional annotation of bacterial genomes.
PaperBLAST cannot provide snippets for many of the papers that are
published in non-open-access journals. This limitation applies even if
the paper is marked as "free" on the publisher's web site and is
available in PubmedCentral or EuropePMC. If a journal that you publish
in is marked as "secret," please consider publishing elsewhere.
Many important articles are missing from PaperBLAST, either because
the article's full text is not in EuropePMC (as for many older
articles), or because the paper does not mention a protein identifier such as a locus tag, or because of PaperBLAST's heuristics. If you notice an
article that characterizes a protein's function but is missing from
PaperBLAST, please notify the curators at UniProt
or add an entry to GeneRIF.
Entries in either of these databases will eventually be incorporated
into PaperBLAST. Note that to add an entry to UniProt, you will need
to find the UniProt identifier for the protein. If the protein is not
already in UniProt, you can ask them to create an entry. To add an
entry to GeneRIF, you will need an NCBI Gene identifier, but
unfortunately many prokaryotic proteins in RefSeq do not have
corresponding Gene identifers.
References
PaperBLAST: Text-mining papers for information about homologs.
M. N. Price and A. P. Arkin (2017). mSystems, 10.1128/mSystems.00039-17.
Europe PMC in 2017.
M. Levchenko et al (2017). Nucleic Acids Research, 10.1093/nar/gkx1005.
Gene indexing: characterization and analysis of NLM's GeneRIFs.
J. A. Mitchell et al (2003). AMIA Annu Symp Proc 2003:460-464.
UniProt: the universal protein knowledgebase.
The UniProt Consortium (2016). Nucleic Acids Research, 10.1093/nar/gkw1099.
BRENDA in 2017: new perspectives and new tools in BRENDA.
S. Placzek et al (2017). Nucleic Acids Research, 10.1093/nar/gkw952.
The EcoCyc database: reflecting new knowledge about Escherichia coli K-12.
I. M. Keeseler et al (2016). Nucleic Acids Research, 10.1093/nar/gkw1003.
The MetaCyc database of metabolic pathways and enzymes.
R. Caspi et al (2018). Nucleic Acids Research, 10.1093/nar/gkx935.
CharProtDB: a database of experimentally characterized protein annotations.
R. Madupu et al (2012). Nucleic Acids Research, 10.1093/nar/gkr1133.
The carbohydrate-active enzymes database (CAZy) in 2013.
V. Lombard et al (2014). Nucleic Acids Research, 10.1093/nar/gkt1178.
The Transporter Classification Database (TCDB): recent advances
M. H. Saier, Jr. et al (2016). Nucleic Acids Research, 10.1093/nar/gkv1103.
REBASE - a database for DNA restriction and modification: enzymes, genes and genomes.
R. J. Roberts et al (2015). Nucleic Acids Research, 10.1093/nar/gku1046.
Deep annotation of protein function across diverse bacteria from mutant phenotypes.
M. N. Price et al (2016). bioRxiv, 10.1101/072470.
by Morgan Price,
Arkin group
Lawrence Berkeley National Laboratory