PaperBLAST
Full List of Papers Linked to VIMSS10089329
AMPL1_ARATH / P30184 Leucine aminopeptidase 1; Leucyl aminopeptidase 1; AtLAP1; Proline aminopeptidase 1; Prolyl aminopeptidase 1; EC 3.4.11.1; EC 3.4.11.5 from Arabidopsis thaliana (Mouse-ear cress) (see 3 papers)
AT2G24200 cytosol aminopeptidase from Arabidopsis thaliana
NP_179997 Cytosol aminopeptidase family protein from Arabidopsis thaliana
- function: Presumably involved in the processing and regular turnover of intracellular proteins. Catalyzes the removal of unsubstituted N- terminal amino acids from various peptides (Probable). Possesses leucine aminopeptidase activity against the model substrate leucine- amido methyl coumarin (PubMed:22493451). Possesses Cys-Gly dipeptidase activity. In addition, can cleave Cys-Leu and Leu-Cys dipeptides (PubMed:25716890).
function: Functions as a molecular chaperone to protect proteins from heat-induced damage.
catalytic activity: Release of an N-terminal amino acid, Xaa-|-Yaa-, in which Xaa is preferably Leu, but may be other amino acids including Pro although not Arg or Lys, and Yaa may be Pro. Amino acid amides and methyl esters are also readily hydrolyzed, but rates on arylamides are exceedingly low.
catalytic activity: Release of N-terminal proline from a peptide.
cofactor: Mn(2+) (Binds 2 Mn(2+) ions per subunit.)
subunit: Homohexamer (dimer of homotrimers). - Marine Invertebrates: A Promissory Still Unexplored Source of Inhibitors of Biomedically Relevant Metallo Aminopeptidases Belonging to the M1 and M17 Families
Pascual, Marine drugs 2023 - “...thaliana ) - M17.A01 Arabidopsis thaliana At4g30910 ( Arabidopsis thaliana ) - M17.A02 Arabidopsis thaliana At2g24200 ( Arabidopsis thaliana ) - M17.A03 Arabidopsis thaliana CG7340 g.p. ( Drosophila melanogaster ) - M17.A04 Drosophila melanogaster ZK353.6 ( Caenorhabditis elegans ) - M17.A05 Caenorhabditis elegans * IUBMB: International...”
- What Antarctic Plants Can Tell Us about Climate Changes: Temperature as a Driver for Metabolic Reprogramming
Bertini, Biomolecules 2021 - “...subtilase family protein 1.42 10 3 1.76 AT4G35090 CAT2, catalase 2 7.64 10 4 1.61 AT2G24200 Cytosol aminopeptidase family protein (ATLAP1, LAP1, LEUCYL AMINOPEPTIDASE 1) 9.74 10 4 1.60 AT5G58070 ATTIL, TIL, temperature-induced lipocalin 5.70 10 3 1.40 AT1G03090 MCCA, methylcrotonyl-CoA carboxylase alpha chain, mitochondrial/3-methylcrotonyl-CoA carboxylase...”
- Function and Regulation of Chloroplast Peroxiredoxin IIE
Dreyer, Antioxidants (Basel, Switzerland) 2021 - “...aldolase 4 AT4G26530 O65581 Fructose-bisphosphate aldolase 5 AT5G49910 Q9LTX9 Heat shock 70 kDa protein 7 AT2G24200 P30184 Leucine aminopeptidase 1 AT5G45930 Q5XF33 Magnesium-chelatase subunit ChII-2 AT1G70890 Q9SSK5 MLP-like protein 43 AT5G26000 P37702 Myrosinase 1 AT3G62030 P34791 Peptidyl-prolyl cis-trans isomerase CYP20-3 AT2G29630 O82392 Phosphomethylpyrimidine synthase AT5G52920 Q9FLW9...”
- Proteomic analysis of haem-binding protein from Arabidopsis thaliana and Cyanidioschyzon merolae
Shimizu, Philosophical transactions of the Royal Society of London. Series B, Biological sciences 2020 - “...38.367 338353 LVEMHFPLPEGRSPSR peak 10 2254.161 15.379 317337 TANFPQIYAVGRAAASRHAPR AspGluAlaAsp (DEAD)-box ATP-dependent RNA helicase CML137C At2g24200 nucleus 66.4 24.4 peak 4 870.56 19.981 165172 VAVLSLLR peak 5 944.552 11.224 391398 LLAEEISK peak 8 1851.882 53.312 5773 IQSVPGVPQELADTLER similar to GTPase-activating protein CMJ230C At4g15850 vesicle 55.0 23.3...”
- Identifying Early Warning Signals for the Sudden Transition from Mild to Severe Tobacco Etch Disease by Dynamical Network Biomarkers
Tarazona, Viruses 2019 - “...20S PROTEASOME BETA SUBUNIT G1 , PBG1 ), At2g16600 ( CYCLOPHILIN 19 , CYP19 ), At2g24200 ( LEUCYL AMINOPEPTIDASE 1 , LAP1 ), At3g02630 ( ACYL ACYL CARRIER PROTEIN (ACP) DESATURASE 5 , AAD5 ), At4g26130 (hypothetical protein described as a cotton fiber protein), At5g2107 (hypothetical...”
- Increases in activity of proteasome and papain-like cysteine protease in Arabidopsis autophagy mutants: back-up compensatory effect or cell-death promoting effect?
Havé, Journal of experimental botany 2018 - “...https://www.ebi.ac.uk/merops/ ), UniProt ( www.uniprot.org/ ), agriGO (bioinfo.cau.edu.cn/agriGO), and Mapman (mapman.gabipd.org/). Accession numbers AT4G38220; AT2G27020; AT2G24200; AT4G20850; AT5G35590; AT4G31300; AT3G22110; AT2G05840; AT4G14800; AT3G60820; AT1G53750; AT3G05530; AT4G17510; AT5G05780; AT5G58290; AT5G10540; AT1G21720; AT1G56450; AT5G42790; AT4G01610; AT1G47128; AT1G53850; AT5G45890; AT5G51070; AT3G13235; AT1G79340; AT1G50380; AT3G51260; AT5G23540; AT4G38630; AT1G51710; AT5G66140; AT4G30910;...”
- “...1.77 1.84 2.21 XICs AT2G27020 PAG1 Thr 20S (CP ) 1.41 1.79 1.64 1.84 XICs AT2G24200 LAP1 Met PM,C 1.44 1.55 1.42 1.41 XICs AT4G20850 TPP2 Ser Pl 1.38 1.48 1.38 1.55 XICs AT5G35590 PAA1 Thr 20S (CP ) 1.24 1.58 1.28 1.63 XICs AT4G31300 PBA1...”
- DYn-2 Based Identification of Arabidopsis Sulfenomes
Akter, Molecular & cellular proteomics : MCP 2015 - “...protein AT1G16350 AT1G09780 AT1G11840 AT5G13520 AT5G60160 AT2G24200 AT3G06650 AT3G06580 AT2G41530 AT5G58330 Tubulin binding cofactor C domain-containing protein...”
- “...Protein degradation AT5G36210 AT5G13520 AT1G22920 AT2G24200 AT1G09210 AT1G56340 Primary metabolism AT3G06650 AT4G24830 AT3G48000 AT1G24180 AT5G44340 AT5G19770...”
- Proteomic analysis of endoplasmic reticulum stress responses in rice seeds
Qian, Scientific reports 2015 - “...4.019 AT1G63800 ubiquitin-conjugating enzyme LOC_Os10g31000 gi|115474297 3.156 AT3G55410 2-oxoglutarate dehydrogenase E1 component LOC_Os07g49520 gi|75261364 4.276 AT2G24200 leucine aminopeptidase LOC_Os02g55140 gi|311893431 6.100 AT4G33150 saccharopine dehydrogenase LOC_Os02g54254 gi|222616995 4.201 AT1G55860 HECT-domain domain containing protein LOC_Os12g24080 gi|37718894 2.670 AT5G15400 U-box domain-containing protein LOC_Os03g31400 gi|125987818 7.325 Cysteine proteinase inhibitor 2...”
- Transcript profile analyses of maize silks reveal effective activation of genes involved in microtubule-based movement, ubiquitin-dependent protein degradation, and transport in the pollination process
Xu, PloS one 2013 - “...germination [81] GRMZM2G131026 AT3G04080 Apyrase 1 (AtAPY1) [ A. thaliana ] Pollen germination [82] GRMZM2G178958 AT2G24200 Leucyl aminopeptidase 1 (LAP1) [ A. thaliana ] Pollen adhesion [83] GRMZM2G075255 AT1G02205 CER1 [ A. thaliana ] Pollen hydration [84] GRMZM2G099097 GRMZM2G083526 AT5G57800 CER3 [ A. thaliana ] Pollen...”
- Plant leucine aminopeptidases moonlight as molecular chaperones to alleviate stress-induced damage
Scranton, The Journal of biological chemistry 2012 - “...Palo Alto, CA) and an oligo(dT) primer. LAP1 (At2g24200) and LAP2 (At4g30920) coding regions were cloned by RT-PCR using gene-specific primers (supplemental...”
- “...Primer name Primer sequenceA At2g24200 AtLAP1 LAP1-F 5'-AGCATATGATGGCTCACACTCYCGGT-3' LAP1-R 5'-ATGCGGCCGCTCACGAAGATGAATTCTTC-3' LAP2-F 5'-GCATATGGC-...”
- H2O2-triggered retrograde signaling from chloroplasts to nucleus plays specific role in response to stress
Maruta, The Journal of biological chemistry 2012 - “...the RNAi-trigger contained 18 bp sequences matching At5g38350, At2g24200, and At1g59610 genes, those genes were not included in the down-regulated genes in the...”
- Evidence for the Existence in Arabidopsis thaliana of the Proteasome Proteolytic Pathway: ACTIVATION IN RESPONSE TO CADMIUM
Polge, The Journal of biological chemistry 2009 - “...At4g20850 At1g67690 At5g10540 At5g65620 At2g24200 At4g30920 At4g30910 At2g14260 At3g18780 At1g49240 AAGAAGGCGTTATCGAGGTG TCTACGGGTTCTTCGACCAG...”
- “...family protein (Thimet, TOP) Leucine aminopeptidase (LAP1) At5g65620 At5g10540 At2g24200 5.90 5.45 5.66 1479 736 175 88701 78994 54475 36.0 21.3 14.4 27 15 4...”
- A proteomics dissection of Arabidopsis thaliana vacuoles isolated from cell culture
Jaquinod, Molecular & cellular proteomics : MCP 2007 - “...(At1g21680) (57); a putative leucine aminopeptidase (At2g24200); a putative pectin methylesterase (At1g11580); the glycosyl hydrolase family 17 (At4g16260),...”
- Fruit ripening-associated leucylaminopeptidase with cysteinylglycine dipeptidase activity from durian suggests its involvement in glutathione recycling
Panpetch, BMC plant biology 2021 - “...tree was constructed with MEGA v. 7 using 1000 bootstrap replicates. Arabidopsis thaliana : AtLAP1 (NP_179997), AtLAP2 (NP_194821), and AtLAP3 (NP_001328632); Solanum lycopersicum : SlLAP1 (NP_001233862.2) and SlLAP2 (NP_001233884.2); Solanum tuberosum : StLAP1 (XP_006350102.1) and StLAP2 (XP_015165363.1); Durio zibethinus: Musang King DzLAP1_MK (NW_019167860.1) and DzLAP2_MK (NW_019168159)...”
- Proteome and Interactome Linked to Metabolism, Genetic Information Processing, and Abiotic Stress in Gametophytes of Two Woodferns
Ojosnegros, International journal of molecular sciences 2023 - “...CLP PROTEASE PROTEOLYTIC SUBUNIT-RELATED PROTEIN 2 32.7 4 1 1 4.49 10 126 Degradation 170504-166_2_ORF2 P30184 LAP1 LEUCINE AMINOPEPTIDASE 1 62.5 4 1 4 0...”
- Function and Regulation of Chloroplast Peroxiredoxin IIE
Dreyer, Antioxidants (Basel, Switzerland) 2021 - “...4 AT4G26530 O65581 Fructose-bisphosphate aldolase 5 AT5G49910 Q9LTX9 Heat shock 70 kDa protein 7 AT2G24200 P30184 Leucine aminopeptidase 1 AT5G45930 Q5XF33 Magnesium-chelatase subunit ChII-2 AT1G70890 Q9SSK5 MLP-like protein 43 AT5G26000 P37702 Myrosinase 1 AT3G62030 P34791 Peptidyl-prolyl cis-trans isomerase CYP20-3 AT2G29630 O82392 Phosphomethylpyrimidine synthase AT5G52920 Q9FLW9 Plastidial...”
- Proteomic analysis on roots of Oenothera glazioviana under copper-stress conditions
Wang, Scientific reports 2017 - “...0.61 4.6E-06 38 B6T451 Importin subunit alpha Zea mays Q96321 N/A 3 0.57 0.00233 39 P30184 Leucine aminopeptidase 1 Arabidopsis thaliana P30184 LAP1 2 2.38 1.2E-08 40 Q9LXC0 GDP dissociation inhibitor Arabidopsis thaliana Q9LXC0 At5g09550 3 2.40 0.00036 41 A7PZL3 Probable polygalacturonase Vitis vinifera Q9SMT3 GSVIVT00026920001...”
- Salt-induced subcellular kinase relocation and seedling susceptibility caused by overexpression of Medicago SIMKK in Arabidopsis
Ovečka, Journal of experimental botany 2014 - “...thiolase 2 peroxisomal 48548 8.35 4 16.88 303.25 476.75 1.57 0.005 Proteolysis and protein processing P30184 Leucine aminopeptidase 1 54475 5.55 6 22.69 296.75 489.75 1.65 0.002 Q9LMU2 Kunitz type trypsin and protease inhibitor domain-containing protein 22067 8.97 4 18.36 1004.66 541.33 0.54 0.001 Lipid binding...”
- Salmonella enterica serovar typhimurium peptidase B is a leucyl aminopeptidase with specificity for acidic amino acids
Mathew, Journal of bacteriology 2000 - “...A. pernix LAP (NUMG); A.thal, Arabidopsis thaliana (P30184); B.sLAP, Bacillus subtilis LAP; B.pert, Bordetella pertussis (NUMG); B.tLAP, Bos taurus kidney...”
For advice on how to use these tools together, see
Interactive tools for functional annotation of bacterial genomes.
The PaperBLAST database links 793,807 different protein sequences to 1,259,118 scientific articles. Searches against EuropePMC were last performed on March 13 2025.
PaperBLAST builds a database of protein sequences that are linked
to scientific articles. These links come from automated text searches
against the articles in EuropePMC
and from manually-curated information from GeneRIF, UniProtKB/Swiss-Prot,
BRENDA,
CAZy (as made available by dbCAN),
BioLiP,
CharProtDB,
MetaCyc,
EcoCyc,
TCDB,
REBASE,
the Fitness Browser,
and a subset of the European Nucleotide Archive with the /experiment tag.
Given this database and a protein sequence query,
PaperBLAST uses protein-protein BLAST
to find similar sequences with E < 0.001.
To build the database, we query EuropePMC with locus tags, with RefSeq protein
identifiers, and with UniProt
accessions. We obtain the locus tags from RefSeq or from MicrobesOnline. We use
queries of the form "locus_tag AND genus_name" to try to ensure that
the paper is actually discussing that gene. Because EuropePMC indexes
most recent biomedical papers, even if they are not open access, some
of the links may be to papers that you cannot read or that our
computers cannot read. We query each of these identifiers that
appears in the open access part of EuropePMC, as well as every locus
tag that appears in the 500 most-referenced genomes, so that a gene
may appear in the PaperBLAST results even though none of the papers
that mention it are open access. We also incorporate text-mined links
from EuropePMC that link open access articles to UniProt or RefSeq
identifiers. (This yields some additional links because EuropePMC
uses different heuristics for their text mining than we do.)
For every article that mentions a locus tag, a RefSeq protein
identifier, or a UniProt accession, we try to select one or two
snippets of text that refer to the protein. If we cannot get access to
the full text, we try to select a snippet from the abstract, but
unfortunately, unique identifiers such as locus tags are rarely
provided in abstracts.
PaperBLAST also incorporates manually-curated protein functions:
- Proteins from NCBI's RefSeq are included if a
GeneRIF
entry links the gene to an article in
PubMed®.
GeneRIF also provides a short summary of the article's claim about the
protein, which is shown instead of a snippet.
- Proteins from Swiss-Prot (the curated part of UniProt)
are included if the curators
identified experimental evidence for the protein's function (evidence
code ECO:0000269). For these proteins, the fields of the Swiss-Prot entry that
describe the protein's function are shown (with bold headings).
- Proteins from BRENDA,
a curated database of enzymes, are included if they are linked to a paper in PubMed
and their full sequence is known.
- Every protein from the non-redundant subset of
BioLiP,
a database
of ligand-binding sites and catalytic residues in protein structures, is included. Since BioLiP itself
does not include descriptions of the proteins, those are taken from the
Protein Data Bank.
Descriptions from PDB rely on the original submitter of the
structure and cannot be updated by others, so they may be less reliable.
(For SitesBLAST and Sites on a Tree, we use a larger subset of BioLiP so that every
ligand is represented among a group of structures with similar sequences, but for
PaperBLAST, we use the non-redundant set provided by BioLiP.)
- Every protein from EcoCyc, a curated
database of the proteins in Escherichia coli K-12, is included, regardless
of whether they are characterized or not.
- Proteins from the MetaCyc metabolic pathway database
are included if they are linked to a paper in PubMed and their full sequence is known.
- Proteins from the Transport Classification Database (TCDB)
are included if they have known substrate(s), have reference(s),
and are not described as uncharacterized or putative.
(Some of the references are not visible on the PaperBLAST web site.)
- Every protein from CharProtDB,
a database of experimentally characterized protein annotations, is included.
- Proteins from the CAZy database of carbohydrate-active enzymes
are included if they are associated with an Enzyme Classification number.
Even though CAZy does not provide links from individual protein sequences to papers,
these should all be experimentally-characterized proteins.
- Proteins from the REBASE database
of restriction enzymes are included if they have known specificity.
- Every protein with an evidence-based reannotation (based on mutant phenotypes)
in the Fitness Browser is included.
- Sequence-specific transcription factors (including sigma factors and DNA-binding response regulators)
with experimentally-determined DNA binding sites from the
PRODORIC database of gene regulation in prokaryotes.
- Putative transcription factors from RegPrecise
that have manually-curated predictions for their binding sites. These predictions are based on
conserved putative regulatory sites across genomes that contain similar transcription factors,
so PaperBLAST clusters the TFs at 70% identity and retains just one member of each cluster.
- Coding sequence (CDS) features from the
European Nucleotide Archive (ENA)
are included if the /experiment tag is set (implying that there is experimental evidence for the annotation),
the nucleotide entry links to paper(s) in PubMed,
and the nucleotide entry is from the STD data class
(implying that these are targeted annotated sequences, not from shotgun sequencing).
Also, to filter out genes whose transcription or translation was detected, but whose function
was not studied, nucleotide entries or papers with more than 25 such proteins are excluded.
Descriptions from ENA rely on the original submitter of the
sequence and cannot be updated by others, so they may be less reliable.
Except for GeneRIF and ENA,
the curated entries include a short curated
description of the protein's function.
For entries from BioLiP, the protein's function may not be known beyond binding to the ligand.
Many of these entries also link to articles in PubMed.
For more information see the
PaperBLAST paper (mSystems 2017)
or the code.
You can download PaperBLAST's database here.
Changes to PaperBLAST since the paper was written:
- November 2023: incorporated PRODORIC and RegPrecise. Many PRODORIC entries were not linked to a protein sequence (no UniProt identifier), so we added this information.
- February 2023: BioLiP changed their download format. PaperBLAST now includes their non-redundant subset. SitesBLAST and Sites on a Tree use a larger non-redundant subset that ensures that every ligand is represented within each cluster. This should ensure that every binding site is represented.
- June 2022: incorporated some coding sequences from ENA with the /experiment tag.
- March 2022: incorporated BioLiP.
- April 2020: incorporated TCDB.
- April 2019: EuropePMC now returns table entries in their search results. This has expanded PaperBLAST's database, but most of the new entries are of low relevance, and the resulting snippets are often just lists of locus tags with annotations.
- February 2018: the alignment page reports the conservation of the hit's functional sites (if available from from Swiss-Prot or UniProt)
- January 2018: incorporated BRENDA.
- December 2017: incorporated MetaCyc, CharProtDB, CAZy, REBASE, and the reannotations from the Fitness Browser.
- September 2017: EuropePMC no longer returns some table entries in their search results. This has shrunk PaperBLAST's database, but has also reduced the number of low-relevance hits.
Many of these changes are described in Interactive tools for functional annotation of bacterial genomes.
PaperBLAST cannot provide snippets for many of the papers that are
published in non-open-access journals. This limitation applies even if
the paper is marked as "free" on the publisher's web site and is
available in PubmedCentral or EuropePMC. If a journal that you publish
in is marked as "secret," please consider publishing elsewhere.
Many important articles are missing from PaperBLAST, either because
the article's full text is not in EuropePMC (as for many older
articles), or because the paper does not mention a protein identifier such as a locus tag, or because of PaperBLAST's heuristics. If you notice an
article that characterizes a protein's function but is missing from
PaperBLAST, please notify the curators at UniProt
or add an entry to GeneRIF.
Entries in either of these databases will eventually be incorporated
into PaperBLAST. Note that to add an entry to UniProt, you will need
to find the UniProt identifier for the protein. If the protein is not
already in UniProt, you can ask them to create an entry. To add an
entry to GeneRIF, you will need an NCBI Gene identifier, but
unfortunately many prokaryotic proteins in RefSeq do not have
corresponding Gene identifers.
References
PaperBLAST: Text-mining papers for information about homologs.
M. N. Price and A. P. Arkin (2017). mSystems, 10.1128/mSystems.00039-17.
Europe PMC in 2017.
M. Levchenko et al (2017). Nucleic Acids Research, 10.1093/nar/gkx1005.
Gene indexing: characterization and analysis of NLM's GeneRIFs.
J. A. Mitchell et al (2003). AMIA Annu Symp Proc 2003:460-464.
UniProt: the universal protein knowledgebase.
The UniProt Consortium (2016). Nucleic Acids Research, 10.1093/nar/gkw1099.
BRENDA in 2017: new perspectives and new tools in BRENDA.
S. Placzek et al (2017). Nucleic Acids Research, 10.1093/nar/gkw952.
The EcoCyc database: reflecting new knowledge about Escherichia coli K-12.
I. M. Keeseler et al (2016). Nucleic Acids Research, 10.1093/nar/gkw1003.
The MetaCyc database of metabolic pathways and enzymes.
R. Caspi et al (2018). Nucleic Acids Research, 10.1093/nar/gkx935.
CharProtDB: a database of experimentally characterized protein annotations.
R. Madupu et al (2012). Nucleic Acids Research, 10.1093/nar/gkr1133.
The carbohydrate-active enzymes database (CAZy) in 2013.
V. Lombard et al (2014). Nucleic Acids Research, 10.1093/nar/gkt1178.
The Transporter Classification Database (TCDB): recent advances
M. H. Saier, Jr. et al (2016). Nucleic Acids Research, 10.1093/nar/gkv1103.
REBASE - a database for DNA restriction and modification: enzymes, genes and genomes.
R. J. Roberts et al (2015). Nucleic Acids Research, 10.1093/nar/gku1046.
Deep annotation of protein function across diverse bacteria from mutant phenotypes.
M. N. Price et al (2016). bioRxiv, 10.1101/072470.
by Morgan Price,
Arkin group
Lawrence Berkeley National Laboratory