PaperBLAST
Full List of Papers Linked to NP_194980.1
YUC1_ARATH / Q9SZY8 Probable indole-3-pyruvate monooxygenase YUCCA1; Flavin-containing monooxygenase YUCCA1; EC 1.14.13.168 from Arabidopsis thaliana (Mouse-ear cress) (see 9 papers)
Q9SZY8 indole-3-pyruvate monooxygenase (EC 1.14.13.168) from Arabidopsis thaliana (see paper)
NP_194980 Flavin-binding monooxygenase family protein from Arabidopsis thaliana
AT4G32540 YUC1 (YUCCA 1); FAD binding / NADP or NADPH binding / flavin-containing monooxygenase/ oxidoreductase from Arabidopsis thaliana
- function: Involved in auxin biosynthesis, but not in the tryptamine or the CYP79B2/B3 branches. Catalyzes in vitro the N-oxidation of tryptamine to form N-hydroxyl tryptamine. Involved during embryogenesis and seedling development. Required for the formation of floral organs and vascular tissues. Belongs to the set of redundant YUCCA genes probably responsible for auxin biosynthesis in shoots.
catalytic activity: indole-3-pyruvate + NADPH + O2 + H(+) = (indol-3-yl)acetate + CO2 + NADP(+) + H2O (RHEA:34331)
cofactor: FAD
disruption phenotype: No visible phenotype, due to the redundancy with the other members of the YUCCA family. - The viral suppressor HCPro decreases DNA methylation and activates auxin biosynthesis genes.
Yang, Virology 2020 (PubMed)- GeneRIF: The viral suppressor HCPro decreases DNA methylation and activates auxin biosynthesis genes.
- Integrated transcriptome and miRNA analysis uncovers molecular regulators of aerial stem-to-rhizome transition in the medical herb Gynostemma pentaphyllum
Yang, BMC genomics 2019 - “..., and red arrows indicate putative proteins in G. pentaphyllum . Accession numbers: AtYUCCA1, number: NP_194980; AtYUCCA2, number: NP_193062; AtYUCCA3, number: NP_171955; AtYUCCA4, number: NP_196693; AtYUCCA5, number: NP_199202; AtYUCCA6, number: NP_001190399; AtYUCCA7, number: NP_180881; AtYUCCA8, number: NP_194601; AtYUCCA9, number: NP_171914; AtYUCCA10, number: NP_175321; AtYUCCA11, number: NP_173564;...”
- Bioinformatics Analysis of Phylogeny and Transcription of TAA/YUC Auxin Biosynthetic Genes.
Poulet, International journal of molecular sciences 2017 - GeneRIF: Data show that endoplasmic reticulum (ER) membrane anchored YUC proteins can mainly be found in roots, while cytosolic proteins are more abundant in the shoot.
- Distinct Characteristics of Indole-3-Acetic Acid and Phenylacetic Acid, Two Common Auxins in Plants.
Sugawara, Plant & cell physiology 2015 - GeneRIF: The induction of the YUCCA (YUC) genes increases Phenylacetic acid (PAA) metabolite levels in Arabidopsis, indicating that YUC flavin-containing monooxygenases may play a role in PAA biosynthesis.
- Small-molecule auxin inhibitors that target YUCCA are powerful tools for studying auxin function.
Kakei, The Plant journal : for cell and molecular biology 2015 (PubMed)- GeneRIF: Studies indicate that YUCCA, a flavin-containing monooxygenase (YUC), catalyzes the last step of conversion from indole-3-pyruvate (IPyA) to indole-3-acetic acid (IAA).
- Yucasin is a potent inhibitor of YUCCA, a key enzyme in auxin biosynthesis.
Nishimura, The Plant journal : for cell and molecular biology 2014 (PubMed)- GeneRIF: Yucasin is a potent inhibitor of YUC enzymes that offers an effective tool for analyzing the contribution of Indole-3-acetic acid (IAA) biosynthesis via the indole-3-pyruvic acid (IPyA) pathway to plant development and physiological processes.
- Induction of somatic embryos in Arabidopsis requires local YUCCA expression mediated by the down-regulation of ethylene biosynthesis.
Bai, Molecular plant 2013 (PubMed)- GeneRIF: YUCCAs (YUCs) encoding key enzymes in auxin biosynthesis are required for somatic embryo induction
- Auxin promotes susceptibility to Pseudomonas syringae via a mechanism independent of suppression of salicylic acid-mediated defenses.
Mutka, The Plant journal : for cell and molecular biology 2013 (PubMed)- GeneRIF: Data indicate that mature YUCCA 1 (YUC1) auxin biosynthesis gene overexpressing plants have elevated auxin levels and enhanced susceptibility to Pseudomonas syringae strain DC3000.
- LEAFY COTYLEDON2 (LEC2) promotes embryogenic induction in somatic tissues of Arabidopsis, via YUCCA-mediated auxin biosynthesis.
Wójcikowska, Planta 2013 - GeneRIF: The analysis indicated that YUCCAs and TAA1, working in the IPA-YUC auxin biosynthesis pathway, are associated with SE induction, and that the expression of three YUCCA genes (YUC1, YUC4 and YUC10) is associated with LEC2 activity.
- Allelic analyses of the Arabidopsis YUC1 locus reveal residues and domains essential for the functions of YUC family of flavin monooxygenases.
Hou, Journal of integrative plant biology 2011 - GeneRIF: Residues near the putative GXGXXG FAD binding site and the putative GXGXXG NADPH binding site are highly conserved in YUC flavin monooxygenases.
- YUCCA genes are expressed in response to leaf adaxial-abaxial juxtaposition and are required for leaf margin development.
Wang, Plant physiology 2011 - GeneRIF: Expressions of YUCs in the leaf respond to the adaxial-abaxial juxtaposition, and that the activities of auxin mediate leaf margin development, which subsequently promotes blade outgrowth.
- The Arabidopsis YUCCA1 flavin monooxygenase functions in the indole-3-pyruvic acid branch of auxin biosynthesis.
Stepanova, The Plant cell 2011 - GeneRIF: YUCCA1 flavin monooxygenase functions in the indole-3-pyruvic acid branch of auxin biosynthesis
- Activation-tagged suppressors of a weak brassinosteroid receptor mutant
Kang, Molecular plant 2010 - “...libraries under the following accession numbers: NP_194980 (YUCCA1, At4g32540), NP_193062 (YUCCA2, At4g13260), NP_171955 (YUCCA3, At1g04610), NP_850808 (YUCCA4,...”
- Auxin synthesized by the YUCCA flavin monooxygenases is essential for embryogenesis and leaf formation in Arabidopsis.
Cheng, The Plant cell 2007 - GeneRIF: These data demonstrate that auxin synthesized by the YUC flavin monooxygenases is an essential auxin source for Arabidopsis thaliana embryogenesis and postembryonic organ formation.
- RETINOBLASTOMA-RELATED Has Both Canonical and Noncanonical Regulatory Functions During Thermo-Morphogenic Responses in Arabidopsis Seedlings
Hamid, Plant, cell & environment 2025 - “...2 (ORC2): At2g37560; CDKB1;1: At3g54180; CYCD3;1: At4g34160; CYCA3;1: At5g43080; PIF4: At2g43010; PIF7: At5g61270; YUCCA1 (YUC1): At4g32540; YUCCA2 (YUC2): At4g13260; YUCCA8 (YUC8): At4g28720; YUCCA9 (YUC9): At1g04180; TIR1: At3g62980; HY5: At5g11260. Conflicts of Interest The authors declare no conflicts of interest. Supporting information Supplemental Figure S1. The RBR...”
- On the cutting edge of development: laser-assisted microdissection of the Arabidopsis gynoecium reveals tissue-specific gene expression patterns
Lanctot, Plant physiology 2024 - “...AT1G70940 TRN2 Gramene: AT5G46700 TRN2 Araport: AT5G46700 REM1 Gramene: AT4G31610 REM1 Araport: AT4G31610 YUC1 Gramene: AT4G32540 YUC1 Araport: AT4G32540 WOX1 Gramene: AT3G18010 WOX1 Araport: AT3G18010 WOX12 Gramene: AT5G17810 WOX12 Araport: AT5G17810 fruit AmiGo: PO:0009001 SHP2 Gramene: AT2G42830 SHP2 Araport: AT2G42830 References Guillotin B , Birnbaum KD...”
- Exogenous application of the apocarotenoid retinaldehyde negatively regulates auxin-mediated root growth
Xu, Plant physiology 2024 - “...AT4G17870 ; SLR, AT4G14550 ; SUR2, AT4G31500 ; WEI2, AT5G05730 ; WEI7, AT1G25220 ; YUC1, AT4G32540 ; YUC2, AT4G13260 ; YUC3, AT1G04610 ; YUC4, AT5G11320 ; YUC5, AT5G43890 ; YUC6, AT5G25620 ; YUC7, AT2G33230 ; YUC8, AT4G28720 ; YUC9, AT1G04180 ; YUC10, AT1G48910 ; YUC11, AT1G21430...”
- Current Advances in the Functional Diversity and Mechanisms Underlying Endophyte-Plant Interactions
Zhao, Microorganisms 2024 - “...159 ]. The single-nucleotide polymorphisms (SNPs) at these two significant loci are located between YUC-1 (AT4G32540), which is involved in auxin biosynthesis, and LEUNIG (AT4G32551), which is associated with leaf and flower organ development. Notably, there is a significant overlap between the root-colonized microbial community regulated...”
- Unlocking the Multifaceted Mechanisms of Bud Outgrowth: Advances in Understanding Shoot Branching
Yuan, Plants (Basel, Switzerland) 2023 - “...] AXR1 AT1G05180 Arabidopsis a subunit of the RUB1 activating enzyme [ 34 ] YUCCA AT4G32540 Arabidopsis A flavin monooxygenase-like enzyme, auxin biosynthesis [ 35 ] PIN1 Os02g0743400 Rice a n auxin transporter [ 36 ] OsPIN5b Os09g0505400 Rice a n auxin transporter [ 37 ]...”
- Molecular Mechanisms of Plant Regeneration from Differentiated Cells: Approaches from Historical Tissue Culture Systems
Morinaka, Plant & cell physiology 2023 - “...MYB3R1 AT4G32730 Protoplast culture Sakamoto etal. (2022) MYB3R4 AT5G11510 Protoplast culture Sakamoto etal. (2022) YUC1 AT4G32540 Protoplast culture Sakamoto etal. (2022) Table 2 Reprogramming from differentiated cells in various plant species highlighted in this review. Species Original tissues Regenerated organs References N. tabacum Epidermis A whole...”
- Annotation of the Turnera subulata (Passifloraceae) Draft Genome Reveals the S-Locus Evolved after the Divergence of Turneroideae from Passifloroideae in a Stepwise Manner
Henning, Plants (Basel, Switzerland) 2023 - “...AtYUC6 (AT5G25620) showed a tendency towards higher expression in the stamen, while homologs of AtYUC1 (AT4G32540) and AtYUC4 (AT5G11320) showed a tendency towards higher expression in the pistil. We used RT-qPCR to test hypotheses generated from previous analyses ( Figure 8 ). For this analysis, we...”
- Significance of NatB-mediated N-terminal acetylation of auxin biosynthetic enzymes in maintaining auxin homeostasis in Arabidopsis thaliana
Liu, Communications biology 2022 - “...genes mentioned in this study are CKRC3/TCU2 (AT5G58450), NBC (AT1G03150), CKRC1/TAA1 (AT1G70560), SUR2 (AT4G31500), YUC1 (AT4G32540), YUC2 (AT4G13260), YUC5 (AT5G43890), YUC6 (AT5G25620), CKRC2/YUC8 (AT4G28720), YUC9 (AT1G04180), YUC10 (AT1G48910), YUC11 (AT1G21430), CYP79B2 (AT4G39950), and CYP79B3 (AT2G22330). Reporting summary Further information on research design is available in the...”
- BIG Modulates Stem Cell Niche and Meristem Development via SCR/SHR Pathway in Arabidopsis Roots
Liu, International journal of molecular sciences 2022 - “...4.4. Accession Number AT3G02260 ( BIG ), At3g11260 ( WOX5 ), AT4G32810 ( MAX4 ), AT4G32540 ( YUCCA1 ), AT1G73590 ( PIN1 ), AT5G57090 ( PIN2 ), AT1G70940 ( PIN3 ), AT1G23080 ( PIN7 ), At3g54220 ( SCR ), AT4G37650 ( SHR ), At3g20840 ( PLT1...”
- BnERF114.A1, a Rapeseed Gene Encoding APETALA2/ETHYLENE RESPONSE FACTOR, Regulates Plant Architecture through Auxin Accumulation in the Apex in Arabidopsis
Lyu, International journal of molecular sciences 2022 - “...transgenic plants. ( A ) relative expression levels of four YUCCA genes ( AtYUCCA1 [ At4g32540 ], AtYUCCA2 [ At4g13260 ], AtYUCCA4 [ At5g11320 ], and AtYUCCA6 [ At5g25620 ]); the expression level AtYUCCA1 in the wild-type (WT) being set as a unit. ( B )...”
- A BTB-TAZ protein is required for gene activation by Cauliflower mosaic virus 35S multimerized enhancers
Irigoyen, Plant physiology 2022 - “...At3g48360; GTE9 , At5g14270; GTE11 , At3g01770; CULLIN3A , At1g26830; CULLIN3B , At1g69670; YUCCA1 , At4g32540; PAP1 , At1g56650; JAW , At4g23713 ; PHT4;2 , At2g38060; TPT , At5g46110; EIF-4A2 , At1g54270; ACTIN7 , At5g09810; 18S , At2g01010; CAB2 , At1g29920. Supplemental data The following materials...”
- Regulation of Phytohormones on the Growth and Development of Plant Root Hair
Li, Frontiers in plant science 2022 - “...ARF7 At5g20730 + Schoenaers et al., 2018 ARF19 At1g19220 + Schoenaers et al., 2018 YUCCA At4g32540 + + Zhao et al., 2001 AUX1 At2g38120 + Yu et al., 2015 PIN2 At5g57090 + + Cho et al., 2007 EIN3 At3g20770 + + Feng et al., 2017 EIL1...”
- Plant genetic effects on microbial hubs impact host fitness in repeated field trials
Brachi, Proceedings of the National Academy of Sciences of the United States of America 2022 - “...on positions 15704377, 15704472, and 15704478. These consecutive single-nucleotide polymorphisms (SNPs) are located between YUC-1 (AT4G32540), involved in auxin biosynthesis, and LEUNIG (AT4G32551), involved in the development of the leaf blade and floral organs. A potentially more powerful strategy to detect minor quantitative trait loci (QTL)...”
- Jasmonic Acid-Dependent MYC Transcription Factors Bind to a Tandem G-Box Motif in the YUCCA8 and YUCCA9 Promoters to Regulate Biotic Stress Responses
Pérez-Alonso, International journal of molecular sciences 2021 - “...eleven YUCCA genes. The figure shows the genomic regions around ( A ) YUC1 , At4g32540; ( B ) YUC2 , At4g13260; ( C ) YUC3 , At1g04610; ( D ) YUC4 , At5g11320; ( E ) YUC5 , At5g43890; ( F ) YUC6 , At5g25620;...”
- The Diverse Salt-Stress Response of Arabidopsis ctr1-1 and ein2-1Ethylene Signaling Mutants Is Linked to Altered Root Auxin Homeostasis
Vaseva, Plants (Basel, Switzerland) 2021 - “...auxin biosynthesis ( TAA1 At1g70560 ; TAR1 At1g23320 ; TAR2 At4g24670 , and YUC1/2/3/4/5/6/7/8/9/10/11 : At4g32540, At4g13260, At1g04610, At5g11320, At5g43890, At5g25620, At2g33230, At4g28720, At1g04180, At1g48910, At1g21430 ) and transporter coding genes ( PIN1/2/3/4/5/6/7/8: At1g73590, At5g57090, At1g70940, At2g01420, At5g16530, At1g77110, At1g23080, At5g15100; AUX1/LAX1/2/3: At2g38120, At5g01240, At2g21050, At1g77690;...”
- “...genes from the auxin Trp-dependent biosynthesis: TAA1 (At1g70560), TAR1 (At1g23320), TAR2 (At4g24670) and YUC1-11 (resp. At4g32540 , At4g13260 , At1g04610 , At5g11320 , At5g43890 , At5g25620 , At2g33230 , At4g28720 , At1g04180 , At1g48910 , At1g21430 ), and Table S2: TF DeCON in silico screen of...”
- Molecular Network for Regulation of Ovule Number in Plants
Qadir, International journal of molecular sciences 2021 - “...Arabidopsis ARF5 AT1G19850 Act as a transcriptional activator [ 3 , 39 ] Arabidopsis YUC1 AT4G32540 Auxin biosynthesis [ 31 , 40 ] Arabidopsis YUC4 AT5G11320 Auxin biosynthesis [ 31 , 40 ] Arabidopsis REV AT5G60690 homeodomain-leucine zipper family [ 31 , 41 ] Cytokinin (CTK)...”
- Auxin Metabolism in Plants
Casanova-Sáez, Cold Spring Harbor perspectives in biology 2021 (secret) - The Arabidopsis MATERNAL EFFECT EMBRYO ARREST45 protein modulates maternal auxin biosynthesis and controls seed size by inducing AINTEGUMENTA
Li, The Plant cell 2021 - “...accession number: MEE45 (At4g00260), ANT (At4g37750), MINI3 (At1g55600), IKU1 (At2g35230), IKU2 (At3g19700), SHB1 (At4g25350), YUC1 (At4g32540), YUC2 (At4g13260), YUC3 (At1g04610), YUC4 (At5g11320), YUC5 (At5g43890), YUC6 (At5g25620), YUC7 (At2g33230), YUC8 (At4g28720), YUC9 (At1g04180), YUC10 (At1g48910), and YUC11 (At1g21430). RNA-seq data discussed in this study have been deposited...”
- Cytokinin Signaling and De Novo Shoot Organogenesis
Hnatuszko-Konka, Genes 2021 - “...WUS defines the organizing center in SAM [ 45 , 53 , 66 ] YUC1 AT4G32540 Auxin synthesis OTHERS; YUC-mediated auxin biosynthesis is required for efficient shoot regeneration (callus) [ 50 , 62 ] YUC4 AT5G11320 Auxin synthesis OTHERS; YUC-mediated auxin biosynthesis is required for efficient...”
- Quantitative Trait Loci (QTLs) Associated with Microspore Culture in Raphanus sativus L. (Radish)
Kim, Genes 2020 - “...were identified as candidate genes. The genes Rs426380 and Rs426400 have the same function as AT4G32540, which is known to function as the flavin-binding monooxygenase family protein and is known to act as an enzyme in auxin biosynthesis [ 62 ]. It also plays a key...”
- “...regeneration near the QTL region. QTL Gene ID A.T ortholog Gene Description P1_Chr8_1 1 Rs426380 AT4G32540 Flavin-binding monooxygenase family protein P1_Chr8_1 Rs426400 AT4G32540 Flavin-binding monooxygenase family protein P1_Chr9_1/P1_Chr9_2 Rs465100 AT5G51230 VEFS-Box of polycomb protein P2_Chr9_1 2 Rs479580 AT4G02020 SET domain-containing protein P2_Chr9_1 Rs479680 AT4G02020 SET domain-containing...”
- Drought-Induced Regulatory Cascades and Their Effects on the Nutritional Quality of Developing Potato Tubers
Da, Genes 2020 - “...monooxygenase 3.42 AT4G28720 YUC8 68.3 AT5G43890 YUC5 67.2 PGSC0003DMG400026087 Flavin monooxygenase 3.09 AT5G11320 YUC4 57.4 AT4G32540 YUC 54.3 PGSC0003DMG400003773 SAUR family protein 8.34 AT1G75580 SAUR51 72.2 AT1G19830 SAUR54 61.5 PGSC0003DMG400001667 SAUR family protein 7.40 AT4G38860 SAUR16 64.8 AT4G34760 SAUR50 64.5 AT2G21220 SAUR12 63.5 AT2G16580 SAUR8 63.0...”
- Into the Seed: Auxin Controls Seed Development and Grain Yield
Cao, International journal of molecular sciences 2020 - “...Silique length, Seed size Influences auxin metabolism or auxin biosynthesis Shi et al., 2019 Arabidopsis At4G32540, At5G11320, At1G48910, At1G21430 YUC1, YUC4, YUC10, YUC11 Flavin monooxygenases Embryogenesis and post-embryonic organ formation Involved in auxin biosynthesis Cheng et al., 2007 Arabidopsis At1G28300 LEC2 AP2/B3-like transcriptional factor family protein...”
- The YUCCA-Auxin-WOX11 Module Controls Crown Root Development in Rice
Zhang, Frontiers in plant science 2018 - “...can be found in the GenBank/EMBL data libraries using the following accession numbers: YUCCA1 , At4g32540; YUCCA2 , At4g13260; YUCCA3 , At1g04610; YUCCA4 , At5g11320; YUCCA5 , At5g43890; YUCCA6 , At5g25620; YUCCA7 , At2g33230; YUCCA8 , At4g04610; YUCCA9 , At1g04180; YUCCA10 , At1g48910; YUCCA11 , At1g21430....”
- Maternal auxin supply contributes to early embryo patterning in Arabidopsis
Robert, Nature plants 2018 - “...BDL/IAA12 (At1g04550), LAX1 (At5g01240), PIN3 (At1g70940), R2D2 (NASC ID N2105637), TAA1 (At1g70560), TAR1 (At4g24670), YUC1 (At4g32540), YUC4 (At5g11320), YUC8 (At4g28720), YUC9 (At1g04180), WOX2 (At5g59340), p35S:DII-VENUS (NASC ID 799173), p35S:mDII-VENUS (NASC ID 799174). Author contributions H.S.R., C.P. and C.L.G. contributed equally to this work, performed experiments. H.S.R.,...”
For advice on how to use these tools together, see
Interactive tools for functional annotation of bacterial genomes.
The PaperBLAST database links 793,807 different protein sequences to 1,259,118 scientific articles. Searches against EuropePMC were last performed on March 13 2025.
PaperBLAST builds a database of protein sequences that are linked
to scientific articles. These links come from automated text searches
against the articles in EuropePMC
and from manually-curated information from GeneRIF, UniProtKB/Swiss-Prot,
BRENDA,
CAZy (as made available by dbCAN),
BioLiP,
CharProtDB,
MetaCyc,
EcoCyc,
TCDB,
REBASE,
the Fitness Browser,
and a subset of the European Nucleotide Archive with the /experiment tag.
Given this database and a protein sequence query,
PaperBLAST uses protein-protein BLAST
to find similar sequences with E < 0.001.
To build the database, we query EuropePMC with locus tags, with RefSeq protein
identifiers, and with UniProt
accessions. We obtain the locus tags from RefSeq or from MicrobesOnline. We use
queries of the form "locus_tag AND genus_name" to try to ensure that
the paper is actually discussing that gene. Because EuropePMC indexes
most recent biomedical papers, even if they are not open access, some
of the links may be to papers that you cannot read or that our
computers cannot read. We query each of these identifiers that
appears in the open access part of EuropePMC, as well as every locus
tag that appears in the 500 most-referenced genomes, so that a gene
may appear in the PaperBLAST results even though none of the papers
that mention it are open access. We also incorporate text-mined links
from EuropePMC that link open access articles to UniProt or RefSeq
identifiers. (This yields some additional links because EuropePMC
uses different heuristics for their text mining than we do.)
For every article that mentions a locus tag, a RefSeq protein
identifier, or a UniProt accession, we try to select one or two
snippets of text that refer to the protein. If we cannot get access to
the full text, we try to select a snippet from the abstract, but
unfortunately, unique identifiers such as locus tags are rarely
provided in abstracts.
PaperBLAST also incorporates manually-curated protein functions:
- Proteins from NCBI's RefSeq are included if a
GeneRIF
entry links the gene to an article in
PubMed®.
GeneRIF also provides a short summary of the article's claim about the
protein, which is shown instead of a snippet.
- Proteins from Swiss-Prot (the curated part of UniProt)
are included if the curators
identified experimental evidence for the protein's function (evidence
code ECO:0000269). For these proteins, the fields of the Swiss-Prot entry that
describe the protein's function are shown (with bold headings).
- Proteins from BRENDA,
a curated database of enzymes, are included if they are linked to a paper in PubMed
and their full sequence is known.
- Every protein from the non-redundant subset of
BioLiP,
a database
of ligand-binding sites and catalytic residues in protein structures, is included. Since BioLiP itself
does not include descriptions of the proteins, those are taken from the
Protein Data Bank.
Descriptions from PDB rely on the original submitter of the
structure and cannot be updated by others, so they may be less reliable.
(For SitesBLAST and Sites on a Tree, we use a larger subset of BioLiP so that every
ligand is represented among a group of structures with similar sequences, but for
PaperBLAST, we use the non-redundant set provided by BioLiP.)
- Every protein from EcoCyc, a curated
database of the proteins in Escherichia coli K-12, is included, regardless
of whether they are characterized or not.
- Proteins from the MetaCyc metabolic pathway database
are included if they are linked to a paper in PubMed and their full sequence is known.
- Proteins from the Transport Classification Database (TCDB)
are included if they have known substrate(s), have reference(s),
and are not described as uncharacterized or putative.
(Some of the references are not visible on the PaperBLAST web site.)
- Every protein from CharProtDB,
a database of experimentally characterized protein annotations, is included.
- Proteins from the CAZy database of carbohydrate-active enzymes
are included if they are associated with an Enzyme Classification number.
Even though CAZy does not provide links from individual protein sequences to papers,
these should all be experimentally-characterized proteins.
- Proteins from the REBASE database
of restriction enzymes are included if they have known specificity.
- Every protein with an evidence-based reannotation (based on mutant phenotypes)
in the Fitness Browser is included.
- Sequence-specific transcription factors (including sigma factors and DNA-binding response regulators)
with experimentally-determined DNA binding sites from the
PRODORIC database of gene regulation in prokaryotes.
- Putative transcription factors from RegPrecise
that have manually-curated predictions for their binding sites. These predictions are based on
conserved putative regulatory sites across genomes that contain similar transcription factors,
so PaperBLAST clusters the TFs at 70% identity and retains just one member of each cluster.
- Coding sequence (CDS) features from the
European Nucleotide Archive (ENA)
are included if the /experiment tag is set (implying that there is experimental evidence for the annotation),
the nucleotide entry links to paper(s) in PubMed,
and the nucleotide entry is from the STD data class
(implying that these are targeted annotated sequences, not from shotgun sequencing).
Also, to filter out genes whose transcription or translation was detected, but whose function
was not studied, nucleotide entries or papers with more than 25 such proteins are excluded.
Descriptions from ENA rely on the original submitter of the
sequence and cannot be updated by others, so they may be less reliable.
Except for GeneRIF and ENA,
the curated entries include a short curated
description of the protein's function.
For entries from BioLiP, the protein's function may not be known beyond binding to the ligand.
Many of these entries also link to articles in PubMed.
For more information see the
PaperBLAST paper (mSystems 2017)
or the code.
You can download PaperBLAST's database here.
Changes to PaperBLAST since the paper was written:
- November 2023: incorporated PRODORIC and RegPrecise. Many PRODORIC entries were not linked to a protein sequence (no UniProt identifier), so we added this information.
- February 2023: BioLiP changed their download format. PaperBLAST now includes their non-redundant subset. SitesBLAST and Sites on a Tree use a larger non-redundant subset that ensures that every ligand is represented within each cluster. This should ensure that every binding site is represented.
- June 2022: incorporated some coding sequences from ENA with the /experiment tag.
- March 2022: incorporated BioLiP.
- April 2020: incorporated TCDB.
- April 2019: EuropePMC now returns table entries in their search results. This has expanded PaperBLAST's database, but most of the new entries are of low relevance, and the resulting snippets are often just lists of locus tags with annotations.
- February 2018: the alignment page reports the conservation of the hit's functional sites (if available from from Swiss-Prot or UniProt)
- January 2018: incorporated BRENDA.
- December 2017: incorporated MetaCyc, CharProtDB, CAZy, REBASE, and the reannotations from the Fitness Browser.
- September 2017: EuropePMC no longer returns some table entries in their search results. This has shrunk PaperBLAST's database, but has also reduced the number of low-relevance hits.
Many of these changes are described in Interactive tools for functional annotation of bacterial genomes.
PaperBLAST cannot provide snippets for many of the papers that are
published in non-open-access journals. This limitation applies even if
the paper is marked as "free" on the publisher's web site and is
available in PubmedCentral or EuropePMC. If a journal that you publish
in is marked as "secret," please consider publishing elsewhere.
Many important articles are missing from PaperBLAST, either because
the article's full text is not in EuropePMC (as for many older
articles), or because the paper does not mention a protein identifier such as a locus tag, or because of PaperBLAST's heuristics. If you notice an
article that characterizes a protein's function but is missing from
PaperBLAST, please notify the curators at UniProt
or add an entry to GeneRIF.
Entries in either of these databases will eventually be incorporated
into PaperBLAST. Note that to add an entry to UniProt, you will need
to find the UniProt identifier for the protein. If the protein is not
already in UniProt, you can ask them to create an entry. To add an
entry to GeneRIF, you will need an NCBI Gene identifier, but
unfortunately many prokaryotic proteins in RefSeq do not have
corresponding Gene identifers.
References
PaperBLAST: Text-mining papers for information about homologs.
M. N. Price and A. P. Arkin (2017). mSystems, 10.1128/mSystems.00039-17.
Europe PMC in 2017.
M. Levchenko et al (2017). Nucleic Acids Research, 10.1093/nar/gkx1005.
Gene indexing: characterization and analysis of NLM's GeneRIFs.
J. A. Mitchell et al (2003). AMIA Annu Symp Proc 2003:460-464.
UniProt: the universal protein knowledgebase.
The UniProt Consortium (2016). Nucleic Acids Research, 10.1093/nar/gkw1099.
BRENDA in 2017: new perspectives and new tools in BRENDA.
S. Placzek et al (2017). Nucleic Acids Research, 10.1093/nar/gkw952.
The EcoCyc database: reflecting new knowledge about Escherichia coli K-12.
I. M. Keeseler et al (2016). Nucleic Acids Research, 10.1093/nar/gkw1003.
The MetaCyc database of metabolic pathways and enzymes.
R. Caspi et al (2018). Nucleic Acids Research, 10.1093/nar/gkx935.
CharProtDB: a database of experimentally characterized protein annotations.
R. Madupu et al (2012). Nucleic Acids Research, 10.1093/nar/gkr1133.
The carbohydrate-active enzymes database (CAZy) in 2013.
V. Lombard et al (2014). Nucleic Acids Research, 10.1093/nar/gkt1178.
The Transporter Classification Database (TCDB): recent advances
M. H. Saier, Jr. et al (2016). Nucleic Acids Research, 10.1093/nar/gkv1103.
REBASE - a database for DNA restriction and modification: enzymes, genes and genomes.
R. J. Roberts et al (2015). Nucleic Acids Research, 10.1093/nar/gku1046.
Deep annotation of protein function across diverse bacteria from mutant phenotypes.
M. N. Price et al (2016). bioRxiv, 10.1101/072470.
by Morgan Price,
Arkin group
Lawrence Berkeley National Laboratory