PaperBLAST
PaperBLAST Hits for NP_171829.1 agenet domain protein (DOMAIN OF UNKNOWN FUNCTION 724 1) (Arabidopsis thaliana) (670 a.a., MAIFKDCEVE...)
Show query sequence
>NP_171829.1 agenet domain protein (DOMAIN OF UNKNOWN FUNCTION 724 1) (Arabidopsis thaliana)
MAIFKDCEVEIFSEEDGFRNAWYRAILEETPTNPTSESKKLRFSYMTKSLNKEGSSSPPT
VEQRFIRPVPPENLYNGVVFEEGTMVDADYKHRWRTGVVINKMENDSYLVLFDCPPDIIQ
FETKHLRAHLDWTGSEWVQPEVRELSKSMFSPGTLVEVSCVIDKVEVSWVTAMIVKEIEE
SGEKKFIVKVCNKHLSCRVDEAKPNMTVDSCCVRPRPPLFFVEEYDLRDCVEVFHGSSWR
QGVVKGVHIEKQYTVTLEATKDKLVVKHSDLRPFKVWEDGVWHNGPQQKPVKESPSNAIK
QKPMCSSSGARPMTPKMATKHARISFNPEENVEELSVAETVAATGKLEKMGIAEESVSCV
TPLKQTEANAEGNKLEPMRNQNCLRNDSTQQMLPEEENSKDGSTKRKREEKHNSASSVMD
EIDGTCNGSESEISNTGKSICNNDDVDDQPLSTELPYYQSLSVVNSFAADAEETPAKSAR
TISPFAKKLPFWKSYETDELYKSLPQSPHFSPLFKAKEDIREWSAVGMMVTFYCLLKEVK
DLQLDDSSSKLSSLSSSLAELEKHGFNVTDPLSRISKVLPLQDKRAKKAEERKCLEKKIE
CEEIERKRFEEEFADFERIIIEKKRQALVAKEKKEAADKRIGEMKTCAETIDQEIKDEEL
EFQTTVSTPW
Running BLASTp...
Found 29 similar proteins in the literature:
DUF1_ARATH / Q9ZVT1 DUF724 domain-containing protein 1; AtDUF1 from Arabidopsis thaliana (Mouse-ear cress) (see paper)
NP_171829 agenet domain protein (DOMAIN OF UNKNOWN FUNCTION 724 1) from Arabidopsis thaliana
AT1G03300 agenet domain-containing protein from Arabidopsis thaliana
100% identity, 100% coverage
- function: May be involved in the polar growth of plant cells via transportation of RNAs.
disruption phenotype: No visible phenotype under normal growth conditions. - Characterization of DUF724 gene family in Arabidopsis thaliana.
Cao, Plant molecular biology 2010 (PubMed)- GeneRIF: AtDuf4 were found to express in the root tips. They were localized in nucleus.
- Functional Insight of Nitric-Oxide Induced DUF Genes in Arabidopsis thaliana
Nabi, Frontiers in plant science 2020 - “...proteins in 107 species: Archae0; Bacteria4; Metazoa91; Fungi93; Plants473; Viruses0; Other Eukaryotes14 (source: NCBI BLink). AT1G03300 2.99000 0.24000 12.45833 0.00000 3.62457 Member of the plant-specific DUF724 protein family. Arabidopsis has 10 DUF724 proteins. Loss of function mutant has a WT phenotype AT3G15310 25.97000 2.10000 12.36667 0.00000...”
DUF6_ARATH / O22897 DUF724 domain-containing protein 6; AtDUF6 from Arabidopsis thaliana (Mouse-ear cress) (see paper)
AT2G47230 agenet domain-containing protein from Arabidopsis thaliana
NP_182245 agenet domain protein (DOMAIN OF UNKNOWN FUNCTION 724 6) from Arabidopsis thaliana
56% identity, 95% coverage
- function: May be involved in the polar growth of plant cells via transportation of RNAs.
- Methodological implementation of mixed linear models in multi-locus genome-wide association studies
Wen, Briefings in bioinformatics 2018 - “...0.460 1.199 [ 31 ] At2g27380 2 11703876 4.744 0.043 0.323 1.122 [ 33 ] At2g47230 2 19396129 4.208 0.038 0.298 0.911 [ 31 ] At3g56900 3 21079518 3.081 0.032 0.311 0.661 [ 31 ] At3g57000 3 21079518 3.081 0.032 0.311 0.661 [ 31 ] At5g06550...”
- “...al. [ 29 ]. For example, among seven known genes ( At1g03457 , At2g27380 , At2g47230 , At3g56900 , At3g57000 , At5g06550 and At5g06590 ) for 8W GH FT in this study, no genes were within the 133 candidate genes in Atwell et al. [ 29...”
- Characterization of DUF724 gene family in Arabidopsis thaliana.
Cao, Plant molecular biology 2010 (PubMed)- GeneRIF: Data show that AtDuf6 genes were expressed in roots, leaves, shoot apical meristems, anthers and pollen grains. They were localized in nucleus.
DUF3_ARATH / Q9FZD9 DUF724 domain-containing protein 3; AtDUF3 from Arabidopsis thaliana (Mouse-ear cress) (see paper)
AT1G26540 agenet domain-containing protein from Arabidopsis thaliana
55% identity, 96% coverage
DUF2_ARATH / F4I8W1 DUF724 domain-containing protein 2; AtDUF2 from Arabidopsis thaliana (Mouse-ear cress) (see paper)
NP_172609 agenet domain protein (DOMAIN OF UNKNOWN FUNCTION 724 2) from Arabidopsis thaliana
40% identity, 100% coverage
NP_001331103 agenet domain protein (DOMAIN OF UNKNOWN FUNCTION 724 9) from Arabidopsis thaliana
37% identity, 78% coverage
DUF7_ARATH / Q8H0V4 DUF724 domain-containing protein 7; AtDUF7; ABAP1-interacting protein 1 from Arabidopsis thaliana (Mouse-ear cress) (see 2 papers)
AT3G62300 agenet domain-containing protein from Arabidopsis thaliana
29% identity, 93% coverage
- function: May act as a link between DNA replication, transcription and chromatin remodeling during flower development. May participate in the repression of LHP1-targeted genes during flower development by direct interaction with LHP1 (PubMed:26538092). May be involved in the polar growth of plant cells via transportation of RNAs (Probable).
subunit: Homodimer (PubMed:19795213, PubMed:26538092). Interacts wtih ABAP1, ARIA and LHP1 (PubMed:26538092). Interacts with the non-modified histones H1, H2B, H3 and H4 (PubMed:26538092). - AIP1 is a novel Agenet/Tudor domain protein from Arabidopsis that interacts with regulators of DNA replication, transcription and chromatin remodeling
Brasil, BMC plant biology 2015 - “...sativa Os05g04180; Angiosperm Eudicot Populus trichocarpa Potri_018G030500_5, Brassica rapa Bra022578, Manihot esculenta cassava4_1_003152, A. thaliana AT3G62300, AT5G13020. The two sequences of Agenet/Tudor repetitions from AIP1 were used (AT3G62300.1 and AT3G62300.2). c Overlapping Agenet/Tudor models generated in the I-TASSER server. The structures are colored in white (B_MA_20337g0010),...”
- “...division in leaves [ 17 ]. Among the ABAP1-interacting proteins (AIPs) identified, there was AIP1 (At3G62300), an unknown protein predicted with 722 amino acids and approximately 80,9kDa. It harbors two repeats of Agenet/Tudor domain in its N-terminal region (amino acids 1384, and 161224) as well as...”
DUF9_ARATH / Q9FFA2 DUF724 domain-containing protein 9; AtDUF9 from Arabidopsis thaliana (Mouse-ear cress) (see paper)
AT5G23780 agenet domain-containing protein from Arabidopsis thaliana
37% identity, 78% coverage
NP_001190163 agenet domain protein (DOMAIN OF UNKNOWN FUNCTION 724 7) from Arabidopsis thaliana
29% identity, 93% coverage
DUF8_ARATH / F4KEA4 DUF724 domain-containing protein 8; AtDUF8 from Arabidopsis thaliana (Mouse-ear cress) (see paper)
37% identity, 78% coverage
- function: May be involved in the polar growth of plant cells via transportation of RNAs.
disruption phenotype: No visible phenotype under normal growth conditions.
DUF10_ARATH / Q9FFA0 DUF724 domain-containing protein 10; AtDUF10 from Arabidopsis thaliana (Mouse-ear cress) (see paper)
NP_197769 agenet domain protein (DOMAIN OF UNKNOWN FUNCTION 724 10) from Arabidopsis thaliana
41% identity, 42% coverage
AT2G47220 3' exoribonuclease family domain 1 protein-related from Arabidopsis thaliana
NP_182244 polyribonucleotide phosphorylase, putative (DOMAIN OF UNKNOWN FUNCTION 724 5) from Arabidopsis thaliana
38% identity, 46% coverage
DUF5_ARATH / Q0WNB1 DUF724 domain-containing protein 5; AtDUF5 from Arabidopsis thaliana (Mouse-ear cress) (see paper)
46% identity, 29% coverage
- function: May be involved in the polar growth of plant cells via transportation of RNAs.
subunit: Homodimer.
disruption phenotype: No visible phenotype under normal growth conditions.
NP_001318629 agenet domain protein (DOMAIN OF UNKNOWN FUNCTION 724 8) from Arabidopsis thaliana
39% identity, 44% coverage
D7U2L4 Agenet domain-containing protein from Vitis vinifera
33% identity, 27% coverage
- Grape ASR-Silencing Sways Nuclear Proteome, Histone Marks and Interplay of Intrinsically Disordered Proteins
Atanassov, International journal of molecular sciences 2022 - “...1.53 D7T3I0 (D7T3I0_VITVI) CBI25061.3 VIT_00s0179g00340.t01 Histone H2A.1 1.71 F6GV41 (F6GV41_VITVI) CBI16181.3 VIT_06s0004g04230.t01 Histone H2B 1.74 D7U2L4 (D7U2L4_VITVI) CBI36980.3 VIT_07s0005g01810.t01 Agenet domain-containing protein 1.87 D7TCM4 (D7TCM4_VIT CBI27882.3 VIT_11s0016g01890.t01 Single myb histone 1.33 D7TED8 (D7TED8_VITVI) CBI28861.3 VIT_12s0059g01310.t01 SUMO protein 1.43 D7TUZ2 (D7TUZ2_VITVI) CBI34317.3 VIT_14s0030g00480.t01 RNA recognition motif family...”
AGDP1_ARATH / Q500V5 Protein AGENET DOMAIN (AGD)-CONTAINING P1; Protein ONE AGENET DOMAIN-CONTAINING PROTEIN from Arabidopsis thaliana (Mouse-ear cress) (see 2 papers)
AT1G09320 agenet domain-containing protein from Arabidopsis thaliana
NP_172403 agenet domain-containing protein from Arabidopsis thaliana
27% identity, 48% coverage
- function: Heterochromatin-binding protein that preferentially occupies long transposons and specifically recognizes the histone H3 'Lys-9' methylation (H3K9me) marks, with a stronger affinity for dimethylated H3K9 (H3K9me2) (PubMed:30382101, PubMed:30425322). Required for transcriptional silencing, non-CG DNA methylation (e.g. CHG and CHH regions), and H3K9 dimethylation (H3K9me2) at some loci (PubMed:30382101, PubMed:30425322). Mediates heterochromatin phase separation and chromocenter formation (PubMed:30425322).
disruption phenotype: Abnormal transcription up-regulation of some transposable elements (TEs) and of hypermethylated loci (including MU1, GP1, SN1 and ERT7) (PubMed:30382101, PubMed:30425322). Hypomethylated DNA CHG and CHH regions (PubMed:30382101, PubMed:30425322). Reduced H3K9me2 levels (PubMed:30382101). Increased ratio of decondensed nuclei (PubMed:30425322). - Plant HP1 protein ADCP1 links multivalent H3K9 methylation readout to heterochromatin formation
Zhao, Cell research 2019 - “...in Arabidopsis through a 3D-carbene based SPRi platform. 17 , 18 One Agenet domain-containing protein AT1G09320 (abbreviated as ADCP1) showed a significant signal towards H3K9me2 peptide on the SPRi platform (Fig. 1a ). ADCP1 contains three conserved tandem Agenet domains, which are labelled as Agenet 1/2,...”
- AIP1 is a novel Agenet/Tudor domain protein from Arabidopsis that interacts with regulators of DNA replication, transcription and chromatin remodeling
Brasil, BMC plant biology 2015 - “...belonging to Agenet/Tudor domain family in plants, we used an Agenet/Tudor sequence from the gene At1g09320 to perform TBLASTN query against available genome sequences in Phytozome, NCBI, TAIR and Congenie databases [ 18 21 ]. The search included genomes of unicellular green algae (4 species), nonvascular...”
- Plant HP1 protein ADCP1 links multivalent H3K9 methylation readout to heterochromatin formation.
Zhao, Cell research 2019 - GeneRIF: The authors report on the discovery of ADCP1 (Agenet Domain Containing Protein 1) as a multivalent histone H3K9 methylation reader in plants, and outline its functional roles in mediating heterochromatin phase separation, histone H3K9 and DNA methylation maintenance, as well as transposon silencing.
- Arabidopsis AGDP1 links H3K9me2 to DNA methylation in heterochromatin.
Zhang, Nature communications 2018 - GeneRIF: AGDP1 links histone H3 lysine 9 dimethylation to DNA methylation in heterochromatin regions.[AGDP1]
7ytaB / A0A1S4CD95 Crystal structure of ntagdp3 agd1-2 in complex with an h3k9me2 peptide (see paper)
39% identity, 21% coverage
DUF4_ARATH / O81039 DUF724 domain-containing protein 4; AtDUF4 from Arabidopsis thaliana (Mouse-ear cress) (see paper)
AT2G46840 hypothetical protein from Arabidopsis thaliana
NP_182207 hypothetical protein (DOMAIN OF UNKNOWN FUNCTION 724 4) from Arabidopsis thaliana
32% identity, 18% coverage
6ie6A / Q500V5 Crystal structure of adcp1 tandem agenet domain 3-4 in complex with h3k9me2
31% identity, 20% coverage
6ie4A / Q500V5 Crystal structure of adcp1 tandem agenet domain 1-2 in complex with h3k9me1
33% identity, 19% coverage
5zwxA / A0A493R6M0 Crystal structure of raphanus sativus agdp1 agd12 in complex with an h3k9me2 peptide (see paper)
31% identity, 20% coverage
AT1G06340 agenet domain-containing protein from Arabidopsis thaliana
30% identity, 21% coverage
AT4G32440 agenet domain-containing protein from Arabidopsis thaliana
36% identity, 10% coverage
For advice on how to use these tools together, see
Interactive tools for functional annotation of bacterial genomes.
The PaperBLAST database links 793,807 different protein sequences to 1,259,118 scientific articles. Searches against EuropePMC were last performed on March 13 2025.
PaperBLAST builds a database of protein sequences that are linked
to scientific articles. These links come from automated text searches
against the articles in EuropePMC
and from manually-curated information from GeneRIF, UniProtKB/Swiss-Prot,
BRENDA,
CAZy (as made available by dbCAN),
BioLiP,
CharProtDB,
MetaCyc,
EcoCyc,
TCDB,
REBASE,
the Fitness Browser,
and a subset of the European Nucleotide Archive with the /experiment tag.
Given this database and a protein sequence query,
PaperBLAST uses protein-protein BLAST
to find similar sequences with E < 0.001.
To build the database, we query EuropePMC with locus tags, with RefSeq protein
identifiers, and with UniProt
accessions. We obtain the locus tags from RefSeq or from MicrobesOnline. We use
queries of the form "locus_tag AND genus_name" to try to ensure that
the paper is actually discussing that gene. Because EuropePMC indexes
most recent biomedical papers, even if they are not open access, some
of the links may be to papers that you cannot read or that our
computers cannot read. We query each of these identifiers that
appears in the open access part of EuropePMC, as well as every locus
tag that appears in the 500 most-referenced genomes, so that a gene
may appear in the PaperBLAST results even though none of the papers
that mention it are open access. We also incorporate text-mined links
from EuropePMC that link open access articles to UniProt or RefSeq
identifiers. (This yields some additional links because EuropePMC
uses different heuristics for their text mining than we do.)
For every article that mentions a locus tag, a RefSeq protein
identifier, or a UniProt accession, we try to select one or two
snippets of text that refer to the protein. If we cannot get access to
the full text, we try to select a snippet from the abstract, but
unfortunately, unique identifiers such as locus tags are rarely
provided in abstracts.
PaperBLAST also incorporates manually-curated protein functions:
- Proteins from NCBI's RefSeq are included if a
GeneRIF
entry links the gene to an article in
PubMed®.
GeneRIF also provides a short summary of the article's claim about the
protein, which is shown instead of a snippet.
- Proteins from Swiss-Prot (the curated part of UniProt)
are included if the curators
identified experimental evidence for the protein's function (evidence
code ECO:0000269). For these proteins, the fields of the Swiss-Prot entry that
describe the protein's function are shown (with bold headings).
- Proteins from BRENDA,
a curated database of enzymes, are included if they are linked to a paper in PubMed
and their full sequence is known.
- Every protein from the non-redundant subset of
BioLiP,
a database
of ligand-binding sites and catalytic residues in protein structures, is included. Since BioLiP itself
does not include descriptions of the proteins, those are taken from the
Protein Data Bank.
Descriptions from PDB rely on the original submitter of the
structure and cannot be updated by others, so they may be less reliable.
(For SitesBLAST and Sites on a Tree, we use a larger subset of BioLiP so that every
ligand is represented among a group of structures with similar sequences, but for
PaperBLAST, we use the non-redundant set provided by BioLiP.)
- Every protein from EcoCyc, a curated
database of the proteins in Escherichia coli K-12, is included, regardless
of whether they are characterized or not.
- Proteins from the MetaCyc metabolic pathway database
are included if they are linked to a paper in PubMed and their full sequence is known.
- Proteins from the Transport Classification Database (TCDB)
are included if they have known substrate(s), have reference(s),
and are not described as uncharacterized or putative.
(Some of the references are not visible on the PaperBLAST web site.)
- Every protein from CharProtDB,
a database of experimentally characterized protein annotations, is included.
- Proteins from the CAZy database of carbohydrate-active enzymes
are included if they are associated with an Enzyme Classification number.
Even though CAZy does not provide links from individual protein sequences to papers,
these should all be experimentally-characterized proteins.
- Proteins from the REBASE database
of restriction enzymes are included if they have known specificity.
- Every protein with an evidence-based reannotation (based on mutant phenotypes)
in the Fitness Browser is included.
- Sequence-specific transcription factors (including sigma factors and DNA-binding response regulators)
with experimentally-determined DNA binding sites from the
PRODORIC database of gene regulation in prokaryotes.
- Putative transcription factors from RegPrecise
that have manually-curated predictions for their binding sites. These predictions are based on
conserved putative regulatory sites across genomes that contain similar transcription factors,
so PaperBLAST clusters the TFs at 70% identity and retains just one member of each cluster.
- Coding sequence (CDS) features from the
European Nucleotide Archive (ENA)
are included if the /experiment tag is set (implying that there is experimental evidence for the annotation),
the nucleotide entry links to paper(s) in PubMed,
and the nucleotide entry is from the STD data class
(implying that these are targeted annotated sequences, not from shotgun sequencing).
Also, to filter out genes whose transcription or translation was detected, but whose function
was not studied, nucleotide entries or papers with more than 25 such proteins are excluded.
Descriptions from ENA rely on the original submitter of the
sequence and cannot be updated by others, so they may be less reliable.
Except for GeneRIF and ENA,
the curated entries include a short curated
description of the protein's function.
For entries from BioLiP, the protein's function may not be known beyond binding to the ligand.
Many of these entries also link to articles in PubMed.
For more information see the
PaperBLAST paper (mSystems 2017)
or the code.
You can download PaperBLAST's database here.
Changes to PaperBLAST since the paper was written:
- November 2023: incorporated PRODORIC and RegPrecise. Many PRODORIC entries were not linked to a protein sequence (no UniProt identifier), so we added this information.
- February 2023: BioLiP changed their download format. PaperBLAST now includes their non-redundant subset. SitesBLAST and Sites on a Tree use a larger non-redundant subset that ensures that every ligand is represented within each cluster. This should ensure that every binding site is represented.
- June 2022: incorporated some coding sequences from ENA with the /experiment tag.
- March 2022: incorporated BioLiP.
- April 2020: incorporated TCDB.
- April 2019: EuropePMC now returns table entries in their search results. This has expanded PaperBLAST's database, but most of the new entries are of low relevance, and the resulting snippets are often just lists of locus tags with annotations.
- February 2018: the alignment page reports the conservation of the hit's functional sites (if available from from Swiss-Prot or UniProt)
- January 2018: incorporated BRENDA.
- December 2017: incorporated MetaCyc, CharProtDB, CAZy, REBASE, and the reannotations from the Fitness Browser.
- September 2017: EuropePMC no longer returns some table entries in their search results. This has shrunk PaperBLAST's database, but has also reduced the number of low-relevance hits.
Many of these changes are described in Interactive tools for functional annotation of bacterial genomes.
PaperBLAST cannot provide snippets for many of the papers that are
published in non-open-access journals. This limitation applies even if
the paper is marked as "free" on the publisher's web site and is
available in PubmedCentral or EuropePMC. If a journal that you publish
in is marked as "secret," please consider publishing elsewhere.
Many important articles are missing from PaperBLAST, either because
the article's full text is not in EuropePMC (as for many older
articles), or because the paper does not mention a protein identifier such as a locus tag, or because of PaperBLAST's heuristics. If you notice an
article that characterizes a protein's function but is missing from
PaperBLAST, please notify the curators at UniProt
or add an entry to GeneRIF.
Entries in either of these databases will eventually be incorporated
into PaperBLAST. Note that to add an entry to UniProt, you will need
to find the UniProt identifier for the protein. If the protein is not
already in UniProt, you can ask them to create an entry. To add an
entry to GeneRIF, you will need an NCBI Gene identifier, but
unfortunately many prokaryotic proteins in RefSeq do not have
corresponding Gene identifers.
References
PaperBLAST: Text-mining papers for information about homologs.
M. N. Price and A. P. Arkin (2017). mSystems, 10.1128/mSystems.00039-17.
Europe PMC in 2017.
M. Levchenko et al (2017). Nucleic Acids Research, 10.1093/nar/gkx1005.
Gene indexing: characterization and analysis of NLM's GeneRIFs.
J. A. Mitchell et al (2003). AMIA Annu Symp Proc 2003:460-464.
UniProt: the universal protein knowledgebase.
The UniProt Consortium (2016). Nucleic Acids Research, 10.1093/nar/gkw1099.
BRENDA in 2017: new perspectives and new tools in BRENDA.
S. Placzek et al (2017). Nucleic Acids Research, 10.1093/nar/gkw952.
The EcoCyc database: reflecting new knowledge about Escherichia coli K-12.
I. M. Keeseler et al (2016). Nucleic Acids Research, 10.1093/nar/gkw1003.
The MetaCyc database of metabolic pathways and enzymes.
R. Caspi et al (2018). Nucleic Acids Research, 10.1093/nar/gkx935.
CharProtDB: a database of experimentally characterized protein annotations.
R. Madupu et al (2012). Nucleic Acids Research, 10.1093/nar/gkr1133.
The carbohydrate-active enzymes database (CAZy) in 2013.
V. Lombard et al (2014). Nucleic Acids Research, 10.1093/nar/gkt1178.
The Transporter Classification Database (TCDB): recent advances
M. H. Saier, Jr. et al (2016). Nucleic Acids Research, 10.1093/nar/gkv1103.
REBASE - a database for DNA restriction and modification: enzymes, genes and genomes.
R. J. Roberts et al (2015). Nucleic Acids Research, 10.1093/nar/gku1046.
Deep annotation of protein function across diverse bacteria from mutant phenotypes.
M. N. Price et al (2016). bioRxiv, 10.1101/072470.
by Morgan Price,
Arkin group
Lawrence Berkeley National Laboratory