PaperBLAST – Find papers about a protein or its homologs

 

PaperBLAST

PaperBLAST Hits for sp|P43898|HEM6_PSEAE Oxygen-dependent coproporphyrinogen-III oxidase OS=Pseudomonas aeruginosa (strain ATCC 15692 / DSM 22644 / CIP 104116 / JCM 14847 / LMG 12228 / 1C / PRS 101 / PAO1) OX=208964 GN=hemF PE=3 SV=1 (305 a.a., MTDRIAAVKT...)

Other sequence analysis tools:

Find functional residues: SitesBLAST

Search for conserved domains

Find the best match in UniProt

Compare to protein structures

Predict transmenbrane helices: Phobius

Predict protein localization: PSORTb

Find homologs in fast.genomics

Fitness BLAST: loading...

Found 80 similar proteins in the literature:

P43898 Oxygen-dependent coproporphyrinogen-III oxidase from Pseudomonas aeruginosa (strain ATCC 15692 / DSM 22644 / CIP 104116 / JCM 14847 / LMG 12228 / 1C / PRS 101 / PAO1)
PA0024 coproporphyrinogen III oxidase from Pseudomonas aeruginosa PAO1
CIA_05056 oxygen-dependent coproporphyrinogen oxidase from Pseudomonas aeruginosa PA14
100% identity, 100% coverage

PP0073 coproporphyrinogen III oxidase, aerobic from Pseudomonas putida KT2440
87% identity, 99% coverage

LINJ_06_1330 coproporphyrinogen III oxidase from Leishmania infantum JPCM5
A4HT18 coproporphyrinogen oxidase from Leishmania infantum
70% identity, 99% coverage

HEM6_LEIMA / P84155 Oxygen-dependent coproporphyrinogen-III oxidase; Coprogen oxidase; Coproporphyrinogenase; EC 1.3.3.3 from Leishmania major
LMJF_06_1270 coproporphyrinogen III oxidase from Leishmania major strain Friedlin
70% identity, 99% coverage

3dwsB / P84155 Leishmania major coproporphyrinogen iii oxidase with bound ligand
70% identity, 99% coverage

WP_001625620 oxygen-dependent coproporphyrinogen oxidase from Escherichia coli
71% identity, 96% coverage

Q9KVT4 Oxygen-dependent coproporphyrinogen-III oxidase from Vibrio cholerae serotype O1 (strain ATCC 39315 / El Tor Inaba N16961)
67% identity, 96% coverage

Sec / b2436 coproporphyrinogen III oxidase (EC 1.3.3.3) from Escherichia coli K-12 substr. MG1655 (see 3 papers)
hemF / P36553 coproporphyrinogen III oxidase (EC 1.3.3.3) from Escherichia coli (strain K12) (see 9 papers)
HEM6_ECOLI / P36553 Oxygen-dependent coproporphyrinogen-III oxidase; CPO; Coprogen oxidase; Coproporphyrinogenase; EC 1.3.3.3 from Escherichia coli (strain K12) (see 2 papers)
P36553 coproporphyrinogen oxidase (EC 1.3.3.3) from Escherichia coli (see paper)
b2436 coproporphyrinogen III oxidase from Escherichia coli str. K-12 substr. MG1655
70% identity, 96% coverage

P33771 Oxygen-dependent coproporphyrinogen-III oxidase from Salmonella typhimurium (strain LT2 / SGSC1412 / ATCC 700720)
71% identity, 96% coverage

lpg1215 oxygen-dependent coproporphyrinogen III oxidase from Legionella pneumophila subsp. pneumophila str. Philadelphia 1
66% identity, 96% coverage

lpp1223 oxygen-dependent coproporphyrinogen III oxidase from Legionella pneumophila str. Paris
67% identity, 97% coverage

YPO3032 coproporphyrinogen III oxidase, aerobic HemF from Yersinia pestis CO92
69% identity, 96% coverage

ECs3307 coproporphyrinogen III oxidase from Escherichia coli O157:H7 str. Sakai
70% identity, 96% coverage

SO_0038 oxygen-dependent coproporphyrinogen oxidase from Shewanella oneidensis MR-1
SO0038 coproporphyrinogen III oxidase, aerobic from Shewanella oneidensis MR-1
66% identity, 97% coverage

XAC4109 aerobic coproporphyrinogen III oxidase from Xanthomonas axonopodis pv. citri str. 306
68% identity, 97% coverage

B2FND0 Oxygen-dependent coproporphyrinogen-III oxidase from Stenotrophomonas maltophilia (strain K279a)
68% identity, 96% coverage

3dwrB / P84155 Leishmania major coproporphyrinogen iii oxidase with bound ligand
66% identity, 99% coverage

LF41_3101 oxygen-dependent coproporphyrinogen oxidase from Lysobacter dokdonensis DS-58
64% identity, 95% coverage

BP2310 coproporphyrinogen III oxidase from Bordetella pertussis Tohama I
64% identity, 97% coverage

XF0017 coproporphyrinogen III oxidase from Xylella fastidiosa 9a5c
64% identity, 96% coverage

Fphi_1842 Coproporphyrinogen oxidase from Francisella philomiragia subsp. philomiragia ATCC 25017
55% identity, 98% coverage

FTL_1022 Coproporphyinogen III oxidase from Francisella tularensis subsp. holarctica
55% identity, 98% coverage

FTA_1078 coproporphyrinogen III oxidase from Francisella tularensis subsp. holarctica FTA
54% identity, 98% coverage

SYNPCC7002_A1828 coproporphyrinogen III oxidase, aerobic from Synechococcus sp. PCC 7002
51% identity, 87% coverage

all1357 coproporphyrinogen III oxidase from Nostoc sp. PCC 7120
53% identity, 84% coverage

HEM6_SYNY3 / P72848 Oxygen-dependent coproporphyrinogen-III oxidase; CPO; Coprogen oxidase; Coproporphyrinogenase; EC 1.3.3.3 from Synechocystis sp. (strain ATCC 27184 / PCC 6803 / Kazusa) (see paper)
P72848 coproporphyrinogen oxidase (EC 1.3.3.3) from Synechocystis sp. (see paper)
sll1185 coproporphyrinogen III oxidase from Synechocystis sp. PCC 6803
51% identity, 88% coverage

AM1_0615 coproporphyrinogen III oxidase, aerobic from Acaryochloris marina MBIC11017
51% identity, 89% coverage

cce_3201 coproporphyrinogen III oxidase, aerobic from Cyanothece sp. ATCC 51142
50% identity, 88% coverage

Tery_1166 Coproporphyrinogen oxidase from Trichodesmium erythraeum IMS101
49% identity, 89% coverage

M744_13435 oxygen-dependent coproporphyrinogen oxidase from Synechococcus elongatus UTEX 2973
51% identity, 88% coverage

A1S_3108 coproporphyrinogen III oxidase from Acinetobacter baumannii ATCC 17978
61% identity, 78% coverage

Synpcc7942_0674 Coproporphyrinogen oxidase from Synechococcus elongatus PCC 7942
Q31QG3 Oxygen-dependent coproporphyrinogen-III oxidase from Synechococcus elongatus (strain ATCC 33912 / PCC 7942 / FACHB-805)
52% identity, 91% coverage

BL107_08791 coproporphyrinogen III oxidase from Synechococcus sp. BL107
48% identity, 79% coverage

MXAN_6762 coproporphyrinogen III oxidase, aerobic from Myxococcus xanthus DK 1622
56% identity, 96% coverage

Caur_2599 Coproporphyrinogen oxidase from Chloroflexus aurantiacus J-10-fl
53% identity, 98% coverage

SYNW2040 Coproporphyrinogen III oxidase from Synechococcus sp. WH 8102
49% identity, 77% coverage

LOC107845388 oxygen-dependent coproporphyrinogen-III oxidase, chloroplastic from Capsicum annuum
49% identity, 74% coverage

Q2F7H7 coproporphyrinogen oxidase (EC 1.3.3.3) from Zea mays (see paper)
50% identity, 70% coverage

CPO / P35055 coproporphyrinogen III oxidase subunit (EC 1.3.3.3) from Glycine max (see paper)
P35055 Oxygen-dependent coproporphyrinogen-III oxidase, chloroplastic from Glycine max
50% identity, 77% coverage

NP_001347283 oxygen-dependent coproporphyrinogen-III oxidase, chloroplastic from Glycine max
50% identity, 77% coverage

GRMZM5G870342 uncharacterized protein LOC100500945 from Zea mays
49% identity, 75% coverage

F2DIZ2 coproporphyrinogen oxidase from Hordeum vulgare subsp. vulgare
51% identity, 73% coverage

Q2F7H8 coproporphyrinogen oxidase (EC 1.3.3.3) from Zea mays (see paper)
49% identity, 74% coverage

HEMF1 / Q9LR75 coproporphyrinogen III oxidase (EC 1.3.3.3) from Arabidopsis thaliana (see 2 papers)
HEM61_ARATH / Q9LR75 Coproporphyrinogen-III oxidase 1, chloroplastic; AtCPO-I; Coprogen oxidase; Coproporphyrinogenase; Protein LESION INITIATION 2; EC 1.3.3.3 from Arabidopsis thaliana (Mouse-ear cress) (see paper)
Q9LR75 coproporphyrinogen oxidase (EC 1.3.3.3) from Arabidopsis thaliana (see paper)
AT1G03475 LIN2 (LESION INITIATION 2); coproporphyrinogen oxidase from Arabidopsis thaliana
NP_171847 Coproporphyrinogen III oxidase from Arabidopsis thaliana
48% identity, 77% coverage

Gasu_19740 coproporphyrinogen III oxidase from Galdieria sulphuraria
47% identity, 74% coverage

XP_416596 oxygen-dependent coproporphyrinogen-III oxidase, mitochondrial from Gallus gallus
46% identity, 73% coverage

XP_008201513 oxygen-dependent coproporphyrinogen-III oxidase from Tribolium castaneum
47% identity, 78% coverage

E1BKY9 coproporphyrinogen oxidase from Bos taurus
45% identity, 67% coverage

Ot03g03170 Coproporphyrinogen III oxidase, conserved site from Ostreococcus tauri
49% identity, 84% coverage

FGSG_10739 coproporphyrinogen III oxidase from Fusarium graminearum PH-1
45% identity, 72% coverage

CCM_07483 coproporphyrinogen III oxidase from Cordyceps militaris CM01
46% identity, 72% coverage

CPOX / P36551 Coproporphyrinogen-III oxidase, mitochondrial (EC 1.3.3.3) from Homo sapiens (see 5 papers)
HEM6_HUMAN / P36551 Oxygen-dependent coproporphyrinogen-III oxidase, mitochondrial; COX; Coprogen oxidase; Coproporphyrinogenase; EC 1.3.3.3 from Homo sapiens (Human) (see 14 papers)
P36551 coproporphyrinogen oxidase (EC 1.3.3.3) from Homo sapiens (see 5 papers)
45% identity, 66% coverage

HEM6_MOUSE / P36552 Oxygen-dependent coproporphyrinogen-III oxidase, mitochondrial; COX; Coprogen oxidase; Coproporphyrinogenase; EC 1.3.3.3 from Mus musculus (Mouse) (see paper)
P36552 coproporphyrinogen oxidase (EC 1.3.3.3) from Mus musculus (see paper)
NP_031783 oxygen-dependent coproporphyrinogen-III oxidase, mitochondrial precursor from Mus musculus
45% identity, 67% coverage

HEM13 coproporphyrinogen III oxidase from Candida albicans (see 2 papers)
49% identity, 89% coverage

HEM6_RAT / Q3B7D0 Oxygen-dependent coproporphyrinogen-III oxidase, mitochondrial; COX; Coprogen oxidase; Coproporphyrinogenase; EC 1.3.3.3 from Rattus norvegicus (Rat) (see paper)
44% identity, 67% coverage

HEM6_YEAST / P11353 Oxygen-dependent coproporphyrinogen-III oxidase; COX; Coprogen oxidase; Coproporphyrinogenase; EC 1.3.3.3 from Saccharomyces cerevisiae (strain ATCC 204508 / S288c) (Baker's yeast) (see 4 papers)
P11353 coproporphyrinogen oxidase (EC 1.3.3.3) from Saccharomyces cerevisiae (see paper)
YDR044W Coproporphyrinogen III oxidase, an oxygen requiring enzyme that catalyzes the sixth step in the heme biosynthetic pathway; localizes to the mitochondrial inner membrane; transcription is repressed by oxygen and heme (via Rox1p and Hap1p) from Saccharomyces cerevisiae
46% identity, 87% coverage

Q70W35 coproporphyrinogen oxidase (EC 1.3.3.3) from Kluyveromyces lactis (see paper)
XP_455911 coproporphyrinogen oxidase from Kluyveromyces lactis
43% identity, 80% coverage

Q9S7V1 coproporphyrinogen oxidase from Chlamydomonas reinhardtii
XP_001701729 uncharacterized protein from Chlamydomonas reinhardtii
44% identity, 81% coverage

NCU01546 coproporphyrinogen III oxidase from Neurospora crassa OR74A
44% identity, 72% coverage

AFUA_1G07480, Afu1g07480 coproporphyrinogen III oxidase, putative from Aspergillus fumigatus Af293
45% identity, 60% coverage

G3XZT7 coproporphyrinogen oxidase (EC 1.3.3.3) from Aspergillus niger (see paper)
44% identity, 61% coverage

An07g10040 uncharacterized protein from Aspergillus niger
44% identity, 61% coverage

CNAG_02460 coproporphyrinogen III oxidase from Cryptococcus neoformans var. grubii H99
41% identity, 70% coverage

CNBE0930 hypothetical protein from Cryptococcus neoformans var. neoformans B-3501A
41% identity, 88% coverage

RSP_0682 coproporphyrinogen III oxidase, aerobic from Rhodobacter sphaeroides 2.4.1
43% identity, 90% coverage

WP_012151472 oxygen-dependent coproporphyrinogen oxidase from Rickettsia rickettsii str. Morgan
41% identity, 88% coverage

Wbm0709 Coproporphyrinogen III oxidase from Wolbachia endosymbiont strain TRS of Brugia malayi
WBM_RS04335 oxygen-dependent coproporphyrinogen oxidase from Wolbachia endosymbiont strain TRS of Brugia malayi
42% identity, 86% coverage

RT0874 oxygen-dependent coproporphyrinogen III oxidase from Rickettsia typhi str. wilmington
40% identity, 90% coverage

NT01EI_1236 coproporphyrinogen III oxidase, aerobic from Edwardsiella ictaluri 93-146
59% identity, 55% coverage

CC0506 coproporphyrinogen III oxidase, aerobic from Caulobacter crescentus CB15
42% identity, 90% coverage

XP_005247182 oxygen-dependent coproporphyrinogen-III oxidase, mitochondrial isoform X1 from Homo sapiens
42% identity, 59% coverage

CLIBASIA_04875 coproporphyrinogen III oxidase from Candidatus Liberibacter asiaticus str. psy62
39% identity, 85% coverage

RPA1514 putative coproporphyrinogen III oxidase precursor from Rhodopseudomonas palustris CGA009
41% identity, 86% coverage

Ecaj_0446 Coproporphyrinogen oxidase from Ehrlichia canis str. Jake
37% identity, 90% coverage

XP_014771663 oxygen-dependent coproporphyrinogen-III oxidase isoform X1 from Octopus bimaculoides
42% identity, 67% coverage

B488_08410 oxygen-dependent coproporphyrinogen oxidase from Liberibacter crescens BT-1
38% identity, 90% coverage

CKC_03595 oxygen-dependent coproporphyrinogen oxidase from Candidatus Liberibacter solanacearum CLso-ZC1
37% identity, 85% coverage

PF3D7_1142400, XP_001348105 coproporphyrinogen-III oxidase from Plasmodium falciparum 3D7
27% identity, 51% coverage

Q9FEX6 Putative lectin (Fragment) from Hordeum vulgare
59% identity, 32% coverage

AT4G03205 coproporphyrinogen III oxidase, putative / coproporphyrinogenase, putative / coprogen oxidase, putative from Arabidopsis thaliana
Q93Z96 Coproporphyrinogen-III oxidase 2, chloroplastic from Arabidopsis thaliana
42% identity, 44% coverage

New Search

For advice on how to use these tools together, see Interactive tools for functional annotation of bacterial genomes.

Statistics

The PaperBLAST database links 793,807 different protein sequences to 1,259,118 scientific articles. Searches against EuropePMC were last performed on March 13 2025.

How It Works

PaperBLAST builds a database of protein sequences that are linked to scientific articles. These links come from automated text searches against the articles in EuropePMC and from manually-curated information from GeneRIF, UniProtKB/Swiss-Prot, BRENDA, CAZy (as made available by dbCAN), BioLiP, CharProtDB, MetaCyc, EcoCyc, TCDB, REBASE, the Fitness Browser, and a subset of the European Nucleotide Archive with the /experiment tag. Given this database and a protein sequence query, PaperBLAST uses protein-protein BLAST to find similar sequences with E < 0.001.

To build the database, we query EuropePMC with locus tags, with RefSeq protein identifiers, and with UniProt accessions. We obtain the locus tags from RefSeq or from MicrobesOnline. We use queries of the form "locus_tag AND genus_name" to try to ensure that the paper is actually discussing that gene. Because EuropePMC indexes most recent biomedical papers, even if they are not open access, some of the links may be to papers that you cannot read or that our computers cannot read. We query each of these identifiers that appears in the open access part of EuropePMC, as well as every locus tag that appears in the 500 most-referenced genomes, so that a gene may appear in the PaperBLAST results even though none of the papers that mention it are open access. We also incorporate text-mined links from EuropePMC that link open access articles to UniProt or RefSeq identifiers. (This yields some additional links because EuropePMC uses different heuristics for their text mining than we do.)

For every article that mentions a locus tag, a RefSeq protein identifier, or a UniProt accession, we try to select one or two snippets of text that refer to the protein. If we cannot get access to the full text, we try to select a snippet from the abstract, but unfortunately, unique identifiers such as locus tags are rarely provided in abstracts.

PaperBLAST also incorporates manually-curated protein functions:

Except for GeneRIF and ENA, the curated entries include a short curated description of the protein's function. For entries from BioLiP, the protein's function may not be known beyond binding to the ligand. Many of these entries also link to articles in PubMed.

For more information see the PaperBLAST paper (mSystems 2017) or the code. You can download PaperBLAST's database here.

Changes to PaperBLAST since the paper was written:

Many of these changes are described in Interactive tools for functional annotation of bacterial genomes.

Secrets

PaperBLAST cannot provide snippets for many of the papers that are published in non-open-access journals. This limitation applies even if the paper is marked as "free" on the publisher's web site and is available in PubmedCentral or EuropePMC. If a journal that you publish in is marked as "secret," please consider publishing elsewhere.

Omissions from the PaperBLAST Database

Many important articles are missing from PaperBLAST, either because the article's full text is not in EuropePMC (as for many older articles), or because the paper does not mention a protein identifier such as a locus tag, or because of PaperBLAST's heuristics. If you notice an article that characterizes a protein's function but is missing from PaperBLAST, please notify the curators at UniProt or add an entry to GeneRIF. Entries in either of these databases will eventually be incorporated into PaperBLAST. Note that to add an entry to UniProt, you will need to find the UniProt identifier for the protein. If the protein is not already in UniProt, you can ask them to create an entry. To add an entry to GeneRIF, you will need an NCBI Gene identifier, but unfortunately many prokaryotic proteins in RefSeq do not have corresponding Gene identifers.

References

PaperBLAST: Text-mining papers for information about homologs.
M. N. Price and A. P. Arkin (2017). mSystems, 10.1128/mSystems.00039-17.

Europe PMC in 2017.
M. Levchenko et al (2017). Nucleic Acids Research, 10.1093/nar/gkx1005.

Gene indexing: characterization and analysis of NLM's GeneRIFs.
J. A. Mitchell et al (2003). AMIA Annu Symp Proc 2003:460-464.

UniProt: the universal protein knowledgebase.
The UniProt Consortium (2016). Nucleic Acids Research, 10.1093/nar/gkw1099.

BRENDA in 2017: new perspectives and new tools in BRENDA.
S. Placzek et al (2017). Nucleic Acids Research, 10.1093/nar/gkw952.

The EcoCyc database: reflecting new knowledge about Escherichia coli K-12.
I. M. Keeseler et al (2016). Nucleic Acids Research, 10.1093/nar/gkw1003.

The MetaCyc database of metabolic pathways and enzymes.
R. Caspi et al (2018). Nucleic Acids Research, 10.1093/nar/gkx935.

CharProtDB: a database of experimentally characterized protein annotations.
R. Madupu et al (2012). Nucleic Acids Research, 10.1093/nar/gkr1133.

The carbohydrate-active enzymes database (CAZy) in 2013.
V. Lombard et al (2014). Nucleic Acids Research, 10.1093/nar/gkt1178.

The Transporter Classification Database (TCDB): recent advances
M. H. Saier, Jr. et al (2016). Nucleic Acids Research, 10.1093/nar/gkv1103.

REBASE - a database for DNA restriction and modification: enzymes, genes and genomes.
R. J. Roberts et al (2015). Nucleic Acids Research, 10.1093/nar/gku1046.

Deep annotation of protein function across diverse bacteria from mutant phenotypes.
M. N. Price et al (2016). bioRxiv, 10.1101/072470.

by Morgan Price, Arkin group
Lawrence Berkeley National Laboratory