PaperBLAST – Find papers about a protein or its homologs

 

PaperBLAST

PaperBLAST Hits for sp|Q9HXY9|RNH2_PSEAE Ribonuclease HII OS=Pseudomonas aeruginosa (strain ATCC 15692 / DSM 22644 / CIP 104116 / JCM 14847 / LMG 12228 / 1C / PRS 101 / PAO1) OX=208964 GN=rnhB PE=3 SV=1 (201 a.a., MQLGLDFNLV...)

Other sequence analysis tools:

Find functional residues: SitesBLAST

Search for conserved domains

Find the best match in UniProt

Compare to protein structures

Predict transmenbrane helices: Phobius

Predict protein localization: PSORTb

Find homologs in fast.genomics

Fitness BLAST: loading...

Found 81 similar proteins in the literature:

PA3642 ribonuclease HII from Pseudomonas aeruginosa PAO1
100% identity, 100% coverage

PP1605 ribonuclease HII from Pseudomonas putida KT2440
87% identity, 93% coverage

S0176 RNAse HII from Shigella flexneri 2a str. 2457T
71% identity, 94% coverage

RnhB / b0183 RNase HII (EC 3.1.26.4) from Escherichia coli K-12 substr. MG1655 (see 20 papers)
rnhB / P10442 RNase HII (EC 3.1.26.4) from Escherichia coli (strain K12) (see 19 papers)
RNH2_ECOLI / P10442 Ribonuclease HII; RNase HII; EC 3.1.26.4 from Escherichia coli (strain K12) (see paper)
P10442 ribonuclease H (EC 3.1.26.4) from Escherichia coli (see paper)
NP_414725 RNase HII from Escherichia coli str. K-12 substr. MG1655
71% identity, 94% coverage

7uwhC / P10442 Cryoem structure of e. Coli transcription-coupled ribonucleotide excision repair (tc-rer) complex bound to ribonucleotide substrate (see paper)
70% identity, 94% coverage

HD1026 ribonuclease HII from Haemophilus ducreyi 35000HP
65% identity, 92% coverage

APL_0129 ribonuclease HII from Actinobacillus pleuropneumoniae L20
67% identity, 92% coverage

HI1059 ribonuclease HII (rnhB) from Haemophilus influenzae Rd KW20
P43808 Ribonuclease HII from Haemophilus influenzae (strain ATCC 51907 / DSM 11121 / KW20 / Rd)
66% identity, 92% coverage

ABUW_2751 ribonuclease HII from Acinetobacter baumannii
62% identity, 92% coverage

P52021 Ribonuclease HII from Vibrio cholerae serotype O1 (strain ATCC 39315 / El Tor Inaba N16961)
62% identity, 90% coverage

swp_3507 Ribonuclease H from Shewanella piezotolerans WP3
62% identity, 90% coverage

lpg1373 ribonuclease HII from Legionella pneumophila subsp. pneumophila str. Philadelphia 1
63% identity, 92% coverage

SO_1643 ribonuclease HII from Shewanella oneidensis MR-1
63% identity, 89% coverage

Q25C12 ribonuclease H (EC 3.1.26.4) from Shewanella sp. (see paper)
62% identity, 87% coverage

LHK_00722 Rnh2 from Laribacter hongkongensis HLHK9
65% identity, 94% coverage

CCNA_00383 ribonuclease HII from Caulobacter crescentus NA1000
62% identity, 85% coverage

CV_2210 ribonuclease HII from Chromobacterium violaceum ATCC 12472
65% identity, 91% coverage

NGO1789 RnhB from Neisseria gonorrhoeae FA 1090
58% identity, 92% coverage

E2P69_RS10975 ribonuclease HII from Xanthomonas perforans
56% identity, 76% coverage

FTN_1293 ribonuclease HII from Francisella tularensis subsp. novicida U112
48% identity, 92% coverage

SAR11_0108 Ribonuclease HII (RNase HII) from Candidatus Pelagibacter ubique HTCC1062
46% identity, 92% coverage

WP_109143090 ribonuclease HII from Bradyrhizobium sp. SUTN9-2
56% identity, 69% coverage

SMSK321_0568 ribonuclease HII from Streptococcus mitis SK321
49% identity, 71% coverage

LSA0993 Ribonuclease HII (RNase HII) from Lactobacillus sakei subsp. sakei 23K
51% identity, 73% coverage

SPD_1020 ribonuclease HII from Streptococcus pneumoniae D39
49% identity, 71% coverage

BSU16060 ribonuclease HII from Bacillus subtilis subsp. subtilis str. 168
50% identity, 72% coverage

SSU05_0996 ribonuclease HII from Streptococcus suis 05ZYH33
46% identity, 72% coverage

A7J09_06320 ribonuclease HII from Streptococcus suis
46% identity, 72% coverage

SPy1162 putative ribonuclease HII from Streptococcus pyogenes M1 GAS
48% identity, 69% coverage

lp_1853 ribonuclease HII from Lactobacillus plantarum WCFS1
52% identity, 71% coverage

Q9X017 ribonuclease H (EC 3.1.26.4) from Thermotoga maritima (see 2 papers)
55% identity, 77% coverage

SAOUHSC_01215 hypothetical protein from Staphylococcus aureus subsp. aureus NCTC 8325
YP_499752 ribonuclease HII from Staphylococcus aureus subsp. aureus NCTC 8325
46% identity, 72% coverage

SA1087 RNase HII from Staphylococcus aureus subsp. aureus N315
46% identity, 72% coverage

3o3fA / Q9X017 T. Maritima rnase h2 d107n in complex with nucleic acid substrate and magnesium ions (see paper)
54% identity, 83% coverage

A0QV44 ribonuclease H (EC 3.1.26.4) from Mycolicibacterium smegmatis (see paper)
MSMEG_2442 ribonuclease HII from Mycobacterium smegmatis str. MC2 155
51% identity, 68% coverage

Rv2902c ribonuclease HII from Mycobacterium tuberculosis H37Rv
52% identity, 70% coverage

TTHA0198 ribonuclease HII from Thermus thermophilus HB8
51% identity, 90% coverage

AMK58_19355 ribonuclease HII from Azospirillum brasilense
55% identity, 83% coverage

alr4332 ribonuclease HII from Nostoc sp. PCC 7120
48% identity, 84% coverage

Rru_A3209 Ribonuclease H from Rhodospirillum rubrum ATCC 11170
54% identity, 84% coverage

Q8Y7K4 ribonuclease H (EC 3.1.26.4) from Listeria monocytogenes EGD-e (see paper)
lmo1273 similar to ribonuclease H rnh from Listeria monocytogenes EGD-e
NP_464798 ribonuclease HII from Listeria monocytogenes EGD-e
50% identity, 69% coverage

DVU0834 ribonuclease HII from Desulfovibrio vulgaris Hildenborough
49% identity, 82% coverage

LLKF_1359 ribonuclease HII from Lactococcus lactis subsp. lactis KF147
48% identity, 72% coverage

Q9CG17 Ribonuclease HII from Lactococcus lactis subsp. lactis (strain IL1403)
48% identity, 72% coverage

CP0654 ribonuclease HII from Chlamydophila pneumoniae AR39
Q9Z962 Ribonuclease HII from Chlamydia pneumoniae
46% identity, 86% coverage

TRQ7_RS00060 ribonuclease HII from Thermotoga sp. RQ7
50% identity, 75% coverage

Dgeo_1623 Ribonuclease H from Deinococcus geothermalis DSM 11300
49% identity, 82% coverage

DR1949, DR_1949 ribonuclease HII from Deinococcus radiodurans R1
48% identity, 86% coverage

BB0046 ribonuclease H (rnhB) from Borrelia burgdorferi B31
41% identity, 88% coverage

slr1130 ribonuclease HII from Synechocystis sp. PCC 6803
45% identity, 88% coverage

Krad_1405 ribonuclease HII from Kineococcus radiotolerans SRS30216 = ATCC BAA-149
43% identity, 68% coverage

SCO5812 ribonuclease HII from Streptomyces coelicolor A3(2)
44% identity, 76% coverage

LA_2386 ribonuclease H II from Leptospira interrogans serovar lai str. 56601
38% identity, 78% coverage

Francci3_3588 Ribonuclease H from Frankia sp. CcI3
46% identity, 75% coverage

Cj0010c ribonuclease HII from Campylobacter jejuni subsp. jejuni NCTC 11168
35% identity, 92% coverage

HP1323 ribonuclease HII (rnhB) from Helicobacter pylori 26695
32% identity, 89% coverage

HMPREF0421_21216 ribonuclease HII from Gardnerella vaginalis ATCC 14019
32% identity, 81% coverage

BL_RS05100 ribonuclease HII from Bifidobacterium longum
33% identity, 64% coverage

M164_0197 ribonuclease HII from Sulfolobus islandicus M.16.4
35% identity, 84% coverage

Q8U036 ribonuclease H (EC 3.1.26.4) from Pyrococcus furiosus (see paper)
WP_011012922 ribonuclease HII from Pyrococcus furiosus DSM 3638
PF1781 RNaseH II from Pyrococcus furiosus DSM 3638
34% identity, 81% coverage

H0H31_RS04250 ribonuclease HII from Micrococcus luteus
38% identity, 71% coverage

Q8WR57 ribonuclease H (EC 3.1.26.4) from Leishmania donovani (see paper)
38% identity, 21% coverage

Q8WSZ0 ribonuclease H (EC 3.1.26.4) from Leishmania major (see paper)
38% identity, 21% coverage

TERMP_00671, TERMP_RS03345 ribonuclease HII from Thermococcus barophilus MP
33% identity, 81% coverage

PAB0352 ribonuclease HII from Pyrococcus abyssi GE5
Q9V1A9 Ribonuclease HII from Pyrococcus abyssi (strain GE5 / Orsay)
PAB_RS02765 ribonuclease HII from Pyrococcus abyssi GE5
33% identity, 81% coverage

WP_010978497 ribonuclease HII from Sulfurisphaera tokodaii
ST0519 208aa long hypothetical ribonuclease HII from Sulfolobus tokodaii str. 7
33% identity, 88% coverage

WP_019177553 ribonuclease HII from Methanomassiliicoccus luminyensis B10
35% identity, 86% coverage

O59351 ribonuclease H (EC 3.1.26.4) from Pyrococcus horikoshii OT3 (see paper)
30% identity, 83% coverage

O74035 ribonuclease H (EC 3.1.26.4) from Thermococcus kodakarensis (see 2 papers)
WP_011249756 ribonuclease HII from Thermococcus kodakarensis
TK0805 ribonuclease HII from Thermococcus kodakaraensis KOD1
35% identity, 68% coverage

DDB_G0277705 hypothetical protein from Dictyostelium discoideum AX4
32% identity, 28% coverage

Q9YET5 ribonuclease H (EC 3.1.26.4) from Aeropyrum pernix (see paper)
33% identity, 81% coverage

RNH2_ARCFU / O29634 Ribonuclease HII; RNase HII; EC 3.1.26.4 from Archaeoglobus fulgidus (strain ATCC 49558 / DSM 4304 / JCM 9628 / NBRC 100126 / VC-16) (see paper)
33% identity, 76% coverage

AT2G25100 ribonuclease HII family protein from Arabidopsis thaliana
NP_565584 Polynucleotidyl transferase, ribonuclease H-like superfamily protein from Arabidopsis thaliana
30% identity, 46% coverage

MMP1374 Ribonuclease HII from Methanococcus maripaludis S2
27% identity, 77% coverage

HVO_1978 ribonuclease H II from Haloferax volcanii DS2
34% identity, 72% coverage

NP_956520 ribonuclease H2 subunit A from Danio rerio
29% identity, 61% coverage

EHI_134360 ribonuclease H2 subunit A, putative from Entamoeba histolytica HM-1:IMSS
28% identity, 38% coverage

RNH2A_MOUSE / Q9CWY8 Ribonuclease H2 subunit A; RNase H2 subunit A; Ribonuclease HI large subunit; RNase HI large subunit; Ribonuclease HI subunit A; EC 3.1.26.4 from Mus musculus (Mouse) (see paper)
Q9CWY8 ribonuclease H (EC 3.1.26.4) from Mus musculus (see paper)
NP_081463 ribonuclease H2 subunit A isoform 1 from Mus musculus
30% identity, 58% coverage

E8T217 ribonuclease H (EC 3.1.26.4) from Thermovibrio ammonificans (see paper)
28% identity, 64% coverage

RNH2A_BOVIN / Q2TBT5 Ribonuclease H2 subunit A; RNase H2 subunit A; Ribonuclease HI large subunit; RNase HI large subunit; Ribonuclease HI subunit A; EC 3.1.26.4 from Bos taurus (Bovine) (see paper)
29% identity, 59% coverage

4py5A / E8T217 Thermovibrio ammonificans rnase h3 in complex with 19-mer RNA/DNA (see paper)
28% identity, 65% coverage

New Search

For advice on how to use these tools together, see Interactive tools for functional annotation of bacterial genomes.

Statistics

The PaperBLAST database links 793,807 different protein sequences to 1,259,118 scientific articles. Searches against EuropePMC were last performed on March 13 2025.

How It Works

PaperBLAST builds a database of protein sequences that are linked to scientific articles. These links come from automated text searches against the articles in EuropePMC and from manually-curated information from GeneRIF, UniProtKB/Swiss-Prot, BRENDA, CAZy (as made available by dbCAN), BioLiP, CharProtDB, MetaCyc, EcoCyc, TCDB, REBASE, the Fitness Browser, and a subset of the European Nucleotide Archive with the /experiment tag. Given this database and a protein sequence query, PaperBLAST uses protein-protein BLAST to find similar sequences with E < 0.001.

To build the database, we query EuropePMC with locus tags, with RefSeq protein identifiers, and with UniProt accessions. We obtain the locus tags from RefSeq or from MicrobesOnline. We use queries of the form "locus_tag AND genus_name" to try to ensure that the paper is actually discussing that gene. Because EuropePMC indexes most recent biomedical papers, even if they are not open access, some of the links may be to papers that you cannot read or that our computers cannot read. We query each of these identifiers that appears in the open access part of EuropePMC, as well as every locus tag that appears in the 500 most-referenced genomes, so that a gene may appear in the PaperBLAST results even though none of the papers that mention it are open access. We also incorporate text-mined links from EuropePMC that link open access articles to UniProt or RefSeq identifiers. (This yields some additional links because EuropePMC uses different heuristics for their text mining than we do.)

For every article that mentions a locus tag, a RefSeq protein identifier, or a UniProt accession, we try to select one or two snippets of text that refer to the protein. If we cannot get access to the full text, we try to select a snippet from the abstract, but unfortunately, unique identifiers such as locus tags are rarely provided in abstracts.

PaperBLAST also incorporates manually-curated protein functions:

Except for GeneRIF and ENA, the curated entries include a short curated description of the protein's function. For entries from BioLiP, the protein's function may not be known beyond binding to the ligand. Many of these entries also link to articles in PubMed.

For more information see the PaperBLAST paper (mSystems 2017) or the code. You can download PaperBLAST's database here.

Changes to PaperBLAST since the paper was written:

Many of these changes are described in Interactive tools for functional annotation of bacterial genomes.

Secrets

PaperBLAST cannot provide snippets for many of the papers that are published in non-open-access journals. This limitation applies even if the paper is marked as "free" on the publisher's web site and is available in PubmedCentral or EuropePMC. If a journal that you publish in is marked as "secret," please consider publishing elsewhere.

Omissions from the PaperBLAST Database

Many important articles are missing from PaperBLAST, either because the article's full text is not in EuropePMC (as for many older articles), or because the paper does not mention a protein identifier such as a locus tag, or because of PaperBLAST's heuristics. If you notice an article that characterizes a protein's function but is missing from PaperBLAST, please notify the curators at UniProt or add an entry to GeneRIF. Entries in either of these databases will eventually be incorporated into PaperBLAST. Note that to add an entry to UniProt, you will need to find the UniProt identifier for the protein. If the protein is not already in UniProt, you can ask them to create an entry. To add an entry to GeneRIF, you will need an NCBI Gene identifier, but unfortunately many prokaryotic proteins in RefSeq do not have corresponding Gene identifers.

References

PaperBLAST: Text-mining papers for information about homologs.
M. N. Price and A. P. Arkin (2017). mSystems, 10.1128/mSystems.00039-17.

Europe PMC in 2017.
M. Levchenko et al (2017). Nucleic Acids Research, 10.1093/nar/gkx1005.

Gene indexing: characterization and analysis of NLM's GeneRIFs.
J. A. Mitchell et al (2003). AMIA Annu Symp Proc 2003:460-464.

UniProt: the universal protein knowledgebase.
The UniProt Consortium (2016). Nucleic Acids Research, 10.1093/nar/gkw1099.

BRENDA in 2017: new perspectives and new tools in BRENDA.
S. Placzek et al (2017). Nucleic Acids Research, 10.1093/nar/gkw952.

The EcoCyc database: reflecting new knowledge about Escherichia coli K-12.
I. M. Keeseler et al (2016). Nucleic Acids Research, 10.1093/nar/gkw1003.

The MetaCyc database of metabolic pathways and enzymes.
R. Caspi et al (2018). Nucleic Acids Research, 10.1093/nar/gkx935.

CharProtDB: a database of experimentally characterized protein annotations.
R. Madupu et al (2012). Nucleic Acids Research, 10.1093/nar/gkr1133.

The carbohydrate-active enzymes database (CAZy) in 2013.
V. Lombard et al (2014). Nucleic Acids Research, 10.1093/nar/gkt1178.

The Transporter Classification Database (TCDB): recent advances
M. H. Saier, Jr. et al (2016). Nucleic Acids Research, 10.1093/nar/gkv1103.

REBASE - a database for DNA restriction and modification: enzymes, genes and genomes.
R. J. Roberts et al (2015). Nucleic Acids Research, 10.1093/nar/gku1046.

Deep annotation of protein function across diverse bacteria from mutant phenotypes.
M. N. Price et al (2016). bioRxiv, 10.1101/072470.

by Morgan Price, Arkin group
Lawrence Berkeley National Laboratory