PaperBLAST – Find papers about a protein or its homologs

 

PaperBLAST

PaperBLAST Hits for 59 a.a. (MAKLEITLKR...)

Other sequence analysis tools:

Find functional residues: SitesBLAST

Search for conserved domains

Find the best match in UniProt

Compare to protein structures

Predict transmenbrane helices: Phobius

Predict protein localization: PSORTb

Find homologs in fast.genomics

Fitness BLAST: loading...

Found 73 similar proteins in the literature:

RL30_BACSU / P19947 Large ribosomal subunit protein uL30; 50S ribosomal protein L30; BL27 from Bacillus subtilis (strain 168) (see paper)
BSU01340 50S ribosomal protein L30 from Bacillus subtilis subsp. subtilis str. 168
100% identity, 100% coverage

7aqcZ / P19947 Structure of the bacterial rqc complex (decoding state) (see paper)
100% identity, 98% coverage

lmo2614 ribosomal protein L30 from Listeria monocytogenes EGD-e
Q71WG4 Large ribosomal subunit protein uL30 from Listeria monocytogenes serotype 4b (strain F2365)
78% identity, 98% coverage

SAOUHSC_02493 ribosomal protein L30 from Staphylococcus aureus subsp. aureus NCTC 8325
A5IV16 Large ribosomal subunit protein uL30 from Staphylococcus aureus (strain JH9)
A8Z339 Large ribosomal subunit protein uL30 from Staphylococcus aureus (strain USA300 / TCH1516)
P0A0G0 Large ribosomal subunit protein uL30 from Staphylococcus aureus (strain N315)
Q2FEQ7 Large ribosomal subunit protein uL30 from Staphylococcus aureus (strain USA300)
Q6GEK1 Large ribosomal subunit protein uL30 from Staphylococcus aureus (strain MRSA252)
SA2030 50S ribosomal protein L30 from Staphylococcus aureus subsp. aureus N315
SACOL2221 ribosomal protein L30p/L7e from Staphylococcus aureus subsp. aureus COL
EKM74_RS05540, USA300HOU_RS12085 50S ribosomal protein L30 from Staphylococcus aureus subsp. aureus USA300_TCH1516
75% identity, 100% coverage

BC_0149 50S ribosomal protein L30 from Bacillus cereus ATCC 14579
Q81VR2 Large ribosomal subunit protein uL30 from Bacillus anthracis
BC0149, NP_830029 LSU ribosomal protein L30P from Bacillus cereus ATCC 14579
77% identity, 93% coverage

SERP1813 ribosomal protein L30 from Staphylococcus epidermidis RP62A
71% identity, 98% coverage

SE1805 50S ribosomal protein L30 from Staphylococcus epidermidis ATCC 12228
71% identity, 100% coverage

8a573 / Q927M5 Cryo-em structure of hflxr bound to the listeria monocytogenes 50s ribosomal subunit. (see paper)
77% identity, 95% coverage

7asmX / P0A0G2 Staphylococcus aureus 50s after 30 minutes incubation at 37c
74% identity, 98% coverage

5nrgW / P0A0G2 The crystal structure of the large ribosomal subunit of staphylococcus aureus in complex with rb02 (see paper)
75% identity, 97% coverage

IUJ47_RS04615 50S ribosomal protein L30 from Enterococcus faecalis
Q839E6 Large ribosomal subunit protein uL30 from Enterococcus faecalis (strain ATCC 700802 / V583)
EF0225 ribosomal protein L30 from Enterococcus faecalis V583
64% identity, 98% coverage

6o8w0 / A0A1B4XKW1 6o8w0 (see paper)
63% identity, 97% coverage

STER_RS09255 50S ribosomal protein L30 from Streptococcus thermophilus LMD-9
62% identity, 97% coverage

SUB0086 50S ribosomal protein L30 from Streptococcus uberis 0140J
57% identity, 97% coverage

SPV_0211 50S ribosomal protein L30 from Streptococcus pneumoniae
SP_0228 50S ribosomal protein L30 from Streptococcus pneumoniae TIGR4
62% identity, 97% coverage

CAC3115 Ribosomal protein L30 from Clostridium acetobutylicum ATCC 824
60% identity, 97% coverage

MSMEG_1473 ribosomal protein L30 from Mycobacterium smegmatis str. MC2 155
60% identity, 95% coverage

5zeb1 / A0QSG7 5zeb1 (see paper)
60% identity, 95% coverage

Q1GBK0 Large ribosomal subunit protein uL30 from Lactobacillus delbrueckii subsp. bulgaricus (strain ATCC 11842 / DSM 20081 / BCRC 10696 / JCM 1002 / NBRC 13953 / NCIMB 11778 / NCTC 12712 / WDCM 00102 / Lb 14)
55% identity, 95% coverage

XNR_3747 50S ribosomal protein L30 from Streptomyces albidoflavus
52% identity, 97% coverage

LSEI_2485 Ribosomal protein L30 from Lactobacillus casei ATCC 334
53% identity, 95% coverage

DU507_12850 50S ribosomal protein L30 from Lacticaseibacillus rhamnosus GG
53% identity, 95% coverage

5myjB2 / A2RNN5 of 70S ribosome from Lactococcus lactis (see paper)
54% identity, 97% coverage

B7C60_RS03530 50S ribosomal protein L30 from Vibrio fujianensis
50% identity, 98% coverage

Bbr_1623 50S ribosomal protein L30 from Bifidobacterium breve UCC2003
51% identity, 95% coverage

Cp1002_0374 50S ribosomal protein L30 from Corynebacterium pseudotuberculosis 1002
51% identity, 90% coverage

MAB_3794c 50S ribosomal protein L30 from Mycobacterium abscessus ATCC 19977
52% identity, 97% coverage

VP0275 ribosomal protein L30 from Vibrio parahaemolyticus RIMD 2210633
48% identity, 98% coverage

FP1321 50S ribosomal protein L30 from Flavobacterium psychrophilum JIP02/86
53% identity, 97% coverage

SCO4720 50S ribosomal protein L30 from Streptomyces coelicolor A3(2)
50% identity, 97% coverage

BCG_0772 putative 50S ribosomal protein L30 rpmD from Mycobacterium bovis BCG str. Pasteur 1173P2
A1KGK3 Large ribosomal subunit protein uL30 from Mycobacterium bovis (strain BCG / Pasteur 1173P2)
P9WHA3 Large ribosomal subunit protein uL30 from Mycobacterium tuberculosis (strain ATCC 25618 / H37Rv)
MT0747 50S ribosomal protein L30 from Mycobacterium tuberculosis CDC1551
Rv0722 50S ribosomal protein L30 from Mycobacterium tuberculosis H37Rv
MRA_0730 50S ribosomal protein L30 from Mycobacterium tuberculosis H37Ra
51% identity, 88% coverage

RL30_THET8 / Q5SHQ6 Large ribosomal subunit protein uL30; 50S ribosomal protein L30 from Thermus thermophilus (strain ATCC 27634 / DSM 579 / HB8) (see paper)
6b4vAA / Q72I22 blasticidin S and E. coli release factor 1 bound to the 70S ribosome (see paper)
TTC1310 No description from Thermus thermophilus HB27
46% identity, 95% coverage

PGN_1850 50S ribosomal protein L30 from Porphyromonas gingivalis ATCC 33277
52% identity, 98% coverage

lpg0347 50S ribosomal protein L30/(L7E) from Legionella pneumophila subsp. pneumophila str. Philadelphia 1
46% identity, 93% coverage

D0R1L5 Large ribosomal subunit protein uL30 from Lactobacillus johnsonii (strain FI9785)
53% identity, 92% coverage

7f0dZ / A5U0A8 Cryo-em structure of mycobacterium tuberculosis 50s ribosome subunit bound with clarithromycin (see paper)
50% identity, 95% coverage

MAP4185 RpmD from Mycobacterium avium subsp. paratuberculosis str. k10
50% identity, 79% coverage

7jilZ / A0A1M5L7Z6 7jilZ (see paper)
54% identity, 95% coverage

EAMY_3368 50S ribosomal protein L30 from Erwinia amylovora CFBP1430
47% identity, 98% coverage

RpmD / b3302 50S ribosomal subunit protein L30 from Escherichia coli K-12 substr. MG1655 (see 5 papers)
rpmD / P0AG51 50S ribosomal subunit protein L30 from Escherichia coli (strain K12) (see 2 papers)
RL30_ECOLI / P0AG51 Large ribosomal subunit protein uL30; 50S ribosomal protein L30 from Escherichia coli (strain K12) (see 9 papers)
ECs4167 50S ribosomal subunit protein L30 from Escherichia coli O157:H7 str. Sakai
NP_417761 50S ribosomal subunit protein L30 from Escherichia coli str. K-12 substr. MG1655
b3302 50S ribosomal protein L30 from Escherichia coli str. K-12 substr. MG1655
NP_709090 50S ribosomal subunit protein L30 from Shigella flexneri 2a str. 301
51% identity, 98% coverage

8a3ly / P0AG51 8a3ly (see paper)
49% identity, 93% coverage

Q88XW8 Large ribosomal subunit protein uL30 from Lactiplantibacillus plantarum (strain ATCC BAA-793 / NCIMB 8826 / WCFS1)
49% identity, 95% coverage

9c4gy / A0A2B7I5Y4 Cutibacterium acnes 50s ribosomal subunit with clindamycin bound (see paper)
45% identity, 98% coverage

SPA3288 50S ribosomal subunit protein L30 from Salmonella enterica subsp. enterica serovar Paratyphi A str. ATCC 9150
B5R1F9 Large ribosomal subunit protein uL30 from Salmonella enteritidis PT4 (strain P125109)
P0A2A7 Large ribosomal subunit protein uL30 from Salmonella typhimurium (strain LT2 / SGSC1412 / ATCC 700720)
STM3422 50S ribosomal subunit protein L30 from Salmonella typhimurium LT2
SEN3250 50S ribosomal subunit protein L30 from Salmonella enterica subsp. enterica serovar Enteritidis str. P125109
SENTW_3549 50S ribosomal protein L30 from Citrobacter braakii
47% identity, 98% coverage

bglu_1g02760 50S ribosomal protein L30 from Burkholderia glumae BGR1
49% identity, 84% coverage

BB562_12015 50S ribosomal protein L30 from Lactiplantibacillus pentosus
47% identity, 95% coverage

RL30_DEIRA / Q9RSL0 Large ribosomal subunit protein uL30; 50S ribosomal protein L30 from Deinococcus radiodurans (strain ATCC 13939 / DSM 20539 / JCM 16871 / CCUG 27074 / LMG 4051 / NBRC 15346 / NCIMB 9279 / VKM B-1422 / R1) (see 6 papers)
7a0rW / Q9RSL0 50s deinococcus radiodurans ribosome bounded with mycinamicin i (see paper)
DR2114 ribosomal protein L30 from Deinococcus radiodurans R1
47% identity, 93% coverage

GSU2839 ribosomal protein L30 from Geobacter sulfurreducens PCA
49% identity, 97% coverage

AL022_RS00600 50S ribosomal protein L30 from Cardinium endosymbiont cEper1 of Encarsia pergandiella
46% identity, 92% coverage

AS87_07825 50S ribosomal protein L30 from Riemerella anatipestifer Yb2
45% identity, 98% coverage

HMPREF0424_0269 ribosomal protein L30 from Gardnerella vaginalis 409-05
47% identity, 97% coverage

TP0206a ribosomal protein L30 from Treponema pallidum subsp. pallidum str. Nichols
48% identity, 92% coverage

7pktx / A0A2K3DSP9 7pktx (see paper)
43% identity, 32% coverage

FN1626 LSU ribosomal protein L30P from Fusobacterium nucleatum subsp. nucleatum ATCC 25586
41% identity, 95% coverage

SPO0503 50S ribosomal protein L30 from Ruegeria pomeroyi DSS-3
49% identity, 79% coverage

A1JS10 Large ribosomal subunit protein uL30 from Yersinia enterocolitica serotype O:8 / biotype 1B (strain NCTC 13174 / 8081)
44% identity, 98% coverage

B2B147 Large ribosomal subunit protein uL30m from Podospora anserina (strain S / ATCC MYA-4624 / DSM 980 / FGSC 10383)
47% identity, 46% coverage

H375_8910 50S ribosomal protein L30 from Rickettsia prowazekii str. Breinl
43% identity, 84% coverage

HI0796 ribosomal protein L30 (rpL30) from Haemophilus influenzae Rd KW20
46% identity, 98% coverage

SO0249 ribosomal protein L30 from Shewanella oneidensis MR-1
42% identity, 92% coverage

RM33_NEUCR / Q1K8Y7 Large ribosomal subunit protein uL30m from Neurospora crassa (strain ATCC 24698 / 74-OR23-1A / CBS 708.71 / DSM 1257 / FGSC 987) (see 2 papers)
43% identity, 36% coverage

NMA0111 50S ribosomal protein L30 from Neisseria meningitidis Z2491
NMB0160 50S ribosomal protein L30 from Neisseria meningitidis MC58
44% identity, 90% coverage

PA4245 50S ribosomal protein L30 from Pseudomonas aeruginosa PAO1
PA14_09030 50S ribosomal protein L30 from Pseudomonas aeruginosa UCBPP-PA14
40% identity, 97% coverage

AT5G55140 ribosomal protein L30 family protein from Arabidopsis thaliana
39% identity, 51% coverage

6xywAz / Q8L908 6xywAz (see paper)
39% identity, 68% coverage

FTH_0249 ribosomal protein L30 from Francisella tularensis subsp. holarctica OSU18
FTN_0257 50S ribosomal protein L30 from Francisella tularensis subsp. novicida U112
41% identity, 89% coverage

PD0455 50S ribosomal protein L30 from Xylella fastidiosa Temecula1
42% identity, 90% coverage

ZMO0534 50S ribosomal protein L30 from Zymomonas mobilis subsp. mobilis ZM4
46% identity, 97% coverage

BAB1_1237 Ribosomal protein L30:Ribosomal protein L30, bacterial and organelle form from Brucella melitensis biovar Abortus 2308
BMEI0775 LSU ribosomal protein L30P from Brucella melitensis 16M
47% identity, 75% coverage

6yweU / Q8X098 structure of the mitoribosome from Neurospora crassa in the P/E tRNA bound state (see paper)
44% identity, 40% coverage

PP0472, PP_0472 ribosomal protein L30 from Pseudomonas putida KT2440
39% identity, 97% coverage

AFUA_4G10480 50S ribosomal protein L30, putative from Aspergillus fumigatus Af293
41% identity, 62% coverage

8cd1Z / Q9HWF3 8cd1Z (see paper)
39% identity, 95% coverage

New Search

For advice on how to use these tools together, see Interactive tools for functional annotation of bacterial genomes.

Statistics

The PaperBLAST database links 793,807 different protein sequences to 1,259,118 scientific articles. Searches against EuropePMC were last performed on March 13 2025.

How It Works

PaperBLAST builds a database of protein sequences that are linked to scientific articles. These links come from automated text searches against the articles in EuropePMC and from manually-curated information from GeneRIF, UniProtKB/Swiss-Prot, BRENDA, CAZy (as made available by dbCAN), BioLiP, CharProtDB, MetaCyc, EcoCyc, TCDB, REBASE, the Fitness Browser, and a subset of the European Nucleotide Archive with the /experiment tag. Given this database and a protein sequence query, PaperBLAST uses protein-protein BLAST to find similar sequences with E < 0.001.

To build the database, we query EuropePMC with locus tags, with RefSeq protein identifiers, and with UniProt accessions. We obtain the locus tags from RefSeq or from MicrobesOnline. We use queries of the form "locus_tag AND genus_name" to try to ensure that the paper is actually discussing that gene. Because EuropePMC indexes most recent biomedical papers, even if they are not open access, some of the links may be to papers that you cannot read or that our computers cannot read. We query each of these identifiers that appears in the open access part of EuropePMC, as well as every locus tag that appears in the 500 most-referenced genomes, so that a gene may appear in the PaperBLAST results even though none of the papers that mention it are open access. We also incorporate text-mined links from EuropePMC that link open access articles to UniProt or RefSeq identifiers. (This yields some additional links because EuropePMC uses different heuristics for their text mining than we do.)

For every article that mentions a locus tag, a RefSeq protein identifier, or a UniProt accession, we try to select one or two snippets of text that refer to the protein. If we cannot get access to the full text, we try to select a snippet from the abstract, but unfortunately, unique identifiers such as locus tags are rarely provided in abstracts.

PaperBLAST also incorporates manually-curated protein functions:

Except for GeneRIF and ENA, the curated entries include a short curated description of the protein's function. For entries from BioLiP, the protein's function may not be known beyond binding to the ligand. Many of these entries also link to articles in PubMed.

For more information see the PaperBLAST paper (mSystems 2017) or the code. You can download PaperBLAST's database here.

Changes to PaperBLAST since the paper was written:

Many of these changes are described in Interactive tools for functional annotation of bacterial genomes.

Secrets

PaperBLAST cannot provide snippets for many of the papers that are published in non-open-access journals. This limitation applies even if the paper is marked as "free" on the publisher's web site and is available in PubmedCentral or EuropePMC. If a journal that you publish in is marked as "secret," please consider publishing elsewhere.

Omissions from the PaperBLAST Database

Many important articles are missing from PaperBLAST, either because the article's full text is not in EuropePMC (as for many older articles), or because the paper does not mention a protein identifier such as a locus tag, or because of PaperBLAST's heuristics. If you notice an article that characterizes a protein's function but is missing from PaperBLAST, please notify the curators at UniProt or add an entry to GeneRIF. Entries in either of these databases will eventually be incorporated into PaperBLAST. Note that to add an entry to UniProt, you will need to find the UniProt identifier for the protein. If the protein is not already in UniProt, you can ask them to create an entry. To add an entry to GeneRIF, you will need an NCBI Gene identifier, but unfortunately many prokaryotic proteins in RefSeq do not have corresponding Gene identifers.

References

PaperBLAST: Text-mining papers for information about homologs.
M. N. Price and A. P. Arkin (2017). mSystems, 10.1128/mSystems.00039-17.

Europe PMC in 2017.
M. Levchenko et al (2017). Nucleic Acids Research, 10.1093/nar/gkx1005.

Gene indexing: characterization and analysis of NLM's GeneRIFs.
J. A. Mitchell et al (2003). AMIA Annu Symp Proc 2003:460-464.

UniProt: the universal protein knowledgebase.
The UniProt Consortium (2016). Nucleic Acids Research, 10.1093/nar/gkw1099.

BRENDA in 2017: new perspectives and new tools in BRENDA.
S. Placzek et al (2017). Nucleic Acids Research, 10.1093/nar/gkw952.

The EcoCyc database: reflecting new knowledge about Escherichia coli K-12.
I. M. Keeseler et al (2016). Nucleic Acids Research, 10.1093/nar/gkw1003.

The MetaCyc database of metabolic pathways and enzymes.
R. Caspi et al (2018). Nucleic Acids Research, 10.1093/nar/gkx935.

CharProtDB: a database of experimentally characterized protein annotations.
R. Madupu et al (2012). Nucleic Acids Research, 10.1093/nar/gkr1133.

The carbohydrate-active enzymes database (CAZy) in 2013.
V. Lombard et al (2014). Nucleic Acids Research, 10.1093/nar/gkt1178.

The Transporter Classification Database (TCDB): recent advances
M. H. Saier, Jr. et al (2016). Nucleic Acids Research, 10.1093/nar/gkv1103.

REBASE - a database for DNA restriction and modification: enzymes, genes and genomes.
R. J. Roberts et al (2015). Nucleic Acids Research, 10.1093/nar/gku1046.

Deep annotation of protein function across diverse bacteria from mutant phenotypes.
M. N. Price et al (2016). bioRxiv, 10.1101/072470.

by Morgan Price, Arkin group
Lawrence Berkeley National Laboratory