PaperBLAST – Find papers about a protein or its homologs

 

PaperBLAST

PaperBLAST Hits for sp|B1V955|RS6_PHYAS Small ribosomal subunit protein bS6 OS=Phytoplasma australiense OX=59748 GN=rpsF PE=3 SV=1 (93 a.a., MKKYEIMYIL...)

Other sequence analysis tools:

Find functional residues: SitesBLAST

Search for conserved domains

Find the best match in UniProt

Compare to protein structures

Predict transmenbrane helices: Phobius

Predict protein localization: PSORTb

Find homologs in fast.genomics

Fitness BLAST: loading...

Found 56 similar proteins in the literature:

C6ZDG4 Small ribosomal subunit protein bS6 from Staphylococcus xylosus
WP_017723387 30S ribosomal protein S6 from Staphylococcus xylosus
43% identity, 92% coverage

RS6_BACSU / P21468 Small ribosomal subunit protein bS6; 30S ribosomal protein S6; BS9 from Bacillus subtilis (strain 168) (see 2 papers)
8cduT / P21468 Rnase r bound to a 30s degradation intermediate (main state) (see paper)
BSU40910 30S ribosomal protein S6 from Bacillus subtilis subsp. subtilis str. 168
NP_391971 ribosomal protein S6 (BS9) from Bacillus subtilis subsp. subtilis str. 168
40% identity, 98% coverage

SERP0044 ribosomal protein S6 from Staphylococcus epidermidis RP62A
39% identity, 95% coverage

lmo0044 ribosomal protein S6 from Listeria monocytogenes EGD-e
Q725C0 Small ribosomal subunit protein bS6 from Listeria monocytogenes serotype 4b (strain F2365)
38% identity, 95% coverage

UH47_03310 30S ribosomal protein S6 from Staphylococcus pseudintermedius
38% identity, 95% coverage

8uu8f / A0A7X0WKH5 8uu8f (see paper)
37% identity, 97% coverage

NP_835129 SSU ribosomal protein S6P from Bacillus cereus ATCC 14579
BC_5476 30S ribosomal protein S6 from Bacillus cereus ATCC 14579
38% identity, 97% coverage

7bgdf / Q2G113 Staphylococcus aureus 30s ribosomal subunit in presence of spermidine (body only)
36% identity, 97% coverage

Q6GJV3 Small ribosomal subunit protein bS6 from Staphylococcus aureus (strain MRSA252)
Q2FJP8 Small ribosomal subunit protein bS6 from Staphylococcus aureus (strain USA300)
SA0352 30S ribosomal protein S6 from Staphylococcus aureus subsp. aureus N315
SAV0365 30S ribosomal protein S6 from Staphylococcus aureus subsp. aureus Mu50
SAOUHSC_00348 ribosomal protein S6 from Staphylococcus aureus subsp. aureus NCTC 8325
SACOL0437 30S ribosomal protein S6 from Staphylococcus aureus subsp. aureus COL
EKM74_RS10450, USA300HOU_RS01935 30S ribosomal protein S6 from Staphylococcus aureus
36% identity, 95% coverage

3r3tA / Q81JI2 Crystal structure of 30s ribosomal protein s from bacillus anthracis
38% identity, 98% coverage

HCW_02210 30S ribosomal protein S6 from Helicobacter cetorum MIT 00-7128
38% identity, 63% coverage

HP1246 ribosomal protein S6 (rps6) from Helicobacter pylori 26695
37% identity, 65% coverage

5myjAF / A2RNZ4 of 70S ribosome from Lactococcus lactis (see paper)
LLNZ_12790 30S ribosomal protein S6 from Lactococcus cremoris subsp. cremoris NZ9000
llmg_2475 30S ribosomal protein S6 from Lactococcus lactis subsp. cremoris MG1363
35% identity, 95% coverage

P0DE96 Small ribosomal subunit protein bS6 from Streptococcus pyogenes serotype M3 (strain ATCC BAA-595 / MGAS315)
34% identity, 96% coverage

Q8EKV4 Small ribosomal subunit protein bS6 from Oceanobacillus iheyensis (strain DSM 14371 / CIP 107618 / JCM 11309 / KCTC 3954 / HTE831)
33% identity, 98% coverage

Cj1070 30S ribosomal protein S6 from Campylobacter jejuni subsp. jejuni NCTC 11168
35% identity, 74% coverage

A1W057 Small ribosomal subunit protein bS6 from Campylobacter jejuni subsp. jejuni serotype O:23/36 (strain 81-176)
35% identity, 74% coverage

SSA_0437 30S ribosomal protein S6, putative from Streptococcus sanguinis SK36
32% identity, 96% coverage

lp_0009 ribosomal protein S6 from Lactobacillus plantarum WCFS1
Q890K2 Small ribosomal subunit protein bS6 from Lactiplantibacillus plantarum (strain ATCC BAA-793 / NCIMB 8826 / WCFS1)
32% identity, 92% coverage

HSISS4_01661 30S ribosomal protein S6 from Streptococcus salivarius
32% identity, 96% coverage

SPD_1370 ribosomal protein S6 from Streptococcus pneumoniae D39
SP_1541 30S ribosomal protein S6 from Streptococcus pneumoniae TIGR4
32% identity, 96% coverage

RP039 30S RIBOSOMAL PROTEIN S6 (rpsF) from Rickettsia prowazekii str. Madrid E
36% identity, 76% coverage

STER_1728, STER_RS08450 30S ribosomal protein S6 from Streptococcus thermophilus
32% identity, 96% coverage

SYNW2511 30S ribosomal protein S6 from Synechococcus sp. WH 8102
31% identity, 71% coverage

FE46_RS04555 30S ribosomal protein S6 from Flavobacterium psychrophilum
FP1851 30S ribosomal protein S6 from Flavobacterium psychrophilum JIP02/86
33% identity, 81% coverage

DVU0956 ribosomal protein S6 from Desulfovibrio vulgaris Hildenborough
26% identity, 91% coverage

QBX69_00365 30S ribosomal protein S6 from Rickettsia rickettsii str. 'Sheila Smith'
35% identity, 76% coverage

LSEI_0009 Ribosomal protein S6 from Lactobacillus casei ATCC 334
31% identity, 92% coverage

EF_0007 30S ribosomal protein S6 from Enterococcus faecalis V583
32% identity, 90% coverage

7nhkg / A0A1B4XKB6 7nhkg (see paper)
32% identity, 93% coverage

all4802 30S ribosomal protein S6 from Nostoc sp. PCC 7120
31% identity, 83% coverage

FN1657 SSU ribosomal protein S6P from Fusobacterium nucleatum subsp. nucleatum ATCC 25586
30% identity, 89% coverage

Teth39_2275 30S ribosomal protein S6 from Thermoanaerobacter ethanolicus ATCC 33223
34% identity, 97% coverage

RS6_THET8 / Q5SLP8 Small ribosomal subunit protein bS6; 30S ribosomal protein S6; TS9 from Thermus thermophilus (strain ATCC 27634 / DSM 579 / HB8) (see paper)
5a9zBJ / Q5SLP8 of Thermous thermophilus ribosome bound to BipA-GDPCP (see paper)
P23370 Small ribosomal subunit protein bS6 from Thermus thermophilus
27% identity, 91% coverage

K9TWE1 Small ribosomal subunit protein bS6 from Chroococcidiopsis thermalis (strain PCC 7203)
30% identity, 83% coverage

Q31SC5 Small ribosomal subunit protein bS6 from Synechococcus elongatus (strain ATCC 33912 / PCC 7942 / FACHB-805)
28% identity, 84% coverage

D0R2Q2 Small ribosomal subunit protein bS6 from Lactobacillus johnsonii (strain FI9785)
31% identity, 92% coverage

LSA0007 30S Ribosomal protein S6 from Lactobacillus sakei subsp. sakei 23K
31% identity, 92% coverage

ECH_0308 ribosomal protein S6 from Ehrlichia chaffeensis str. Arkansas
29% identity, 85% coverage

F452_RS0105155 30S ribosomal protein S6 from Porphyromonas gulae DSM 15663
PGN_0639 30S ribosomal protein S6 from Porphyromonas gingivalis ATCC 33277
PG0595 ribosomal protein S6 from Porphyromonas gingivalis W83
27% identity, 79% coverage

cg3308 30S ribosomal protein S6 from Corynebacterium glutamicum ATCC 13032
NCgl2881 30S ribosomal protein S6 from Corynebacterium glutamicum ATCC 13032
28% identity, 97% coverage

azo0718 RpsF protein from Azoarcus sp. BH72
29% identity, 72% coverage

ZMO1225 30S ribosomal protein S6 from Zymomonas mobilis subsp. mobilis ZM4
26% identity, 74% coverage

CAC3724 Ribosomal protein S6 from Clostridium acetobutylicum ATCC 824
29% identity, 97% coverage

AH67_08045 30S ribosomal protein S6 from Bifidobacterium pseudolongum PV8-2
26% identity, 94% coverage

8p8vE / P75543 8p8vE (see paper)
27% identity, 41% coverage

lpg1592 30S ribosomal protein S6 from Legionella pneumophila subsp. pneumophila str. Philadelphia 1
27% identity, 80% coverage

CDR20291_3523 30S ribosomal protein S6 from Clostridium difficile R20291
CD3663 30S ribosomal protein S6 from Clostridium difficile 630
31% identity, 98% coverage

BL0416 30S ribosomal protein S6 from Bifidobacterium longum NCC2705
29% identity, 93% coverage

DR0098 ribosomal protein S6 from Deinococcus radiodurans R1
Q9RY52 Small ribosomal subunit protein bS6 from Deinococcus radiodurans (strain ATCC 13939 / DSM 20539 / JCM 16871 / CCUG 27074 / LMG 4051 / NBRC 15346 / NCIMB 9279 / VKM B-1422 / R1)
22% identity, 91% coverage

YPTB0438 30S ribosomal protein S6 from Yersinia pseudotuberculosis IP 32953
A1JIS8 Small ribosomal subunit protein bS6 from Yersinia enterocolitica serotype O:8 / biotype 1B (strain NCTC 13174 / 8081)
YPO3539 30S ribosomal protein S6 from Yersinia pestis CO92
25% identity, 71% coverage

BAB1_0480 Ribosomal protein S6 from Brucella melitensis biovar Abortus 2308
25% identity, 62% coverage

BR0455 30S ribosomal protein S6 from Brucella suis 1330
25% identity, 62% coverage

Q5F925 Small ribosomal subunit protein bS6 from Neisseria gonorrhoeae (strain ATCC 700825 / FA 1090)
27% identity, 75% coverage

E3D771 Small ribosomal subunit protein bS6 from Gardnerella vaginalis (strain ATCC 14019 / 317)
28% identity, 69% coverage

EAMY_3145 30S ribosomal protein S6 from Erwinia amylovora CFBP1430
EAM_0448 30S ribosomal protein S6 from Erwinia amylovora ATCC 49946
25% identity, 70% coverage

New Search

For advice on how to use these tools together, see Interactive tools for functional annotation of bacterial genomes.

Statistics

The PaperBLAST database links 793,807 different protein sequences to 1,259,118 scientific articles. Searches against EuropePMC were last performed on March 13 2025.

How It Works

PaperBLAST builds a database of protein sequences that are linked to scientific articles. These links come from automated text searches against the articles in EuropePMC and from manually-curated information from GeneRIF, UniProtKB/Swiss-Prot, BRENDA, CAZy (as made available by dbCAN), BioLiP, CharProtDB, MetaCyc, EcoCyc, TCDB, REBASE, the Fitness Browser, and a subset of the European Nucleotide Archive with the /experiment tag. Given this database and a protein sequence query, PaperBLAST uses protein-protein BLAST to find similar sequences with E < 0.001.

To build the database, we query EuropePMC with locus tags, with RefSeq protein identifiers, and with UniProt accessions. We obtain the locus tags from RefSeq or from MicrobesOnline. We use queries of the form "locus_tag AND genus_name" to try to ensure that the paper is actually discussing that gene. Because EuropePMC indexes most recent biomedical papers, even if they are not open access, some of the links may be to papers that you cannot read or that our computers cannot read. We query each of these identifiers that appears in the open access part of EuropePMC, as well as every locus tag that appears in the 500 most-referenced genomes, so that a gene may appear in the PaperBLAST results even though none of the papers that mention it are open access. We also incorporate text-mined links from EuropePMC that link open access articles to UniProt or RefSeq identifiers. (This yields some additional links because EuropePMC uses different heuristics for their text mining than we do.)

For every article that mentions a locus tag, a RefSeq protein identifier, or a UniProt accession, we try to select one or two snippets of text that refer to the protein. If we cannot get access to the full text, we try to select a snippet from the abstract, but unfortunately, unique identifiers such as locus tags are rarely provided in abstracts.

PaperBLAST also incorporates manually-curated protein functions:

Except for GeneRIF and ENA, the curated entries include a short curated description of the protein's function. For entries from BioLiP, the protein's function may not be known beyond binding to the ligand. Many of these entries also link to articles in PubMed.

For more information see the PaperBLAST paper (mSystems 2017) or the code. You can download PaperBLAST's database here.

Changes to PaperBLAST since the paper was written:

Many of these changes are described in Interactive tools for functional annotation of bacterial genomes.

Secrets

PaperBLAST cannot provide snippets for many of the papers that are published in non-open-access journals. This limitation applies even if the paper is marked as "free" on the publisher's web site and is available in PubmedCentral or EuropePMC. If a journal that you publish in is marked as "secret," please consider publishing elsewhere.

Omissions from the PaperBLAST Database

Many important articles are missing from PaperBLAST, either because the article's full text is not in EuropePMC (as for many older articles), or because the paper does not mention a protein identifier such as a locus tag, or because of PaperBLAST's heuristics. If you notice an article that characterizes a protein's function but is missing from PaperBLAST, please notify the curators at UniProt or add an entry to GeneRIF. Entries in either of these databases will eventually be incorporated into PaperBLAST. Note that to add an entry to UniProt, you will need to find the UniProt identifier for the protein. If the protein is not already in UniProt, you can ask them to create an entry. To add an entry to GeneRIF, you will need an NCBI Gene identifier, but unfortunately many prokaryotic proteins in RefSeq do not have corresponding Gene identifers.

References

PaperBLAST: Text-mining papers for information about homologs.
M. N. Price and A. P. Arkin (2017). mSystems, 10.1128/mSystems.00039-17.

Europe PMC in 2017.
M. Levchenko et al (2017). Nucleic Acids Research, 10.1093/nar/gkx1005.

Gene indexing: characterization and analysis of NLM's GeneRIFs.
J. A. Mitchell et al (2003). AMIA Annu Symp Proc 2003:460-464.

UniProt: the universal protein knowledgebase.
The UniProt Consortium (2016). Nucleic Acids Research, 10.1093/nar/gkw1099.

BRENDA in 2017: new perspectives and new tools in BRENDA.
S. Placzek et al (2017). Nucleic Acids Research, 10.1093/nar/gkw952.

The EcoCyc database: reflecting new knowledge about Escherichia coli K-12.
I. M. Keeseler et al (2016). Nucleic Acids Research, 10.1093/nar/gkw1003.

The MetaCyc database of metabolic pathways and enzymes.
R. Caspi et al (2018). Nucleic Acids Research, 10.1093/nar/gkx935.

CharProtDB: a database of experimentally characterized protein annotations.
R. Madupu et al (2012). Nucleic Acids Research, 10.1093/nar/gkr1133.

The carbohydrate-active enzymes database (CAZy) in 2013.
V. Lombard et al (2014). Nucleic Acids Research, 10.1093/nar/gkt1178.

The Transporter Classification Database (TCDB): recent advances
M. H. Saier, Jr. et al (2016). Nucleic Acids Research, 10.1093/nar/gkv1103.

REBASE - a database for DNA restriction and modification: enzymes, genes and genomes.
R. J. Roberts et al (2015). Nucleic Acids Research, 10.1093/nar/gku1046.

Deep annotation of protein function across diverse bacteria from mutant phenotypes.
M. N. Price et al (2016). bioRxiv, 10.1101/072470.

by Morgan Price, Arkin group
Lawrence Berkeley National Laboratory