PaperBLAST – Find papers about a protein or its homologs

 

PaperBLAST

PaperBLAST Hits for DDA3937_RS16230 (61 a.a., MLILTRRVGE...)

Other sequence analysis tools:

Find functional residues: SitesBLAST

Search for conserved domains

Find the best match in UniProt

Compare to protein structures

Predict transmenbrane helices: Phobius

Predict protein localization: PSORTb

Find homologs in fast.genomics

Fitness BLAST: loading...

Found 77 similar proteins in the literature:

Dd586_3183 carbon storage regulator, CsrA from Dickeya dadantii Ech586
ECA3366 carbon storage regulator from Erwinia carotovora subsp. atroseptica SCRI1043
Dd703_0999 carbon storage regulator, CsrA from Dickeya dadantii Ech703
Dda3937_03151, ECA3366, W5S_1009 carbon storage regulator CsrA from Pectobacterium atrosepticum SCRI1043
100% identity, 100% coverage

CSRA_YERE8 / A1JK11 Translational regulator CsrA; Carbon storage regulator; Post-transcriptional regulator RsmA from Yersinia enterocolitica serotype O:8 / biotype 1B (strain NCTC 13174 / 8081) (see paper)
YPO3304 carbon storage regulator from Yersinia pestis CO92
YPK_3372 carbon storage regulator, CsrA from Yersinia pseudotuberculosis YPIII
98% identity, 100% coverage

CSRA_PECCC / P0DKY7 Translational regulator CsrA; Carbon storage regulator; Repressor RsmA from Pectobacterium carotovorum subsp. carotovorum (Erwinia carotovora subsp. carotovora) (see paper)
98% identity, 100% coverage

ZfiA / b2696 carbon storage regulator from Escherichia coli K-12 substr. MG1655 (see 33 papers)
CSRA_ECOLI / P69913 Carbon storage regulator; Translational dual regulator CsrA from Escherichia coli (strain K12) (see 21 papers)
CSRA_SALTY / P69917 Translational regulator CsrA; Carbon storage regulator from Salmonella typhimurium (strain LT2 / SGSC1412 / ATCC 700720) (see paper)
B5XVB9 Translational regulator CsrA from Klebsiella pneumoniae (strain 342)
ECs_3553, NP_311580 pleiotropic regulatory protein for carbon source metabolism from Escherichia coli O157:H7 str. Sakai
NP_417176 carbon storage regulator from Escherichia coli str. K-12 substr. MG1655
NP_461747 carbon storage regulator from Salmonella enterica subsp. enterica serovar Typhimurium str. LT2
UTI89_C3057 carbon storage regulator; controls glycogen synthesis, gluconeogenesis, cell size and surface properties from Escherichia coli UTI89
STM2826 carbon storage regulator from Salmonella typhimurium LT2
NP_417176, b2696 carbon storage regulator from Escherichia coli str. K-12 substr. MG1655
ECP_2656 carbon storage regulator CsrA from Escherichia coli 536
Ent638_3171 carbon storage regulator from Enterobacter sp. 638
STM14_3412 carbon storage regulator CsrA from Salmonella enterica subsp. enterica serovar Typhimurium str. 14028S
97% identity, 100% coverage

WP_004155916 carbon storage regulator CsrA from Erwinia amylovora ACW56400
97% identity, 100% coverage

CSRA_PROMI / Q93MI1 Translational regulator CsrA; Carbon storage regulator from Proteus mirabilis (see paper)
PMI0377 carbon storage regulator from Proteus mirabilis HI4320
95% identity, 97% coverage

CSRA_SERMA / O85735 Translational regulator CsrA; Carbon storage regulator; Repressor of secondary metabolites from Serratia marcescens (see paper)
100% identity, 81% coverage

K3G22_13785 carbon storage regulator CsrA from Shewanella putrefaciens
93% identity, 92% coverage

SO3426, SO_3426 carbon storage regulator from Shewanella oneidensis MR-1
93% identity, 92% coverage

A0KPF9 Translational regulator CsrA from Aeromonas hydrophila subsp. hydrophila (strain ATCC 7966 / DSM 30187 / BCRC 13018 / CCUG 14551 / JCM 1027 / KCTC 2358 / NCIMB 9240 / NCTC 8049)
100% identity, 89% coverage

PST_1371 carbon storage regulator from Pseudomonas stutzeri A1501
87% identity, 100% coverage

EQU24_RS07950 carbon storage regulator CsrA from Methylotuvimicrobium buryatense
88% identity, 92% coverage

VP2546 carbon storage regulator from Vibrio parahaemolyticus RIMD 2210633
M892_13560 carbon storage regulator CsrA from Vibrio campbellii ATCC BAA-1116
95% identity, 88% coverage

VC0548 carbon storage regulator from Vibrio cholerae O1 biovar eltor str. N16961
95% identity, 88% coverage

TE101_05290 carbon storage regulator CsrA from Alteromonas macleodii
90% identity, 95% coverage

CSRA_PSEAE / O69078 Translational regulator CsrA; Carbon storage regulator; Global translational regulatory protein RsmA from Pseudomonas aeruginosa (strain ATCC 15692 / DSM 22644 / CIP 104116 / JCM 14847 / LMG 12228 / 1C / PRS 101 / PAO1) (see 4 papers)
NP_249596 carbon storage regulator from Pseudomonas aeruginosa PAO1
PA14_52570 RsmA, regulator of secondary metabolites from Pseudomonas aeruginosa UCBPP-PA14
PA0905 RsmA, regulator of secondary metabolites from Pseudomonas aeruginosa PAO1
85% identity, 100% coverage

HMPREF0010_03075, WP_000906487 carbon storage regulator CsrA from Acinetobacter nosocomialis
91% identity, 67% coverage

7yr6E / O69078 Cryo-em structure of pseudomonas aeruginosa rsmz RNA in complex with two rsma protein dimers (see paper)
93% identity, 90% coverage

TERTU_2809 carbon storage regulator from Teredinibacter turnerae T7901
86% identity, 89% coverage

WP_016209832 carbon storage regulator CsrA from Piscirickettsia salmonis T-GIM
88% identity, 64% coverage

MARME_RS03780 carbon storage regulator CsrA from Marinomonas mediterranea MMB-1
84% identity, 92% coverage

Alvin_1102 carbon storage regulator, CsrA from Allochromatium vinosum DSM 180
86% identity, 78% coverage

ETAE_2858 carbon storage regulator from Edwardsiella tarda EIB202
98% identity, 80% coverage

lpg0781 global regulator (carbon storage regulator) from Legionella pneumophila subsp. pneumophila str. Philadelphia 1
77% identity, 73% coverage

lpp0845 global regulator CsrA from Legionella pneumophila str. Paris
77% identity, 94% coverage

PXO_00146 carbon storage regulator from Xanthomonas oryzae pv. oryzae PXO99A
E2J5T5 Translational regulator CsrA from Xanthomonas oryzae pv. oryzae
XAC1743 carbon storage regulator from Xanthomonas axonopodis pv. citri str. 306
XOO2938 carbon storage regulator from Xanthomonas oryzae pv. oryzae KACC10331
90% identity, 74% coverage

DP16_RS13530 carbon storage regulator CsrA from Stenotrophomonas maltophilia
88% identity, 78% coverage

PD0095 carbon storage regulator from Xylella fastidiosa Temecula1
XF0125 carbon storage regulator from Xylella fastidiosa 9a5c
87% identity, 73% coverage

CSRA2_PSEPH / P69920 Translational regulator CsrA2; Carbon storage regulator 2; Regulator of secondary metabolites RsmA from Pseudomonas protegens (strain DSM 19095 / LMG 27888 / CFBP 6595 / CHA0) (see 6 papers)
Pchl3084_4387, WP_002554426 carbon storage regulator CsrA from Pseudomonas chlororaphis
PSPTO1844, PSPTO_1844 carbon storage regulator from Pseudomonas syringae pv. tomato str. DC3000
YP_236624 Carbon storage regulator from Pseudomonas syringae pv. syringae B728a
76% identity, 98% coverage

CBU_1050 carbon storage regulator from Coxiella burnetii RSA 493
73% identity, 79% coverage

XaFJ1_GM001161 carbon storage regulator CsrA from Xanthomonas albilineans
81% identity, 80% coverage

HZ99_RS09580 carbon storage regulator CsrA from Pseudomonas fluorescens
76% identity, 98% coverage

PFLU4746 carbon storage regulator homolog from Pseudomonas fluorescens SBW25
GIB64_12670, GIB65_24765, PflSS101_4138 carbon storage regulator CsrA from Pseudomonas lactis
77% identity, 97% coverage

PFLU4165 carbon storage regulator from Pseudomonas fluorescens SBW25
71% identity, 78% coverage

2mf0A / P0DPC3 Structural basis of the non-coding RNA rsmz acting as protein sponge: conformer l of rsmz(1-72)/rsme(dimer) 1to3 complex (see paper)
71% identity, 95% coverage

PFLU_4165, PflSS101_3491 carbon storage regulator CsrA from Pseudomonas lactis
71% identity, 91% coverage

CSRA1_PSEPH / P0DPC3 Translational regulator CsrA1; Carbon storage regulator 1; Regulator of secondary metabolites RsmE from Pseudomonas protegens (strain DSM 19095 / LMG 27888 / CFBP 6595 / CHA0) (see 6 papers)
Pchl3084_2024 carbon storage regulator CsrA from Pseudomonas chlororaphis subsp. aureofaciens 30-84
71% identity, 91% coverage

E2P69_RS21330 carbon storage regulator CsrA from Xanthomonas perforans
71% identity, 97% coverage

WP_003179932 carbon storage regulator CsrA from Pseudomonas sp. PSB1
71% identity, 91% coverage

Q91_0863 carbon storage regulator CsrA from Cycloclasticus sp. P1
75% identity, 92% coverage

PPUTLS46_020631, RPPX_02245 carbon storage regulator CsrA from Pseudomonas putida S12
PP_4472 carbon storage regulator CsrA from Pseudomonas putida KT2440
75% identity, 97% coverage

PSPTO_3566 carbon storage regulator from Pseudomonas syringae pv. tomato str. DC3000
YP_236409 Carbon storage regulator from Pseudomonas syringae pv. syringae B728a
78% identity, 87% coverage

PP3832, PP_3832 carbon storage regulator from Pseudomonas putida KT2440
62% identity, 94% coverage

F382_01930 carbon storage regulator CsrA from Mannheimia haemolytica D153
68% identity, 87% coverage

TERTU_2436 carbon storage regulator from Teredinibacter turnerae T7901
76% identity, 84% coverage

MARME_RS09140 carbon storage regulator CsrA from Marinomonas mediterranea MMB-1
64% identity, 90% coverage

lpg2094 carbon storage regulator RsmA from Legionella pneumophila subsp. pneumophila str. Philadelphia 1
PtVFX2014_07430 carbon storage regulator CsrA from Legionella pneumophila
65% identity, 71% coverage

HD1430 carbon storage regulator CrsA from Haemophilus ducreyi 35000HP
66% identity, 85% coverage

PSPTO_1629 carbon storage regulator from Pseudomonas syringae pv. tomato str. DC3000
56% identity, 91% coverage

Bpet1351 carbon storage regulator from Bordetella petrii DSM 12804
67% identity, 80% coverage

YP_236820 Carbon storage regulator from Pseudomonas syringae pv. syringae B728a
60% identity, 89% coverage

X994_313 carbon storage regulator CsrA from Burkholderia pseudomallei
69% identity, 70% coverage

pOZ176_186 carbon storage regulator CsrA from Pseudomonas aeruginosa PA96
65% identity, 90% coverage

CBU_0024 carbon storage regulator from Coxiella burnetii RSA 493
63% identity, 84% coverage

CSRA_GEOTN / A4ISU9 Translational regulator CsrA from Geobacillus thermodenitrificans (strain NG80-2) (see paper)
51% identity, 72% coverage

LIMLP_17575 carbon storage regulator CsrA from Leptospira interrogans serovar Manilae
49% identity, 70% coverage

LEPBI_I3210 carbon storage regulator-like protein from Leptospira biflexa serovar Patoc strain 'Patoc 1 (Paris)'
49% identity, 75% coverage

Dde_3150 carbon storage regulator from Desulfovibrio desulfuricans G20
49% identity, 73% coverage

GSU3041 carbon storage regulator from Geobacter sulfurreducens PCA
46% identity, 77% coverage

DVU0521 carbon storage regulator from Desulfovibrio vulgaris Hildenborough
52% identity, 69% coverage

CSRA_BACSU / P33911 Translational regulator CsrA from Bacillus subtilis (strain 168) (see 4 papers)
NP_391417 carbon storage regulator from Bacillus subtilis subsp. subtilis str. 168
42% identity, 81% coverage

Cbei_4295 carbon storage regulator, CsrA from Clostridium beijerincki NCIMB 8052
45% identity, 81% coverage

PFLU4324 carbon storage regulator from Pseudomonas fluorescens SBW25
52% identity, 78% coverage

TP0657 carbon storage regulator (csrA) from Treponema pallidum subsp. pallidum str. Nichols
43% identity, 74% coverage

CA_C2209 carbon storage regulator CsrA from Clostridium acetobutylicum ATCC 824
CAC2209 Carbon storage regulator, csrA from Clostridium acetobutylicum ATCC 824
38% identity, 79% coverage

PP1746, PP_1746 carbon storage regulator, putative from Pseudomonas putida KT2440
49% identity, 90% coverage

CD630_02340, CDIF630erm_00356, WP_003425267 carbon storage regulator CsrA from Clostridioides difficile CD37
CD0234 carbon storage regulator from Clostridium difficile 630
37% identity, 86% coverage

ArtHe_17680 carbon storage regulator CsrA from Arthrobacter sp. Helios
40% identity, 74% coverage

PflSS101_3653 carbon storage regulator CsrA from Pseudomonas lactis
44% identity, 88% coverage

PPUTLS46_015344 carbon storage regulator CsrA from Pseudomonas putida LS46
47% identity, 90% coverage

PSLF89_RS34715 carbon storage regulator from Piscirickettsia salmonis LF-89 = ATCC VR-1361
44% identity, 82% coverage

plpp0016 hypothetical protein from Legionella pneumophila str. Paris
50% identity, 59% coverage

PSPTO_3943 carbon storage regulator, putative from Pseudomonas syringae pv. tomato str. DC3000
45% identity, 73% coverage

TDE2355 carbon storage regulator from Treponema denticola ATCC 35405
52% identity, 72% coverage

BB0184 carbon storage regulator (csrA) from Borrelia burgdorferi B31
33% identity, 70% coverage

BH0184 carbon storage regulator from Borrelia hermsii DAH
33% identity, 69% coverage

lpg1257 LvrC from Legionella pneumophila subsp. pneumophila str. Philadelphia 1
36% identity, 88% coverage

New Search

For advice on how to use these tools together, see Interactive tools for functional annotation of bacterial genomes.

Statistics

The PaperBLAST database links 793,807 different protein sequences to 1,259,118 scientific articles. Searches against EuropePMC were last performed on March 13 2025.

How It Works

PaperBLAST builds a database of protein sequences that are linked to scientific articles. These links come from automated text searches against the articles in EuropePMC and from manually-curated information from GeneRIF, UniProtKB/Swiss-Prot, BRENDA, CAZy (as made available by dbCAN), BioLiP, CharProtDB, MetaCyc, EcoCyc, TCDB, REBASE, the Fitness Browser, and a subset of the European Nucleotide Archive with the /experiment tag. Given this database and a protein sequence query, PaperBLAST uses protein-protein BLAST to find similar sequences with E < 0.001.

To build the database, we query EuropePMC with locus tags, with RefSeq protein identifiers, and with UniProt accessions. We obtain the locus tags from RefSeq or from MicrobesOnline. We use queries of the form "locus_tag AND genus_name" to try to ensure that the paper is actually discussing that gene. Because EuropePMC indexes most recent biomedical papers, even if they are not open access, some of the links may be to papers that you cannot read or that our computers cannot read. We query each of these identifiers that appears in the open access part of EuropePMC, as well as every locus tag that appears in the 500 most-referenced genomes, so that a gene may appear in the PaperBLAST results even though none of the papers that mention it are open access. We also incorporate text-mined links from EuropePMC that link open access articles to UniProt or RefSeq identifiers. (This yields some additional links because EuropePMC uses different heuristics for their text mining than we do.)

For every article that mentions a locus tag, a RefSeq protein identifier, or a UniProt accession, we try to select one or two snippets of text that refer to the protein. If we cannot get access to the full text, we try to select a snippet from the abstract, but unfortunately, unique identifiers such as locus tags are rarely provided in abstracts.

PaperBLAST also incorporates manually-curated protein functions:

Except for GeneRIF and ENA, the curated entries include a short curated description of the protein's function. For entries from BioLiP, the protein's function may not be known beyond binding to the ligand. Many of these entries also link to articles in PubMed.

For more information see the PaperBLAST paper (mSystems 2017) or the code. You can download PaperBLAST's database here.

Changes to PaperBLAST since the paper was written:

Many of these changes are described in Interactive tools for functional annotation of bacterial genomes.

Secrets

PaperBLAST cannot provide snippets for many of the papers that are published in non-open-access journals. This limitation applies even if the paper is marked as "free" on the publisher's web site and is available in PubmedCentral or EuropePMC. If a journal that you publish in is marked as "secret," please consider publishing elsewhere.

Omissions from the PaperBLAST Database

Many important articles are missing from PaperBLAST, either because the article's full text is not in EuropePMC (as for many older articles), or because the paper does not mention a protein identifier such as a locus tag, or because of PaperBLAST's heuristics. If you notice an article that characterizes a protein's function but is missing from PaperBLAST, please notify the curators at UniProt or add an entry to GeneRIF. Entries in either of these databases will eventually be incorporated into PaperBLAST. Note that to add an entry to UniProt, you will need to find the UniProt identifier for the protein. If the protein is not already in UniProt, you can ask them to create an entry. To add an entry to GeneRIF, you will need an NCBI Gene identifier, but unfortunately many prokaryotic proteins in RefSeq do not have corresponding Gene identifers.

References

PaperBLAST: Text-mining papers for information about homologs.
M. N. Price and A. P. Arkin (2017). mSystems, 10.1128/mSystems.00039-17.

Europe PMC in 2017.
M. Levchenko et al (2017). Nucleic Acids Research, 10.1093/nar/gkx1005.

Gene indexing: characterization and analysis of NLM's GeneRIFs.
J. A. Mitchell et al (2003). AMIA Annu Symp Proc 2003:460-464.

UniProt: the universal protein knowledgebase.
The UniProt Consortium (2016). Nucleic Acids Research, 10.1093/nar/gkw1099.

BRENDA in 2017: new perspectives and new tools in BRENDA.
S. Placzek et al (2017). Nucleic Acids Research, 10.1093/nar/gkw952.

The EcoCyc database: reflecting new knowledge about Escherichia coli K-12.
I. M. Keeseler et al (2016). Nucleic Acids Research, 10.1093/nar/gkw1003.

The MetaCyc database of metabolic pathways and enzymes.
R. Caspi et al (2018). Nucleic Acids Research, 10.1093/nar/gkx935.

CharProtDB: a database of experimentally characterized protein annotations.
R. Madupu et al (2012). Nucleic Acids Research, 10.1093/nar/gkr1133.

The carbohydrate-active enzymes database (CAZy) in 2013.
V. Lombard et al (2014). Nucleic Acids Research, 10.1093/nar/gkt1178.

The Transporter Classification Database (TCDB): recent advances
M. H. Saier, Jr. et al (2016). Nucleic Acids Research, 10.1093/nar/gkv1103.

REBASE - a database for DNA restriction and modification: enzymes, genes and genomes.
R. J. Roberts et al (2015). Nucleic Acids Research, 10.1093/nar/gku1046.

Deep annotation of protein function across diverse bacteria from mutant phenotypes.
M. N. Price et al (2016). bioRxiv, 10.1101/072470.

by Morgan Price, Arkin group
Lawrence Berkeley National Laboratory