PaperBLAST

PaperBLAST – Find papers about a protein or its homologs

PaperBLAST

PaperBLAST Hits for VIMSS33397 Probable conserved membrane protein (195 a.a., MCHTAPMEPS...)

Other sequence analysis tools:

Find functional residues: SitesBLAST

Search for conserved domains

Find the best match in UniProt

Compare to protein structures

Predict transmenbrane helices: Phobius

Predict protein localization: PSORTb

Find homologs in fast.genomics

Fitness BLAST: loading...

Show query sequence

Found 5 similar proteins in the literature:

Rv1624c Probable conserved membrane protein from Mycobacterium tuberculosis H37Rv
100% identity, 100% coverage

Comparative Genomic Analysis of Mycobacterium tuberculosis Isolates Circulating in North Santander, Colombia
Bohada-Lizarazo, Tropical medicine and infectious disease 2024
- “...72% 21088X1, X2, X4, X5, X7, X8, X10, X11, X12, X13, X14, X15, X17 Truncation rv1624c Integral membrane protein SNP Transition 11.10% 21088X4, X5 Truncation rv2120c Integral membrane protein SNP Transition 11.10% 21088X10, X11 Truncation rv2395 Membrane protein SNP Transition 11.10% 21088X12, X13 Truncation rv3870 EccCa1...”
Inhibition of apoptosis by Rv2456c through Nuclear factor-κB extends the survival of Mycobacterium tuberculosis
Jurcic, International journal of mycobacteriology 2016
- “...that may interact or have homology with other genes identified in the literature. For example, Rv1624c (FID12) is a conserved hypothetical protein with some similarity to M. tuberculosis NuoK. This gene lies in an operon with NuoG, part of a nicotinamide adenine dinucleotide + hydrogen dehydrogenase...”
- “...protein 10 Rv0110 Probable conserved integral membrane transport protein 11 Rv2019 Unknown, conserved protein 12 Rv1624c Probable conserved membrane protein, similarity to nuoK 13 Rv1781c malQ Probable 4-alpha-glucanotransferase 14 Rv2663 Hypothetical protein 15 Rv2141c Conserved protein 16 Rv3738c PPE66 Unknown 17 Rv1704c cycA Probable D-serine/alanine/glycine transporter...”
Characterization of a cAMP responsive transcription factor, Cmr (Rv1675c), in TB complex mycobacteria reveals overlap with the DosR (DevR) dormancy regulon
Ranganathan, Nucleic acids research 2016
- “...competitor DNA; Lane denoted NS was supplemented with a 40 bp DNA fragment upstream of Rv1624c, as non-specific competitor DNA. ( D ) DNase I footprinting assay with His-Cmr using 1C1 DNA as template. 1C1 DNA was digested in the absence of Cmr (lane indicated )...”
Dysregulation of serine biosynthesis contributes to the growth defect of a Mycobacterium tuberculosis crp mutant
Bai, Molecular microbiology 2011
- “...Bai et al ., 2005 ); 4, a 40 bp DNA sequence upstream of the Rv1624c orf that does not contain a CRP Mt -binding site, which is used as a negative control; 5, native serC -Rv0885 motif probe; 6, modified G4-to-C; and 7, modified C17-to-G....”
Characterization of Mycobacterium tuberculosis Rv3676 (CRPMt), a cyclic AMP receptor protein-like DNA binding protein
Bai, Journal of bacteriology 2005
- “...by EMSA. A 40-bp intergenic DNA fragment upstream of Rv1624c was used as a nonspecific control. The presence of CRPMt retarded, in a dose-dependent manner, the...”
- “...DNA, but not by the 40-bp intergenic DNA upstream of Rv1624c (1624c) that was used as a negative control. This control Rv1624c DNA probe also failed to bind to...”
Cloning and expression of multiple integral membrane proteins from Mycobacterium tuberculosis in Escherichia coli
Korepanova, Protein science : a publication of the Protein Society 2005
- “...Rv1440 Rv1446c Rv1459c Rv1487 Rv1567c Rv1607 Rv1616 Rv1624c Rv1634 Rv1635c Rv1819c Rv1824 Rv1857 Rv1861 Rv1892 Rv1902c Rv1924c Rv1974 Rv2076c Rv2144c Rv2146c...”
Characterization of the cydAB-encoded cytochrome bd oxidase from Mycobacterium smegmatis
Kana, Journal of bacteriology 2001
- “...open reading frame that is highly homologous to the Rv1624c gene from M. tuberculosis (7), preceded by a further 962 bp of upstream sequence. The PstI fragment...”
- “...open reading frame that showed high homology to Rv1624c from M. tuberculosis was found upstream of cydA, confirming that the genetic organization at this...”

MAP1317c hypothetical protein from Mycobacterium avium subsp. paratuberculosis str. k10
80% identity, 98% coverage

Gene expression profiling of Mycobacterium avium subsp. paratuberculosis in simulated multi-stress conditions and within THP-1 cells reveals a new kind of interactive intramacrophage behaviour
Cossu, BMC microbiology 2012
- “...for the protein folding along with resistance factors such as acid resistance membrane protein ( MAP1317c ) for resistance to acids and three entries of acyltransferase 3 ( MAP3276c MAP3514 MAP1271c ) required for peptidoglycan O-acylation in order to increase its resistance [ 48 ]. There...”
- “...) that are repressed. The stress metabolism shows an up-regulation of acid-resistance membrane protein ( MAP1317c ) specific for resistance to acidic environment, uspA ( MAP1754c ) and two entries for the repair of damaged DNA such as recR and end . On the other hand,...”

Q9AK72 Integral membrane protein from Streptomyces coelicolor (strain ATCC BAA-471 / A3(2) / M145)
51% identity, 72% coverage

Comparative genomics of transport proteins in developmental bacteria: Myxococcus xanthus and Streptomyces coelicolor
Getsin, BMC microbiology 2013
- “...ArAE Family 8.A.3.4.1 Q9KYG0 239 2 MPA1-C Family 9.A.31.1.2 Q9XA27 436 10 SdpAB Family 9.B.36.1.2 Q9AK72 226 6 Hde Family 9.B.74.4.1 Q9K3K9 357 6 PIP Family 9.B.140.1.1 Q9K4J8 280 6 DUF1206 Family Proteins were retrieved with GBLAST e-values between 0.1 and 0.001, individually verified and assigned...”
- “...Streptomyces species. The second Class 9 protein identified in Sco was a 6 TMS homologue (Q9AK72; 6 TMSs; 226 aas), a member of the Acid Resistance Membrane Protein (HdeD) Family. It was assigned TC# 9.B.36.1.2, but no functional assignment was possible. The third Class 9 protein...”

BAD_1583 hypothetical protein from Bifidobacterium adolescentis ATCC 15703
29% identity, 57% coverage

Metagenomic identification, purification and characterisation of the Bifidobacterium adolescentis BgaC β-galactosidase
Mulualem, Applied microbiology and biotechnology 2021
- “...(Fig. 1 ). All the clones shared the -galactosidase BAD_1582 gene and the adjacent five BAD_1583 to BAD_1587 genes. The adjacent genes encoded a sugar transporting permease, required by many bacteria to access sugars for adaptable growth in the gut environment. BAD_1582 -specific PCR, using the...”
- “...six shared genes. These include the -galactosidase gene (BAD_1582, bgaC ), HdeD family acid-resistance protein (BAD_1583), LacI family transcriptional regulator (BAD_1584), carbohydrate ABC transporter substrate-binding protein (BAD_1585), sugar ABC transporter permease (BAD_1586), and carbohydrate ABC transporter permease (BAD_1587). The numbers at the start and end of...”

lmo0596 similar to unknown proteins from Listeria monocytogenes EGD-e
LMOf6854_0637 membrane protein, putative from Listeria monocytogenes str. 1/2a F6854
31% identity, 84% coverage

The influence of stress factors on selected phenotypic and genotypic features of Listeria monocytogenes - a pilot study
Wiktorczyk-Kapischke, BMC microbiology 2023
- “...of sigB (stress induced regulator of genes), agrA, agrB (associated with biofilm formation) and lmo2230, lmo0596 (acid and alkali stress) (qPCR)for three strains of L. monocytogenes . Results Applied stress conditions contributed to changes in phenotypic features and expression levels of sigB , agrA , agrB...”
- “...stress (55C), lmo2230 transcript level after exposure to acid and alkali stress (ATCC 19111), and lmo0596 transcript level after exposure to acid stress (ATCC 19111). Conclusions Environmental stress changes the ability to form a biofilm and the MIC values of antibiotics and affect the level of...”
Acid stress signals are integrated into the σB-dependent general stress response pathway via the stressosome in the food-borne pathogen Listeria monocytogenes
Guerreiro, PLoS pathogens 2022
- “...this pathogen in food, food-processing environments and in the gastrointestinal tract. Results The transcription of lmo0596 is induced by acid in B -dependent manner The B regulon in L . monocytogenes is composed of a large number of genes that are heterogeneously expressed in response to...”
- “...however we sought to use an additional B -reporter gene in this study. The gene lmo0596 was a potential target, as it possesses a B promoter and had been shown in other studies to belong to the B regulon [ 9 , 60 , 61 ]....”
Listeria monocytogenes Requires the RsbX Protein To Prevent SigB Activation under Nonstressed Conditions
Oliveira, Journal of bacteriology 2022
- “...the transcription of two strongly SigB-dependent genes, lmo2230 , encoding a putative arsenate reductase, and lmo0596 , encoding a putative trans -membrane protein with an unknown function ( 33 , 34 ). Also, we examined expression of lmo1699 , encoding a chemotaxis protein which is only...”
- “...at 600nm [OD 600 ], 0.8) before being subject to Northern blot analysis. lmo2230 and lmo0596 transcription was induced under nonstressed (dark) conditions in the rsbX mutant compared to the wild type at both 23 and 37C ( Fig. 2A and B ). This effect could...”
The stressosome is required to transduce low pH signals leading to increased transcription of the amino acid-based acid tolerance mechanisms in Listeria monocytogenes
Guerreiro, Access microbiology 2022
- “...at pH 5 for 15min increased the transcription of highly B -dependent genes lmo2230 and lmo0596 , and enhanced L. monocytogenes acid tolerance in a stressosome-dependent manner. The genes lmo2230 and lmo0596 encode a putative arsenate reductase and a transmembrane protein with unknown function, respectively [...”
Mild Stress Conditions during Laboratory Culture Promote the Proliferation of Mutations That Negatively Affect Sigma B Activity in Listeria monocytogenes
Guerreiro, Journal of bacteriology 2020 (secret)
Transcriptomic and Phenotypic Analyses of the Sigma B-Dependent Characteristics and the Synergism between Sigma B and Sigma L in Listeria monocytogenes EGD-e
Mattila, Microorganisms 2020
- “...protein pai 1, putative 2.1 3 lmo2158 choloylglycine hydrolase lmo0170 conserved domain protein 3 3.6 lmo0596 conserved hypothetical protein 4.7 3.8 lmo0911 conserved hypothetical protein 1.7 1.5 lmo0995 conserved hypothetical protein 1.6 3.9 lmo1241 conserved hypothetical protein 2 1.6 lmo1776 conserved hypothetical protein 2.3 1.9 lmo2213...”
Transcriptional and phenotypic responses of Listeria monocytogenes to chlorine dioxide
Pleitner, Applied and environmental microbiology 2014
- “...Hypothetical protein lmo0170 lmo0229 lmo0231 lmo0439 lmo0496 lmo0596 lmo0670 lmo0720 lmo0761 lmo0796 lmo0869 lmo0911 lmo0964 lmo1059 lmo1069 lmo1137 lmo1332...”
Cycles of light and dark co-ordinate reversible colony differentiation in Listeria monocytogenes
Tiensuu, Molecular microbiology 2013
- “...dependent on B ( Fig. 1 E). Also, expression of the genuinely B -regulated genes, lmo0596 and lmo2230 was highly induced at light conditions as compared with dark conditions (Fig. S3). Absence of Lmo0799 at light conditions decreased lmo0596 and lmo2230 expression to a level observed...”
- “...lysine permease Lmo0798) and in the promoter region of the gene encoding the membrane protein Lmo0596 respectively (Table S1). Interestingly, the level of the B protein was only slightly decreased (a maximum of threefold) in the above tested transposon mutants apart from one. Previously, expression of...”
More
Physiological and transcriptional characterization of persistent and nonpersistent Listeria monocytogenes isolates
Fox, Applied and environmental microbiology 2011
- “...Downregulated Downregulated LMOf6854_1483 LMOf6854_0650 LMOf6854_0329 LMOf6854_0637 Putative membrane protein LPXTG-motif cell wall anchor domain protein...”

New Search

For advice on how to use these tools together, see Interactive tools for functional annotation of bacterial genomes.

Statistics

The PaperBLAST database links 789,361 different protein sequences to 1,256,019 scientific articles. Searches against EuropePMC were last performed on January 10 2025.

PaperBLAST builds a database of protein sequences that are linked to scientific articles. These links come from automated text searches against the articles in EuropePMC and from manually-curated information from GeneRIF, UniProtKB/Swiss-Prot, BRENDA, CAZy (as made available by dbCAN), BioLiP, CharProtDB, MetaCyc, EcoCyc, TCDB, REBASE, the Fitness Browser, and a subset of the European Nucleotide Archive with the /experiment tag. Given this database and a protein sequence query, PaperBLAST uses protein-protein BLAST to find similar sequences with E < 0.001.

To build the database, we query EuropePMC with locus tags, with RefSeq protein identifiers, and with UniProt accessions. We obtain the locus tags from RefSeq or from MicrobesOnline. We use queries of the form "locus_tag AND genus_name" to try to ensure that the paper is actually discussing that gene. Because EuropePMC indexes most recent biomedical papers, even if they are not open access, some of the links may be to papers that you cannot read or that our computers cannot read. We query each of these identifiers that appears in the open access part of EuropePMC, as well as every locus tag that appears in the 500 most-referenced genomes, so that a gene may appear in the PaperBLAST results even though none of the papers that mention it are open access. We also incorporate text-mined links from EuropePMC that link open access articles to UniProt or RefSeq identifiers. (This yields some additional links because EuropePMC uses different heuristics for their text mining than we do.)

For every article that mentions a locus tag, a RefSeq protein identifier, or a UniProt accession, we try to select one or two snippets of text that refer to the protein. If we cannot get access to the full text, we try to select a snippet from the abstract, but unfortunately, unique identifiers such as locus tags are rarely provided in abstracts.

PaperBLAST also incorporates manually-curated protein functions:

Proteins from NCBI's RefSeq are included if a GeneRIF entry links the gene to an article in PubMed^®. GeneRIF also provides a short summary of the article's claim about the protein, which is shown instead of a snippet.
Proteins from Swiss-Prot (the curated part of UniProt) are included if the curators identified experimental evidence for the protein's function (evidence code ECO:0000269). For these proteins, the fields of the Swiss-Prot entry that describe the protein's function are shown (with bold headings).
Proteins from BRENDA, a curated database of enzymes, are included if they are linked to a paper in PubMed and their full sequence is known.
Every protein from the non-redundant subset of BioLiP, a database of ligand-binding sites and catalytic residues in protein structures, is included. Since BioLiP itself does not include descriptions of the proteins, those are taken from the Protein Data Bank. Descriptions from PDB rely on the original submitter of the structure and cannot be updated by others, so they may be less reliable. (For SitesBLAST and Sites on a Tree, we use a larger subset of BioLiP so that every ligand is represented among a group of structures with similar sequences, but for PaperBLAST, we use the non-redundant set provided by BioLiP.)
Every protein from EcoCyc, a curated database of the proteins in Escherichia coli K-12, is included, regardless of whether they are characterized or not.
Proteins from the MetaCyc metabolic pathway database are included if they are linked to a paper in PubMed and their full sequence is known.
Proteins from the Transport Classification Database (TCDB) are included if they have known substrate(s), have reference(s), and are not described as uncharacterized or putative. (Some of the references are not visible on the PaperBLAST web site.)
Every protein from CharProtDB, a database of experimentally characterized protein annotations, is included.
Proteins from the CAZy database of carbohydrate-active enzymes are included if they are associated with an Enzyme Classification number. Even though CAZy does not provide links from individual protein sequences to papers, these should all be experimentally-characterized proteins.
Proteins from the REBASE database of restriction enzymes are included if they have known specificity.
Every protein with an evidence-based reannotation (based on mutant phenotypes) in the Fitness Browser is included.
Sequence-specific transcription factors (including sigma factors and DNA-binding response regulators) with experimentally-determined DNA binding sites from the PRODORIC database of gene regulation in prokaryotes.
Putative transcription factors from RegPrecise that have manually-curated predictions for their binding sites. These predictions are based on conserved putative regulatory sites across genomes that contain similar transcription factors, so PaperBLAST clusters the TFs at 70% identity and retains just one member of each cluster.
Coding sequence (CDS) features from the European Nucleotide Archive (ENA) are included if the /experiment tag is set (implying that there is experimental evidence for the annotation), the nucleotide entry links to paper(s) in PubMed, and the nucleotide entry is from the STD data class (implying that these are targeted annotated sequences, not from shotgun sequencing). Also, to filter out genes whose transcription or translation was detected, but whose function was not studied, nucleotide entries or papers with more than 25 such proteins are excluded. Descriptions from ENA rely on the original submitter of the sequence and cannot be updated by others, so they may be less reliable.

Except for GeneRIF and ENA, the curated entries include a short curated description of the protein's function. For entries from BioLiP, the protein's function may not be known beyond binding to the ligand. Many of these entries also link to articles in PubMed.

For more information see the PaperBLAST paper (mSystems 2017) or the code. You can download PaperBLAST's database here.

Changes to PaperBLAST since the paper was written:

November 2023: incorporated PRODORIC and RegPrecise. Many PRODORIC entries were not linked to a protein sequence (no UniProt identifier), so we added this information.
February 2023: BioLiP changed their download format. PaperBLAST now includes their non-redundant subset. SitesBLAST and Sites on a Tree use a larger non-redundant subset that ensures that every ligand is represented within each cluster. This should ensure that every binding site is represented.
June 2022: incorporated some coding sequences from ENA with the /experiment tag.
March 2022: incorporated BioLiP.
April 2020: incorporated TCDB.
April 2019: EuropePMC now returns table entries in their search results. This has expanded PaperBLAST's database, but most of the new entries are of low relevance, and the resulting snippets are often just lists of locus tags with annotations.
February 2018: the alignment page reports the conservation of the hit's functional sites (if available from from Swiss-Prot or UniProt)
January 2018: incorporated BRENDA.
December 2017: incorporated MetaCyc, CharProtDB, CAZy, REBASE, and the reannotations from the Fitness Browser.
September 2017: EuropePMC no longer returns some table entries in their search results. This has shrunk PaperBLAST's database, but has also reduced the number of low-relevance hits.

Many of these changes are described in Interactive tools for functional annotation of bacterial genomes.

Secrets

PaperBLAST cannot provide snippets for many of the papers that are published in non-open-access journals. This limitation applies even if the paper is marked as "free" on the publisher's web site and is available in PubmedCentral or EuropePMC. If a journal that you publish in is marked as "secret," please consider publishing elsewhere.

Omissions from the PaperBLAST Database

Many important articles are missing from PaperBLAST, either because the article's full text is not in EuropePMC (as for many older articles), or because the paper does not mention a protein identifier such as a locus tag, or because of PaperBLAST's heuristics. If you notice an article that characterizes a protein's function but is missing from PaperBLAST, please notify the curators at UniProt or add an entry to GeneRIF. Entries in either of these databases will eventually be incorporated into PaperBLAST. Note that to add an entry to UniProt, you will need to find the UniProt identifier for the protein. If the protein is not already in UniProt, you can ask them to create an entry. To add an entry to GeneRIF, you will need an NCBI Gene identifier, but unfortunately many prokaryotic proteins in RefSeq do not have corresponding Gene identifers.

References

PaperBLAST: Text-mining papers for information about homologs.
M. N. Price and A. P. Arkin (2017). mSystems, 10.1128/mSystems.00039-17.

Europe PMC in 2017.
M. Levchenko et al (2017). Nucleic Acids Research, 10.1093/nar/gkx1005.

Gene indexing: characterization and analysis of NLM's GeneRIFs.
J. A. Mitchell et al (2003). AMIA Annu Symp Proc 2003:460-464.

UniProt: the universal protein knowledgebase.
The UniProt Consortium (2016). Nucleic Acids Research, 10.1093/nar/gkw1099.

BRENDA in 2017: new perspectives and new tools in BRENDA.
S. Placzek et al (2017). Nucleic Acids Research, 10.1093/nar/gkw952.

The EcoCyc database: reflecting new knowledge about Escherichia coli K-12.
I. M. Keeseler et al (2016). Nucleic Acids Research, 10.1093/nar/gkw1003.

The MetaCyc database of metabolic pathways and enzymes.
R. Caspi et al (2018). Nucleic Acids Research, 10.1093/nar/gkx935.

CharProtDB: a database of experimentally characterized protein annotations.
R. Madupu et al (2012). Nucleic Acids Research, 10.1093/nar/gkr1133.

The carbohydrate-active enzymes database (CAZy) in 2013.
V. Lombard et al (2014). Nucleic Acids Research, 10.1093/nar/gkt1178.

The Transporter Classification Database (TCDB): recent advances
M. H. Saier, Jr. et al (2016). Nucleic Acids Research, 10.1093/nar/gkv1103.

REBASE - a database for DNA restriction and modification: enzymes, genes and genomes.
R. J. Roberts et al (2015). Nucleic Acids Research, 10.1093/nar/gku1046.

Deep annotation of protein function across diverse bacteria from mutant phenotypes.
M. N. Price et al (2016). bioRxiv, 10.1101/072470.

by Morgan Price, Arkin group
Lawrence Berkeley National Laboratory

PaperBLAST – Find papers about a protein or its homologs

PaperBLAST

PaperBLAST Hits for VIMSS33397 Probable conserved membrane protein (195 a.a., MCHTAPMEPS...)

New Search

Statistics

How It Works

Secrets

Omissions from the PaperBLAST Database

References