PaperBLAST – Find papers about a protein or its homologs

 

PaperBLAST

PaperBLAST Hits for SwissProt::Q84DC3 NAD(P)-dependent benzaldehyde dehydrogenase; EC 1.2.1.28; EC 1.2.1.7 (Pseudomonas putida (Arthrobacter siderocapsulatus)) (436 a.a., MNYLSPAKID...)

Other sequence analysis tools:

Find functional residues: SitesBLAST

Search for conserved domains

Find the best match in UniProt

Compare to protein structures

Predict transmenbrane helices: Phobius

Predict protein localization: PSORTb

Find homologs in fast.genomics

Fitness BLAST: loading...

Found 253 similar proteins in the literature:

mdlD / Q84DC3 NADP+-benzaldehyde dehydrogenase (EC 1.2.1.96; EC 1.2.1.7) from Pseudomonas putida (see 2 papers)
MDLD_PSEPU / Q84DC3 NAD(P)-dependent benzaldehyde dehydrogenase; EC 1.2.1.28; EC 1.2.1.7 from Pseudomonas putida (Arthrobacter siderocapsulatus) (see paper)
Q84DC3 benzaldehyde dehydrogenase (NAD+) (EC 1.2.1.28); benzaldehyde dehydrogenase (NADP+) (EC 1.2.1.7) from Pseudomonas putida (see paper)
WP_016501743 NAD(P)-dependent benzaldehyde dehydrogenase MdlD from Pseudomonas sp. SDS3-8
100% identity, 100% coverage

5ucdA / Q84DC3 Benzaldehyde dehydrogenase, a class 3 aldehyde dehydrogenase, with bound NADP+ and benzoate adduct (see paper)
100% identity, 100% coverage

dpgC / A3RJV6 benzaldehyde dehydrogenase (EC 1.2.1.28) from Stutzerimonas stutzeri (see paper)
86% identity, 100% coverage

aldH / A0A0H3KDU5 4,4'-diaponeurosporenal dehydrogenase from Staphylococcus aureus (strain Newman) (see paper)
DIALD_STAA8 / Q2FWX9 4,4'-diaponeurosporen-aldehyde dehydrogenase; 4,4'-diaponeurosporenal dehydrogenase; EC 1.2.1.- from Staphylococcus aureus (strain NCTC 8325 / PS 47) (see paper)
NWMN_1858 aldehyde dehydrogenase from Staphylococcus aureus subsp. aureus str. Newman
SAOUHSC_02142 aldehyde dehydrogenase, putative from Staphylococcus aureus subsp. aureus NCTC 8325
SACOL1984 aldehyde dehydrogenase from Staphylococcus aureus subsp. aureus COL
CH51_RS10975 aldehyde dehydrogenase from Staphylococcus aureus
44% identity, 92% coverage

Dhaf_2181 Aldehyde Dehydrogenase from Desulfitobacterium hafniense DCB-2
44% identity, 93% coverage

SAR2013 putative aldehyde dehydrogenase from Staphylococcus aureus subsp. aureus MRSA252
44% identity, 92% coverage

SA1736 aldehyde dehydrogenase from Staphylococcus aureus subsp. aureus N315
44% identity, 92% coverage

RCF35_18825 aldehyde dehydrogenase from Bacillus velezensis
45% identity, 93% coverage

HWX41_RS25615 aldehyde dehydrogenase from Bacillus paramycoides
43% identity, 92% coverage

V529_29500 aldehyde dehydrogenase from Bacillus velezensis SQR9
45% identity, 93% coverage

GBAA1296 aldehyde dehydrogenase from Bacillus anthracis str. 'Ames Ancestor'
46% identity, 93% coverage

BC1285 Aldehyde dehydrogenase (NAD(P)+) from Bacillus cereus ATCC 14579
45% identity, 93% coverage

HWX41_RS17590 aldehyde dehydrogenase from Bacillus paramycoides
45% identity, 93% coverage

Cphy_3041 Aldehyde Dehydrogenase_ from Clostridium phytofermentans ISDg
42% identity, 92% coverage

Npun_F0840 aldehyde dehydrogenase from Nostoc punctiforme
43% identity, 93% coverage

SE1603 aldehyde dehydrogenase from Staphylococcus epidermidis ATCC 12228
41% identity, 92% coverage

FQ085_01960 aldehyde dehydrogenase from Planococcus sp. ANT_H30
40% identity, 94% coverage

DCF50_p2406 aldehyde dehydrogenase from Dehalobacter sp. CF
42% identity, 94% coverage

alr3672 aldehyde dehydrogenase from Nostoc sp. PCC 7120
41% identity, 94% coverage

Ava_3615 Aldehyde dehydrogenase from Anabaena variabilis ATCC 29413
41% identity, 94% coverage

CD2206 aldehyde dehydrogenase from Clostridium difficile 630
40% identity, 93% coverage

NP_504634 Aldehyde dehydrogenase from Caenorhabditis elegans
41% identity, 85% coverage

CwatDRAFT_1032 Aldehyde dehydrogenase from Crocosphaera watsonii WH 8501
42% identity, 92% coverage

ABO_2709 aldehyde dehydrogenase (NAD) from Alcanivorax borkumensis SK2
41% identity, 87% coverage

Cbei_1953 aldehyde dehydrogenase from Clostridium beijerincki NCIMB 8052
38% identity, 92% coverage

Syncc9605_0497 putative aldehyde dehydrogenase from Synechococcus sp. CC9605
41% identity, 93% coverage

WH5701_06196 Putative aldehyde dehydrogenase from Synechococcus sp. WH 5701
42% identity, 84% coverage

Syncc9902_1838 putative aldehyde dehydrogenase from Synechococcus sp. CC9902
42% identity, 89% coverage

PMT0191 Putative aldehyde dehydrogenase from Prochlorococcus marinus str. MIT 9313
43% identity, 88% coverage

WP_010873792 aldehyde dehydrogenase family protein from Synechocystis sp. PCC 6803
slr0091 aldehyde dehydrogenase from Synechocystis sp. PCC 6803
40% identity, 93% coverage

Synpcc7942_0489 aldehyde dehydrogenase from Synechococcus elongatus PCC 7942
42% identity, 93% coverage

Q7ZU10 Aldehyde dehydrogenase from Danio rerio
39% identity, 88% coverage

RS9917_02641 Putative aldehyde dehydrogenase from Synechococcus sp. RS9917
40% identity, 86% coverage

AL3B1_HUMAN / P43353 Aldehyde dehydrogenase family 3 member B1; Aldehyde dehydrogenase 7; Long-chain fatty aldehyde dehydrogenase; Medium-chain fatty aldehyde dehydrogenase; EC 1.2.1.28; EC 1.2.1.5; EC 1.2.1.7; EC 1.2.1.48 from Homo sapiens (Human) (see 2 papers)
P43353 aldehyde dehydrogenase [NAD(P)+] (EC 1.2.1.5) from Homo sapiens (see paper)
NP_001154945 aldehyde dehydrogenase family 3 member B1 isoform a from Homo sapiens
41% identity, 90% coverage

G7PP62 Aldehyde dehydrogenase from Macaca fascicularis
40% identity, 90% coverage

AL3A1_HUMAN / P30838 Aldehyde dehydrogenase, dimeric NADP-preferring; ALDHIII; Aldehyde dehydrogenase 3; Aldehyde dehydrogenase family 3 member A1; EC 1.2.1.5 from Homo sapiens (Human) (see 2 papers)
P30838 aldehyde dehydrogenase [NAD(P)+] (EC 1.2.1.5) from Homo sapiens (see 8 papers)
NP_000682 aldehyde dehydrogenase, dimeric NADP-preferring isoform 1 from Homo sapiens
38% identity, 95% coverage

WH7805_06416 Putative aldehyde dehydrogenase from Synechococcus sp. WH 7805
39% identity, 85% coverage

4l1oB / P30838 Crystal structure of human aldh3a1 with inhibitor 1-{[4-(1,3- benzodioxol-5-ylmethyl)piperazin-1-yl]methyl}-1h-indole-2,3-dione (see paper)
38% identity, 95% coverage

F1SDC7 Aldehyde dehydrogenase from Sus scrofa
41% identity, 77% coverage

Awo_c33720 aldehyde dehydrogenase from Acetobacterium woodii DSM 1030
37% identity, 93% coverage

alkH / P12693 aldehyde dehydrogenase (EC 1.2.1.3) from Pseudomonas oleovorans (see 3 papers)
ALDH_PSEOL / P12693 Aldehyde dehydrogenase; EC 1.2.1.3 from Pseudomonas oleovorans (see paper)
alkH / GB|CAB54053.1 aldehyde dehydrogenase from Pseudomonas oleovorans (see paper)
alkH / CAB54053.1 aldehyde dehydrogenase from Pseudomonas putida (see 6 papers)
40% identity, 88% coverage

PMN2A_1709 Putative aldehyde dehydrogenase from Prochlorococcus marinus str. NATL2A
40% identity, 90% coverage

XP_006532087 aldehyde dehydrogenase, dimeric NADP-preferring isoform X1 from Mus musculus
38% identity, 94% coverage

MAB_4605c Probable aldehyde dehydrogenase from Mycobacterium abscessus ATCC 19977
38% identity, 90% coverage

AL3B2_MOUSE / E9Q3E1 Aldehyde dehydrogenase family 3 member B2; Aldh3B2; Aldehyde dehydrogenase 8; Long-chain fatty aldehyde dehydrogenase; EC 1.2.1.3; EC 1.2.1.48 from Mus musculus (Mouse) (see paper)
E9Q3E1 long-chain-aldehyde dehydrogenase (EC 1.2.1.48) from Mus musculus (see paper)
NP_001170909 aldehyde dehydrogenase family 3 member B2 from Mus musculus
39% identity, 88% coverage

AL3A1_MOUSE / P47739 Aldehyde dehydrogenase, dimeric NADP-preferring; Aldehyde dehydrogenase 4; Aldehyde dehydrogenase family 3 member A1; Dioxin-inducible aldehyde dehydrogenase 3; EC 1.2.1.5 from Mus musculus (Mouse) (see 3 papers)
P47739 aldehyde dehydrogenase [NAD(P)+] (EC 1.2.1.5) from Mus musculus (see paper)
38% identity, 94% coverage

AL3A1_RAT / P11883 Aldehyde dehydrogenase, dimeric NADP-preferring; Aldehyde dehydrogenase family 3 member A1; HTC-ALDH; Tumor-associated aldehyde dehydrogenase; EC 1.2.1.5 from Rattus norvegicus (Rat) (see paper)
NP_114178 aldehyde dehydrogenase, dimeric NADP-preferring from Rattus norvegicus
38% identity, 93% coverage

1ad3A / P11883 Class 3 aldehyde dehydrogenase complex with nicotinamide-adenine- dinucleotide (see paper)
38% identity, 95% coverage

ALDH3A2 / P51648 fatty aldehyde dehydrogenase (EC 1.2.1.3; EC 1.2.1.94; EC 1.2.1.39) from Homo sapiens (see 10 papers)
AL3A2_HUMAN / P51648 Aldehyde dehydrogenase family 3 member A2; Aldehyde dehydrogenase 10; Fatty aldehyde dehydrogenase; Microsomal aldehyde dehydrogenase; EC 1.2.1.3; EC 1.2.1.94 from Homo sapiens (Human) (see 11 papers)
P51648 long-chain-aldehyde dehydrogenase (EC 1.2.1.48) from Homo sapiens (see paper)
40% identity, 86% coverage

Q3UNF5 aldehyde dehydrogenase (NAD+) (EC 1.2.1.3) from Mus musculus (see paper)
37% identity, 94% coverage

B7ZN13 Aldehyde dehydrogenase from Mus musculus
37% identity, 94% coverage

AL3B3_MOUSE / J3QMK6 Aldehyde dehydrogenase family 3 member B3; EC 1.2.1.3 from Mus musculus (Mouse) (see paper)
J3QMK6 long-chain-aldehyde dehydrogenase (EC 1.2.1.48) from Mus musculus (see paper)
38% identity, 87% coverage

Q6PKA6 Aldehyde dehydrogenase, dimeric NADP-preferring (Fragment) from Homo sapiens
39% identity, 80% coverage

NM219_06200, NM220_06200 aldehyde dehydrogenase from Parvimonas micra
37% identity, 94% coverage

Q60HH8 Aldehyde dehydrogenase family 3 member A2 from Macaca fascicularis
39% identity, 86% coverage

M3W3M4 Aldehyde dehydrogenase from Felis catus
37% identity, 90% coverage

AL3B1_MOUSE / Q80VQ0 Aldehyde dehydrogenase family 3 member B1; Aldehyde dehydrogenase 7; Long-chain fatty aldehyde dehydrogenase; Medium-chain fatty aldehyde dehydrogenase; EC 1.2.1.28; EC 1.2.1.5; EC 1.2.1.7; EC 1.2.1.48 from Mus musculus (Mouse) (see 2 papers)
Q80VQ0 aldehyde dehydrogenase [NAD(P)+] (EC 1.2.1.5) from Mus musculus (see paper)
Q3TX25 Aldehyde dehydrogenase from Mus musculus
37% identity, 90% coverage

XP_008297392 aldehyde dehydrogenase, dimeric NADP-preferring-like from Stegastes partitus
38% identity, 82% coverage

CRTNC_CYTFI / P0DPF0 4,4'-diapolycopene-4,4'-dial dehydrogenase; 4,4'-diapolycopene aldehyde oxidase; 4,4'-diapolycopene-4,4'-dial oxidase; 4,4'-diapolycopene-4,4'-dioate synthase; EC 1.2.99.10 from Cytobacillus firmus (Bacillus firmus) (see 2 papers)
P0DPF0 4,4'-diapolycopenoate synthase (EC 1.2.99.10) from Cytobacillus firmus (see paper)
38% identity, 93% coverage

Q53NG8 aldehyde dehydrogenase [NAD(P)+] (EC 1.2.1.5) from Oryza sativa (see paper)
40% identity, 87% coverage

I1QYB5 Aldehyde dehydrogenase from Oryza glaberrima
40% identity, 87% coverage

Pro0374 NAD-dependent aldehyde dehydrogenase from Prochlorococcus marinus str. SS120
37% identity, 90% coverage

Q5XI42 Aldehyde dehydrogenase family 3 member B1 from Rattus norvegicus
37% identity, 90% coverage

LOC100232483 LOW QUALITY PROTEIN: aldehyde dehydrogenase family 3 member B1 from Taeniopygia guttata
38% identity, 77% coverage

SYNW1956 putative aldehyde dehydrogenase from Synechococcus sp. WH 8102
42% identity, 90% coverage

ALDH3_BACSU / P46329 Putative aldehyde dehydrogenase AldX; EC 1.2.1.3 from Bacillus subtilis (strain 168) (see paper)
39% identity, 94% coverage

R0JMW7 Aldehyde dehydrogenase family 3 member A2 (Fragment) from Anas platyrhynchos
37% identity, 91% coverage

Cthe_2238 aldehyde dehydrogenase from Clostridium thermocellum ATCC 27405
37% identity, 92% coverage

CtherDRAFT_1042 Aldehyde Dehydrogenase_ from Clostridium thermocellum DSM 4150
37% identity, 92% coverage

E9QH31 Aldehyde dehydrogenase from Danio rerio
37% identity, 85% coverage

GRMZM2G060800 aldehyde dehydrogenase, dimeric NADP-preferring from Zea mays
38% identity, 87% coverage

LOC108996365 aldehyde dehydrogenase family 3 member H1-like from Juglans regia
37% identity, 78% coverage

SO3683 coniferyl aldehyde dehydrogenase from Shewanella oneidensis MR-1
36% identity, 89% coverage

V529_39560 aldehyde dehydrogenase family protein from Bacillus velezensis SQR9
38% identity, 98% coverage

Saro_1197 aldehyde dehydrogenase from Novosphingobium aromaticivorans DSM 12444
37% identity, 95% coverage

XP_001655923 aldehyde dehydrogenase, dimeric NADP-preferring isoform X12 from Aedes aegypti
36% identity, 84% coverage

RCF35_04555 aldehyde dehydrogenase family protein from Bacillus velezensis
38% identity, 98% coverage

LOC110770067 aldehyde dehydrogenase family 3 member H1-like from Prunus avium
36% identity, 78% coverage

LOC127486773 aldehyde dehydrogenase family 3 member A2 from Oryctolagus cuniculus
38% identity, 91% coverage

ALDH3-2 / Q16MV5 fatty aldehyde dehydrogenase 3 (EC 1.2.1.94) from Aedes aegypti (see paper)
Q16MV5 farnesal dehydrogenase (EC 1.2.1.94) from Aedes aegypti (see paper)
37% identity, 84% coverage

A0A5F9D390 Aldehyde dehydrogenase from Oryctolagus cuniculus
38% identity, 81% coverage

B1AV77 Aldehyde dehydrogenase from Mus musculus
39% identity, 84% coverage

AL3A2_MOUSE / P47740 Aldehyde dehydrogenase family 3 member A2; Aldehyde dehydrogenase 3; Fatty aldehyde dehydrogenase; EC 1.2.1.3; EC 1.2.1.94 from Mus musculus (Mouse) (see paper)
P47740 aldehyde dehydrogenase (NAD+) (EC 1.2.1.3) from Mus musculus (see paper)
39% identity, 85% coverage

B1ATI0 Aldehyde dehydrogenase from Mus musculus
39% identity, 78% coverage

XP_006532094 aldehyde dehydrogenase family 3 member A2 isoform X2 from Mus musculus
39% identity, 85% coverage

BTH_I0192 coniferyl aldehyde dehydrogenase from Burkholderia thailandensis E264
38% identity, 91% coverage

G3V9W6 Aldehyde dehydrogenase family 3 member A2 from Rattus norvegicus
38% identity, 74% coverage

XP_004502482 aldehyde dehydrogenase family 3 member H1-like isoform X1 from Cicer arietinum
36% identity, 86% coverage

XP_006246591 aldehyde dehydrogenase family 3 member A2 isoform X1 from Rattus norvegicus
38% identity, 71% coverage

AL3A2_RAT / P30839 Aldehyde dehydrogenase family 3 member A2; Aldehyde dehydrogenase 4; Fatty aldehyde dehydrogenase; Microsomal aldehyde dehydrogenase; msALDH; EC 1.2.1.3; EC 1.2.1.94 from Rattus norvegicus (Rat) (see paper)
38% identity, 86% coverage

XP_001655925 aldehyde dehydrogenase, dimeric NADP-preferring from Aedes aegypti
37% identity, 84% coverage

AL3H1_ARATH / Q70DU8 Aldehyde dehydrogenase family 3 member H1; AtALDH4; Ath-ALDH4; EC 1.2.1.3 from Arabidopsis thaliana (Mouse-ear cress) (see 4 papers)
Q70DU8 aldehyde dehydrogenase (NAD+) (EC 1.2.1.3) from Arabidopsis thaliana (see paper)
NP_175081 aldehyde dehydrogenase 3H1 from Arabidopsis thaliana
AT1G44170 ALDH3H1 (ALDEHYDE DEHYDROGENASE 3H1); 3-chloroallyl aldehyde dehydrogenase/ aldehyde dehydrogenase (NAD) from Arabidopsis thaliana
36% identity, 88% coverage

M5XBG8 Aldehyde dehydrogenase from Prunus persica
36% identity, 84% coverage

GRMZM2G103546 aldehyde dehydrogenase, dimeric NADP-preferring from Zea mays
37% identity, 85% coverage

Q6C0L0 Aldehyde dehydrogenase from Yarrowia lipolytica (strain CLIB 122 / E 150)
37% identity, 79% coverage

Q54DG1 Aldehyde dehydrogenase family 3 comG from Dictyostelium discoideum
37% identity, 89% coverage

PA0366 probable aldehyde dehydrogenase from Pseudomonas aeruginosa PAO1
36% identity, 89% coverage

aldh3H1 / CAE51203.1 putative aldehyde dehydrogenase, partial from Arabidopsis thaliana (see paper)
36% identity, 89% coverage

Maqu_3572 aldehyde dehydrogenase from Marinobacter aqueolei
36% identity, 88% coverage

C4QXC1 Aldehyde dehydrogenase from Komagataella phaffii (strain GS115 / ATCC 20864)
33% identity, 84% coverage

LOC105159459 aldehyde dehydrogenase from Sesamum indicum
36% identity, 89% coverage

K2MRC3 Aldehyde dehydrogenase from Trypanosoma cruzi marinkellei
39% identity, 83% coverage

ALDH_CRAPL / Q8VXQ2 Aldehyde dehydrogenase; Cp-ALDH; EC 1.2.1.3 from Craterostigma plantagineum (Blue gem) (Torenia plantagineum) (see paper)
ALDH / CAC84900.1 aldehyde dehydrogenase from Craterostigma plantagineum (see paper)
36% identity, 87% coverage

aldh3F1 / CAE48163.1 putative aldehyde dehydrogenase from Arabidopsis thaliana (see paper)
36% identity, 86% coverage

LOC101511680, XP_004502485 aldehyde dehydrogenase family 3 member H1-like from Cicer arietinum
35% identity, 87% coverage

AL3F1_ARATH / Q70E96 Aldehyde dehydrogenase family 3 member F1; EC 1.2.1.3 from Arabidopsis thaliana (Mouse-ear cress) (see paper)
AT4G36250 ALDH3F1 (Aldehyde Dehydrogenase 3F1); 3-chloroallyl aldehyde dehydrogenase/ aldehyde dehydrogenase (NAD) from Arabidopsis thaliana
NP_195348 aldehyde dehydrogenase 3F1 from Arabidopsis thaliana
36% identity, 86% coverage

LOC101515558, XP_004498346 aldehyde dehydrogenase family 3 member H1 from Cicer arietinum
36% identity, 86% coverage

CTRG_05010 conserved hypothetical protein from Candida tropicalis MYA-3404
35% identity, 61% coverage

AFUA_4G13500, Afu4g13500 aldehyde dehydrogenase, putative from Aspergillus fumigatus Af293
36% identity, 82% coverage

LOC101219569 aldehyde dehydrogenase family 3 member H1 from Cucumis sativus
36% identity, 87% coverage

XP_001335979 aldehyde dehydrogenase family 3 member A2 from Danio rerio
35% identity, 85% coverage

CYPRO_1155 aldehyde dehydrogenase family protein from Cyclonatronum proteinivorum
39% identity, 88% coverage

SPRG_08456 hypothetical protein from Saprolegnia parasitica CBS 223.65
37% identity, 80% coverage

R4YMB5 Aldehyde dehydrogenase from Oleispira antarctica RB-8
36% identity, 89% coverage

LOC103938024 aldehyde dehydrogenase family 3 member F1-like from Pyrus x bretschneideri
34% identity, 88% coverage

XP_004503899 aldehyde dehydrogenase family 3 member H1-like isoform X1 from Cicer arietinum
36% identity, 76% coverage

LOC21396538 aldehyde dehydrogenase family 3 member H1 from Morus notabilis
36% identity, 85% coverage

Q6C5T1 Aldehyde dehydrogenase from Yarrowia lipolytica (strain CLIB 122 / E 150)
36% identity, 80% coverage

TOL_0223 coniferyl aldehyde dehydrogenase from Thalassolituus oleivorans MIL-1
34% identity, 89% coverage

SLG_20400 aldehyde dehydrogenase family protein from Sphingobium sp. SYK-6
36% identity, 81% coverage

Bcen_5677 aldehyde dehydrogenase from Burkholderia cenocepacia AU 1054
35% identity, 87% coverage

B0W47_16410 coniferyl aldehyde dehydrogenase from Komagataeibacter nataicola
34% identity, 91% coverage

CCM_09155 fatty aldehyde dehydrogenase from Cordyceps militaris CM01
35% identity, 82% coverage

WP_016502080 coniferyl aldehyde dehydrogenase from Pseudomonas sp. SDS3-8
36% identity, 90% coverage

LOC101491914, XP_004507095 aldehyde dehydrogenase family 3 member F1-like from Cicer arietinum
35% identity, 85% coverage

SDRG_06419 hypothetical protein from Saprolegnia diclina VS20
35% identity, 83% coverage

Q583M9 Aldehyde dehydrogenase from Trypanosoma brucei brucei (strain 927/4 GUTat10.1)
36% identity, 78% coverage

An01g09260 uncharacterized protein from Aspergillus niger
35% identity, 84% coverage

GRMZM2G155502 aldehyde dehydrogenase 3B1 from Zea mays
36% identity, 85% coverage

LOC101497113, XP_004486968 aldehyde dehydrogenase family 3 member F1-like from Cicer arietinum
34% identity, 84% coverage

LOC21398387 aldehyde dehydrogenase family 3 member F1 from Morus notabilis
32% identity, 87% coverage

LOC100185488 LOW QUALITY PROTEIN: aldehyde dehydrogenase, dimeric NADP-preferring-like from Ciona intestinalis
35% identity, 93% coverage

PP_5120 coniferyl aldehyde dehydrogenase from Pseudomonas putida KT2440
PP5120 conifer aldehyde dehydrogenase, putative from Pseudomonas putida KT2440
36% identity, 90% coverage

CARD_GIBF5 / F6IBC7 Beta-apo-4'-carotenal oxygenase; Beta-apo-4'-carotenal dehydrogenase; EC 1.2.1.82 from Gibberella fujikuroi (strain CBS 195.34 / IMI 58289 / NRRL A-6831) (Bakanae and foot rot disease fungus) (Fusarium fujikuroi) (see paper)
F6IBC7 beta-apo-4'-carotenal oxygenase (EC 1.2.1.82) from Fusarium fujikuroi (see paper)
FFUJ_07503 related to aldehyde dehydrogenase from Fusarium fujikuroi IMI 58289
35% identity, 80% coverage

J3QRD1 Aldehyde dehydrogenase family 3 member A2 from Homo sapiens
40% identity, 84% coverage

LOC101511819, XP_012573731 LOW QUALITY PROTEIN: aldehyde dehydrogenase family 3 member F1-like from Cicer arietinum
34% identity, 85% coverage

VCA1067 aldehyde dehydrogenase from Vibrio cholerae O1 biovar eltor str. N16961
34% identity, 88% coverage

5nnoA / C9ZQX6 Structure of tbaldh3 complexed with NAD and an3057 aldehyde (see paper)
36% identity, 87% coverage

C9JMC5 Aldehyde dehydrogenase, dimeric NADP-preferring (Fragment) from Homo sapiens
37% identity, 84% coverage

MAV_5147 fatty aldehyde dehydrogenase from Mycobacterium avium 104
35% identity, 92% coverage

AL3I1_ARATH / Q8W033 Aldehyde dehydrogenase family 3 member I1, chloroplastic; AtALDH3; Ath-ALDH3; EC 1.2.1.3 from Arabidopsis thaliana (Mouse-ear cress) (see 4 papers)
Q8W033 glycolaldehyde dehydrogenase (EC 1.2.1.21); aldehyde dehydrogenase (NAD+) (EC 1.2.1.3) from Arabidopsis thaliana (see 2 papers)
NP_567962 aldehyde dehydrogenase 3I1 from Arabidopsis thaliana
AT4G34240 ALDH3I1 (ALDEHYDE DEHYDROGENASE 3I1; 3-chloroallyl aldehyde dehydrogenase/ aldehyde dehydrogenase (NAD) from Arabidopsis thaliana
35% identity, 76% coverage

TTHERM_00530250 aldehyde dehydrogenase family protein from Tetrahymena thermophila SB210
36% identity, 85% coverage

LOC112001970 LOW QUALITY PROTEIN: aldehyde dehydrogenase family 3 member F1 from Quercus suber
33% identity, 85% coverage

Q6CGN3 Aldehyde dehydrogenase from Yarrowia lipolytica (strain CLIB 122 / E 150)
33% identity, 79% coverage

MT0155 aldehyde dehydrogenase, class 3 from Mycobacterium tuberculosis CDC1551
38% identity, 86% coverage

AL3B2_HUMAN / P48448 Aldehyde dehydrogenase family 3 member B2; ALDH3B2; Aldehyde dehydrogenase 8; Long-chain fatty aldehyde dehydrogenase; EC 1.2.1.3; EC 1.2.1.48 from Homo sapiens (Human) (see 2 papers)
NP_001026786 aldehyde dehydrogenase family 3 member B2 isoform a from Homo sapiens
39% identity, 76% coverage

P96824 Aldehyde dehydrogenase from Mycobacterium tuberculosis (strain ATCC 25618 / H37Rv)
NP_214661 aldehyde dehydrogenase from Mycobacterium tuberculosis H37Rv
Rv0147 PROBABLE ALDEHYDE DEHYDROGENASE (NAD+) DEPENDENT from Mycobacterium tuberculosis H37Rv
38% identity, 83% coverage

PMT9312_0337 putative aldehyde dehydrogenase from Prochlorococcus marinus str. MIT 9312
32% identity, 88% coverage

ABO_0087 aldehyde dehydrogenase from Alcanivorax borkumensis SK2
34% identity, 89% coverage

XP_011394900 aldehyde dehydrogenase, variant from Neurospora crassa OR74A
34% identity, 83% coverage

Q6CG32 Aldehyde dehydrogenase from Yarrowia lipolytica (strain CLIB 122 / E 150)
34% identity, 80% coverage

YLO-1 / Q870P2 apo-4'-lycopenal dehydrogenase (EC 1.2.1.82) from Neurospora crassa (see paper)
CARD_NEUCR / Q1K615 Beta-apo-4'-carotenal oxygenase; Aldehyde dehydrogenase ylo-1; Beta-apo-4'-carotenal dehydrogenase; Yellow protein 1; EC 1.2.1.82 from Neurospora crassa (strain ATCC 24698 / 74-OR23-1A / CBS 708.71 / DSM 1257 / FGSC 987) (see 3 papers)
Q1K615 beta-apo-4'-carotenal oxygenase (EC 1.2.1.82) from Neurospora crassa (see 3 papers)
34% identity, 80% coverage

H6S33_000904 uncharacterized protein from Morchella sextelata
38% identity, 85% coverage

CtCNB1_1309 aldehyde dehydrogenase from Comamonas testosteroni CNB-2
CTCNB1_RS06680 coniferyl aldehyde dehydrogenase from Comamonas thiooxydans
34% identity, 89% coverage

HMPREF0010_01789 coniferyl aldehyde dehydrogenase from Acinetobacter baumannii ATCC 19606 = CIP 70.34 = JCM 6841
34% identity, 89% coverage

Q6H627 aldehyde dehydrogenase [NAD(P)+] (EC 1.2.1.5) from Oryza sativa (see paper)
33% identity, 87% coverage

ABBFA_003085 Coniferyl aldehyde dehydrogenase(CALDH) from Acinetobacter baumannii AB307-0294
34% identity, 89% coverage

Q7XR89 aldehyde dehydrogenase [NAD(P)+] (EC 1.2.1.5) from Oryza sativa (see paper)
35% identity, 83% coverage

ACIAD0503 coniferyl aldehyde dehydrogenase (CALDH) from Acinetobacter sp. ADP1
33% identity, 89% coverage

Q0DZ46 aldehyde dehydrogenase [NAD(P)+] (EC 1.2.1.5) from Oryza sativa (see paper)
32% identity, 86% coverage

CC1849 coniferyl aldehyde dehydrogenase from Caulobacter crescentus CB15
34% identity, 88% coverage

PMM0331 Putative aldehyde dehydrogenase from Prochlorococcus marinus sp. MED4
33% identity, 89% coverage

A1S_0449 coniferyl aldehyde dehydrogenase (CALDH) from Acinetobacter baumannii ATCC 17978
36% identity, 85% coverage

LOC112010330 aldehyde dehydrogenase family 3 member F1 from Quercus suber
34% identity, 87% coverage

A9762_21150 coniferyl aldehyde dehydrogenase from Pandoraea sp. ISTKB
36% identity, 87% coverage

LOC112010332 LOW QUALITY PROTEIN: aldehyde dehydrogenase family 3 member F1 from Quercus suber
31% identity, 82% coverage

PSHAa2139 putative aldehyde dehydrogenase from Pseudoalteromonas haloplanktis TAC125
32% identity, 91% coverage

RPA1687 putative aldehyde dehydrogenase from Rhodopseudomonas palustris CGA009
35% identity, 84% coverage

CALB_PSEUH / O86447 Coniferyl aldehyde dehydrogenase; CALDH; EC 1.2.1.68 from Pseudomonas sp. (strain HR199 / DSM 7063) (see 2 papers)
calB coniferyl-aldehyde dehydrogenase; EC 1.2.1.68 from Pseudomonas sp. HR199 (see paper)
32% identity, 89% coverage

HFD1 / Q04458 fatty aldehyde dehydrogenase HFD1 (EC 1.2.1.3; EC 1.2.1.64) from Saccharomyces cerevisiae (strain ATCC 204508 / S288c) (see 2 papers)
HFD1_YEAST / Q04458 Fatty aldehyde dehydrogenase HFD1; Hexadecenal dehydrogenase; EC 1.2.1.3; EC 1.2.1.64 from Saccharomyces cerevisiae (strain ATCC 204508 / S288c) (Baker's yeast) (see 9 papers)
YMR110C Hfd1p from Saccharomyces cerevisiae
32% identity, 81% coverage

CCM_02203 aldehyde dehydrogenase, putative from Cordyceps militaris CM01
34% identity, 71% coverage

WP_154716401 coniferyl aldehyde dehydrogenase from Sterolibacterium denitrificans
32% identity, 88% coverage

Q68D64 Aldehyde dehydrogenase family 3 member A2 (Fragment) from Homo sapiens
42% identity, 60% coverage

MSMEG_2242 coniferyl aldehyde dehydrogenase from Mycobacterium smegmatis str. MC2 155
33% identity, 85% coverage

WP_099520946 aldehyde dehydrogenase family protein from Paenibacillus sp. BIHB 4019
32% identity, 89% coverage

BL1124 fatty aldehyde dehydrogenase from Bifidobacterium longum NCC2705
38% identity, 53% coverage

BBMN68_872 aldehyde dehydrogenase family protein from Bifidobacterium longum subsp. longum BBMN68
38% identity, 53% coverage

BLJ_0565 aldehyde dehydrogenase family protein from Bifidobacterium longum subsp. longum JDM301
37% identity, 53% coverage

Ga0061065_12214 coniferyl aldehyde dehydrogenase from Marinomonas fungiae
32% identity, 85% coverage

MSMEG_0889, MSMEI_0868 aldehyde dehydrogenase family protein from Mycolicibacterium smegmatis MC2 155
A0QQV4 Aldehyde dehydrogenase from Mycolicibacterium smegmatis (strain ATCC 700084 / mc(2)155)
MSMEG_0889 succinic semialdehyde dehydrogenase from Mycobacterium smegmatis str. MC2 155
33% identity, 80% coverage

LOC107802760 aldehyde dehydrogenase family 3 member H1-like from Nicotiana tabacum
38% identity, 61% coverage

Afu8g02310 aldehyde dehydrogenase ALDH from Aspergillus fumigatus Af293
31% identity, 70% coverage

LOC112010331 aldehyde dehydrogenase family 3 member F1 from Quercus suber
37% identity, 61% coverage

WP_076384861 aldehyde dehydrogenase family protein from Pseudomonas sp. A214
32% identity, 85% coverage

XP_813115 aldehyde dehydrogenase, putative from Trypanosoma cruzi
30% identity, 75% coverage

AL221_ARATH / Q0WSF1 Aldehyde dehydrogenase 22A1; Novel aldehyde dehydrogenase family 22 member A1; EC 1.2.1.3 from Arabidopsis thaliana (Mouse-ear cress) (see paper)
aldh22A1 / CAE48165.1 putative aldehyde dehydrogenase from Arabidopsis thaliana (see paper)
AT3G66658 ALDH22a1 (Aldehyde Dehydrogenase 22a1); 3-chloroallyl aldehyde dehydrogenase/ oxidoreductase from Arabidopsis thaliana
30% identity, 70% coverage

ald / Q4VKV0 4,4'-diapolycopenedial dehydrogenase (EC 1.2.99.10) from Methylomonas sp. (see 4 papers)
ALD_METSP / Q4VKV0 4,4'-diapolycopene aldehyde oxidase; 4,4'-diapolycopenedial dehydrogenase; 4,4'-diapolycopenoate synthase; EC 1.2.99.10 from Methylomonas sp. (see paper)
Q4VKV0 4,4'-diapolycopenoate synthase (EC 1.2.99.10) from Methylomonas sp. (see paper)
30% identity, 75% coverage

BADH1_ORYSJ / O24174 Betaine aldehyde dehydrogenase 1; OsBADH1; EC 1.2.1.8 from Oryza sativa subsp. japonica (Rice) (see 3 papers)
O24174 betaine-aldehyde dehydrogenase (EC 1.2.1.8) from Oryza sativa Japonica Group (see paper)
33% identity, 78% coverage

R0KX92 Aldehyde dehydrogenase, mitochondrial (Fragment) from Anas platyrhynchos
30% identity, 88% coverage

Q38AY7 Aldehyde dehydrogenase, putative from Trypanosoma brucei brucei (strain 927/4 GUTat10.1)
31% identity, 74% coverage

WP_019069277 aldehyde dehydrogenase from Streptomyces hokutonensis
35% identity, 66% coverage

WP_059082879 aldehyde dehydrogenase from Streptomyces scabiei
35% identity, 66% coverage

WP_055635242 aldehyde dehydrogenase from Streptomyces griseoruber
35% identity, 66% coverage

U3IM27 Aldehyde dehydrogenase, mitochondrial from Anas platyrhynchos platyrhynchos
30% identity, 94% coverage

RHA1_RS29865 aldehyde dehydrogenase family protein from Rhodococcus jostii RHA1
32% identity, 79% coverage

HISP_04880 aldehyde dehydrogenase from Haloarcula hispanica N601
32% identity, 84% coverage

WP_059211011 aldehyde dehydrogenase from Streptomyces canus
35% identity, 66% coverage

KW89_2719 aldehyde dehydrogenase family protein from Piscirickettsia salmonis
31% identity, 87% coverage

WP_005473480 aldehyde dehydrogenase from Streptomyces bottropensis ATCC 25435
34% identity, 66% coverage

WP_057613570 aldehyde dehydrogenase from Streptomyces sp. Root369
34% identity, 66% coverage

ligV / A2PZP3 vanillin dehydrogenase monomer (EC 1.2.1.67) from Sphingomonas paucimobilis (see 2 papers)
A2PZP3 vanillin dehydrogenase (EC 1.2.1.67) from Sphingomonas paucimobilis (see paper)
G2IMC6 vanillin dehydrogenase (EC 1.2.1.67) from Sphingobium sp. SYK-6 (see paper)
31% identity, 84% coverage

BMEI0024 L-SORBOSONE DEHYDROGENASE, NAD(P) DEPENDENT from Brucella melitensis 16M
33% identity, 65% coverage

P27463 retinal dehydrogenase (EC 1.2.1.36) from Gallus gallus (see 2 papers)
30% identity, 84% coverage

AL1A1_RABIT / Q8MI17 Aldehyde dehydrogenase 1A1; 3-deoxyglucosone dehydrogenase; ALDH-E1; ALHDII; Aldehyde dehydrogenase family 1 member A1; Aldehyde dehydrogenase, cytosolic; Retinal dehydrogenase 1; RALDH 1; RalDH1; EC 1.2.1.19; EC 1.2.1.28; EC 1.2.1.3; EC 1.2.1.36 from Oryctolagus cuniculus (Rabbit) (see paper)
30% identity, 86% coverage

Q9DD46 retinal dehydrogenase (EC 1.2.1.36) from Gallus gallus (see paper)
NP_990000 aldehyde dehydrogenase family 1 member A3 from Gallus gallus
30% identity, 83% coverage

WP_037697438 aldehyde dehydrogenase from Streptomyces scabiei
34% identity, 66% coverage

Q0QHK6 1-pyrroline-5-carboxylate dehydrogenase 2 from Glossina morsitans morsitans
33% identity, 63% coverage

Tery_2599 Aldehyde dehydrogenase (NAD+) from Trichodesmium erythraeum IMS101
31% identity, 82% coverage

WP_062046703 aldehyde dehydrogenase from Streptomyces canus
34% identity, 66% coverage

Q1LBV2 Aldehyde dehydrogenase from Cupriavidus metallidurans (strain ATCC 43123 / DSM 2839 / NBRC 102507 / CH34)
29% identity, 89% coverage

Q6TH48 Aldehyde dehydrogenase, mitochondrial from Danio rerio
31% identity, 79% coverage

CwatDRAFT_0842 Aldehyde dehydrogenase (NAD+) from Crocosphaera watsonii WH 8501
30% identity, 82% coverage

AL1A3_MOUSE / Q9JHW9 Retinaldehyde dehydrogenase 3; RALDH-3; RalDH3; Aldehyde dehydrogenase 6; Aldehyde dehydrogenase family 1 member A3; Aldh1a3; EC 1.2.1.36 from Mus musculus (Mouse) (see 7 papers)
Q9JHW9 retinal dehydrogenase (EC 1.2.1.36); aldehyde dehydrogenase [NAD(P)+] (EC 1.2.1.5) from Mus musculus (see 4 papers)
Q3UIA4 Aldehyde dehydrogenase from Mus musculus
NP_444310 retinaldehyde dehydrogenase 3 from Mus musculus
31% identity, 79% coverage

7a6qA / P47895 Crystal structure of human aldehyde dehydrogenase 1a3 in complex with selective nr6 inhibitor compound (see paper)
30% identity, 83% coverage

NP_695212 retinaldehyde dehydrogenase 3 from Rattus norvegicus
30% identity, 79% coverage

Q8K4D8 retinal dehydrogenase (EC 1.2.1.36) from Rattus norvegicus (see paper)
30% identity, 79% coverage

ALDH1A3 / P47895 retinal dehydrogenase 3 (EC 1.2.1.36) from Homo sapiens (see 3 papers)
AL1A3_HUMAN / P47895 Retinaldehyde dehydrogenase 3; RALDH-3; RalDH3; Aldehyde dehydrogenase 6; Aldehyde dehydrogenase family 1 member A3; ALDH1A3; EC 1.2.1.36 from Homo sapiens (Human) (see 9 papers)
P47895 retinal dehydrogenase (EC 1.2.1.36); aldehyde dehydrogenase [NAD(P)+] (EC 1.2.1.5) from Homo sapiens (see 6 papers)
NP_000684 retinaldehyde dehydrogenase 3 isoform 1 from Homo sapiens
30% identity, 79% coverage

SXYL_00108 NAD-dependent succinate-semialdehyde dehydrogenase from Staphylococcus xylosus
30% identity, 89% coverage

U3IF86 Aldehyde dehydrogenase from Anas platyrhynchos platyrhynchos
30% identity, 83% coverage

Q28EU7 Aldehyde dehydrogenase, mitochondrial from Xenopus tropicalis
30% identity, 78% coverage

A9EEP5 Aldehyde dehydrogenase family 1 subfamily A3 from Rattus norvegicus
30% identity, 79% coverage

SSDH2_SCHPO / Q9UTM8 Putative succinate-semialdehyde dehydrogenase C139.05 [NADP(+)]; SSDH; EC 1.2.1.16 from Schizosaccharomyces pombe (strain 972 / ATCC 24843) (Fission yeast) (see paper)
SPAC139.05 succinate-semialdehyde dehydrogenase (predicted) from Schizosaccharomyces pombe
29% identity, 82% coverage

ACIAD1725 hydroxybenzaldehyde dehydrogenase from Acinetobacter sp. ADP1
33% identity, 68% coverage

Smp_022960 putative aldehyde dehydrogenase from Schistosoma mansoni
30% identity, 82% coverage

SPAC9E9.09c aldehyde dehydrogenase from Schizosaccharomyces pombe
O14293 Putative aldehyde dehydrogenase-like protein C9E9.09c from Schizosaccharomyces pombe (strain 972 / ATCC 24843)
31% identity, 80% coverage

ZP_01726360 aldehyde dehydrogenase from Cyanothece sp. CCY 0110
30% identity, 82% coverage

XP_623084 aldehyde dehydrogenase, mitochondrial from Apis mellifera
34% identity, 65% coverage

AL1A1_MOUSE / P24549 Aldehyde dehydrogenase 1A1; 3-deoxyglucosone dehydrogenase; ALDH-E1; ALHDII; Aldehyde dehydrogenase family 1 member A1; Aldehyde dehydrogenase, cytosolic; Retinal dehydrogenase 1; RALDH 1; RalDH1; EC 1.2.1.19; EC 1.2.1.28; EC 1.2.1.3; EC 1.2.1.36 from Mus musculus (Mouse) (see 4 papers)
P24549 retinal dehydrogenase (EC 1.2.1.36) from Mus musculus (see 3 papers)
NP_038495 aldehyde dehydrogenase 1A1 from Mus musculus
31% identity, 85% coverage

A1S_1110 hydroxybenzaldehyde dehydrogenase from Acinetobacter baumannii ATCC 17978
33% identity, 75% coverage

B5X2T3 Aldehyde dehydrogenase, mitochondrial from Salmo salar
30% identity, 79% coverage

XP_002295797 betaine aldehyde dehydrogenase from Thalassiosira pseudonana CCMP1335
30% identity, 80% coverage

praB / C4TP02 2-hydroxymuconate-6-semialdehyde dehydrogenase (EC 1.2.1.85) from Paenibacillus sp. JJ-1b (see paper)
praB / BAH79100.1 2-hydroxymuconate-6-semialdehyde dehydrogenase from Paenibacillus sp. JJ-1b (see paper)
27% identity, 88% coverage

NF2_RS14385 aldehyde dehydrogenase family protein from Nocardia farcinica NBRC 15532
31% identity, 82% coverage

NP_956784 aldehyde dehydrogenase 2 family member, tandem duplicate 1 from Danio rerio
Q7SXU3 Aldehyde dehydrogenase, mitochondrial from Danio rerio
31% identity, 79% coverage

MAA_02517 aldehyde dehydrogenase from Metarhizium robertsii ARSEF 23
30% identity, 82% coverage

WP_056264373 aldehyde dehydrogenase from Hydrogenophaga sp. Root209
32% identity, 68% coverage

Q4WPA5 Aldehyde dehydrogenase, putative from Aspergillus fumigatus (strain ATCC MYA-4609 / CBS 101355 / FGSC A1100 / Af293)
Afu4g08600 aldehyde dehydrogenase, putative from Aspergillus fumigatus Af293
30% identity, 73% coverage

Bphyt_4023 Aldehyde Dehydrogenase from Burkholderia phytofirmans PsJN
32% identity, 75% coverage

XP_002939310 aldehyde dehydrogenase family 1 member A3 from Xenopus tropicalis
30% identity, 84% coverage

HD73_0368 NADP-dependent succinate-semialdehyde dehydrogenase from Bacillus thuringiensis serovar kurstaki str. HD73
31% identity, 83% coverage

NCU03415 aldehyde dehydrogenase from Neurospora crassa OR74A
32% identity, 83% coverage

P81178 Aldehyde dehydrogenase, mitochondrial from Mesocricetus auratus
29% identity, 86% coverage

NP_001124747 aldehyde dehydrogenase, mitochondrial precursor from Pongo abelii
Q5RF00 Aldehyde dehydrogenase, mitochondrial from Pongo abelii
29% identity, 82% coverage

HWX41_RS06850 aldehyde dehydrogenase family protein from Bacillus paramycoides
33% identity, 68% coverage

Q94IC0 betaine-aldehyde dehydrogenase (EC 1.2.1.8) from Hordeum vulgare (see paper)
33% identity, 76% coverage

ALDH2 / P05091 mitochondrial aldehyde dehydrogenase subunit (EC 1.2.1.3; EC 1.2.1.39) from Homo sapiens (see 14 papers)
ALDH2_HUMAN / P05091 Aldehyde dehydrogenase, mitochondrial; ALDH class 2; ALDH-E2; ALDHI; EC 1.2.1.3 from Homo sapiens (Human) (see 3 papers)
P05091 aldehyde dehydrogenase (NAD+) (EC 1.2.1.3) from Homo sapiens (see 13 papers)
NP_000681 aldehyde dehydrogenase, mitochondrial isoform 1 precursor from Homo sapiens
29% identity, 82% coverage

Q6C2W9 YALI0F04444p from Yarrowia lipolytica (strain CLIB 122 / E 150)
31% identity, 78% coverage

Rmet_5544 aldehyde dehydrogenase from Ralstonia metallidurans CH34
31% identity, 67% coverage

Q53FB6 Aldehyde dehydrogenase, mitochondrial (Fragment) from Homo sapiens
29% identity, 82% coverage

Q3UJW1 Aldehyde dehydrogenase, mitochondrial from Mus musculus
29% identity, 82% coverage

New Search

For advice on how to use these tools together, see Interactive tools for functional annotation of bacterial genomes.

Statistics

The PaperBLAST database links 789,361 different protein sequences to 1,256,019 scientific articles. Searches against EuropePMC were last performed on January 10 2025.

How It Works

PaperBLAST builds a database of protein sequences that are linked to scientific articles. These links come from automated text searches against the articles in EuropePMC and from manually-curated information from GeneRIF, UniProtKB/Swiss-Prot, BRENDA, CAZy (as made available by dbCAN), BioLiP, CharProtDB, MetaCyc, EcoCyc, TCDB, REBASE, the Fitness Browser, and a subset of the European Nucleotide Archive with the /experiment tag. Given this database and a protein sequence query, PaperBLAST uses protein-protein BLAST to find similar sequences with E < 0.001.

To build the database, we query EuropePMC with locus tags, with RefSeq protein identifiers, and with UniProt accessions. We obtain the locus tags from RefSeq or from MicrobesOnline. We use queries of the form "locus_tag AND genus_name" to try to ensure that the paper is actually discussing that gene. Because EuropePMC indexes most recent biomedical papers, even if they are not open access, some of the links may be to papers that you cannot read or that our computers cannot read. We query each of these identifiers that appears in the open access part of EuropePMC, as well as every locus tag that appears in the 500 most-referenced genomes, so that a gene may appear in the PaperBLAST results even though none of the papers that mention it are open access. We also incorporate text-mined links from EuropePMC that link open access articles to UniProt or RefSeq identifiers. (This yields some additional links because EuropePMC uses different heuristics for their text mining than we do.)

For every article that mentions a locus tag, a RefSeq protein identifier, or a UniProt accession, we try to select one or two snippets of text that refer to the protein. If we cannot get access to the full text, we try to select a snippet from the abstract, but unfortunately, unique identifiers such as locus tags are rarely provided in abstracts.

PaperBLAST also incorporates manually-curated protein functions:

Except for GeneRIF and ENA, the curated entries include a short curated description of the protein's function. For entries from BioLiP, the protein's function may not be known beyond binding to the ligand. Many of these entries also link to articles in PubMed.

For more information see the PaperBLAST paper (mSystems 2017) or the code. You can download PaperBLAST's database here.

Changes to PaperBLAST since the paper was written:

Many of these changes are described in Interactive tools for functional annotation of bacterial genomes.

Secrets

PaperBLAST cannot provide snippets for many of the papers that are published in non-open-access journals. This limitation applies even if the paper is marked as "free" on the publisher's web site and is available in PubmedCentral or EuropePMC. If a journal that you publish in is marked as "secret," please consider publishing elsewhere.

Omissions from the PaperBLAST Database

Many important articles are missing from PaperBLAST, either because the article's full text is not in EuropePMC (as for many older articles), or because the paper does not mention a protein identifier such as a locus tag, or because of PaperBLAST's heuristics. If you notice an article that characterizes a protein's function but is missing from PaperBLAST, please notify the curators at UniProt or add an entry to GeneRIF. Entries in either of these databases will eventually be incorporated into PaperBLAST. Note that to add an entry to UniProt, you will need to find the UniProt identifier for the protein. If the protein is not already in UniProt, you can ask them to create an entry. To add an entry to GeneRIF, you will need an NCBI Gene identifier, but unfortunately many prokaryotic proteins in RefSeq do not have corresponding Gene identifers.

References

PaperBLAST: Text-mining papers for information about homologs.
M. N. Price and A. P. Arkin (2017). mSystems, 10.1128/mSystems.00039-17.

Europe PMC in 2017.
M. Levchenko et al (2017). Nucleic Acids Research, 10.1093/nar/gkx1005.

Gene indexing: characterization and analysis of NLM's GeneRIFs.
J. A. Mitchell et al (2003). AMIA Annu Symp Proc 2003:460-464.

UniProt: the universal protein knowledgebase.
The UniProt Consortium (2016). Nucleic Acids Research, 10.1093/nar/gkw1099.

BRENDA in 2017: new perspectives and new tools in BRENDA.
S. Placzek et al (2017). Nucleic Acids Research, 10.1093/nar/gkw952.

The EcoCyc database: reflecting new knowledge about Escherichia coli K-12.
I. M. Keeseler et al (2016). Nucleic Acids Research, 10.1093/nar/gkw1003.

The MetaCyc database of metabolic pathways and enzymes.
R. Caspi et al (2018). Nucleic Acids Research, 10.1093/nar/gkx935.

CharProtDB: a database of experimentally characterized protein annotations.
R. Madupu et al (2012). Nucleic Acids Research, 10.1093/nar/gkr1133.

The carbohydrate-active enzymes database (CAZy) in 2013.
V. Lombard et al (2014). Nucleic Acids Research, 10.1093/nar/gkt1178.

The Transporter Classification Database (TCDB): recent advances
M. H. Saier, Jr. et al (2016). Nucleic Acids Research, 10.1093/nar/gkv1103.

REBASE - a database for DNA restriction and modification: enzymes, genes and genomes.
R. J. Roberts et al (2015). Nucleic Acids Research, 10.1093/nar/gku1046.

Deep annotation of protein function across diverse bacteria from mutant phenotypes.
M. N. Price et al (2016). bioRxiv, 10.1101/072470.

by Morgan Price, Arkin group
Lawrence Berkeley National Laboratory