Align acetaldehyde dehydrogenase (EC 1.2.1.3) (characterized)
to candidate WP_011842384.1 RSPH17029_RS17490 aldehyde dehydrogenase family protein
Query= reanno::Burk376:H281DRAFT_01117 (795 letters) >NCBI__GCF_000015985.1:WP_011842384.1 Length = 791 Score = 831 bits (2146), Expect = 0.0 Identities = 445/799 (55%), Positives = 543/799 (67%), Gaps = 14/799 (1%) Query: 1 MSVAEYFSSMDYGPAPEDDQPARQWLAQHEARFGHFIGGAWHAPASGAQFVSHAPASGER 60 MS+ + SMDYGPAPE A+ WLA GH+I GA+ + A V + PA+GE Sbjct: 1 MSIKDIMESMDYGPAPEAATDAKAWLAARGHALGHYIDGAFTGAETAAIEVEN-PATGEI 59 Query: 61 LADIAQGDAADIDAALAAARAAQPGWLALGGKGRARHLYALARMVQRHSRLFAVLEALDN 120 LA I A+I+AA+AAARAA GW L G RAR+LYA+AR +Q+ R F+VLE LDN Sbjct: 60 LARIPAAGEAEIEAAVAAARAAFSGWSQLPGFERARYLYAIARGLQKRERFFSVLETLDN 119 Query: 121 GKPIRETRDLDVPLVARHFLHHAGWAQLQDSEFADHAPLGVIGQIVPWNFPLLMLAWKIA 180 GK IRETR DVPL RHF HHAGWA + EF H PLGV GQ++PWNFP+LMLAWKIA Sbjct: 120 GKAIRETRTADVPLAIRHFYHHAGWAAVLGEEFPGHEPLGVCGQVIPWNFPMLMLAWKIA 179 Query: 181 PAIATGNCVVLKPAEYTPLTALLFAELAHQAGLPAGVLNVVTGDGSTGAALVEHPQVDKI 240 PA+A GN VVLKPA+ TPLTA+ FAE+ + GLP GV+N+V G TGA LV HP V K+ Sbjct: 180 PALAAGNTVVLKPADLTPLTAVAFAEMLDEIGLPRGVVNIVHGGAETGALLVRHPGVAKV 239 Query: 241 AFTGSTEVGKLIRSVTAGSGKSLTLELGGKSPFIVFDDADLDGAVEGVVDAIWFNQGQVC 300 AFTGST VG+ IR TAGSGKSLTLELGGKSPF+V DADLD AVEGVV+ +WFNQG+VC Sbjct: 240 AFTGSTAVGREIRRATAGSGKSLTLELGGKSPFVVCADADLDAAVEGVVEGVWFNQGEVC 299 Query: 301 CAGSRLLVQEGIEARFIAKLKRRMETLRVGPSLDKSIDMGAIVDPVQLERIHSLVETGRR 360 CAGSRLL+QEGI RF+AKL+ RME +RVG LDKS DMGAIV Q RI L+ R Sbjct: 300 CAGSRLLLQEGIAERFLAKLRARMEKIRVGDPLDKSTDMGAIVSARQKARIEELIAGAAR 359 Query: 361 EGCAIWQAADTPLPANGCFYPPTLVTNVAPASTLAQEEIFGPVLVTMSFRTPDEAIALAN 420 EG + QAA PLPA G F P + PA+T+AQ EIFGP+ VT +FRT DEA+ALAN Sbjct: 360 EGYRLEQAA-CPLPAAGHFVAPGFFADTEPAATVAQVEIFGPIAVTTTFRTVDEAVALAN 418 Query: 421 NSRYGLAASVWSETIGRALDVAPRLAAGVVWVNATNLFDAAVGFGGYRESGYGREGGREG 480 N+ YGLAASVWSE I A ++A R+ AGVVW+NA+NLFDA FGG +ESG+GREG REG Sbjct: 419 NTPYGLAASVWSENINAATELAARIRAGVVWINASNLFDAGASFGGMKESGFGREGAREG 478 Query: 481 IHEYLKPRAWLNLPKRQP---VSAATNASDDRQVSNLA-LVDRTAKLFIGGKQVRPDSGY 536 + YL+PR R P V+ + NL+ L+DRT K +IGG QVRPD G Sbjct: 479 LGAYLRPRT-----PRGPEALVAPVDFTAHTGMGGNLSGLIDRTMKNYIGGAQVRPDGGA 533 Query: 537 SLPVHAPDGTRVGEVGEGNRKDIRNAVQAARAAQKWSQASTHNRAQVLFYLAENLAVRAD 596 S V P G +G RKDIRNAV+AA A+ W+ A+ H RAQVLF+LAEN+A RA+ Sbjct: 534 SYVVRGPKGEALGLAPVSGRKDIRNAVEAALKAKGWA-ANAHGRAQVLFFLAENIAARAE 592 Query: 597 EFAHQLTVRNGATDAAAHAEVEASVTRLFTYAAWADKFDGAVHTPPLRGVALAMHEPLGV 656 + A L V+ GA + A AEV + + R+F YA ADK DG +H R + L++ EPLGV Sbjct: 593 DLAAAL-VQGGAGRSEAAAEVRSLIERVFFYAGMADKDDGRIHATKPRHLTLSVKEPLGV 651 Query: 657 IGIACPDEAPLLAFVSLAAPALAMGNRVVVLPGEACPLAVTDFYQVVETSDVPGGVVNIV 716 +G+ PDEAPLL+ +SL P +A GNRVV +P A L Q+ +TSD+PGGVVN+V Sbjct: 652 VGVLAPDEAPLLSLMSLILPLIAAGNRVVAVPSPAQALLAQPLTQIFDTSDLPGGVVNLV 711 Query: 717 TGKREALLPALARHDDVDAVWCFGSAADATLIERESVGNLKRTFTDYGRQFDW-FDRASE 775 TG R L LA HD VD +W GSA A +E S GNLK+ +T+ GR DW D + Sbjct: 712 TGDRNLLARTLAEHDAVDGIWYHGSAKGAAEVEALSAGNLKQVWTNGGRALDWNADAVAC 771 Query: 776 GLPFLRQAVQVKNIWIPYG 794 G +L +A Q+K IW+PYG Sbjct: 772 GRSWLDRATQIKTIWVPYG 790 Lambda K H 0.320 0.135 0.411 Gapped Lambda K H 0.267 0.0410 0.140 Matrix: BLOSUM62 Gap Penalties: Existence: 11, Extension: 1 Number of Sequences: 1 Number of Hits to DB: 1718 Number of extensions: 76 Number of successful extensions: 10 Number of sequences better than 1.0e-02: 1 Number of HSP's gapped: 1 Number of HSP's successfully gapped: 1 Length of query: 795 Length of database: 791 Length adjustment: 41 Effective length of query: 754 Effective length of database: 750 Effective search space: 565500 Effective search space used: 565500 Neighboring words threshold: 11 Window for multiple hits: 40 X1: 16 ( 7.4 bits) X2: 38 (14.6 bits) X3: 64 (24.7 bits) S1: 41 (21.8 bits) S2: 55 (25.8 bits)
This GapMind analysis is from Apr 10 2024. The underlying query database was built on Sep 17 2021.
Each pathway is defined by a set of rules based on individual steps or genes. Candidates for each step are identified by using ublast (a fast alternative to protein BLAST) against a database of manually-curated proteins (most of which are experimentally characterized) or by using HMMer with enzyme models (usually from TIGRFam). Ublast hits may be split across two different proteins.
A candidate for a step is "high confidence" if either:
Otherwise, a candidate is "medium confidence" if either:
Other blast hits with at least 50% coverage are "low confidence."
Steps with no high- or medium-confidence candidates may be considered "gaps." For the typical bacterium that can make all 20 amino acids, there are 1-2 gaps in amino acid biosynthesis pathways. For diverse bacteria and archaea that can utilize a carbon source, there is a complete high-confidence catabolic pathway (including a transporter) just 38% of the time, and there is a complete medium-confidence pathway 63% of the time. Gaps may be due to:
GapMind relies on the predicted proteins in the genome and does not search the six-frame translation. In most cases, you can search the six-frame translation by clicking on links to Curated BLAST for each step definition (in the per-step page).
For more information, see:
If you notice any errors or omissions in the step descriptions, or any questionable results, please let us know
by Morgan Price, Arkin group, Lawrence Berkeley National Laboratory