GapMind for Amino acid biosynthesis

 

Alignments for a candidate for ilvD in Haloferax volcanii DS2

Align Dihydroxy-acid dehydratase; DAD; EC 4.2.1.9 (characterized)
to candidate WP_004042767.1 C498_RS08250 dihydroxy-acid dehydratase

Query= SwissProt::P9WKJ5
         (575 letters)



>NCBI__GCF_000337315.1:WP_004042767.1
          Length = 584

 Score =  561 bits (1446), Expect = e-164
 Identities = 295/575 (51%), Positives = 391/575 (68%), Gaps = 8/575 (1%)

Query: 2   PQTTDEAASVSTVADIKPRSRDVTDGLEKAAARGMLRAVGMDDEDFAKPQIGVASSWNEI 61
           P+  D     S+  D   RS +VT+G +KA  R M RA+G DDEDF+ P +GV +   +I
Sbjct: 6   PREEDPDDVFSSGKDPNLRSTEVTEGPDKAPHRAMFRAMGFDDEDFSSPMVGVPNPAADI 65

Query: 62  TPCNLSLDRLANAVKEGVFSAGGYPLEFGTISVSDGISMGHEGMHFSLVSREVIADSVEV 121
           TPCN+ LD +A+A  EG+ +AGG P+EFGT+++SD ISMG EGM  SL+SREVIADSVE+
Sbjct: 66  TPCNVHLDDVADAAIEGIDAAGGMPIEFGTVTISDAISMGTEGMKASLISREVIADSVEL 125

Query: 122 VMQAERLDGSVLLAGCDKSLPGMLMAAARLDLAAVFLYAGSILPGRAKLSDGSERDVTII 181
           V   ER+D  V +AGCDK+LPGM+MAA R DL +VFLY GSI+PG+ +      R+VT+ 
Sbjct: 126 VSFGERMDALVTVAGCDKNLPGMMMAAIRTDLPSVFLYGGSIMPGQHE-----GREVTVQ 180

Query: 182 DAFEAVGACSRGLMSRADVDAIERAICPGEGACGGMYTANTMASAAEALGMSLPGSAAPP 241
           + FE VG  + G MS  ++D +ER  CPG G+CGGM+TANTMAS +EALGM+  GSA+ P
Sbjct: 181 NVFEGVGTYAEGDMSADELDDLERHACPGAGSCGGMFTANTMASISEALGMAPLGSASAP 240

Query: 242 ATDRRRDGFARRSGQAVVELLRRGITARDILTKEAFENAIAVVMAFGGSTNAVLHLLAIA 301
           A    R   ARR+G+ V++ +       DILTK++FENAI + +A GGSTNAVLHLLA+A
Sbjct: 241 AESDERYENARRAGEVVLDCVENDRRPSDILTKKSFENAITLQVATGGSTNAVLHLLALA 300

Query: 302 HEANVALSLQDFSRIGSGVPHLADVKPFGRHVMSDVDHIGGVPVVMKALLDAGLLHGDCL 361
            EA V L +++F+ I    P +A+++P G  VM+D+  IGG+PVV++ L++AGL HGD +
Sbjct: 301 AEAGVDLDIEEFNEISRRTPKIANLQPGGTRVMNDLHEIGGIPVVIRRLVEAGLFHGDAM 360

Query: 362 TVTGHTMAENLAAITPPDPDG---KVLRALANPIHPSGGITILHGSLAPEGAVVKTAGFD 418
           TVTG T+AE L  +  PD DG     L  +  P    G I IL G+LAP+GAV+K  G D
Sbjct: 361 TVTGRTIAEELDHLDLPDDDGLEADFLYTVDEPYQDEGAIKILTGNLAPDGAVLKVTGDD 420

Query: 419 SDVFEGTARVFDGERAALDALEDGTITVGDAVVIRYEGPKGGPGMREMLAITGAIKGAGL 478
           +    G ARVF+ E  A+  +++G I  GD + IR EGP+GGPGMREML +T A+ G G 
Sbjct: 421 AFHHTGPARVFENEEDAMRYVQEGHIEEGDVIAIRNEGPRGGPGMREMLGVTAAVVGQGH 480

Query: 479 GKDVLLLTDGRFSGGTTGLCVGHIAPEAVDGGPIALLRNGDRIRLDVAGRVLDVLADPAE 538
             DV LLTDGRFSG T G  VGH+APEA +GGPI LL +GD + +D+  R L V     E
Sbjct: 481 EDDVALLTDGRFSGATRGPMVGHVAPEAAEGGPIGLLEDGDEVTVDIPNRELSVDLSDEE 540

Query: 539 FASRQQDFSPPPPRYTTGVLSKYVKLVSSAAVGAV 573
             +R++D+ P PP YT+GVL+KY +   SAA GAV
Sbjct: 541 LEARKEDWEPKPPAYTSGVLAKYARDFGSAANGAV 575


Lambda     K      H
   0.318    0.136    0.393 

Gapped
Lambda     K      H
   0.267   0.0410    0.140 


Matrix: BLOSUM62
Gap Penalties: Existence: 11, Extension: 1
Number of Sequences: 1
Number of Hits to DB: 1066
Number of extensions: 54
Number of successful extensions: 5
Number of sequences better than 1.0e-02: 1
Number of HSP's gapped: 1
Number of HSP's successfully gapped: 1
Length of query: 575
Length of database: 584
Length adjustment: 36
Effective length of query: 539
Effective length of database: 548
Effective search space:   295372
Effective search space used:   295372
Neighboring words threshold: 11
Window for multiple hits: 40
X1: 16 ( 7.3 bits)
X2: 38 (14.6 bits)
X3: 64 (24.7 bits)
S1: 41 (21.7 bits)
S2: 53 (25.0 bits)

Align candidate WP_004042767.1 C498_RS08250 (dihydroxy-acid dehydratase)
to HMM TIGR00110 (ilvD: dihydroxy-acid dehydratase (EC 4.2.1.9))

# hmmsearch :: search profile(s) against a sequence database
# HMMER 3.3.1 (Jul 2020); http://hmmer.org/
# Copyright (C) 2020 Howard Hughes Medical Institute.
# Freely distributed under the BSD open source license.
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# query HMM file:                  ../tmp/path.aa/TIGR00110.hmm
# target sequence database:        /tmp/gapView.2144531.genome.faa
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Query:       TIGR00110  [M=543]
Accession:   TIGR00110
Description: ilvD: dihydroxy-acid dehydratase
Scores for complete sequences (score includes all domains):
   --- full sequence ---   --- best 1 domain ---    -#dom-
    E-value  score  bias    E-value  score  bias    exp  N  Sequence                             Description
    ------- ------ -----    ------- ------ -----   ---- --  --------                             -----------
   4.2e-217  707.9   2.4   4.9e-217  707.7   2.4    1.0  1  NCBI__GCF_000337315.1:WP_004042767.1  


Domain annotation for each sequence (and alignments):
>> NCBI__GCF_000337315.1:WP_004042767.1  
   #    score  bias  c-Evalue  i-Evalue hmmfrom  hmm to    alifrom  ali to    envfrom  env to     acc
 ---   ------ ----- --------- --------- ------- -------    ------- -------    ------- -------    ----
   1 !  707.7   2.4  4.9e-217  4.9e-217       1     541 [.      36     576 ..      36     578 .. 0.99

  Alignments for each domain:
  == domain 1  score: 707.7 bits;  conditional E-value: 4.9e-217
                             TIGR00110   1 aarallkatGlkdedlekPiiavvnsyteivPghvhlkdlaklvkeeieaaGgvakefntiavsDGiamgheG 73 
                                           ++ra+++a+G+ ded++ P+++v n   +i+P++vhl+d+a+++ e+i+aaGg++ ef+t+++sD i+mg+eG
  NCBI__GCF_000337315.1:WP_004042767.1  36 PHRAMFRAMGFDDEDFSSPMVGVPNPAADITPCNVHLDDVADAAIEGIDAAGGMPIEFGTVTISDAISMGTEG 108
                                           69*********************************************************************** PP

                             TIGR00110  74 mkysLpsreiiaDsvetvvkahalDalvvissCDkivPGmlmaalrlniPaivvsGGpmeagktklsekidlv 146
                                           mk sL sre+iaDsve v  ++++Dalv+++ CDk++PGm+maa+r+++P+++ +GG++++g+ + ++++++ 
  NCBI__GCF_000337315.1:WP_004042767.1 109 MKASLISREVIADSVELVSFGERMDALVTVAGCDKNLPGMMMAAIRTDLPSVFLYGGSIMPGQHE-GREVTVQ 180
                                           *****************************************************************.9****** PP

                             TIGR00110 147 dvfeavgeyaagklseeeleeiersacPtagsCsGlftansmacltealGlslPgsstllatsaekkelakks 219
                                           +vfe+vg+ya+g++s +el+++er+acP+agsC+G+ftan+ma+++ealG++  gs++++a s e+ e a+++
  NCBI__GCF_000337315.1:WP_004042767.1 181 NVFEGVGTYAEGDMSADELDDLERHACPGAGSCGGMFTANTMASISEALGMAPLGSASAPAESDERYENARRA 253
                                           ************************************************************************* PP

                             TIGR00110 220 gkrivelvkknikPrdiltkeafenaitldlalGGstntvLhllaiakeagvklslddfdrlsrkvPllaklk 292
                                           g+ + + v+++ +P+diltk++fenaitl++a+GGstn+vLhlla+a+eagv+l++++f+++sr++P++a+l+
  NCBI__GCF_000337315.1:WP_004042767.1 254 GEVVLDCVENDRRPSDILTKKSFENAITLQVATGGSTNAVLHLLALAAEAGVDLDIEEFNEISRRTPKIANLQ 326
                                           ************************************************************************* PP

                             TIGR00110 293 PsgkkviedlhraGGvsavlkeldkegllhkdaltvtGktlaetlekvkvlr...vdqdvirsldnpvkkegg 362
                                           P+g +v++dlh+ GG++ v++ l ++gl+h da+tvtG+t+ae+l++ ++     ++ d + ++d+p+++eg 
  NCBI__GCF_000337315.1:WP_004042767.1 327 PGGTRVMNDLHEIGGIPVVIRRLVEAGLFHGDAMTVTGRTIAEELDHLDLPDddgLEADFLYTVDEPYQDEGA 399
                                           ***********************************************998754446678************** PP

                             TIGR00110 363 lavLkGnlaeeGavvkiagveedilkfeGpakvfeseeealeailggkvkeGdvvviryeGPkGgPGmremLa 435
                                           +++L+Gnla++Gav k++g +    +++Gpa+vfe+ee+a+  + +g+++eGdv+ ir eGP+GgPGmremL 
  NCBI__GCF_000337315.1:WP_004042767.1 400 IKILTGNLAPDGAVLKVTGDDA--FHHTGPARVFENEEDAMRYVQEGHIEEGDVIAIRNEGPRGGPGMREMLG 470
                                           *******************988..************************************************* PP

                             TIGR00110 436 PtsalvglGLgkkvaLitDGrfsGgtrGlsiGhvsPeaaegGaialvedGDkikiDienrkldlevseeelae 508
                                            t+a+vg G +++vaL+tDGrfsG+trG+++Ghv+PeaaegG+i+l+edGD++++Di+nr+l +++s+eel++
  NCBI__GCF_000337315.1:WP_004042767.1 471 VTAAVVGQGHEDDVALLTDGRFSGATRGPMVGHVAPEAAEGGPIGLLEDGDEVTVDIPNRELSVDLSDEELEA 543
                                           ************************************************************************* PP

                             TIGR00110 509 rrakakkkearevkgaLakyaklvssadkGavl 541
                                           r++++++k + +++g+Lakya+   sa +Gav+
  NCBI__GCF_000337315.1:WP_004042767.1 544 RKEDWEPKPPAYTSGVLAKYARDFGSAANGAVT 576
                                           *******************************97 PP



Internal pipeline statistics summary:
-------------------------------------
Query model(s):                            1  (543 nodes)
Target sequences:                          1  (584 residues searched)
Passed MSV filter:                         1  (1); expected 0.0 (0.02)
Passed bias filter:                        1  (1); expected 0.0 (0.02)
Passed Vit filter:                         1  (1); expected 0.0 (0.001)
Passed Fwd filter:                         1  (1); expected 0.0 (1e-05)
Initial search space (Z):                  1  [actual number of targets]
Domain search space  (domZ):               1  [number of targets reported over threshold]
# CPU time: 0.01u 0.01s 00:00:00.02 Elapsed: 00:00:00.01
# Mc/sec: 23.23
//
[ok]

This GapMind analysis is from Jul 25 2024. The underlying query database was built on Jul 25 2024.

Links

Downloads

Related tools

About GapMind

Each pathway is defined by a set of rules based on individual steps or genes. Candidates for each step are identified by using ublast (a fast alternative to protein BLAST) against a database of manually-curated proteins (most of which are experimentally characterized) or by using HMMer with enzyme models (usually from TIGRFam). Ublast hits may be split across two different proteins.

A candidate for a step is "high confidence" if either:

where "other" refers to the best ublast hit to a sequence that is not annotated as performing this step (and is not "ignored").

Otherwise, a candidate is "medium confidence" if either:

Other blast hits with at least 50% coverage are "low confidence."

Steps with no high- or medium-confidence candidates may be considered "gaps." For the typical bacterium that can make all 20 amino acids, there are 1-2 gaps in amino acid biosynthesis pathways. For diverse bacteria and archaea that can utilize a carbon source, there is a complete high-confidence catabolic pathway (including a transporter) just 38% of the time, and there is a complete medium-confidence pathway 63% of the time. Gaps may be due to:

GapMind relies on the predicted proteins in the genome and does not search the six-frame translation. In most cases, you can search the six-frame translation by clicking on links to Curated BLAST for each step definition (in the per-step page).

For more information, see:

If you notice any errors or omissions in the step descriptions, or any questionable results, please let us know

by Morgan Price, Arkin group, Lawrence Berkeley National Laboratory