GapMind for Amino acid biosynthesis

 

Alignments for a candidate for ilvD in Bacillus alkalinitrilicus DSM 22532

Align dihydroxyacid dehydratase (EC 4.2.1.9) (characterized)
to candidate WP_078427776.1 BK574_RS05060 dihydroxy-acid dehydratase

Query= metacyc::MONOMER-18815
         (557 letters)



>NCBI__GCF_002019605.1:WP_078427776.1
          Length = 556

 Score =  588 bits (1516), Expect = e-172
 Identities = 293/553 (52%), Positives = 395/553 (71%), Gaps = 3/553 (0%)

Query: 5   KRSQNITQGVARSPNRSMYYALGYKKEDFDKPMVGIANGHSTITPCNAGLQRLADAAIDA 64
           K   ++   + ++PNR+M  A+G   EDF+KP VGIA+  S +TPCN  +  LA  A + 
Sbjct: 7   KSRSSVFNDINKAPNRAMIRAMGITDEDFNKPFVGIASTWSEVTPCNMHIDELARKAKEG 66

Query: 65  IKASDANPQVFGTPTISDGMSMGTEGMKYSLISREVIADCIETAAQGQWMDGVVVIGGCD 124
                  P +F T T+SDG+SMGTEG+++SL SREVIAD IET    Q  DGVV IGGCD
Sbjct: 67  TLNGGGTPFIFNTITVSDGISMGTEGIRFSLPSREVIADSIETVMGAQSYDGVVAIGGCD 126

Query: 125 KNMPGGMIALARTNVPGIYVYGGTIKPGNWKGKDLTIVSSFEAVGEFTAGRMSQEDFEGV 184
           KNMPG MIA+ R N+P ++VYGGTI+ G   GKD+ IVS+FEAVG++  G + +E    V
Sbjct: 127 KNMPGCMIAIGRLNLPAVFVYGGTIRAGKVDGKDIDIVSAFEAVGKYNNGDIDREQLHKV 186

Query: 185 EKNACPSTGSCGGMYTANTMSSSFEALGMSLLYSSTMANPDQEKVDSAAESARVLVEAIK 244
           E +ACP  GSCGGMYTANTM+S+ EA+GMSL  SS+     +EK+    ++ + ++  + 
Sbjct: 187 ECHACPGAGSCGGMYTANTMASAIEAMGMSLPGSSSNPAETKEKLQDCIDAGKAVMNLLS 246

Query: 245 QDIKPRDIITRKSIENAVALIMATGGSTNAVLHYLAIAHAAEVEWTIDDFERIRRKVPVI 304
           + I P+DI+T+K+ ENA+ ++MA GGSTNAVLH LA+AH  +VE  +DDFERIR+KVP I
Sbjct: 247 KGITPKDIMTKKAFENAITVVMALGGSTNAVLHLLALAHTIDVELELDDFERIRKKVPHI 306

Query: 305 CNLKPSGQYVATDLHKAGGIPQVMKILLKAGMLHGDCLTITGRTLAEELENVPDTPRADQ 364
            +LKPSG+YV  DL   GG+P VMK+LL  G+LHGDCLT+TG+T+ E L+ +   P  + 
Sbjct: 307 ADLKPSGKYVMEDLSLIGGVPGVMKLLLDKGLLHGDCLTVTGQTIEENLKEI--VPLKEG 364

Query: 365 DVILPIEKALYAEGHLAILKGNLAEEGAVAKITGLKNPVITGPARVFEDEQSAMEAILAD 424
             I+  E      G L +L+GNLA EGA+AK++GLK   ITGPARVF+ E+ A +A+L +
Sbjct: 365 QEIISFENPKRETGPLVVLRGNLAPEGALAKMSGLKIKEITGPARVFDTEKDATDAVLNN 424

Query: 425 KINAGDILVLRYLGPKGGPGMPEMLAPTSAIIGKGLGESVGFITDGRFSGGTWGMVVGHV 484
           KIN GD++V+RY+GPKGGPGM EML+ T+ ++GKGLGE VG ITDGRFSGGT G+VVGH+
Sbjct: 425 KINPGDVIVIRYVGPKGGPGMAEMLSITAIVVGKGLGEKVGLITDGRFSGGTHGLVVGHI 484

Query: 485 APEAYVGGTIALVQEGDSITIDAHKLLLQLNVADEELARRRANWKQPAPRYTRGVLAKFS 544
           +PEA VGG I L+QEGD ITI++    L +NV+ E+ A R  +W  P  +  +G L+K++
Sbjct: 485 SPEAQVGGPIGLIQEGDLITINSETQELTVNVSPEKFAERAKDW-TPPEQNLKGYLSKYA 543

Query: 545 KLASTASKGAVTD 557
           +L S+ASKGA+TD
Sbjct: 544 RLVSSASKGAITD 556


Lambda     K      H
   0.316    0.133    0.386 

Gapped
Lambda     K      H
   0.267   0.0410    0.140 


Matrix: BLOSUM62
Gap Penalties: Existence: 11, Extension: 1
Number of Sequences: 1
Number of Hits to DB: 951
Number of extensions: 43
Number of successful extensions: 3
Number of sequences better than 1.0e-02: 1
Number of HSP's gapped: 1
Number of HSP's successfully gapped: 1
Length of query: 557
Length of database: 556
Length adjustment: 36
Effective length of query: 521
Effective length of database: 520
Effective search space:   270920
Effective search space used:   270920
Neighboring words threshold: 11
Window for multiple hits: 40
X1: 16 ( 7.3 bits)
X2: 38 (14.6 bits)
X3: 64 (24.7 bits)
S1: 41 (21.6 bits)
S2: 53 (25.0 bits)

Align candidate WP_078427776.1 BK574_RS05060 (dihydroxy-acid dehydratase)
to HMM TIGR00110 (ilvD: dihydroxy-acid dehydratase (EC 4.2.1.9))

# hmmsearch :: search profile(s) against a sequence database
# HMMER 3.3.1 (Jul 2020); http://hmmer.org/
# Copyright (C) 2020 Howard Hughes Medical Institute.
# Freely distributed under the BSD open source license.
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# query HMM file:                  ../tmp/path.aa/TIGR00110.hmm
# target sequence database:        /tmp/gapView.17329.genome.faa
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Query:       TIGR00110  [M=543]
Accession:   TIGR00110
Description: ilvD: dihydroxy-acid dehydratase
Scores for complete sequences (score includes all domains):
   --- full sequence ---   --- best 1 domain ---    -#dom-
    E-value  score  bias    E-value  score  bias    exp  N  Sequence                                 Description
    ------- ------ -----    ------- ------ -----   ---- --  --------                                 -----------
   3.1e-228  744.6   7.4   3.6e-228  744.4   7.4    1.0  1  lcl|NCBI__GCF_002019605.1:WP_078427776.1  BK574_RS05060 dihydroxy-acid deh


Domain annotation for each sequence (and alignments):
>> lcl|NCBI__GCF_002019605.1:WP_078427776.1  BK574_RS05060 dihydroxy-acid dehydratase
   #    score  bias  c-Evalue  i-Evalue hmmfrom  hmm to    alifrom  ali to    envfrom  env to     acc
 ---   ------ ----- --------- --------- ------- -------    ------- -------    ------- -------    ----
   1 !  744.4   7.4  3.6e-228  3.6e-228       1     542 [.      20     556 .]      20     556 .] 0.99

  Alignments for each domain:
  == domain 1  score: 744.4 bits;  conditional E-value: 3.6e-228
                                 TIGR00110   1 aarallkatGlkdedlekPiiavvnsyteivPghvhlkdlaklvkeeieaaGgvakefntiavsDGiam 69 
                                               ++ra+++a+G++ded++kP+++++++++e++P+++h+++la+++ke+  + Gg+++ fnti+vsDGi+m
  lcl|NCBI__GCF_002019605.1:WP_078427776.1  20 PNRAMIRAMGITDEDFNKPFVGIASTWSEVTPCNMHIDELARKAKEGTLNGGGTPFIFNTITVSDGISM 88 
                                               68******************************************************************* PP

                                 TIGR00110  70 gheGmkysLpsreiiaDsvetvvkahalDalvvissCDkivPGmlmaalrlniPaivvsGGpmeagktk 138
                                               g+eG+++sLpsre+iaDs+etv+ a+ +D++v+i+ CDk++PG ++a  rln+Pa++v+GG++ agk+ 
  lcl|NCBI__GCF_002019605.1:WP_078427776.1  89 GTEGIRFSLPSREVIADSIETVMGAQSYDGVVAIGGCDKNMPGCMIAIGRLNLPAVFVYGGTIRAGKVD 157
                                               ********************************************************************* PP

                                 TIGR00110 139 lsekidlvdvfeavgeyaagklseeeleeiersacPtagsCsGlftansmacltealGlslPgsstlla 207
                                                +++id+v++feavg+y++g+++ e+l+++e +acP+agsC+G++tan+ma++ ea+G+slPgss+ +a
  lcl|NCBI__GCF_002019605.1:WP_078427776.1 158 -GKDIDIVSAFEAVGKYNNGDIDREQLHKVECHACPGAGSCGGMYTANTMASAIEAMGMSLPGSSSNPA 225
                                               .9******************************************************************* PP

                                 TIGR00110 208 tsaekkelakksgkrivelvkknikPrdiltkeafenaitldlalGGstntvLhllaiakeagvklsld 276
                                                ++ek++ +  +gk +++l+ k i+P+di+tk+afenait+++alGGstn+vLhlla+a++++v+l+ld
  lcl|NCBI__GCF_002019605.1:WP_078427776.1 226 ETKEKLQDCIDAGKAVMNLLSKGITPKDIMTKKAFENAITVVMALGGSTNAVLHLLALAHTIDVELELD 294
                                               ********************************************************************* PP

                                 TIGR00110 277 dfdrlsrkvPllaklkPsgkkviedlhraGGvsavlkeldkegllhkdaltvtGktlaetlekvkvlrv 345
                                               df+r+++kvP++a+lkPsgk+v+edl   GGv++v+k l  +gllh d+ltvtG+t++e+l+++  l++
  lcl|NCBI__GCF_002019605.1:WP_078427776.1 295 DFERIRKKVPHIADLKPSGKYVMEDLSLIGGVPGVMKLLLDKGLLHGDCLTVTGQTIEENLKEIVPLKE 363
                                               ******************************************************************999 PP

                                 TIGR00110 346 dqdvirsldnpvkkegglavLkGnlaeeGavvkiagveedilkfeGpakvfeseeealeailggkvkeG 414
                                               +q++i s +np +++g l vL+Gnla+eGa++k++g +  i +++Gpa+vf+ e++a +a+l+ k++ G
  lcl|NCBI__GCF_002019605.1:WP_078427776.1 364 GQEII-SFENPKRETGPLVVLRGNLAPEGALAKMSGLK--IKEITGPARVFDTEKDATDAVLNNKINPG 429
                                               99998.79****************************96..59*************************** PP

                                 TIGR00110 415 dvvviryeGPkGgPGmremLaPtsalvglGLgkkvaLitDGrfsGgtrGlsiGhvsPeaaegGaialve 483
                                               dv+viry GPkGgPGm emL  t+ +vg GLg+kv+LitDGrfsGgt+Gl++Gh+sPea +gG+i+l++
  lcl|NCBI__GCF_002019605.1:WP_078427776.1 430 DVIVIRYVGPKGGPGMAEMLSITAIVVGKGLGEKVGLITDGRFSGGTHGLVVGHISPEAQVGGPIGLIQ 498
                                               ********************************************************************* PP

                                 TIGR00110 484 dGDkikiDienrkldlevseeelaerrakakkkearevkgaLakyaklvssadkGavld 542
                                               +GD i+i+ e ++l ++vs e +aer ++++++e ++ kg+L+kya+lvssa+kGa++d
  lcl|NCBI__GCF_002019605.1:WP_078427776.1 499 EGDLITINSETQELTVNVSPEKFAERAKDWTPPE-QNLKGYLSKYARLVSSASKGAITD 556
                                               ********************************99.88*******************986 PP



Internal pipeline statistics summary:
-------------------------------------
Query model(s):                            1  (543 nodes)
Target sequences:                          1  (556 residues searched)
Passed MSV filter:                         1  (1); expected 0.0 (0.02)
Passed bias filter:                        1  (1); expected 0.0 (0.02)
Passed Vit filter:                         1  (1); expected 0.0 (0.001)
Passed Fwd filter:                         1  (1); expected 0.0 (1e-05)
Initial search space (Z):                  1  [actual number of targets]
Domain search space  (domZ):               1  [number of targets reported over threshold]
# CPU time: 0.03u 0.01s 00:00:00.04 Elapsed: 00:00:00.02
# Mc/sec: 10.74
//
[ok]

This GapMind analysis is from Apr 10 2024. The underlying query database was built on Apr 09 2024.

Links

Downloads

Related tools

About GapMind

Each pathway is defined by a set of rules based on individual steps or genes. Candidates for each step are identified by using ublast (a fast alternative to protein BLAST) against a database of manually-curated proteins (most of which are experimentally characterized) or by using HMMer with enzyme models (usually from TIGRFam). Ublast hits may be split across two different proteins.

A candidate for a step is "high confidence" if either:

where "other" refers to the best ublast hit to a sequence that is not annotated as performing this step (and is not "ignored").

Otherwise, a candidate is "medium confidence" if either:

Other blast hits with at least 50% coverage are "low confidence."

Steps with no high- or medium-confidence candidates may be considered "gaps." For the typical bacterium that can make all 20 amino acids, there are 1-2 gaps in amino acid biosynthesis pathways. For diverse bacteria and archaea that can utilize a carbon source, there is a complete high-confidence catabolic pathway (including a transporter) just 38% of the time, and there is a complete medium-confidence pathway 63% of the time. Gaps may be due to:

GapMind relies on the predicted proteins in the genome and does not search the six-frame translation. In most cases, you can search the six-frame translation by clicking on links to Curated BLAST for each step definition (in the per-step page).

For more information, see:

If you notice any errors or omissions in the step descriptions, or any questionable results, please let us know

by Morgan Price, Arkin group, Lawrence Berkeley National Laboratory