GapMind for catabolism of small carbon sources

 

Alignments for a candidate for putA in Crocosphaera subtropica ATCC 51142

Align L-glutamate gamma-semialdehyde dehydrogenase (EC 1.2.1.88) (characterized)
to candidate WP_009544399.1 CCE_RS07540 L-glutamate gamma-semialdehyde dehydrogenase

Query= BRENDA::Q9K9B2
         (515 letters)



>NCBI__GCF_000017845.1:WP_009544399.1
          Length = 990

 Score =  494 bits (1273), Expect = e-144
 Identities = 251/509 (49%), Positives = 335/509 (65%), Gaps = 2/509 (0%)

Query: 5   YKHEPFTDFTVEANRKAFEEALGLVEKELGKEYPLIINGERVTTEDKIQSWNPARKDQLV 64
           + + P TD++ E  R+  ++AL  V+  LGK Y  +INGE V T+  I S NP++  ++V
Sbjct: 470 FVNAPDTDYSREVLREKAQQALVKVKDSLGKTYLPLINGEYVQTDVIIDSLNPSKSSEVV 529

Query: 65  GSVSKANQDLAEKAIQSADEAFQTWRNVNPEERANILVKAAAIIRRRKHEFSAWLVHEAG 124
           G +   + + AE+A+ +A EAF+ W+     ERA IL KA  ++  R+HE SAW+  E G
Sbjct: 530 GQIGLISIEQAEQALNAAKEAFKDWKKTPATERARILRKAGDLMEERRHELSAWICVEVG 589

Query: 125 KPWKEADADTAEAIDFLEYYARQMIELNRGKEILSRPGEQNRYFYTPMGVTVTISPWNFA 184
           K  ++ADA+ +EAIDF  YYA +M  L++G       GE NRY Y P G+ + ISPWNF 
Sbjct: 590 KILQQADAEVSEAIDFCRYYADEMERLDKGYNY-DVAGETNRYHYQPRGIALVISPWNFP 648

Query: 185 LAIMVGTAVAPIVTGNTVVLKPASTTPVVAAKFVEVLEDAGLPKGVINYVPGSGAEVGDY 244
            AI  G  VA +VTGN  +LKPA T+ V+AAK  E+L DAG+PKGV   VPG G++VG Y
Sbjct: 649 FAIATGMTVAALVTGNCTLLKPAETSTVIAAKIAEILVDAGIPKGVFQLVPGKGSKVGAY 708

Query: 245 LVDHPKTSLITFTGSKDVGVRLYERAAVVRPGQNHLKRVIVEMGGKDTVVVDRDADLDLA 304
           +V+HP   LI FTGS++VG R+Y  AA+++PGQ HLKRVI EMGGK+ ++VD  ADLD A
Sbjct: 709 MVNHPDVHLIAFTGSREVGCRIYADAAILQPGQKHLKRVIAEMGGKNAIIVDESADLDQA 768

Query: 305 AESILVSAFGFSGQKCSAGSRAVIHKDVYDEVLEKTVALAKNLTVGDPTNRDNYMGPVID 364
               + SAFG++GQKCSA SR ++   VYD  LE+ V   K+L VG        +GPVID
Sbjct: 769 VAGAVFSAFGYTGQKCSAASRIIVLDPVYDAFLERFVDATKSLNVGPTDEPSTQVGPVID 828

Query: 365 EKAFEKIMSYIEIGKKEGRLMTGGEGDSSTGFFIQPTIIADLDPEAVIMQEEIFGPVVAF 424
             A ++I+ YIE  K+E  L    E   + GF++ PTI  D+ P   I QEEIFGPVVA 
Sbjct: 829 ATAQKRILEYIETAKQESTLALAMEAPDN-GFYVGPTIFGDVLPNHTIAQEEIFGPVVAV 887

Query: 425 SKANDFDHALEIANNTEYGLTGAVITRNRAHIEQAKREFHVGNLYFNRNCTGAIVGYHPF 484
            +  +FD ALE+AN T+Y LTG + +R+  HIEQA++EF VGNLY NR  TGAIV   PF
Sbjct: 888 MRVKNFDEALEVANGTDYALTGGLYSRSPEHIEQAQKEFEVGNLYINRTITGAIVARQPF 947

Query: 485 GGFKMSGTDSKAGGPDYLALHMQAKTVSE 513
           GGFK+SG  SKAGGPDYL   ++ + +SE
Sbjct: 948 GGFKLSGVGSKAGGPDYLLQFLEPRHISE 976


Lambda     K      H
   0.316    0.134    0.388 

Gapped
Lambda     K      H
   0.267   0.0410    0.140 


Matrix: BLOSUM62
Gap Penalties: Existence: 11, Extension: 1
Number of Sequences: 1
Number of Hits to DB: 1163
Number of extensions: 51
Number of successful extensions: 3
Number of sequences better than 1.0e-02: 1
Number of HSP's gapped: 1
Number of HSP's successfully gapped: 1
Length of query: 515
Length of database: 990
Length adjustment: 39
Effective length of query: 476
Effective length of database: 951
Effective search space:   452676
Effective search space used:   452676
Neighboring words threshold: 11
Window for multiple hits: 40
X1: 16 ( 7.3 bits)
X2: 38 (14.6 bits)
X3: 64 (24.7 bits)
S1: 41 (21.6 bits)
S2: 55 (25.8 bits)

Align candidate WP_009544399.1 CCE_RS07540 (L-glutamate gamma-semialdehyde dehydrogenase)
to HMM TIGR01237 (pruA: putative delta-1-pyrroline-5-carboxylate dehydrogenase (EC 1.2.1.88))

# hmmsearch :: search profile(s) against a sequence database
# HMMER 3.3.1 (Jul 2020); http://hmmer.org/
# Copyright (C) 2020 Howard Hughes Medical Institute.
# Freely distributed under the BSD open source license.
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# query HMM file:                  ../tmp/path.carbon/TIGR01237.hmm
# target sequence database:        /tmp/gapView.18643.genome.faa
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Query:       TIGR01237  [M=511]
Accession:   TIGR01237
Description: D1pyr5carbox2: putative delta-1-pyrroline-5-carboxylate dehydrogenase
Scores for complete sequences (score includes all domains):
   --- full sequence ---   --- best 1 domain ---    -#dom-
    E-value  score  bias    E-value  score  bias    exp  N  Sequence                                 Description
    ------- ------ -----    ------- ------ -----   ---- --  --------                                 -----------
   1.6e-263  861.0   0.9   1.6e-263  861.0   0.9    1.6  2  lcl|NCBI__GCF_000017845.1:WP_009544399.1  CCE_RS07540 L-glutamate gamma-se


Domain annotation for each sequence (and alignments):
>> lcl|NCBI__GCF_000017845.1:WP_009544399.1  CCE_RS07540 L-glutamate gamma-semialdehyde dehydrogenase
   #    score  bias  c-Evalue  i-Evalue hmmfrom  hmm to    alifrom  ali to    envfrom  env to     acc
 ---   ------ ----- --------- --------- ------- -------    ------- -------    ------- -------    ----
   1 ?   -1.9   0.1     0.055     0.055      99     148 ..     137     186 ..     107     199 .. 0.61
   2 !  861.0   0.9  1.6e-263  1.6e-263       3     511 .]     472     978 ..     470     978 .. 0.99

  Alignments for each domain:
  == domain 1  score: -1.9 bits;  conditional E-value: 0.055
                                 TIGR01237  99 kaaailkrrrhelsallvlevGkiyaeadaevaeaidfleyyaremikla 148
                                               k+ + l++ ++ ++  l  e+  + +ea a     +d++e  a++  k++
  lcl|NCBI__GCF_000017845.1:WP_009544399.1 137 KTVERLRKEKMGFTIDLLGEAVITESEAKAYLDSYLDLMEKLADQSKKWS 186
                                               55566666666666666666666666666666666666666666666655 PP

  == domain 2  score: 861.0 bits;  conditional E-value: 1.6e-263
                                 TIGR01237   3 nepftdfadeelvqafkkalakvkellGkdyplvinGeeveteakidsinpadksevvGkvakasveda 71 
                                               n+p+td++ e l++++++al kvk+ lGk+y+++inGe+v+t+  ids+np+++sevvG+++++s+e+a
  lcl|NCBI__GCF_000017845.1:WP_009544399.1 472 NAPDTDYSREVLREKAQQALVKVKDSLGKTYLPLINGEYVQTDVIIDSLNPSKSSEVVGQIGLISIEQA 540
                                               89******************************************************************* PP

                                 TIGR01237  72 eqalqaakkafeewkktdveeraaillkaaailkrrrhelsallvlevGkiyaeadaevaeaidfleyy 140
                                               eqal+aak+af++wkkt+  era+il+ka + +++rrhelsa++++evGki+++adaev+eaidf++yy
  lcl|NCBI__GCF_000017845.1:WP_009544399.1 541 EQALNAAKEAFKDWKKTPATERARILRKAGDLMEERRHELSAWICVEVGKILQQADAEVSEAIDFCRYY 609
                                               ********************************************************************* PP

                                 TIGR01237 141 aremiklakskevlsieGeknrylyiplGvavvispwnfplailvGmtvapivtGncvvlkpaeaatvi 209
                                               a+em++l+ ++ + ++ Ge+nry y+p+G+a+vispwnfp+ai+ Gmtva++vtGnc++lkpae++tvi
  lcl|NCBI__GCF_000017845.1:WP_009544399.1 610 ADEMERLD-KGYNYDVAGETNRYHYQPRGIALVISPWNFPFAIATGMTVAALVTGNCTLLKPAETSTVI 677
                                               ********.78999******************************************************* PP

                                 TIGR01237 210 aaklveileeaGlpkGvlqfvpGkGsevGeylvdhpktrlitftGsrevGlriyedaakvqpGqkhlkr 278
                                               aak++eil +aG+pkGv+q vpGkGs+vG y+v+hp+++li+ftGsrevG+riy+daa +qpGqkhlkr
  lcl|NCBI__GCF_000017845.1:WP_009544399.1 678 AAKIAEILVDAGIPKGVFQLVPGKGSKVGAYMVNHPDVHLIAFTGSREVGCRIYADAAILQPGQKHLKR 746
                                               ********************************************************************* PP

                                 TIGR01237 279 viaelGGkdavivdesadieqavaaavtsafGfaGqkcsaasrvvvlekvydevverfveatkslkvgk 347
                                               viae+GGk+a+ivdesad++qava+av safG++Gqkcsaasr++vl++vyd+++erfv+atksl+vg+
  lcl|NCBI__GCF_000017845.1:WP_009544399.1 747 VIAEMGGKNAIIVDESADLDQAVAGAVFSAFGYTGQKCSAASRIIVLDPVYDAFLERFVDATKSLNVGP 815
                                               ********************************************************************* PP

                                 TIGR01237 348 tdeadvqvgpvidqksfdkikeyielgkaegklvlggedddskGyfikptifkdvdrkarlaqeeifGp 416
                                               tde+++qvgpvid++++++i eyie +k+e+ l+l+ ++++++G++++ptif+dv ++ ++aqeeifGp
  lcl|NCBI__GCF_000017845.1:WP_009544399.1 816 TDEPSTQVGPVIDATAQKRILEYIETAKQESTLALA-MEAPDNGFYVGPTIFGDVLPNHTIAQEEIFGP 883
                                               ************************************.999***************************** PP

                                 TIGR01237 417 vvavlrakdfdealeiansteygltGgvisnsrerierakaefevGnlyfnrkitGaivgvqpfGGfkm 485
                                               vvav+r+k+fdeale+an+t+y+ltGg++s+s+e+ie+a++efevGnly+nr+itGaiv++qpfGGfk+
  lcl|NCBI__GCF_000017845.1:WP_009544399.1 884 VVAVMRVKNFDEALEVANGTDYALTGGLYSRSPEHIEQAQKEFEVGNLYINRTITGAIVARQPFGGFKL 952
                                               ********************************************************************* PP

                                 TIGR01237 486 sGtdskaGGpdylaqflqaktvteri 511
                                               sG++skaGGpdyl+qfl++++++e+i
  lcl|NCBI__GCF_000017845.1:WP_009544399.1 953 SGVGSKAGGPDYLLQFLEPRHISENI 978
                                               ************************86 PP



Internal pipeline statistics summary:
-------------------------------------
Query model(s):                            1  (511 nodes)
Target sequences:                          1  (990 residues searched)
Passed MSV filter:                         1  (1); expected 0.0 (0.02)
Passed bias filter:                        1  (1); expected 0.0 (0.02)
Passed Vit filter:                         1  (1); expected 0.0 (0.001)
Passed Fwd filter:                         1  (1); expected 0.0 (1e-05)
Initial search space (Z):                  1  [actual number of targets]
Domain search space  (domZ):               1  [number of targets reported over threshold]
# CPU time: 0.03u 0.01s 00:00:00.04 Elapsed: 00:00:00.03
# Mc/sec: 13.89
//
[ok]

This GapMind analysis is from Sep 24 2021. The underlying query database was built on Sep 17 2021.

Links

Downloads

Related tools

About GapMind

Each pathway is defined by a set of rules based on individual steps or genes. Candidates for each step are identified by using ublast (a fast alternative to protein BLAST) against a database of manually-curated proteins (most of which are experimentally characterized) or by using HMMer with enzyme models (usually from TIGRFam). Ublast hits may be split across two different proteins.

A candidate for a step is "high confidence" if either:

where "other" refers to the best ublast hit to a sequence that is not annotated as performing this step (and is not "ignored").

Otherwise, a candidate is "medium confidence" if either:

Other blast hits with at least 50% coverage are "low confidence."

Steps with no high- or medium-confidence candidates may be considered "gaps." For the typical bacterium that can make all 20 amino acids, there are 1-2 gaps in amino acid biosynthesis pathways. For diverse bacteria and archaea that can utilize a carbon source, there is a complete high-confidence catabolic pathway (including a transporter) just 38% of the time, and there is a complete medium-confidence pathway 63% of the time. Gaps may be due to:

GapMind relies on the predicted proteins in the genome and does not search the six-frame translation. In most cases, you can search the six-frame translation by clicking on links to Curated BLAST for each step definition (in the per-step page).

For more information, see:

If you notice any errors or omissions in the step descriptions, or any questionable results, please let us know

by Morgan Price, Arkin group, Lawrence Berkeley National Laboratory