GapMind for catabolism of small carbon sources

 

Alignments for a candidate for rocA in Sphingopyxis indica DS15

Align L-glutamate gamma-semialdehyde dehydrogenase (EC 1.2.1.88); Proline dehydrogenase (EC 1.5.5.2) (characterized)
to candidate WP_089216122.1 CHB69_RS10610 bifunctional proline dehydrogenase/L-glutamate gamma-semialdehyde dehydrogenase PutA

Query= reanno::ANA3:7023590
         (1064 letters)



>NCBI__GCF_900188185.1:WP_089216122.1
          Length = 1030

 Score =  907 bits (2345), Expect = 0.0
 Identities = 498/1030 (48%), Positives = 677/1030 (65%), Gaps = 26/1030 (2%)

Query: 37   EEQYLSELIKLVPSSDEAIERVTRRAHELVNKVRQFDKKGLMVGIDAFLQQYSLETQEGI 96
            E   +++L   +  S    E VT R   L+ K +   ++  +V     + +Y L T+EG+
Sbjct: 18   EADIVADLRTALARSPATAEAVTARGLTLIRKAKAEGERETLVA--QLMNRYRLSTEEGV 75

Query: 97   ILMCLAEALLRIPDAATADALIEDKLSGAKWDEHLSKSDSVLVNASTWGLMLTGKIVKLD 156
            +LMCLAEALLR+PD ATA+ALI DK++G  W E   +   ++V  S  GL L    + LD
Sbjct: 76   VLMCLAEALLRVPDNATANALIRDKIAGRHWAEGDDEDSPLVVALSARGLSLGSATLMLD 135

Query: 157  KK-IDGTPSNLLSRLVNRLGEPVIRQAMMAAMKIMGKQFVLGRTMKEALKNSE-DKRKLG 214
                   P  +L  ++ R GEPVIRQA +AAMK++G+QFV+G ++  A++ ++ DK +L 
Sbjct: 136  AMGSQAKPLAILRTMIRRSGEPVIRQAALAAMKLLGQQFVMGESIDAAVRRADKDKSELA 195

Query: 215  YTHSYDMLGEAALTRKDAEKYFNDYANAITELGAQSYNENESPRPTISIKLSALHPRYEV 274
               S+DMLGEAA T  DA +Y++ YA AI  +G  +   +      ISIKLSALHPRYE 
Sbjct: 196  ---SFDMLGEAARTAADARRYYDSYAAAIARIGRDAKPGDPFANHGISIKLSALHPRYEY 252

Query: 275  ANEDRVLTELYDTVIRLIKLARGLNIGISIDAEEVDRLELSLKLFQKLFNADATKGWGLL 334
                RV  EL   VI L   AR +NI + IDAEE DRLE  L ++  L +A    GW  L
Sbjct: 253  LQGQRVRDELIPRVIELAVAARRVNIPLMIDAEESDRLEPHLDVYGALIDAGIADGWTGL 312

Query: 335  GIVVQAYSKRALPVLVWLTRLAKEQGDEIPVRLVKGAYWDSELKWAQQAGEAAYPLYTRK 394
            GIVVQAY KRA  V+ W+   A+ +G  + +RLVKGAYWD+E+K AQ  G   +P++T K
Sbjct: 313  GIVVQAYQKRASEVIRWVAARARRRGVMLSMRLVKGAYWDTEIKRAQTLGLGDFPVFTAK 372

Query: 395  AGTDVSYLACARYLLSDATRGAIYPQFASHNAQTVAAISDMAGDRNHEFQRLHGMGQELY 454
              TD++YLACA+ L     +  I+P FASHNA T+A ++++    ++E QRLHGMG+  +
Sbjct: 373  LHTDLNYLACAQILRE--CQDCIFPAFASHNAMTLAFVTELFAGADYELQRLHGMGEGAH 430

Query: 455  DTILS-EAGAKAVRIYAPIGAHKDLLPYLVRRLLENGANTSFVHKLVDPKTPIESLVVHP 513
            D I++     + VR+YAP+G H+DLL YLVRRLLENGAN+SFVH+  DP    E L V P
Sbjct: 431  DAIVALSPPPRPVRVYAPVGTHRDLLAYLVRRLLENGANSSFVHQFSDPDVSAEELAVDP 490

Query: 514  LKTLTGYKTLANNKIVLPT--DIFGSDRKNSKGLNMNIISEAEPFFAALDKFKSTQWQAG 571
                   +++A+    LPT   ++   R+NS+G ++      E   AA+ + +     A 
Sbjct: 491  -------RSVASAPSKLPTGLQLYDPVRRNSRGYDLGEPGVPEALIAAIAEARDAGAVAA 543

Query: 572  PLVNGQTLTGEHKTVVSPFDTTQTVGQVAFADKAAIEQAVASADAAFATWTRTPVEVRAS 631
            P+V G+   G+ + V +P  T   +G+V  AD AA+E+AVA+A  A   W+      RA 
Sbjct: 544  PIVGGRERRGKGEPVHNPA-TGVVIGRVVEADAAAVEEAVAAARKAQGDWSLAGGAFRAE 602

Query: 632  ALQKLADLLEENREELIALCTREAGKSIQDGIDEVREAVDFCRYYAVQAKKLMSKPELLP 691
             L++ ADL+EE     + L   EAGK++ D + EVREAVDF RYYA QA+   S P  LP
Sbjct: 603  RLERAADLIEERDALFLGLAMDEAGKTLVDAVAEVREAVDFLRYYAAQARADFSWPVTLP 662

Query: 692  GPTGELNELFLQGRGVFVCISPWNFPLAIFLGQVSAALAAGNTVVAKPAEQTSIIGYRAV 751
            GPTGE NEL L+G+G+F CISPWNFPLAIFLGQVSAALAAGN V+AKPAEQT +I + AV
Sbjct: 663  GPTGERNELILEGKGIFACISPWNFPLAIFLGQVSAALAAGNAVLAKPAEQTPLIAHAAV 722

Query: 752  QLAHQAGIPTDVLQYLPGTGATVGNALTADERIGGVCFTGSTGTAKLINRTLANREGAII 811
            +   +AG+P D+L YLPG G TVG ALT  + + GV FTGST  A+ INR+LA REG I 
Sbjct: 723  ETLLEAGVPGDILHYLPGRGETVGAALTGHDDVIGVAFTGSTEVARAINRSLAMREGPIA 782

Query: 812  PLIAETGGQNAMVVDSTSQPEQVVNDVVSSSFTSAGQRCSALRVLFLQEDIADRVIDVLQ 871
             LIAETGG NAM+VDST+ PEQV  D V+S+F SAGQRCSALR+L +Q+D+AD +I ++ 
Sbjct: 783  TLIAETGGANAMIVDSTALPEQVARDAVASAFQSAGQRCSALRLLCVQDDVADAMIAMVA 842

Query: 872  GAMDELVIGNPSSVKTDVGPVIDATAKANLDAHIDHIKQVGKLIKQMS---LPAGTENGH 928
            GAM EL +G+P+ + TDVGP+ID  A++N+ A+++  +  G+LI + +   LPAG   G 
Sbjct: 843  GAMAELNVGDPAILATDVGPIIDEEAQSNIAAYVEEARAAGRLIAEAARTKLPAG---GT 899

Query: 929  FVSPTAVEIDSIKVLEKEHFGPILHVIRYKASELAHVIDEINSTGFGLTLGIHSRNEGHA 988
            FV+P  + +D +  L++E FGP+LHV  +K  EL  +ID IN++G+GLTLG+H+R +  A
Sbjct: 900  FVAPAMIRLDHVTDLKREIFGPVLHVATWKGGELDALIDAINASGYGLTLGVHTRIDSVA 959

Query: 989  LEVADKVNVGNVYINRNQIGAVVGVQPFGGQGLSGTGPKAGGPHYLTRFVTEKTRTNNIT 1048
              VA +  VGNVY+NRNQIGA+VG QPFGG+GLSGTGPKAGGP+YL RF  EK+ + +IT
Sbjct: 960  AHVAARAQVGNVYVNRNQIGAIVGSQPFGGRGLSGTGPKAGGPNYLRRFAEEKSISTDIT 1019

Query: 1049 AIGGNATLLS 1058
            A GGNA L++
Sbjct: 1020 AAGGNAALMA 1029


Lambda     K      H
   0.317    0.133    0.377 

Gapped
Lambda     K      H
   0.267   0.0410    0.140 


Matrix: BLOSUM62
Gap Penalties: Existence: 11, Extension: 1
Number of Sequences: 1
Number of Hits to DB: 2394
Number of extensions: 98
Number of successful extensions: 7
Number of sequences better than 1.0e-02: 1
Number of HSP's gapped: 1
Number of HSP's successfully gapped: 1
Length of query: 1064
Length of database: 1030
Length adjustment: 45
Effective length of query: 1019
Effective length of database: 985
Effective search space:  1003715
Effective search space used:  1003715
Neighboring words threshold: 11
Window for multiple hits: 40
X1: 16 ( 7.3 bits)
X2: 38 (14.6 bits)
X3: 64 (24.7 bits)
S1: 41 (21.6 bits)
S2: 58 (26.9 bits)

Align candidate WP_089216122.1 CHB69_RS10610 (bifunctional proline dehydrogenase/L-glutamate gamma-semialdehyde dehydrogenase PutA)
to HMM TIGR01238 (delta-1-pyrroline-5-carboxylate dehydrogenase (EC 1.2.1.88))

# hmmsearch :: search profile(s) against a sequence database
# HMMER 3.3.1 (Jul 2020); http://hmmer.org/
# Copyright (C) 2020 Howard Hughes Medical Institute.
# Freely distributed under the BSD open source license.
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# query HMM file:                  ../tmp/path.carbon/TIGR01238.hmm
# target sequence database:        /tmp/gapView.179970.genome.faa
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Query:       TIGR01238  [M=500]
Accession:   TIGR01238
Description: D1pyr5carbox3: delta-1-pyrroline-5-carboxylate dehydrogenase
Scores for complete sequences (score includes all domains):
   --- full sequence ---   --- best 1 domain ---    -#dom-
    E-value  score  bias    E-value  score  bias    exp  N  Sequence                             Description
    ------- ------ -----    ------- ------ -----   ---- --  --------                             -----------
   6.9e-196  637.5   6.0   9.5e-196  637.0   6.0    1.2  1  NCBI__GCF_900188185.1:WP_089216122.1  


Domain annotation for each sequence (and alignments):
>> NCBI__GCF_900188185.1:WP_089216122.1  
   #    score  bias  c-Evalue  i-Evalue hmmfrom  hmm to    alifrom  ali to    envfrom  env to     acc
 ---   ------ ----- --------- --------- ------- -------    ------- -------    ------- -------    ----
   1 !  637.0   6.0  9.5e-196  9.5e-196       2     497 ..     506    1010 ..     505    1013 .. 0.98

  Alignments for each domain:
  == domain 1  score: 637.0 bits;  conditional E-value: 9.5e-196
                             TIGR01238    2 lygegrknslGvdlaneselksleeqllkaaakkfqaapivgekakaegeaqpvknpadrkdivGqvsead 72  
                                            ly   r+ns G dl    + + l + + +a +    aapivg++ +  g+ +pv npa    ++G+v ead
  NCBI__GCF_900188185.1:WP_089216122.1  506 LYDPVRRNSRGYDLGEPGVPEALIAAIAEARDAGAVAAPIVGGRER-RGKGEPVHNPAT-GVVIGRVVEAD 574 
                                            89999************************************87655.68899*****96.6899******* PP

                             TIGR01238   73 aaevqeavdsavaafaewsatdakeraailerladlleshmpelvallvreaGktlsnaiaevreavdflr 143 
                                            aa v+eav +a +a   ws    + ra  ler+adl+e++   +++l++ eaGktl +a+aevreavdflr
  NCBI__GCF_900188185.1:WP_089216122.1  575 AAAVEEAVAAARKAQGDWSLAGGAFRAERLERAADLIEERDALFLGLAMDEAGKTLVDAVAEVREAVDFLR 645 
                                            *********************************************************************** PP

                             TIGR01238  144 yyakqvedvldeesaka.............lGavvcispwnfplaiftGqiaaalaaGntviakpaeqtsl 201 
                                            yya q++ +++   + +             +G + cispwnfplaif+Gq++aalaaGn+v+akpaeqt+l
  NCBI__GCF_900188185.1:WP_089216122.1  646 YYAAQARADFSWPVTLPgptgernelilegKGIFACISPWNFPLAIFLGQVSAALAAGNAVLAKPAEQTPL 716 
                                            **********9987777899999************************************************ PP

                             TIGR01238  202 iaaravellqeaGvpagviqllpGrGedvGaaltsderiaGviftGstevarlinkalakredapvpliae 272 
                                            ia  ave l+eaGvp  ++  lpGrGe+vGaalt ++ + Gv+ftGstevar in++la re + ++liae
  NCBI__GCF_900188185.1:WP_089216122.1  717 IAHAAVETLLEAGVPGDILHYLPGRGETVGAALTGHDDVIGVAFTGSTEVARAINRSLAMREGPIATLIAE 787 
                                            *********************************************************************** PP

                             TIGR01238  273 tGGqnamivdstalaeqvvadvlasafdsaGqrcsalrvlcvqedvadrvltlikGamdelkvgkpirltt 343 
                                            tGG namivdstal+eqv +d +asaf+saGqrcsalr+lcvq+dvad ++ ++ Gam el+vg p  l t
  NCBI__GCF_900188185.1:WP_089216122.1  788 TGGANAMIVDSTALPEQVARDAVASAFQSAGQRCSALRLLCVQDDVADAMIAMVAGAMAELNVGDPAILAT 858 
                                            *********************************************************************** PP

                             TIGR01238  344 dvGpvidaeakqnllahiekmkakakkvaqvkleddvesekgtfvaptlfelddldelkkevfGpvlhvvr 414 
                                            dvGp+id+ea+ n+ a++e+ +a ++ +a++ + +      gtfvap ++ ld++ +lk+e+fGpvlhv  
  NCBI__GCF_900188185.1:WP_089216122.1  859 DVGPIIDEEAQSNIAAYVEEARAAGRLIAEAARTK--LPAGGTFVAPAMIRLDHVTDLKREIFGPVLHVAT 927 
                                            *******************************9998..7789****************************** PP

                             TIGR01238  415 ykadeldkvvdkinakGygltlGvhsrieetvrqiekrakvGnvyvnrnlvGavvGvqpfGGeGlsGtGpk 485 
                                            +k  eld ++d ina+GygltlGvh+ri++  +++  ra+vGnvyvnrn++Ga+vG qpfGG+GlsGtGpk
  NCBI__GCF_900188185.1:WP_089216122.1  928 WKGGELDALIDAINASGYGLTLGVHTRIDSVAAHVAARAQVGNVYVNRNQIGAIVGSQPFGGRGLSGTGPK 998 
                                            *********************************************************************** PP

                             TIGR01238  486 aGGplylyrltr 497 
                                            aGGp yl r+ +
  NCBI__GCF_900188185.1:WP_089216122.1  999 AGGPNYLRRFAE 1010
                                            *********976 PP



Internal pipeline statistics summary:
-------------------------------------
Query model(s):                            1  (500 nodes)
Target sequences:                          1  (1030 residues searched)
Passed MSV filter:                         1  (1); expected 0.0 (0.02)
Passed bias filter:                        1  (1); expected 0.0 (0.02)
Passed Vit filter:                         1  (1); expected 0.0 (0.001)
Passed Fwd filter:                         1  (1); expected 0.0 (1e-05)
Initial search space (Z):                  1  [actual number of targets]
Domain search space  (domZ):               1  [number of targets reported over threshold]
# CPU time: 0.01u 0.00s 00:00:00.01 Elapsed: 00:00:00.01
# Mc/sec: 33.59
//
[ok]

This GapMind analysis is from Sep 24 2021. The underlying query database was built on Sep 17 2021.

Links

Downloads

Related tools

About GapMind

Each pathway is defined by a set of rules based on individual steps or genes. Candidates for each step are identified by using ublast (a fast alternative to protein BLAST) against a database of manually-curated proteins (most of which are experimentally characterized) or by using HMMer with enzyme models (usually from TIGRFam). Ublast hits may be split across two different proteins.

A candidate for a step is "high confidence" if either:

where "other" refers to the best ublast hit to a sequence that is not annotated as performing this step (and is not "ignored").

Otherwise, a candidate is "medium confidence" if either:

Other blast hits with at least 50% coverage are "low confidence."

Steps with no high- or medium-confidence candidates may be considered "gaps." For the typical bacterium that can make all 20 amino acids, there are 1-2 gaps in amino acid biosynthesis pathways. For diverse bacteria and archaea that can utilize a carbon source, there is a complete high-confidence catabolic pathway (including a transporter) just 38% of the time, and there is a complete medium-confidence pathway 63% of the time. Gaps may be due to:

GapMind relies on the predicted proteins in the genome and does not search the six-frame translation. In most cases, you can search the six-frame translation by clicking on links to Curated BLAST for each step definition (in the per-step page).

For more information, see:

If you notice any errors or omissions in the step descriptions, or any questionable results, please let us know

by Morgan Price, Arkin group, Lawrence Berkeley National Laboratory