GapMind for catabolism of small carbon sources

 

Alignments for a candidate for putA in Azohydromonas australica DSM 1124

Align L-glutamate gamma-semialdehyde dehydrogenase (EC 1.2.1.88); Proline dehydrogenase (EC 1.5.5.2) (characterized)
to candidate WP_051243110.1 H537_RS44845 trifunctional transcriptional regulator/proline dehydrogenase/L-glutamate gamma-semialdehyde dehydrogenase

Query= reanno::Cup4G11:RR42_RS20125
         (1333 letters)



>NCBI__GCF_000430725.1:WP_051243110.1
          Length = 1249

 Score = 1447 bits (3745), Expect = 0.0
 Identities = 778/1259 (61%), Positives = 927/1259 (73%), Gaps = 19/1259 (1%)

Query: 81   PFLEFAQSVQPQSVLRAAITAAYRRPESECVPVLLEQARLPHQQAEAALAMARTLATRLR 140
            PFL+FA  + PQSVLR+AITAA RRPE E V +LLE ARLP  QA + LA+A +LA RLR
Sbjct: 4    PFLDFAGELLPQSVLRSAITAACRRPEPEAVAMLLESARLPPGQAVSVLALAASLARRLR 63

Query: 141  ERKVG--TGREGLVQGLIQEFSLSSQEGVALMCLAEALLRIPDKATRDALIRDKISGANW 198
            ER  G   GREGLVQGL++EFSL+SQEGVALMCLAEALLRIPD ATRDALIRDK+   +W
Sbjct: 64   ERGAGLTAGREGLVQGLMREFSLASQEGVALMCLAEALLRIPDDATRDALIRDKLGDGDW 123

Query: 199  QSHLGQSPSVFVNAATWGLLFTGKLVATH-TEAGLSKALTRIIGKGGEPLIRKGVDMAMR 257
             +HLG+S S+FVNAATWGLL  G++ AT   EAGL  AL+R++ + GEPL+RKGVD+AMR
Sbjct: 124  GAHLGRSGSLFVNAATWGLLLGGRMAATQQAEAGLGSALSRVLARSGEPLLRKGVDLAMR 183

Query: 258  LMGEQFVTGETISEALANARKYEAEGFRYSYDMLGEAAMTEADAQRYLASYEQAINAIGQ 317
            L+GEQFV G+TI EALA ARK E++GF YS+DMLGEAA+T  DAQRYL++YE AI+A+G 
Sbjct: 184  LLGEQFVCGQTIGEALARARKRESQGFTYSFDMLGEAALTAEDAQRYLSAYEHAIHALGI 243

Query: 318  ASRGRGIYEGPGISIKLSALHPRYSRAQHERVIGELYGRLKSLTLLARQYDIGINIDAEE 377
            A  GR ++ GPGISIKLSALHPRY+R+QH RV+ ELY RL  L LLAR + IG++IDAEE
Sbjct: 244  ALAGRDLHAGPGISIKLSALHPRYTRSQHGRVMAELYPRLLQLALLARHHGIGLSIDAEE 303

Query: 378  ADRLEISLDLLERLCFEPELAGWNGIGFVVQGYQKRCPFVIDYLIDLARRSRHRLMIRLV 437
            ADRLE+SLDLL+RLC E  LAGW+G+G  VQ YQKRCP V+D+ IDLARRSR RLM+RLV
Sbjct: 304  ADRLELSLDLLQRLCGEHMLAGWSGLGLAVQAYQKRCPHVLDFCIDLARRSRRRLMLRLV 363

Query: 438  KGAYWDSEIKRAQVDGLEGYPVYTRKVYTDVSYVACARKLLSVPDVIYPQFATHNAHTLA 497
            KGAYWDSEIKRAQ+DGLE Y VYTRK +TDV+Y+ACAR+LL+ PD +YPQFATHNAHT+A
Sbjct: 364  KGAYWDSEIKRAQIDGLEDYAVYTRKAHTDVAYLACARRLLAAPDAVYPQFATHNAHTVA 423

Query: 498  AIYQIAGHNYYPGQYEFQCLHGMGEPLYDQVVGPLADGKFNRPCRIYAPVGTHETLLAYL 557
            AI  +AG  + PG+YEFQCLHGMGEPLY+ VV P  +G    PCRIYAPVGTHETLLAYL
Sbjct: 424  AIQHLAG-AWTPGRYEFQCLHGMGEPLYEMVVAPPIEGGLGLPCRIYAPVGTHETLLAYL 482

Query: 558  VRRLLENGANTSFVNRIADDTISLDELVADPVAVVEQMHADEGALGLPHPRIAQPRTLYG 617
            VRRLLENGANTSFVNRIAD +I ++ LV DPVA VE+    EG LGLPHP I  PR L+G
Sbjct: 483  VRRLLENGANTSFVNRIADASIPVEALVEDPVAQVERAARAEGTLGLPHPAIPLPRALFG 542

Query: 618  ESRANSAGIDLSNEHRLASLSSALLAGTSEAVSAVPLLGTEAAAGEDVNQPAPVRNPSDQ 677
              R NSAGIDL+NEHRLASLS+ALL G  +A +AVPL+      G+    P PV NP+D+
Sbjct: 543  ALRPNSAGIDLANEHRLASLSAALLHGARQAWTAVPLVAGLPRPGD---LPQPVLNPADR 599

Query: 678  RDVVGHVTEASMAEVEAALQAAVNAAPIWQATPADVRAAALERAAELMEAQMQSLMGIIV 737
             D VG V EA   E+E AL AA    P W ATP   RAA LER A+ +E Q+Q+L+G+IV
Sbjct: 600  SDRVGTVREARADEMEDALSAAAAVQPAWGATPPAERAALLERGADALEDQLQTLVGLIV 659

Query: 738  REAGKTFSNAIAEVREAVDFLRYYAAQVRETFSSDTHRPLGPVVCISPWNFPLAIFTGQV 797
            REAGKT   A+ EVREAVD LRY A Q R    + + RPLGPV CISPWNFPLAIFTGQ+
Sbjct: 660  REAGKTVPAAVGEVREAVDALRYAALQARTALDAGS-RPLGPVACISPWNFPLAIFTGQL 718

Query: 798  AAALAAGNTVLAKPAEQTPLIAAQAVRLLREAGVPAGAVQLLPGRGETVGAALVGDARVK 857
            AAALAAGN V+AKPA QTPL+AA+AVRLL  AGVP   +QLLPG GE+VG  L  DARV+
Sbjct: 719  AAALAAGNAVVAKPARQTPLVAAEAVRLLHAAGVPGAVLQLLPGPGESVGLRLARDARVR 778

Query: 858  GVMFTGSTEVARLLQRSVAGRLDAAGRPVPLIAETGGQNAMIVDSSALAEQVVGDVVNSA 917
            GV+FTGST VAR LQ  +A RLDA G P  L+AETGG NA+I DSSALAEQ+V DV+ SA
Sbjct: 779  GVLFTGSTAVARRLQAELALRLDARGVPPLLVAETGGLNALIADSSALAEQLVPDVLASA 838

Query: 918  FDSAGQRCSALRVLCLQEEVADRVLEMLKGAMDELTMGNPDRLSTDVGPVIDEEARGNIV 977
            FDSAGQRCSALRVLCLQ+E+A+ VL ML+GA+ EL +G PD L+TDVGPVID+ AR  + 
Sbjct: 839  FDSAGQRCSALRVLCLQQEIAEPVLRMLQGALAELCVGRPDALATDVGPVIDDTARDTVE 898

Query: 978  RHIDAMRAKGRRVHQADPNGALSAACRNGTFVSPTLIELDSIEELQREVFGPVLHVVRYP 1037
            +H+  M+A G RV +   +  L     +G+FV+PT++EL+S+ +L  EVFGPVLHV+R+ 
Sbjct: 899  KHVLHMQALGLRVTRQPLSEELR---EHGSFVAPTVVELESLSQLPGEVFGPVLHVLRWR 955

Query: 1038 RAGLDTLLAQINGTGYGLTMGIHTRIDETIEHIVERAEVGNLYVNRNIVGAVVGVQPFGG 1097
            R  LD LL +I  TGY LT+G+HTRIDETI  +  RA  GN YVNRN++GAVVGVQPFGG
Sbjct: 956  RGELDGLLQRIEATGYALTLGLHTRIDETIALVTARARAGNQYVNRNMIGAVVGVQPFGG 1015

Query: 1098 EGLSGTGPKAGGPLYLHRLLSVCPLDAVARVVRASDTVGGADETGPVRRTLTETLATLKE 1157
            EGLSGTGPKAGGPL + RL   C   A A +   +     A             L  L+ 
Sbjct: 1016 EGLSGTGPKAGGPLLVRRL---CERHAPALLAIGTPVAAVAPSGNGGGEPRLPALRVLRG 1072

Query: 1158 WAQRESAALPGLVAACERFAAASAAGLSVTLPGPTGERNTYTLLPRAAVLCLAQQETDLA 1217
            W Q        LVAAC+   A S AGL V LPGPTGERN Y LLPR  VLC A    DL 
Sbjct: 1073 WLQEALPEDAALVAACDALLAQSPAGLDVLLPGPTGERNRYALLPRRRVLCQAGDRGDLL 1132

Query: 1218 VQLAAVLAAGSQAVWVESPMARALFARLPKAVQSRVRLVADWSAGDTGFDAVLHHGDSDQ 1277
              LA VLA G + +W +S  ARAL A LP  V+ RV+L A+    D   D         +
Sbjct: 1133 FLLALVLATGGRVLWADSAAARALHAALPAVVRERVKLSANPLGED--IDLAAAQDAPGR 1190

Query: 1278 LRAVCEQLATRPGPIISVQGLAHGEPNIAI---ERLLIERSLSVNTAAAGGNASLMTIG 1333
            +  +   L+ R GPI+ +   A GE + A+   ERL++ERSL VNTAAAGGNA LM +G
Sbjct: 1191 VLELSLALSQRDGPIVPLVACAKGERDPALLPPERLMVERSLCVNTAAAGGNAGLMAMG 1249


Lambda     K      H
   0.318    0.133    0.383 

Gapped
Lambda     K      H
   0.267   0.0410    0.140 


Matrix: BLOSUM62
Gap Penalties: Existence: 11, Extension: 1
Number of Sequences: 1
Number of Hits to DB: 4010
Number of extensions: 186
Number of successful extensions: 9
Number of sequences better than 1.0e-02: 1
Number of HSP's gapped: 1
Number of HSP's successfully gapped: 1
Length of query: 1333
Length of database: 1249
Length adjustment: 48
Effective length of query: 1285
Effective length of database: 1201
Effective search space:  1543285
Effective search space used:  1543285
Neighboring words threshold: 11
Window for multiple hits: 40
X1: 16 ( 7.3 bits)
X2: 38 (14.6 bits)
X3: 64 (24.7 bits)
S1: 41 (21.7 bits)
S2: 59 (27.3 bits)

Align candidate WP_051243110.1 H537_RS44845 (trifunctional transcriptional regulator/proline dehydrogenase/L-glutamate gamma-semialdehyde dehydrogenase)
to HMM TIGR01238 (delta-1-pyrroline-5-carboxylate dehydrogenase (EC 1.2.1.88))

# hmmsearch :: search profile(s) against a sequence database
# HMMER 3.3.1 (Jul 2020); http://hmmer.org/
# Copyright (C) 2020 Howard Hughes Medical Institute.
# Freely distributed under the BSD open source license.
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# query HMM file:                  ../tmp/path.carbon/TIGR01238.hmm
# target sequence database:        /tmp/gapView.7439.genome.faa
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Query:       TIGR01238  [M=500]
Accession:   TIGR01238
Description: D1pyr5carbox3: delta-1-pyrroline-5-carboxylate dehydrogenase
Scores for complete sequences (score includes all domains):
   --- full sequence ---   --- best 1 domain ---    -#dom-
    E-value  score  bias    E-value  score  bias    exp  N  Sequence                                 Description
    ------- ------ -----    ------- ------ -----   ---- --  --------                                 -----------
   6.8e-207  673.8   0.2     9e-207  673.4   0.2    1.1  1  lcl|NCBI__GCF_000430725.1:WP_051243110.1  H537_RS44845 trifunctional trans


Domain annotation for each sequence (and alignments):
>> lcl|NCBI__GCF_000430725.1:WP_051243110.1  H537_RS44845 trifunctional transcriptional regulator/proline dehydrogenase/
   #    score  bias  c-Evalue  i-Evalue hmmfrom  hmm to    alifrom  ali to    envfrom  env to     acc
 ---   ------ ----- --------- --------- ------- -------    ------- -------    ------- -------    ----
   1 !  673.4   0.2    9e-207    9e-207       2     497 ..     540    1036 ..     539    1039 .. 0.98

  Alignments for each domain:
  == domain 1  score: 673.4 bits;  conditional E-value: 9e-207
                                 TIGR01238    2 lygegrknslGvdlaneselksleeqllkaaakkfqaapivgekakaegeaqpvknpadrkdivGqv 68  
                                                l+g  r ns+G+dlane++l+sl++ ll+ a + + a+p+v++ ++  +  qpv npadr d vG+v
  lcl|NCBI__GCF_000430725.1:WP_051243110.1  540 LFGALRPNSAGIDLANEHRLASLSAALLHGARQAWTAVPLVAGLPRPGDLPQPVLNPADRSDRVGTV 606 
                                                8999*************************************************************** PP

                                 TIGR01238   69 seadaaevqeavdsavaafaewsatdakeraailerladlleshmpelvallvreaGktlsnaiaev 135 
                                                +ea a+e+++a+++a a  + w at+++eraa ler ad le ++ +lv+l+vreaGkt+  a+ ev
  lcl|NCBI__GCF_000430725.1:WP_051243110.1  607 REARADEMEDALSAAAAVQPAWGATPPAERAALLERGADALEDQLQTLVGLIVREAGKTVPAAVGEV 673 
                                                ******************************************************************* PP

                                 TIGR01238  136 reavdflryyakqvedvldeesakalGavvcispwnfplaiftGqiaaalaaGntviakpaeqtsli 202 
                                                reavd lry a q++  ld    ++lG+v cispwnfplaiftGq+aaalaaGn+v+akpa qt+l+
  lcl|NCBI__GCF_000430725.1:WP_051243110.1  674 REAVDALRYAALQARTALDAG-SRPLGPVACISPWNFPLAIFTGQLAAALAAGNAVVAKPARQTPLV 739 
                                                ******************998.8******************************************** PP

                                 TIGR01238  203 aaravellqeaGvpagviqllpGrGedvGaaltsderiaGviftGstevarlinkalakredap... 266 
                                                aa+av ll+ aGvp +v+qllpG Ge+vG  l+ d+r++Gv+ftGst+var+++ +la r da+   
  lcl|NCBI__GCF_000430725.1:WP_051243110.1  740 AAEAVRLLHAAGVPGAVLQLLPGPGESVGLRLARDARVRGVLFTGSTAVARRLQAELALRLDARgvp 806 
                                                **************************************************************98744 PP

                                 TIGR01238  267 vpliaetGGqnamivdstalaeqvvadvlasafdsaGqrcsalrvlcvqedvadrvltlikGamdel 333 
                                                  l+aetGG na+i ds+alaeq+v dvlasafdsaGqrcsalrvlc+q+++a+ vl +++Ga+ el
  lcl|NCBI__GCF_000430725.1:WP_051243110.1  807 PLLVAETGGLNALIADSSALAEQLVPDVLASAFDSAGQRCSALRVLCLQQEIAEPVLRMLQGALAEL 873 
                                                469**************************************************************** PP

                                 TIGR01238  334 kvgkpirlttdvGpvidaeakqnllahiekmkakakkvaqvkleddvesekgtfvaptlfelddlde 400 
                                                 vg+p  l tdvGpvid+ a++ +++h+ +m+a + +v++  l +    e+g+fvapt++el++l++
  lcl|NCBI__GCF_000430725.1:WP_051243110.1  874 CVGRPDALATDVGPVIDDTARDTVEKHVLHMQALGLRVTRQPLSE-ELREHGSFVAPTVVELESLSQ 939 
                                                ***********************************9999888887.4679***************** PP

                                 TIGR01238  401 lkkevfGpvlhvvrykadeldkvvdkinakGygltlGvhsrieetvrqiekrakvGnvyvnrnlvGa 467 
                                                l  evfGpvlhv+r+++ eld ++++i+a+Gy+ltlG+h+ri+et++ ++ ra++Gn yvnrn++Ga
  lcl|NCBI__GCF_000430725.1:WP_051243110.1  940 LPGEVFGPVLHVLRWRRGELDGLLQRIEATGYALTLGLHTRIDETIALVTARARAGNQYVNRNMIGA 1006
                                                ******************************************************************* PP

                                 TIGR01238  468 vvGvqpfGGeGlsGtGpkaGGplylyrltr 497 
                                                vvGvqpfGGeGlsGtGpkaGGpl + rl +
  lcl|NCBI__GCF_000430725.1:WP_051243110.1 1007 VVGVQPFGGEGLSGTGPKAGGPLLVRRLCE 1036
                                                ************************999975 PP



Internal pipeline statistics summary:
-------------------------------------
Query model(s):                            1  (500 nodes)
Target sequences:                          1  (1249 residues searched)
Passed MSV filter:                         1  (1); expected 0.0 (0.02)
Passed bias filter:                        1  (1); expected 0.0 (0.02)
Passed Vit filter:                         1  (1); expected 0.0 (0.001)
Passed Fwd filter:                         1  (1); expected 0.0 (1e-05)
Initial search space (Z):                  1  [actual number of targets]
Domain search space  (domZ):               1  [number of targets reported over threshold]
# CPU time: 0.02u 0.00s 00:00:00.02 Elapsed: 00:00:00.02
# Mc/sec: 25.60
//
[ok]

This GapMind analysis is from Sep 24 2021. The underlying query database was built on Sep 17 2021.

Links

Downloads

Related tools

About GapMind

Each pathway is defined by a set of rules based on individual steps or genes. Candidates for each step are identified by using ublast (a fast alternative to protein BLAST) against a database of manually-curated proteins (most of which are experimentally characterized) or by using HMMer with enzyme models (usually from TIGRFam). Ublast hits may be split across two different proteins.

A candidate for a step is "high confidence" if either:

where "other" refers to the best ublast hit to a sequence that is not annotated as performing this step (and is not "ignored").

Otherwise, a candidate is "medium confidence" if either:

Other blast hits with at least 50% coverage are "low confidence."

Steps with no high- or medium-confidence candidates may be considered "gaps." For the typical bacterium that can make all 20 amino acids, there are 1-2 gaps in amino acid biosynthesis pathways. For diverse bacteria and archaea that can utilize a carbon source, there is a complete high-confidence catabolic pathway (including a transporter) just 38% of the time, and there is a complete medium-confidence pathway 63% of the time. Gaps may be due to:

GapMind relies on the predicted proteins in the genome and does not search the six-frame translation. In most cases, you can search the six-frame translation by clicking on links to Curated BLAST for each step definition (in the per-step page).

For more information, see:

If you notice any errors or omissions in the step descriptions, or any questionable results, please let us know

by Morgan Price, Arkin group, Lawrence Berkeley National Laboratory