GapMind for catabolism of small carbon sources

 

Alignments for a candidate for rocA in Epibacterium ulvae U95

Align L-glutamate gamma-semialdehyde dehydrogenase (EC 1.2.1.88); Proline dehydrogenase (EC 1.5.5.2) (characterized)
to candidate WP_090216616.1 CV091_RS05315 bifunctional proline dehydrogenase/L-glutamate gamma-semialdehyde dehydrogenase PutA

Query= reanno::Phaeo:GFF1160
         (1158 letters)



>NCBI__GCF_002796795.1:WP_090216616.1
          Length = 1135

 Score = 1711 bits (4432), Expect = 0.0
 Identities = 869/1136 (76%), Positives = 967/1136 (85%), Gaps = 8/1136 (0%)

Query: 24   LRYRIDAGTYVDQAQMRDQLFALANLDATDRSTISANAAALVRDIRGHSSPGLMEVFLAE 83
            LR RID  TY D  Q RD+L A A L A DR  I   AA LVRDIRGHS+PGLMEVFLAE
Sbjct: 7    LRDRIDLQTYADPEQKRDELIATAALSAEDRKAICGQAAGLVRDIRGHSAPGLMEVFLAE 66

Query: 84   YGLSTDEGVALMCLAEALLRVPDADTIDALIEDKIAPSEWGKHLGKSTSSLVNASTWALM 143
            YGLSTDEGVALMCLAEALLRVPDA+TIDALIEDKIAPS+WGKHLG S+SSLVNASTWALM
Sbjct: 67   YGLSTDEGVALMCLAEALLRVPDAETIDALIEDKIAPSDWGKHLGHSSSSLVNASTWALM 126

Query: 144  LTGKVLDEKRSPVSALRGAMKRLGEPVIRTAVSRAMKEMGRQFVLGETIEGAMKRAAGME 203
            LTGKVLDE RSPV ALR A+KRLGEPVIRTAV RAMKEMGRQFVLGETIE AM RA GME
Sbjct: 127  LTGKVLDEGRSPVGALRSAIKRLGEPVIRTAVGRAMKEMGRQFVLGETIESAMTRARGME 186

Query: 204  AKGYTYSYDMLGEAARTEADAARYHLAYSRAISAIAAACNSADIRQNPGISVKLSALHPR 263
             KGYTYSYDMLGEAARTEADAARYHL+YS+AISAIA AC S DIR+NPGISVKLSALHPR
Sbjct: 187  DKGYTYSYDMLGEAARTEADAARYHLSYSKAISAIANACTSDDIRKNPGISVKLSALHPR 246

Query: 264  YELAQETSVKEQLVPRLQALALLAKAAGMGLNVDAEEADRLSLSLEVIEEVISDPALAGW 323
            YELAQET V E+LVPRL+ALALLAKAA MGLNVDAEEA+RLSLSLEVIE V+SDPALAGW
Sbjct: 247  YELAQETLVMEELVPRLKALALLAKAAKMGLNVDAEEANRLSLSLEVIEAVVSDPALAGW 306

Query: 324  DGFGVVVQAYGPRTGAALDALYDMANRYDRRLMVRLVKGAYWDTEVKRAQVEGVDGFPVF 383
            DGFG+VVQAYGPRTG ALDALY+MA+RYDR+ M+RLVKGAYWDTEVK AQVEG+DGFPV+
Sbjct: 307  DGFGIVVQAYGPRTGVALDALYEMADRYDRKFMIRLVKGAYWDTEVKLAQVEGIDGFPVY 366

Query: 384  THKSLTDVSYIANARKLLSITDRIYPQFATHNAHTVSAILHMAKDTDKGAYEFQRLHGMG 443
            T+K+LTDVSYIANARKLL++TDRIYPQFATHNAHTVSAI+HMA++    A+EFQRLHGMG
Sbjct: 367  TNKALTDVSYIANARKLLNMTDRIYPQFATHNAHTVSAIVHMAQEGQ--AFEFQRLHGMG 424

Query: 444  ETLHNMVLEQNQTHCRIYAPVGAHRDLLAYLVRRLLENGANSSFVNQIVDENVPPELVAA 503
            ETLH +VLEQN+T+CRIYAPVGAHRDLLAYLVRRLLENGANSSFVNQIVDE+V PE VA 
Sbjct: 425  ETLHQLVLEQNKTNCRIYAPVGAHRDLLAYLVRRLLENGANSSFVNQIVDESVAPERVAT 484

Query: 504  DPFAQVEDLTANLRKGPDLFQPERPNSIGFDLGHAPTLAAIDAARAPWKSHSWAAEPLLA 563
            DPF Q+ DL   +  GP+L+  ERPNS GFDL HAPTL AID+AR PW++H+W A PLLA
Sbjct: 485  DPFDQIGDLKRQIPTGPELYGAERPNSKGFDLAHAPTLTAIDSARTPWRAHNWVARPLLA 544

Query: 564  KAPETATTTDEPVRNPADLTTVGRVQTAGQAEIETALSAATPWNASAETRAEVLNRAADL 623
               +T  +  + V NP+D   VG        ++E AL+ A  W+A A+ RAE+LNRAADL
Sbjct: 545  S--DTTGSAPQNVMNPSDHALVGESSECRLEDVEQALNDAARWSAPAQERAEILNRAADL 602

Query: 624  YEANYGELFALLTREAGKTLPDCVAELREAVDFLRYYAARISAEPPVGVFTCISPWNFPL 683
            YEA+YGELFALL REAGKTL D VAELREAVDFLRYYAA I A  P G+FTCISPWNFPL
Sbjct: 603  YEAHYGELFALLHREAGKTLMDAVAELREAVDFLRYYAANIPAADPAGIFTCISPWNFPL 662

Query: 684  AIFSGQIAAALAVGNAVLAKPAEQTPLIAHRAISLLHEAGVPRSALQLLPGAG-AVGGAL 742
            AIF+GQIAAALAVGN VLAKPAE T LIAHRA+ LLHEAGVPR+ALQL PG G  +G  L
Sbjct: 663  AIFTGQIAAALAVGNGVLAKPAESTTLIAHRAVQLLHEAGVPRTALQLTPGRGREIGPLL 722

Query: 743  TSDARVGGVAFTGSTATALKIRAAMAEHLRPGAPLIAETGGLNAMIVDSTALPEQAVQSI 802
            T D RV GVAFTGSTATAL IR  MA+ LRPGAPLIAETGGLNAMIVDSTALPEQAVQ+I
Sbjct: 723  TGDPRVSGVAFTGSTATALHIRTEMAKGLRPGAPLIAETGGLNAMIVDSTALPEQAVQAI 782

Query: 803  IESAFQSAGQRCSALRCLYLQEDIADNVLKMLKGAMDALHLGDPWNLSTDSGPVIDETAR 862
            IESAFQSAGQRCSALRCLYLQEDIAD VL MLKGAMD LHLGDPWNLSTDSGPVID  A+
Sbjct: 783  IESAFQSAGQRCSALRCLYLQEDIADTVLDMLKGAMDCLHLGDPWNLSTDSGPVIDSRAQ 842

Query: 863  AGILAHIDAARAEGRVLKEMTAPQGGTFVAPTLIEITGIQALEQEIFGPVLHVVRFKSQD 922
            +GILAHI  AR+EGRV+ E+  PQGGTFVAPTLIE++GI AL++EIFGPVLHV RFK++D
Sbjct: 843  SGILAHISTARSEGRVMHELHPPQGGTFVAPTLIEVSGIDALKEEIFGPVLHVARFKARD 902

Query: 923  LDQIIRDINATGYGLTFGLHTRIDDRVQYICDRIHAGNLYVNRNQIGAIVGSQPFGGEGL 982
            LD++I  IN TGYGLTFGLHTRIDDRVQ++CDRI AGN+YVNRNQIGAIVGSQPFGGEGL
Sbjct: 903  LDKVIEAINGTGYGLTFGLHTRIDDRVQHVCDRIKAGNIYVNRNQIGAIVGSQPFGGEGL 962

Query: 983  SGTGPKAGGPFYMMRFCAPDRQKSVDSWPSDAPAMTMLPAPTGQPMQEITTSLPGPTGES 1042
            SGTGPKAGGP Y+ R+CAPDRQ S +++ +   +  ++PAPTG   Q  T +LPGPTGES
Sbjct: 963  SGTGPKAGGPLYLSRYCAPDRQTSAETFNN---STRVVPAPTGTAAQPTTQTLPGPTGES 1019

Query: 1043 NRLSQLARPPLLCLGPGPQAVVAQARAVHALGGTAIEATGPLDMRQLLTMEGTSGVIWWG 1102
            NRL+   R PLLC+GPG +A   QA+AVH+ GG AI+    LD+ QL T++  +GV+WWG
Sbjct: 1020 NRLTTAPRLPLLCMGPGKKAAAEQAKAVHSHGGLAIQMADNLDLDQLRTLDAIAGVLWWG 1079

Query: 1103 DETTAREIESWLARRNGPILPLIPGLPDKARVQAERHVCVDTTAAGGNAALLGGMG 1158
            DE TAREIE  LA R+G ILPLIPGLPD+ARV AE HVCVDTTAAGGNA+LLGG G
Sbjct: 1080 DEQTAREIEQHLAARDGAILPLIPGLPDRARVMAEHHVCVDTTAAGGNASLLGGQG 1135


Lambda     K      H
   0.317    0.132    0.387 

Gapped
Lambda     K      H
   0.267   0.0410    0.140 


Matrix: BLOSUM62
Gap Penalties: Existence: 11, Extension: 1
Number of Sequences: 1
Number of Hits to DB: 3256
Number of extensions: 127
Number of successful extensions: 5
Number of sequences better than 1.0e-02: 1
Number of HSP's gapped: 1
Number of HSP's successfully gapped: 1
Length of query: 1158
Length of database: 1135
Length adjustment: 46
Effective length of query: 1112
Effective length of database: 1089
Effective search space:  1210968
Effective search space used:  1210968
Neighboring words threshold: 11
Window for multiple hits: 40
X1: 16 ( 7.3 bits)
X2: 38 (14.6 bits)
X3: 64 (24.7 bits)
S1: 41 (21.7 bits)
S2: 58 (26.9 bits)

Align candidate WP_090216616.1 CV091_RS05315 (bifunctional proline dehydrogenase/L-glutamate gamma-semialdehyde dehydrogenase PutA)
to HMM TIGR01238 (delta-1-pyrroline-5-carboxylate dehydrogenase (EC 1.2.1.88))

# hmmsearch :: search profile(s) against a sequence database
# HMMER 3.3.1 (Jul 2020); http://hmmer.org/
# Copyright (C) 2020 Howard Hughes Medical Institute.
# Freely distributed under the BSD open source license.
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# query HMM file:                  ../tmp/path.carbon/TIGR01238.hmm
# target sequence database:        /tmp/gapView.1362687.genome.faa
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Query:       TIGR01238  [M=500]
Accession:   TIGR01238
Description: D1pyr5carbox3: delta-1-pyrroline-5-carboxylate dehydrogenase
Scores for complete sequences (score includes all domains):
   --- full sequence ---   --- best 1 domain ---    -#dom-
    E-value  score  bias    E-value  score  bias    exp  N  Sequence                             Description
    ------- ------ -----    ------- ------ -----   ---- --  --------                             -----------
   2.8e-195  635.5   0.0   1.2e-192  626.8   0.0    2.4  2  NCBI__GCF_002796795.1:WP_090216616.1  


Domain annotation for each sequence (and alignments):
>> NCBI__GCF_002796795.1:WP_090216616.1  
   #    score  bias  c-Evalue  i-Evalue hmmfrom  hmm to    alifrom  ali to    envfrom  env to     acc
 ---   ------ ----- --------- --------- ------- -------    ------- -------    ------- -------    ----
   1 !  626.8   0.0  1.2e-192  1.2e-192       1     495 [.     502     978 ..     502     982 .. 0.97
   2 !    6.1   0.0   0.00017   0.00017     237     270 ..    1069    1102 ..    1062    1123 .. 0.89

  Alignments for each domain:
  == domain 1  score: 626.8 bits;  conditional E-value: 1.2e-192
                             TIGR01238   1 dlygegrknslGvdlaneselksleeqllkaaakkfqaapivgekakaegeaqpvknpadrkdivGqvseada 73 
                                           +lyg  r ns+G dla   +l  +++  + + a+++ a p++  +       q v+np d+  +vG+ se  +
  NCBI__GCF_002796795.1:WP_090216616.1 502 ELYGAERPNSKGFDLAHAPTLTAIDSARTPWRAHNWVARPLL-ASDTTGSAPQNVMNPSDHA-LVGESSECRL 572
                                           79****************************************.55566667799******85.89*******9 PP

                             TIGR01238  74 aevqeavdsavaafaewsatdakeraailerladlleshmpelvallvreaGktlsnaiaevreavdflryya 146
                                           ++v++a++     +a wsa +a+era il+r+adl e h  el all+reaGktl +a+ae+reavdflryya
  NCBI__GCF_002796795.1:WP_090216616.1 573 EDVEQALN----DAARWSA-PAQERAEILNRAADLYEAHYGELFALLHREAGKTLMDAVAELREAVDFLRYYA 640
                                           99998875....5689*98.9**************************************************** PP

                             TIGR01238 147 kqvedvldeesakalGavvcispwnfplaiftGqiaaalaaGntviakpaeqtsliaaravellqeaGvpagv 219
                                            ++       +a + G + cispwnfplaiftGqiaaala Gn v+akpae t+lia rav+ll+eaGvp ++
  NCBI__GCF_002796795.1:WP_090216616.1 641 ANI------PAADPAGIFTCISPWNFPLAIFTGQIAAALAVGNGVLAKPAESTTLIAHRAVQLLHEAGVPRTA 707
                                           *99......45789*********************************************************** PP

                             TIGR01238 220 iqllpGrGedvGaaltsderiaGviftGstevarlinkalakredapvpliaetGGqnamivdstalaeqvva 292
                                           +ql pGrG ++G  lt d+r++Gv+ftGst++a  i+ ++ak   + +pliaetGG namivdstal+eq v+
  NCBI__GCF_002796795.1:WP_090216616.1 708 LQLTPGRGREIGPLLTGDPRVSGVAFTGSTATALHIRTEMAKGLRPGAPLIAETGGLNAMIVDSTALPEQAVQ 780
                                           ************************************************************************* PP

                             TIGR01238 293 dvlasafdsaGqrcsalrvlcvqedvadrvltlikGamdelkvgkpirlttdvGpvidaeakqnllahiekmk 365
                                            +++saf+saGqrcsalr l++qed+ad vl+++kGamd l++g p +l td Gpvid++a+  +lahi   +
  NCBI__GCF_002796795.1:WP_090216616.1 781 AIIESAFQSAGQRCSALRCLYLQEDIADTVLDMLKGAMDCLHLGDPWNLSTDSGPVIDSRAQSGILAHISTAR 853
                                           ************************************************************************* PP

                             TIGR01238 366 akakkvaqvkleddvesekgtfvaptlfelddldelkkevfGpvlhvvrykadeldkvvdkinakGygltlGv 438
                                           + ++ ++++   +      gtfvaptl+e+  +d+lk+e+fGpvlhv r+ka++ldkv++ in +Gyglt+G+
  NCBI__GCF_002796795.1:WP_090216616.1 854 SEGRVMHELHPPQ-----GGTFVAPTLIEVSGIDALKEEIFGPVLHVARFKARDLDKVIEAINGTGYGLTFGL 921
                                           *******998765.....9****************************************************** PP

                             TIGR01238 439 hsrieetvrqiekrakvGnvyvnrnlvGavvGvqpfGGeGlsGtGpkaGGplylyrl 495
                                           h+ri++ v+++ +r+k+Gn+yvnrn++Ga+vG qpfGGeGlsGtGpkaGGplyl r+
  NCBI__GCF_002796795.1:WP_090216616.1 922 HTRIDDRVQHVCDRIKAGNIYVNRNQIGAIVGSQPFGGEGLSGTGPKAGGPLYLSRY 978
                                           ******************************************************997 PP

  == domain 2  score: 6.1 bits;  conditional E-value: 0.00017
                             TIGR01238  237 deriaGviftGstevarlinkalakredapvpli 270 
                                             ++iaGv++ G  ++ar i++ la r+ a  pli
  NCBI__GCF_002796795.1:WP_090216616.1 1069 LDAIAGVLWWGDEQTAREIEQHLAARDGAILPLI 1102
                                            578************************9999998 PP



Internal pipeline statistics summary:
-------------------------------------
Query model(s):                            1  (500 nodes)
Target sequences:                          1  (1135 residues searched)
Passed MSV filter:                         1  (1); expected 0.0 (0.02)
Passed bias filter:                        1  (1); expected 0.0 (0.02)
Passed Vit filter:                         1  (1); expected 0.0 (0.001)
Passed Fwd filter:                         1  (1); expected 0.0 (1e-05)
Initial search space (Z):                  1  [actual number of targets]
Domain search space  (domZ):               1  [number of targets reported over threshold]
# CPU time: 0.01u 0.00s 00:00:00.01 Elapsed: 00:00:00.01
# Mc/sec: 53.51
//
[ok]

This GapMind analysis is from Sep 24 2021. The underlying query database was built on Sep 17 2021.

Links

Downloads

Related tools

About GapMind

Each pathway is defined by a set of rules based on individual steps or genes. Candidates for each step are identified by using ublast (a fast alternative to protein BLAST) against a database of manually-curated proteins (most of which are experimentally characterized) or by using HMMer with enzyme models (usually from TIGRFam). Ublast hits may be split across two different proteins.

A candidate for a step is "high confidence" if either:

where "other" refers to the best ublast hit to a sequence that is not annotated as performing this step (and is not "ignored").

Otherwise, a candidate is "medium confidence" if either:

Other blast hits with at least 50% coverage are "low confidence."

Steps with no high- or medium-confidence candidates may be considered "gaps." For the typical bacterium that can make all 20 amino acids, there are 1-2 gaps in amino acid biosynthesis pathways. For diverse bacteria and archaea that can utilize a carbon source, there is a complete high-confidence catabolic pathway (including a transporter) just 38% of the time, and there is a complete medium-confidence pathway 63% of the time. Gaps may be due to:

GapMind relies on the predicted proteins in the genome and does not search the six-frame translation. In most cases, you can search the six-frame translation by clicking on links to Curated BLAST for each step definition (in the per-step page).

For more information, see:

If you notice any errors or omissions in the step descriptions, or any questionable results, please let us know

by Morgan Price, Arkin group, Lawrence Berkeley National Laboratory