GapMind for catabolism of small carbon sources

 

Alignments for a candidate for rocA in Ochrobactrum thiophenivorans DSM 7216

Align L-glutamate gamma-semialdehyde dehydrogenase (EC 1.2.1.88); Proline dehydrogenase (EC 1.5.5.2) (characterized)
to candidate WP_094508021.1 CEV31_RS15250 trifunctional transcriptional regulator/proline dehydrogenase/L-glutamate gamma-semialdehyde dehydrogenase

Query= reanno::azobra:AZOBR_RS23695
         (1235 letters)



>NCBI__GCF_002252445.1:WP_094508021.1
          Length = 1227

 Score = 1574 bits (4076), Expect = 0.0
 Identities = 822/1222 (67%), Positives = 951/1222 (77%), Gaps = 13/1222 (1%)

Query: 18   FADFAPPIRPATELRAAITAAYRRPEPECLPFLFEQASLPPGVITAAAATARKLITALRA 77
            F +FAPPIR  T LR AIT AYRRPE EC+  + +QA+LP         TARKLI ALRA
Sbjct: 13   FQNFAPPIRVQTPLRKAITDAYRRPEAECVTAIVQQATLPEETAKQVRETARKLIEALRA 72

Query: 78   KPRGRGVEGLIHEYSLSSQEGMALMCLAEALLRIPDHATRDALIRDKIAGGDWQAHLGKG 137
            K +G GVEGL+HEYSLSSQEG+ALMCLAEALLRIPD ATRDALIRDKI+ GDW++H+G G
Sbjct: 73   KHKGTGVEGLVHEYSLSSQEGVALMCLAEALLRIPDMATRDALIRDKISNGDWKSHVGGG 132

Query: 138  GSMFVNAATWGLLITGKLTSAGGEQALSSALTRLIARGGEPLIRRGVDFAMRMMGEQFVT 197
             S+FVNAATWGL++TGKLT+   +  LS+ALTRLIAR GEP+IRRGVD AMRMMGEQFVT
Sbjct: 133  RSLFVNAATWGLVVTGKLTNTVNDSGLSAALTRLIARCGEPVIRRGVDMAMRMMGEQFVT 192

Query: 198  GQTIQEALTNARTMEAEGFRYSYDMLGEAALTAEDAARYYADYVNAIHAIGTASAGRGVY 257
            G+TI EAL  A+++E  GFRYSYDMLGEAA TA DA RYY DY  AIHAIG ASAGRG+Y
Sbjct: 193  GETIDEALKRAKSLEERGFRYSYDMLGEAATTAADAERYYKDYETAIHAIGRASAGRGIY 252

Query: 258  EGPGISIKLSAIHPRYSRAQADRVMDELLPRVKALALLAKGYDIGLNIDAEEADRLELSL 317
            +GPGISIKLSA+HPRY RAQ++RVM ELLP+VK LA L+K Y+IGLNIDAEEADRLELSL
Sbjct: 253  DGPGISIKLSALHPRYVRAQSERVMSELLPKVKELAALSKQYNIGLNIDAEEADRLELSL 312

Query: 318  DLMESLCFDPDLAGWNGIGFVVQAYGKRCPYVIDFLIDLARRSGHRLMIRLVKGAYWDSE 377
            DL++SL  DPDLA W GIGFVVQAYGKRCP+V+DF+IDLARR+  R+M+RLVKGAYWD+E
Sbjct: 313  DLLQSLIEDPDLADWEGIGFVVQAYGKRCPFVLDFIIDLARRNNRRVMVRLVKGAYWDAE 372

Query: 378  IKRAQLDGLPDFPVYTRKVYTDVSYVACARKLLAAPEAVFPQFATHNAQTLATIYEMAGS 437
            IKRAQ+DGL DFPVYTRKV+TDVSY+ACA KLL A + +FPQFATHNAQTLATIY +AG 
Sbjct: 373  IKRAQVDGLEDFPVYTRKVHTDVSYIACAAKLLGARDVIFPQFATHNAQTLATIYHLAGP 432

Query: 438  DFQVGKYEFQCLHGMGEPLYKEVVGPLK--RPCRIYAPVGTHETLLAYLVRRLLENGANS 495
            DF+ G YEFQCLHGMGEPLY EVVG  K  RP RIYAPVGTHETLLAYLVRRLLENGANS
Sbjct: 433  DFKTGSYEFQCLHGMGEPLYDEVVGASKLGRPARIYAPVGTHETLLAYLVRRLLENGANS 492

Query: 496  SFVNRIADPAVPVDELVADPVAVARAIAPTGAPHALIALPRNLYAPERANSAGIDLSDET 555
            SFVNRI D +V VDEL+ADP  V R++A  GA H  I+LP  LY   R NS G DLS+E 
Sbjct: 493  SFVNRIGDKSVSVDELIADPAEVVRSMAVVGARHDQISLPEGLYG-IRKNSVGFDLSNEE 551

Query: 556  ELARLSAALSASAEMTWTAAPLLADGERAGQAQPVRNPADRRDVVGSVTEASEALVAEAF 615
            +LA LS  L A+A   WTA P +   +  G+++PV NP D  DVVG+VTE +   VA+A 
Sbjct: 552  QLAELSETLKANATRAWTAEPQVVGSKVKGESRPVLNPGDHSDVVGTVTEIAADDVAQAM 611

Query: 616  GHAVAAASAWAATPPEERAASLFRAADTMQERMPTLLGLIVREAGKSLPNAIAEVREAID 675
              A  A ++W+   P +RAA L RAAD MQ  M  LLGLI+REAGKS+PNAIAEVREAID
Sbjct: 612  KAAEKAVASWSHVLPADRAACLDRAADIMQREMAELLGLIMREAGKSMPNAIAEVREAID 671

Query: 676  FLRYYGAQVRDRFDNATHRPLGPVVCISPWNFPLAIFSGQIAAALAAGNPVLAKPAEETP 735
            FLRYY  Q R R     H+PLGP+VCISPWNFPLAIF+GQIAAAL AGNPVLAKPAEETP
Sbjct: 672  FLRYYAEQTR-RTLGVAHKPLGPIVCISPWNFPLAIFTGQIAAALVAGNPVLAKPAEETP 730

Query: 736  LIAAEAVRILHAAGIPAGALQLLPGAGEVGAALVGHEAVRGVMFTGSTEVARLIQRQLAG 795
            LIAA+ VRILH AGIPA ALQLLPG G +GAALV  +   GVMFTGSTEVARLIQ QLA 
Sbjct: 731  LIAAQGVRILHEAGIPADALQLLPGDGRIGAALVAAQETCGVMFTGSTEVARLIQAQLAS 790

Query: 796  RLLPDGAPIPLIAETGGQNAMIVDSSALAEQVVGDVIASAFDSAGQRCSALRILCLQEDV 855
            RLLP+G PIPLIAETGGQNAMIVDSSALAEQVV DVI SAFDSAGQRCSALR+LCLQEDV
Sbjct: 791  RLLPNGKPIPLIAETGGQNAMIVDSSALAEQVVFDVIGSAFDSAGQRCSALRVLCLQEDV 850

Query: 856  ADRTLAMLKGAMRELRIGNPDRLAVDVGPVISEEARATIAAHIEAMRAKGRNVEFLPLPA 915
            ADR L MLKGA++EL IG  D+L VD+G VI+ EA+  I  H++ MR  GR VE LPL  
Sbjct: 851  ADRILTMLKGALKELSIGRTDKLKVDIGAVITAEAKDIIEKHVQTMRDMGRKVEQLPLGP 910

Query: 916  ETADGTFIAPTVIEIGGIHELEREVFGPVLHVVRFHRDDLDALVDSINATGYGLTFGLHT 975
            ET  GTF+APT++EI  + +L+REVFGPVLHVVR+ R+D+D+L+D INATGYGLTFGLHT
Sbjct: 911  ETGKGTFVAPTIVEIDSLRDLKREVFGPVLHVVRYKRNDMDSLIDDINATGYGLTFGLHT 970

Query: 976  RIDATIERVTGRIGAGNVYVNRNTIGAVVGVQPFGGHGLSGTGPKAGGPLYLSRLLSRRP 1035
            R+D TI  V  RI  GN+Y+NRN IGA+VGVQPFGG GLSGTGPKAGGPLYL RL+   P
Sbjct: 971  RLDETIAHVADRIRVGNIYINRNVIGAIVGVQPFGGRGLSGTGPKAGGPLYLGRLVETAP 1030

Query: 1036 KGWLEFRGPDAARAAGL-AYGEWLRAKGFT--AEASRCAGYVARSAIGGGAELNGPVGER 1092
               +  R         L  + +WL  +G    A+A+R  G  + SA+G   EL GPVGER
Sbjct: 1031 ---IPPRMASVHSDPALNDFAKWLGNRGMNELAQAARETG--SASALGLNIELPGPVGER 1085

Query: 1093 NLYELHGRGRVLLLPQTRTGLLLQLGAVLATGNSAAVDAPPDLAELLRGLPPALAARVRT 1152
            NLY LH RGR+LL PQT +GL  QL A LATGN A +D    L  +L+GLP ++AARV  
Sbjct: 1086 NLYALHARGRILLAPQTESGLYRQLTAALATGNEAIIDEASGLRNVLKGLPSSVAARVVW 1145

Query: 1153 TADWRDVGPLAAVLVEGDRERVTAINRRVADLPGPILLVQAATAEALAAGRGEGYDLDLL 1212
            T DW+   P A  LVEG+ ER+  IN+++A LPGP++L QAAT+  LA    + Y L  L
Sbjct: 1146 TKDWQADAPFAGALVEGEGERLIDINKKLAALPGPLVLTQAATSAQLAQ-NADCYCLSWL 1204

Query: 1213 LNERSVSVNTAAAGGNASLVAM 1234
            L E S S+NT AAGGNASL+A+
Sbjct: 1205 LEEVSTSINTTAAGGNASLMAI 1226


Lambda     K      H
   0.319    0.136    0.396 

Gapped
Lambda     K      H
   0.267   0.0410    0.140 


Matrix: BLOSUM62
Gap Penalties: Existence: 11, Extension: 1
Number of Sequences: 1
Number of Hits to DB: 3574
Number of extensions: 150
Number of successful extensions: 6
Number of sequences better than 1.0e-02: 1
Number of HSP's gapped: 1
Number of HSP's successfully gapped: 1
Length of query: 1235
Length of database: 1227
Length adjustment: 47
Effective length of query: 1188
Effective length of database: 1180
Effective search space:  1401840
Effective search space used:  1401840
Neighboring words threshold: 11
Window for multiple hits: 40
X1: 16 ( 7.4 bits)
X2: 38 (14.6 bits)
X3: 64 (24.7 bits)
S1: 41 (21.8 bits)
S2: 59 (27.3 bits)

Align candidate WP_094508021.1 CEV31_RS15250 (trifunctional transcriptional regulator/proline dehydrogenase/L-glutamate gamma-semialdehyde dehydrogenase)
to HMM TIGR01238 (delta-1-pyrroline-5-carboxylate dehydrogenase (EC 1.2.1.88))

# hmmsearch :: search profile(s) against a sequence database
# HMMER 3.3.1 (Jul 2020); http://hmmer.org/
# Copyright (C) 2020 Howard Hughes Medical Institute.
# Freely distributed under the BSD open source license.
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# query HMM file:                  ../tmp/path.carbon/TIGR01238.hmm
# target sequence database:        /tmp/gapView.1013585.genome.faa
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Query:       TIGR01238  [M=500]
Accession:   TIGR01238
Description: D1pyr5carbox3: delta-1-pyrroline-5-carboxylate dehydrogenase
Scores for complete sequences (score includes all domains):
   --- full sequence ---   --- best 1 domain ---    -#dom-
    E-value  score  bias    E-value  score  bias    exp  N  Sequence                             Description
    ------- ------ -----    ------- ------ -----   ---- --  --------                             -----------
   6.1e-223  726.7   0.9   8.9e-223  726.2   0.9    1.2  1  NCBI__GCF_002252445.1:WP_094508021.1  


Domain annotation for each sequence (and alignments):
>> NCBI__GCF_002252445.1:WP_094508021.1  
   #    score  bias  c-Evalue  i-Evalue hmmfrom  hmm to    alifrom  ali to    envfrom  env to     acc
 ---   ------ ----- --------- --------- ------- -------    ------- -------    ------- -------    ----
   1 !  726.2   0.9  8.9e-223  8.9e-223       2     497 ..     535    1027 ..     534    1030 .. 0.98

  Alignments for each domain:
  == domain 1  score: 726.2 bits;  conditional E-value: 8.9e-223
                             TIGR01238    2 lygegrknslGvdlaneselksleeqllkaaakkfqaapivgekakaegeaqpvknpadrkdivGqvsead 72  
                                            lyg  rkns G dl+ne++l++l+e l++ a++ + a p v   +k +ge +pv np d+ d+vG+v+e  
  NCBI__GCF_002252445.1:WP_094508021.1  535 LYGI-RKNSVGFDLSNEEQLAELSETLKANATRAWTAEPQV-VGSKVKGESRPVLNPGDHSDVVGTVTEIA 603 
                                            6887.**********************************99.4567789********************** PP

                             TIGR01238   73 aaevqeavdsavaafaewsatdakeraailerladlleshmpelvallvreaGktlsnaiaevreavdflr 143 
                                            a++v +a+++a +a+a ws + +++raa+l+r+ad+++++m el++l++reaGk++ naiaevrea+dflr
  NCBI__GCF_002252445.1:WP_094508021.1  604 ADDVAQAMKAAEKAVASWSHVLPADRAACLDRAADIMQREMAELLGLIMREAGKSMPNAIAEVREAIDFLR 674 
                                            *********************************************************************** PP

                             TIGR01238  144 yyakqvedvldeesakalGavvcispwnfplaiftGqiaaalaaGntviakpaeqtsliaaravellqeaG 214 
                                            yya+q + +l+   +k+lG++vcispwnfplaiftGqiaaal+aGn v+akpae+t+liaa++v +l+eaG
  NCBI__GCF_002252445.1:WP_094508021.1  675 YYAEQTRRTLGVA-HKPLGPIVCISPWNFPLAIFTGQIAAALVAGNPVLAKPAEETPLIAAQGVRILHEAG 744 
                                            **********988.********************************************************* PP

                             TIGR01238  215 vpagviqllpGrGedvGaaltsderiaGviftGstevarlinkalakredap...vpliaetGGqnamivd 282 
                                            +pa ++qllpG G  +Gaal + +   Gv+ftGstevarli+ +la+r  ++   +pliaetGGqnamivd
  NCBI__GCF_002252445.1:WP_094508021.1  745 IPADALQLLPGDGR-IGAALVAAQETCGVMFTGSTEVARLIQAQLASRLLPNgkpIPLIAETGGQNAMIVD 814 
                                            *************9.*********************************8765555**************** PP

                             TIGR01238  283 stalaeqvvadvlasafdsaGqrcsalrvlcvqedvadrvltlikGamdelkvgkpirlttdvGpvidaea 353 
                                            s+alaeqvv dv+ safdsaGqrcsalrvlc+qedvadr+lt++kGa++el +g+  +l+ d+G vi aea
  NCBI__GCF_002252445.1:WP_094508021.1  815 SSALAEQVVFDVIGSAFDSAGQRCSALRVLCLQEDVADRILTMLKGALKELSIGRTDKLKVDIGAVITAEA 885 
                                            *********************************************************************** PP

                             TIGR01238  354 kqnllahiekmkakakkvaqvkleddvesekgtfvaptlfelddldelkkevfGpvlhvvrykadeldkvv 424 
                                            k+ +++h++ m+++++kv q+ l    e+ kgtfvapt++e+d+l +lk+evfGpvlhvvryk++++d+++
  NCBI__GCF_002252445.1:WP_094508021.1  886 KDIIEKHVQTMRDMGRKVEQLPLGP--ETGKGTFVAPTIVEIDSLRDLKREVFGPVLHVVRYKRNDMDSLI 954 
                                            ***********************99..99****************************************** PP

                             TIGR01238  425 dkinakGygltlGvhsrieetvrqiekrakvGnvyvnrnlvGavvGvqpfGGeGlsGtGpkaGGplylyrl 495 
                                            d ina+Gyglt+G+h+r +et++++ +r++vGn+y+nrn++Ga+vGvqpfGG+GlsGtGpkaGGplyl rl
  NCBI__GCF_002252445.1:WP_094508021.1  955 DDINATGYGLTFGLHTRLDETIAHVADRIRVGNIYINRNVIGAIVGVQPFGGRGLSGTGPKAGGPLYLGRL 1025
                                            **********************************************************************9 PP

                             TIGR01238  496 tr 497 
                                            ++
  NCBI__GCF_002252445.1:WP_094508021.1 1026 VE 1027
                                            87 PP



Internal pipeline statistics summary:
-------------------------------------
Query model(s):                            1  (500 nodes)
Target sequences:                          1  (1227 residues searched)
Passed MSV filter:                         1  (1); expected 0.0 (0.02)
Passed bias filter:                        1  (1); expected 0.0 (0.02)
Passed Vit filter:                         1  (1); expected 0.0 (0.001)
Passed Fwd filter:                         1  (1); expected 0.0 (1e-05)
Initial search space (Z):                  1  [actual number of targets]
Domain search space  (domZ):               1  [number of targets reported over threshold]
# CPU time: 0.01u 0.01s 00:00:00.02 Elapsed: 00:00:00.01
# Mc/sec: 47.50
//
[ok]

This GapMind analysis is from Sep 24 2021. The underlying query database was built on Sep 17 2021.

Links

Downloads

Related tools

About GapMind

Each pathway is defined by a set of rules based on individual steps or genes. Candidates for each step are identified by using ublast (a fast alternative to protein BLAST) against a database of manually-curated proteins (most of which are experimentally characterized) or by using HMMer with enzyme models (usually from TIGRFam). Ublast hits may be split across two different proteins.

A candidate for a step is "high confidence" if either:

where "other" refers to the best ublast hit to a sequence that is not annotated as performing this step (and is not "ignored").

Otherwise, a candidate is "medium confidence" if either:

Other blast hits with at least 50% coverage are "low confidence."

Steps with no high- or medium-confidence candidates may be considered "gaps." For the typical bacterium that can make all 20 amino acids, there are 1-2 gaps in amino acid biosynthesis pathways. For diverse bacteria and archaea that can utilize a carbon source, there is a complete high-confidence catabolic pathway (including a transporter) just 38% of the time, and there is a complete medium-confidence pathway 63% of the time. Gaps may be due to:

GapMind relies on the predicted proteins in the genome and does not search the six-frame translation. In most cases, you can search the six-frame translation by clicking on links to Curated BLAST for each step definition (in the per-step page).

For more information, see:

If you notice any errors or omissions in the step descriptions, or any questionable results, please let us know

by Morgan Price, Arkin group, Lawrence Berkeley National Laboratory