GapMind for catabolism of small carbon sources

 

Alignments for a candidate for rocA in Rhizobium etli CFN 42

Align L-glutamate gamma-semialdehyde dehydrogenase (EC 1.2.1.88); Proline dehydrogenase (EC 1.5.5.2) (characterized)
to candidate WP_011428690.1 RHE_RS28430 trifunctional transcriptional regulator/proline dehydrogenase/L-glutamate gamma-semialdehyde dehydrogenase

Query= reanno::azobra:AZOBR_RS23695
         (1235 letters)



>NCBI__GCF_000092045.1:WP_011428690.1
          Length = 1235

 Score = 1628 bits (4216), Expect = 0.0
 Identities = 839/1227 (68%), Positives = 963/1227 (78%), Gaps = 10/1227 (0%)

Query: 12   PGEAAPFADFAPPIRPATELRAAITAAYRRPEPECLPFLFEQASLPPGVITAAAATARKL 71
            P   APFA FAPP+RP +ELR AITAAYRRPE ECLP L   A +         +TAR L
Sbjct: 14   PAGGAPFAAFAPPVRPQSELRRAITAAYRRPETECLPPLVAAARVSEAKRYDIRSTARTL 73

Query: 72   ITALRAKPRGRGVEGLIHEYSLSSQEGMALMCLAEALLRIPDHATRDALIRDKIAGGDWQ 131
            I ALRAK +G GVEGL+ EYSLSSQEG+ALMCLAEALLRIPD  TRDALIRDKIA G+W 
Sbjct: 74   IEALRAKHKGTGVEGLVQEYSLSSQEGVALMCLAEALLRIPDTDTRDALIRDKIAEGNWT 133

Query: 132  AHLGKGGSMFVNAATWGLLITGKLTSAGGEQALSSALTRLIARGGEPLIRRGVDFAMRMM 191
            +H+G G SMFVNAATWGL++TGKLTS   +++LS+ALTRLIAR GEP+IRRGVD AMRMM
Sbjct: 134  SHIGGGKSMFVNAATWGLVVTGKLTSTVNDRSLSAALTRLIARAGEPVIRRGVDMAMRMM 193

Query: 192  GEQFVTGQTIQEALTNARTMEAEGFRYSYDMLGEAALTAEDAARYYADYVNAIHAIGTAS 251
            GEQFVTG+TI+EAL  AR +EA GFRYSYDMLGEAA TA DA RY+ DY  AIHAIG AS
Sbjct: 194  GEQFVTGETIEEALKRARPLEARGFRYSYDMLGEAATTAADAERYFKDYEKAIHAIGKAS 253

Query: 252  AGRGVYEGPGISIKLSAIHPRYSRAQADRVMDELLPRVKALALLAKGYDIGLNIDAEEAD 311
             GRG+Y+GPGISIKLSA+HPRYSR+QA RVM ELLP+VKALA+LAKGYDIGLNIDAEEAD
Sbjct: 254  NGRGIYDGPGISIKLSALHPRYSRSQAGRVMGELLPKVKALAVLAKGYDIGLNIDAEEAD 313

Query: 312  RLELSLDLMESLCFDPDLAGWNGIGFVVQAYGKRCPYVIDFLIDLARRSGHRLMIRLVKG 371
            RLELSLDL+E LCF P+LAGWNG+GFVVQAYGKRCP+V+D++IDLARRSG R+M+RLVKG
Sbjct: 314  RLELSLDLLEELCFAPELAGWNGLGFVVQAYGKRCPFVLDYIIDLARRSGRRMMVRLVKG 373

Query: 372  AYWDSEIKRAQLDGLPDFPVYTRKVYTDVSYVACARKLLAAPEAVFPQFATHNAQTLATI 431
            AYWD+EIKRAQLDGL D+PVYTRK+YTDV+Y+ACARKLLAA +AVFPQFATHNAQTLATI
Sbjct: 374  AYWDAEIKRAQLDGLDDYPVYTRKIYTDVAYIACARKLLAAADAVFPQFATHNAQTLATI 433

Query: 432  YEMAGSDFQVGKYEFQCLHGMGEPLYKEVVG--PLKRPCRIYAPVGTHETLLAYLVRRLL 489
            Y +AG DF VGKYEFQCLHGMGEPLY EVVG   L RPCRIYAPVGTHETLLAYLVRRLL
Sbjct: 434  YHLAGPDFGVGKYEFQCLHGMGEPLYDEVVGKEKLDRPCRIYAPVGTHETLLAYLVRRLL 493

Query: 490  ENGANSSFVNRIADPAVPVDELVADPVAVARAIAPTGAPHALIALPRNLYAPERANSAGI 549
            ENGANSSFV+RI+DP V V+ L+ADP     A+   GAPH  IA P+ LY   RANS+G+
Sbjct: 494  ENGANSSFVHRISDPNVSVEALIADPAETVAAMPVVGAPHVQIAAPKALYGSARANSSGL 553

Query: 550  DLSDETELARLSAALSASAEMTWTAAPLLADGERAGQAQPVRNPADRRDVVGSVTEASEA 609
            DLS E  L+ L+  L+++A   W A P+LADG   G  + V NPAD RDVVG+VTE    
Sbjct: 554  DLSSEATLSDLAQTLASTAATPWHALPILADGSTDGVTREVLNPADHRDVVGTVTELKVE 613

Query: 610  LVAEAFGHAVAAASAWAATPPEERAASLFRAADTMQERMPTLLGLIVREAGKSLPNAIAE 669
              A     A   A  WAA PP ERAA L RAAD MQ R+  L+G+I+REAGKS  NA+ E
Sbjct: 614  EAARVVAMAAEYAPQWAAVPPAERAACLDRAADIMQARIKVLMGIIMREAGKSAANAVGE 673

Query: 670  VREAIDFLRYYGAQVRDRFDNATHRPLGPVVCISPWNFPLAIFSGQIAAALAAGNPVLAK 729
            VREA+DFLRYY  Q R      +H PLGP+VCISPWNFPLAIF+GQ+AAAL AGNPVLAK
Sbjct: 674  VREAVDFLRYYADQARKTL-GPSHLPLGPIVCISPWNFPLAIFTGQVAAALVAGNPVLAK 732

Query: 730  PAEETPLIAAEAVRILHAAGIPAGALQLLPGAGEVGAALVGHEAVRGVMFTGSTEVARLI 789
            PA  TP+IA+E+V+ILH AG+P GALQ +PG+G +GA +VG +   GVMFTGSTEVAR+I
Sbjct: 733  PAGVTPIIASESVKILHEAGVPVGALQFVPGSGRLGAGMVGAQQTAGVMFTGSTEVARMI 792

Query: 790  QRQLAGRLLPDGAPIPLIAETGGQNAMIVDSSALAEQVVGDVIASAFDSAGQRCSALRIL 849
            Q QLA RL   G PIPLIAETGGQN MIVDSSALAEQVV DV+ SAFDSAGQRCSALR+L
Sbjct: 793  QAQLAERLSATGKPIPLIAETGGQNGMIVDSSALAEQVVADVVTSAFDSAGQRCSALRVL 852

Query: 850  CLQEDVADRTLAMLKGAMRELRIGNPDRLAVDVGPVISEEARATIAAHIEAMRAKGRNVE 909
            CLQ+DVADRTLAMLKGA REL IG  DRL++DVGPVI++ A+A I  HIEAMR  GR VE
Sbjct: 853  CLQDDVADRTLAMLKGAFRELTIGRTDRLSIDVGPVINDGAKAEIDQHIEAMRGAGRKVE 912

Query: 910  FLPLPAETADGTFIAPTVIEIGGIHELEREVFGPVLHVVRFHRDDLDALVDSINATGYGL 969
             LPLP   A GTF+ PT+IEI  + +L +EVFGPVLHVVRF R+ LD L+D INA+GYGL
Sbjct: 913  QLPLPESAAKGTFVPPTIIEIKSLSDLTKEVFGPVLHVVRFKRNGLDRLIDDINASGYGL 972

Query: 970  TFGLHTRIDATIERVTGRIGAGNVYVNRNTIGAVVGVQPFGGHGLSGTGPKAGGPLYLSR 1029
            TFGLHTR+D TI  VT RI AGN+YVNRN IGAVVGVQPFGG GLSGTGPKAGGPLY+ R
Sbjct: 973  TFGLHTRLDETIAHVTSRIKAGNLYVNRNIIGAVVGVQPFGGRGLSGTGPKAGGPLYIGR 1032

Query: 1030 LLSRRPKGWLEFRGPDAARA--AGLAYGEWLRAKGFTAEASRCAGYVARSAIGGGAELNG 1087
            L+ R P    +    D+     A   Y  WL  KG T E     GY +RSA+G   EL G
Sbjct: 1033 LVQRAPVPPQQ----DSVHTDLALRDYIVWLDKKGLTDEGEAARGYASRSALGLERELTG 1088

Query: 1088 PVGERNLYELHGRGRVLLLPQTRTGLLLQLGAVLATGNSAAVDAPPDLAELLRGLPPALA 1147
            PVGERNLY LH RGR+LL+PQT TGL  Q+ A LATGN  AVDA P    +L GLP A+A
Sbjct: 1089 PVGERNLYALHPRGRILLVPQTETGLYRQIAAALATGNHLAVDAGPLSKSVLAGLPAAVA 1148

Query: 1148 ARVRTTADWRDVGPLAAVLVEGDRERVTAINRRVADLPGPILLVQAATAEALAAGRGEGY 1207
            +R+  T+DW   GP +  LVEGDR+RV A+N+++A LPGP+LLVQAAT E L +   E Y
Sbjct: 1149 SRLSWTSDWEKDGPFSGALVEGDRDRVLAVNKKIAALPGPLLLVQAATTEELTSD-PEAY 1207

Query: 1208 DLDLLLNERSVSVNTAAAGGNASLVAM 1234
             L+ LL E S S+NTAAAGGNASL+A+
Sbjct: 1208 CLNWLLEEVSTSINTAAAGGNASLMAI 1234


Lambda     K      H
   0.319    0.136    0.396 

Gapped
Lambda     K      H
   0.267   0.0410    0.140 


Matrix: BLOSUM62
Gap Penalties: Existence: 11, Extension: 1
Number of Sequences: 1
Number of Hits to DB: 3780
Number of extensions: 168
Number of successful extensions: 7
Number of sequences better than 1.0e-02: 1
Number of HSP's gapped: 1
Number of HSP's successfully gapped: 1
Length of query: 1235
Length of database: 1235
Length adjustment: 48
Effective length of query: 1187
Effective length of database: 1187
Effective search space:  1408969
Effective search space used:  1408969
Neighboring words threshold: 11
Window for multiple hits: 40
X1: 16 ( 7.4 bits)
X2: 38 (14.6 bits)
X3: 64 (24.7 bits)
S1: 41 (21.8 bits)
S2: 59 (27.3 bits)

Align candidate WP_011428690.1 RHE_RS28430 (trifunctional transcriptional regulator/proline dehydrogenase/L-glutamate gamma-semialdehyde dehydrogenase)
to HMM TIGR01238 (delta-1-pyrroline-5-carboxylate dehydrogenase (EC 1.2.1.88))

# hmmsearch :: search profile(s) against a sequence database
# HMMER 3.3.1 (Jul 2020); http://hmmer.org/
# Copyright (C) 2020 Howard Hughes Medical Institute.
# Freely distributed under the BSD open source license.
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# query HMM file:                  ../tmp/path.carbon/TIGR01238.hmm
# target sequence database:        /tmp/gapView.3248135.genome.faa
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Query:       TIGR01238  [M=500]
Accession:   TIGR01238
Description: D1pyr5carbox3: delta-1-pyrroline-5-carboxylate dehydrogenase
Scores for complete sequences (score includes all domains):
   --- full sequence ---   --- best 1 domain ---    -#dom-
    E-value  score  bias    E-value  score  bias    exp  N  Sequence                             Description
    ------- ------ -----    ------- ------ -----   ---- --  --------                             -----------
   1.8e-215  702.1   1.6   2.8e-215  701.5   1.6    1.3  1  NCBI__GCF_000092045.1:WP_011428690.1  


Domain annotation for each sequence (and alignments):
>> NCBI__GCF_000092045.1:WP_011428690.1  
   #    score  bias  c-Evalue  i-Evalue hmmfrom  hmm to    alifrom  ali to    envfrom  env to     acc
 ---   ------ ----- --------- --------- ------- -------    ------- -------    ------- -------    ----
   1 !  701.5   1.6  2.8e-215  2.8e-215       2     497 ..     542    1035 ..     541    1038 .. 0.98

  Alignments for each domain:
  == domain 1  score: 701.5 bits;  conditional E-value: 2.8e-215
                             TIGR01238    2 lygegrknslGvdlaneselksleeqllkaaakkfqaapivgekakaegeaqpvknpadrkdivGqvsead 72  
                                            lyg++r ns G+dl+ e +l +l + l ++aa  ++a+pi+  +   +g ++ v npad++d+vG+v+e +
  NCBI__GCF_000092045.1:WP_011428690.1  542 LYGSARANSSGLDLSSEATLSDLAQTLASTAATPWHALPIL-ADGSTDGVTREVLNPADHRDVVGTVTELK 611 
                                            8****************************************.6667899999******************* PP

                             TIGR01238   73 aaevqeavdsavaafaewsatdakeraailerladlleshmpelvallvreaGktlsnaiaevreavdflr 143 
                                             +e+ + v  a + +++w a++++eraa+l+r+ad+++ ++  l+++++reaGk+  na+ evreavdflr
  NCBI__GCF_000092045.1:WP_011428690.1  612 VEEAARVVAMAAEYAPQWAAVPPAERAACLDRAADIMQARIKVLMGIIMREAGKSAANAVGEVREAVDFLR 682 
                                            *********************************************************************** PP

                             TIGR01238  144 yyakqvedvldeesakalGavvcispwnfplaiftGqiaaalaaGntviakpaeqtsliaaravellqeaG 214 
                                            yya+q++++l+   + +lG++vcispwnfplaiftGq+aaal+aGn v+akpa  t++ia+++v++l+eaG
  NCBI__GCF_000092045.1:WP_011428690.1  683 YYADQARKTLGPS-HLPLGPIVCISPWNFPLAIFTGQVAAALVAGNPVLAKPAGVTPIIASESVKILHEAG 752 
                                            **********987.9******************************************************** PP

                             TIGR01238  215 vpagviqllpGrGedvGaaltsderiaGviftGstevarlinkalakredap...vpliaetGGqnamivd 282 
                                            vp g++q++pG G  +Ga +   ++ aGv+ftGstevar+i+ +la+r  a    +pliaetGGqn mivd
  NCBI__GCF_000092045.1:WP_011428690.1  753 VPVGALQFVPGSGR-LGAGMVGAQQTAGVMFTGSTEVARMIQAQLAERLSATgkpIPLIAETGGQNGMIVD 822 
                                            *************9.*********************************9876666**************** PP

                             TIGR01238  283 stalaeqvvadvlasafdsaGqrcsalrvlcvqedvadrvltlikGamdelkvgkpirlttdvGpvidaea 353 
                                            s+alaeqvvadv++safdsaGqrcsalrvlc+q+dvadr+l ++kGa  el +g+  rl  dvGpvi++ a
  NCBI__GCF_000092045.1:WP_011428690.1  823 SSALAEQVVADVVTSAFDSAGQRCSALRVLCLQDDVADRTLAMLKGAFRELTIGRTDRLSIDVGPVINDGA 893 
                                            *********************************************************************** PP

                             TIGR01238  354 kqnllahiekmkakakkvaqvkleddvesekgtfvaptlfelddldelkkevfGpvlhvvrykadeldkvv 424 
                                            k ++ +hie+m++ ++kv q+ l +   + kgtfv+pt++e+++l++l kevfGpvlhvvr+k++ ld+++
  NCBI__GCF_000092045.1:WP_011428690.1  894 KAEIDQHIEAMRGAGRKVEQLPLPE--SAAKGTFVPPTIIEIKSLSDLTKEVFGPVLHVVRFKRNGLDRLI 962 
                                            ***********************99..789***************************************** PP

                             TIGR01238  425 dkinakGygltlGvhsrieetvrqiekrakvGnvyvnrnlvGavvGvqpfGGeGlsGtGpkaGGplylyrl 495 
                                            d ina+Gyglt+G+h+r +et++++++r+k+Gn+yvnrn++GavvGvqpfGG+GlsGtGpkaGGply+ rl
  NCBI__GCF_000092045.1:WP_011428690.1  963 DDINASGYGLTFGLHTRLDETIAHVTSRIKAGNLYVNRNIIGAVVGVQPFGGRGLSGTGPKAGGPLYIGRL 1033
                                            **********************************************************************9 PP

                             TIGR01238  496 tr 497 
                                            ++
  NCBI__GCF_000092045.1:WP_011428690.1 1034 VQ 1035
                                            97 PP



Internal pipeline statistics summary:
-------------------------------------
Query model(s):                            1  (500 nodes)
Target sequences:                          1  (1235 residues searched)
Passed MSV filter:                         1  (1); expected 0.0 (0.02)
Passed bias filter:                        1  (1); expected 0.0 (0.02)
Passed Vit filter:                         1  (1); expected 0.0 (0.001)
Passed Fwd filter:                         1  (1); expected 0.0 (1e-05)
Initial search space (Z):                  1  [actual number of targets]
Domain search space  (domZ):               1  [number of targets reported over threshold]
# CPU time: 0.01u 0.01s 00:00:00.02 Elapsed: 00:00:00.01
# Mc/sec: 44.27
//
[ok]

This GapMind analysis is from Apr 09 2024. The underlying query database was built on Sep 17 2021.

Links

Downloads

Related tools

About GapMind

Each pathway is defined by a set of rules based on individual steps or genes. Candidates for each step are identified by using ublast (a fast alternative to protein BLAST) against a database of manually-curated proteins (most of which are experimentally characterized) or by using HMMer with enzyme models (usually from TIGRFam). Ublast hits may be split across two different proteins.

A candidate for a step is "high confidence" if either:

where "other" refers to the best ublast hit to a sequence that is not annotated as performing this step (and is not "ignored").

Otherwise, a candidate is "medium confidence" if either:

Other blast hits with at least 50% coverage are "low confidence."

Steps with no high- or medium-confidence candidates may be considered "gaps." For the typical bacterium that can make all 20 amino acids, there are 1-2 gaps in amino acid biosynthesis pathways. For diverse bacteria and archaea that can utilize a carbon source, there is a complete high-confidence catabolic pathway (including a transporter) just 38% of the time, and there is a complete medium-confidence pathway 63% of the time. Gaps may be due to:

GapMind relies on the predicted proteins in the genome and does not search the six-frame translation. In most cases, you can search the six-frame translation by clicking on links to Curated BLAST for each step definition (in the per-step page).

For more information, see:

If you notice any errors or omissions in the step descriptions, or any questionable results, please let us know

by Morgan Price, Arkin group, Lawrence Berkeley National Laboratory