GapMind for Amino acid biosynthesis

 

Aligments for a candidate for carB in Cupriavidus basilensis 4G11

Align carbamoyl-phosphate synthase (glutamine-hydrolysing) (EC 6.3.5.5) (characterized)
to candidate RR42_RS13510 RR42_RS13510 carbamoyl phosphate synthase large subunit

Query= BRENDA::P00968
         (1073 letters)



>FitnessBrowser__Cup4G11:RR42_RS13510
          Length = 1082

 Score = 1449 bits (3750), Expect = 0.0
 Identities = 750/1087 (68%), Positives = 867/1087 (79%), Gaps = 21/1087 (1%)

Query: 1    MPKRTDIKSILILGAGPIVIGQACEFDYSGAQACKALREEGYRVILVNSNPATIMTDPEM 60
            MPKRTDIKSILI+GAGPI+IGQACEFDYSGAQACKALREEG++V+LVNSNPATIMTDP  
Sbjct: 1    MPKRTDIKSILIIGAGPIIIGQACEFDYSGAQACKALREEGFKVVLVNSNPATIMTDPNT 60

Query: 61   ADATYIEPIHWEVVRKIIEKERPDAVLPTMGGQTALNCALELERQGVLEEFGVTMIGATA 120
            AD TYIEPI WEVV +II KERPDA+LPTMGGQTALNCAL+L R GVL ++ V +IGA+ 
Sbjct: 61   ADVTYIEPITWEVVERIIAKERPDAILPTMGGQTALNCALDLHRHGVLAKYNVELIGASP 120

Query: 121  DAIDKAEDRRRFDVAMKKIGLETARSGIAHTMEEALAVAADV-------GFPCIIRPSFT 173
            +AIDKAEDR++F  AM KIGL +A+SGIAH+MEEALAV   +       G+P +IRPSFT
Sbjct: 121  EAIDKAEDRQKFKEAMTKIGLGSAKSGIAHSMEEALAVQTQIAKETATGGYPIVIRPSFT 180

Query: 174  MGGSGGGIAYNREEFEEICARGLDLSPTKELLIDESLIGWKEYEMEVVRDKNDNCIIVCS 233
            +GGSGGGIAYNREEFEEIC RGLDLSPT+ELLI+ESL+GWKEYEMEVVRDK DNCII+CS
Sbjct: 181  LGGSGGGIAYNREEFEEICKRGLDLSPTRELLIEESLLGWKEYEMEVVRDKKDNCIIICS 240

Query: 234  IENFDAMGIHTGDSITVAPAQTLTDKEYQIMRNASMAVLREIGVETGGSNVQFAVNPKNG 293
            IEN D MGIHTGDSITVAPAQTLTDKEYQI+RNAS+AVLREIGV+TGGSNVQF++NP +G
Sbjct: 241  IENLDPMGIHTGDSITVAPAQTLTDKEYQILRNASLAVLREIGVDTGGSNVQFSINPADG 300

Query: 294  RLIVIEMNPRVSRSSALASKATGFPIAKVAAKLAVGYTLDELMNDITGGRTPASFEPSID 353
            R+IVIEMNPRVSRSSALASKATGFPIAKVAAKLAVGYTLDEL N+ITGG TPASFEPSID
Sbjct: 301  RMIVIEMNPRVSRSSALASKATGFPIAKVAAKLAVGYTLDELKNEITGGATPASFEPSID 360

Query: 354  YVVTKIPRFNFEKFAGANDRLTTQMKSVGEVMAIGRTQQESLQKALRGLEVGATGFDPKV 413
            YVVTK+PRF FEKF  A+  LTTQMKSVGEVMA+GRT QES QKALRGLEVG  G D K 
Sbjct: 361  YVVTKVPRFAFEKFPQADSHLTTQMKSVGEVMAMGRTFQESFQKALRGLEVGVDGLDDKS 420

Query: 414  SLDDPEALTKIRRELKDAGADRIWYIADAFRAGLSVDGVFNLTNIDRWFLVQIEELVRLE 473
            +  D     +I  E+ +AG DRIWY+ DAFR G+S++ V   T ID WFL QIE++V+ E
Sbjct: 421  TDRD-----EIVEEIGEAGPDRIWYVGDAFRIGMSLEEVHAETAIDPWFLAQIEDIVKTE 475

Query: 474  EKVAEVGITGLNADFLRQLKRKGFADARLAKLAGVREAEIRKLRDQYDLHPVYKRVDTCA 533
              V    +  L+A  LR LK+KGF+D RLAKL G   A +R  R    + PVYKRVDTCA
Sbjct: 476  TLVKARKLDSLSAAELRHLKQKGFSDRRLAKLMGAEPAAVRVARHAAGVRPVYKRVDTCA 535

Query: 534  AEFATDTAYMYSTYEE---ECEANPSTDREKIMVLGGGPNRIGQGIEFDYCCVHASLALR 590
            AEFAT+TAYMY TYE    ECEA+P+ +R KIMVLGGGPNRIGQGIEFDYCCVHA+LALR
Sbjct: 536  AEFATNTAYMYGTYEAEHGECEADPTANR-KIMVLGGGPNRIGQGIEFDYCCVHAALALR 594

Query: 591  EDGYETIMVNCNPETVSTDYDTSDRLYFEPVTLEDVLEIVRIEKPKGVIVQYGGQTPLKL 650
            EDGYETIMVNCNPETVSTDYDTSDRLYFEP+TLEDVLEIV  EKP GVIVQYGGQTPLKL
Sbjct: 595  EDGYETIMVNCNPETVSTDYDTSDRLYFEPLTLEDVLEIVDREKPVGVIVQYGGQTPLKL 654

Query: 651  ARALEAAGVPVIGTSPDAIDRAEDRERFQHAVERLKLKQPANATVTAIEMAVEKAKEIGY 710
            A  LEA GVP+IGTSPD ID AEDRERFQ  +  L L+QP N T  A + A+  A EIGY
Sbjct: 655  ALDLEANGVPIIGTSPDMIDAAEDRERFQKLLHELGLRQPPNRTARAEDEALRLATEIGY 714

Query: 711  PLVVRPSYVLGGRAMEIVYDEADLRRYFQTAVSVSNDAPVLLDHFLDDAVEVDVDAICDG 770
            PLVVRPSYVLGGRAMEIV++  DL RY + AV VS+D+PVLLD FL+DA+E DVDA+CDG
Sbjct: 715  PLVVRPSYVLGGRAMEIVHEPRDLERYMREAVKVSHDSPVLLDRFLNDAIECDVDALCDG 774

Query: 771  EMVLIGGIMEHIEQAGVHSGDSACSLPAYTLSQEIQDVMRQQVQKLAFELQVRGLMNVQF 830
            + V IGG+MEHIEQAGVHSGDSACSLP Y+L+Q   D +++Q   +A  L V GLMNVQF
Sbjct: 775  QRVFIGGVMEHIEQAGVHSGDSACSLPPYSLAQATVDELKRQTAAMAKALNVIGLMNVQF 834

Query: 831  AVK--NNE--VYLIEVNPRAARTVPFVSKATGVPLAKVAARVMAGKSLAEQGVTKEVIPP 886
            A++  N E  VY++EVNPRA+RTVP+VSKATG+ LAK+AAR MAG++L  QGV  EV+PP
Sbjct: 835  AIQQVNGEDIVYVLEVNPRASRTVPYVSKATGLSLAKIAARCMAGQTLDSQGVFDEVVPP 894

Query: 887  YYSVKEVVLPFNKFPGVDPLLGPEMRSTGEVMGVGRTFAEAFAKAQLGSNSTMKKHGRAL 946
            Y+SVKE V PFNKFPGVDP+LGPEMRSTGEVMGVG+TF EA  K+QL + S + + G  L
Sbjct: 895  YFSVKEAVFPFNKFPGVDPVLGPEMRSTGEVMGVGKTFGEALFKSQLAAGSRLPEKGTVL 954

Query: 947  LSVREGDKERVVDLAAKLLKQGFELDATHGTAIVLGEAGINPRLVNKVHEGRPHIQDRIK 1006
            L+V++ DK   V +A  L   G+ + AT GTA  +  AGI  R+VNKV +GRPHI D +K
Sbjct: 955  LTVKDSDKPHAVGVARMLHDMGYPIVATRGTASAIEAAGIPVRVVNKVKDGRPHIVDMLK 1014

Query: 1007 NGEYTYIINTT-SGRRAIEDSRVIRRSALQYKVHYDTTLNGGFATAMALNADATEKVISV 1065
            NGE   +  T    R AI DSR IR SAL  +V Y TT+ G  A    L    + +V  +
Sbjct: 1015 NGELALVFTTVDETRTAIADSRSIRISALASRVPYYTTIAGARAAVEGLKHMQSLEVYDL 1074

Query: 1066 QEMHAQI 1072
            Q +HA +
Sbjct: 1075 QSLHASL 1081


Lambda     K      H
   0.318    0.135    0.383 

Gapped
Lambda     K      H
   0.267   0.0410    0.140 


Matrix: BLOSUM62
Gap Penalties: Existence: 11, Extension: 1
Number of Sequences: 1
Number of Hits to DB: 3119
Number of extensions: 137
Number of successful extensions: 20
Number of sequences better than 1.0e-02: 1
Number of HSP's gapped: 1
Number of HSP's successfully gapped: 1
Length of query: 1073
Length of database: 1082
Length adjustment: 46
Effective length of query: 1027
Effective length of database: 1036
Effective search space:  1063972
Effective search space used:  1063972
Neighboring words threshold: 11
Window for multiple hits: 40
X1: 16 ( 7.3 bits)
X2: 38 (14.6 bits)
X3: 64 (24.7 bits)
S1: 41 (21.7 bits)
S2: 58 (26.9 bits)

Align candidate RR42_RS13510 RR42_RS13510 (carbamoyl phosphate synthase large subunit)
to HMM TIGR01369 (carB: carbamoyl-phosphate synthase, large subunit (EC 6.3.5.5))

# hmmsearch :: search profile(s) against a sequence database
# HMMER 3.3.1 (Jul 2020); http://hmmer.org/
# Copyright (C) 2020 Howard Hughes Medical Institute.
# Freely distributed under the BSD open source license.
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# query HMM file:                  ../tmp/path.aa/TIGR01369.hmm
# target sequence database:        /tmp/gapView.23177.genome.faa
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Query:       TIGR01369  [M=1052]
Accession:   TIGR01369
Description: CPSaseII_lrg: carbamoyl-phosphate synthase, large subunit
Scores for complete sequences (score includes all domains):
   --- full sequence ---   --- best 1 domain ---    -#dom-
    E-value  score  bias    E-value  score  bias    exp  N  Sequence                                 Description
    ------- ------ -----    ------- ------ -----   ---- --  --------                                 -----------
          0 1533.3   0.0          0 1533.2   0.0    1.0  1  lcl|FitnessBrowser__Cup4G11:RR42_RS13510  RR42_RS13510 carbamoyl phosphate


Domain annotation for each sequence (and alignments):
>> lcl|FitnessBrowser__Cup4G11:RR42_RS13510  RR42_RS13510 carbamoyl phosphate synthase large subunit
   #    score  bias  c-Evalue  i-Evalue hmmfrom  hmm to    alifrom  ali to    envfrom  env to     acc
 ---   ------ ----- --------- --------- ------- -------    ------- -------    ------- -------    ----
   1 ! 1533.2   0.0         0         0       1    1050 [.       2    1061 ..       2    1063 .. 0.98

  Alignments for each domain:
  == domain 1  score: 1533.2 bits;  conditional E-value: 0
                                 TIGR01369    1 pkredikkvlviGsGpivigqAaEFDYsGsqalkalkeegievvLvnsniAtvmtdeeladkvYieP 67  
                                                pkr+dik++l+iG+Gpi+igqA+EFDYsG+qa+kal+eeg++vvLvnsn+At+mtd++ ad +YieP
  lcl|FitnessBrowser__Cup4G11:RR42_RS13510    2 PKRTDIKSILIIGAGPIIIGQACEFDYSGAQACKALREEGFKVVLVNSNPATIMTDPNTADVTYIEP 68  
                                                689**************************************************************** PP

                                 TIGR01369   68 ltveavekiiekErpDailltlGGqtaLnlaveleekGvLekygvkllGtkveaikkaedRekFkea 134 
                                                +t+e+ve+ii kErpDail+t+GGqtaLn+a++l+++GvL+ky+v+l+G++ eai+kaedR+kFkea
  lcl|FitnessBrowser__Cup4G11:RR42_RS13510   69 ITWEVVERIIAKERPDAILPTMGGQTALNCALDLHRHGVLAKYNVELIGASPEAIDKAEDRQKFKEA 135 
                                                ******************************************************************* PP

                                 TIGR01369  135 lkeineevakseivesveealeaaeei.......gyPvivRaaftlgGtGsgiaeneeelkelveka 194 
                                                +++i++  aks i++s+eeal+++++i       gyP+++R++ftlgG+G+gia+n+ee++e+++++
  lcl|FitnessBrowser__Cup4G11:RR42_RS13510  136 MTKIGLGSAKSGIAHSMEEALAVQTQIaketatgGYPIVIRPSFTLGGSGGGIAYNREEFEEICKRG 202 
                                                ********************999887766666668******************************** PP

                                 TIGR01369  195 lkaspikqvlvekslagwkEiEyEvvRDskdnciivcniEnlDplGvHtGdsivvaPsqtLtdkeyq 261 
                                                l++sp++++l+e+sl gwkE+E+EvvRD+kdncii+c+iEnlDp+G+HtGdsi+vaP+qtLtdkeyq
  lcl|FitnessBrowser__Cup4G11:RR42_RS13510  203 LDLSPTRELLIEESLLGWKEYEMEVVRDKKDNCIIICSIENLDPMGIHTGDSITVAPAQTLTDKEYQ 269 
                                                ******************************************************************* PP

                                 TIGR01369  262 llRdaslkiirelgvege.cnvqfaldPeskryvviEvnpRvsRssALAskAtGyPiAkvaaklavG 327 
                                                +lR+asl+++re+gv+++ +nvqf+++P + r++viE+npRvsRssALAskAtG+PiAkvaaklavG
  lcl|FitnessBrowser__Cup4G11:RR42_RS13510  270 ILRNASLAVLREIGVDTGgSNVQFSINPADGRMIVIEMNPRVSRSSALASKATGFPIAKVAAKLAVG 336 
                                                ***************9988************************************************ PP

                                 TIGR01369  328 ysLdelkndvtk.etvAsfEPslDYvvvkiPrwdldkfekvdrklgtqmksvGEvmaigrtfeealq 393 
                                                y+Ldelkn++t+  t+AsfEPs+DYvv+k+Pr++++kf ++d++l+tqmksvGEvma+grtf+e++q
  lcl|FitnessBrowser__Cup4G11:RR42_RS13510  337 YTLDELKNEITGgATPASFEPSIDYVVTKVPRFAFEKFPQADSHLTTQMKSVGEVMAMGRTFQESFQ 403 
                                                ***********878***************************************************** PP

                                 TIGR01369  394 kalrsleekllglklkekeaesdeeleealkkpndrRlfaiaealrrgvsveevyeltkidrfflek 460 
                                                kalr le ++ gl+ k    ++++e+ e++ ++ ++R++++ +a+r g+s+eev+  t id +fl +
  lcl|FitnessBrowser__Cup4G11:RR42_RS13510  404 KALRGLEVGVDGLDDK---STDRDEIVEEIGEAGPDRIWYVGDAFRIGMSLEEVHAETAIDPWFLAQ 467 
                                                ********99998776...778888999*************************************** PP

                                 TIGR01369  461 lkklvelekeleeeklkelkkellkkakklGfsdeqiaklvkvseaevrklrkelgivpvvkrvDtv 527 
                                                ++++v++e+ ++  kl+ l++ +l+++k++Gfsd+++akl++ + a+vr +r+++g+ pv+krvDt+
  lcl|FitnessBrowser__Cup4G11:RR42_RS13510  468 IEDIVKTETLVKARKLDSLSAAELRHLKQKGFSDRRLAKLMGAEPAAVRVARHAAGVRPVYKRVDTC 534 
                                                ******************************************************************* PP

                                 TIGR01369  528 aaEfeaktpYlYstyeeekddve..vtekkkvlvlGsGpiRigqgvEFDycavhavlalreagykti 592 
                                                aaEf ++t+Y+Y tye+e+ + e   t ++k++vlG+Gp+Rigqg+EFDyc+vha+lalre gy+ti
  lcl|FitnessBrowser__Cup4G11:RR42_RS13510  535 AAEFATNTAYMYGTYEAEHGECEadPTANRKIMVLGGGPNRIGQGIEFDYCCVHAALALREDGYETI 601 
                                                *****************9665541156779************************************* PP

                                 TIGR01369  593 linynPEtvstDydiadrLyFeeltvedvldiiekekvegvivqlgGqtalnlakeleeagvkilGt 659 
                                                ++n+nPEtvstDyd++drLyFe+lt+edvl+i+++ek+ gvivq+gGqt+l+la +le++gv+i+Gt
  lcl|FitnessBrowser__Cup4G11:RR42_RS13510  602 MVNCNPETVSTDYDTSDRLYFEPLTLEDVLEIVDREKPVGVIVQYGGQTPLKLALDLEANGVPIIGT 668 
                                                ******************************************************************* PP

                                 TIGR01369  660 saesidraEdRekFsklldelgikqpkgkeatsveeakeiakeigyPvlvRpsyvlgGrameivene 726 
                                                s++ id aEdRe+F+kll+elg+ qp +++a++ +ea ++a+eigyP++vRpsyvlgGrameiv++ 
  lcl|FitnessBrowser__Cup4G11:RR42_RS13510  669 SPDMIDAAEDRERFQKLLHELGLRQPPNRTARAEDEALRLATEIGYPLVVRPSYVLGGRAMEIVHEP 735 
                                                ******************************************************************* PP

                                 TIGR01369  727 eeleryleeavevskekPvlidkyledavEvdvDavadgeevliagileHiEeaGvHsGDstlvlpp 793 
                                                 +lery++eav+vs+++Pvl+d++l+da+E+dvDa++dg++v+i g++eHiE+aGvHsGDs+++lpp
  lcl|FitnessBrowser__Cup4G11:RR42_RS13510  736 RDLERYMREAVKVSHDSPVLLDRFLNDAIECDVDALCDGQRVFIGGVMEHIEQAGVHSGDSACSLPP 802 
                                                ******************************************************************* PP

                                 TIGR01369  794 qklseevkkkikeivkkiakelkvkGllniqfvvkd....eevyviEvnvRasRtvPfvskalgvpl 856 
                                                 +l + +++++k++++++ak+l+v Gl+n+qf++++    + vyv+Evn+RasRtvP+vska+g++l
  lcl|FitnessBrowser__Cup4G11:RR42_RS13510  803 YSLAQATVDELKRQTAAMAKALNVIGLMNVQFAIQQvngeDIVYVLEVNPRASRTVPYVSKATGLSL 869 
                                                *********************************9875543569************************ PP

                                 TIGR01369  857 vklavkvllgkkleelekgvkkekksklvavkaavfsfsklagvdvvlgpemkstGEvmgigrdlee 923 
                                                +k+a+++++g++l +  +gv  e  + +++vk+avf+f+k+ gvd+vlgpem+stGEvmg+g+++ e
  lcl|FitnessBrowser__Cup4G11:RR42_RS13510  870 AKIAARCMAGQTLDS--QGVFDEVVPPYFSVKEAVFPFNKFPGVDPVLGPEMRSTGEVMGVGKTFGE 934 
                                                **************9..889*********************************************** PP

                                 TIGR01369  924 allkallaskakikkkgsvllsvkdkdkeellelakklaekglkvyategtakvleeagikaevvlk 990 
                                                al+k++la+++++++kg+vll+vkd+dk +++ +a++l+++g+ ++at+gta+++e agi ++vv+k
  lcl|FitnessBrowser__Cup4G11:RR42_RS13510  935 ALFKSQLAAGSRLPEKGTVLLTVKDSDKPHAVGVARMLHDMGYPIVATRGTASAIEAAGIPVRVVNK 1001
                                                ******************************************************************* PP

                                 TIGR01369  991 vseeaekilellkeeeielvinltskkkkaaekgykirreaveykvplvteletaealle 1050
                                                v++ +++i+++lk++e+ lv+++ +++++a  ++ +ir +a+  +vp+ t++++a+a++e
  lcl|FitnessBrowser__Cup4G11:RR42_RS13510 1002 VKDGRPHIVDMLKNGELALVFTTVDETRTAIADSRSIRISALASRVPYYTTIAGARAAVE 1061
                                                *****************************************************9999876 PP



Internal pipeline statistics summary:
-------------------------------------
Query model(s):                            1  (1052 nodes)
Target sequences:                          1  (1082 residues searched)
Passed MSV filter:                         1  (1); expected 0.0 (0.02)
Passed bias filter:                        1  (1); expected 0.0 (0.02)
Passed Vit filter:                         1  (1); expected 0.0 (0.001)
Passed Fwd filter:                         1  (1); expected 0.0 (1e-05)
Initial search space (Z):                  1  [actual number of targets]
Domain search space  (domZ):               1  [number of targets reported over threshold]
# CPU time: 0.10u 0.03s 00:00:00.13 Elapsed: 00:00:00.12
# Mc/sec: 8.82
//
[ok]

This GapMind analysis is from Aug 03 2021. The underlying query database was built on Aug 03 2021.

Links

Downloads

Related tools

About GapMind

Each pathway is defined by a set of rules based on individual steps or genes. Candidates for each step are identified by using ublast (a fast alternative to protein BLAST) against a database of manually-curated proteins (most of which are experimentally characterized) or by using HMMer with enzyme models (usually from TIGRFam). Ublast hits may be split across two different proteins.

A candidate for a step is "high confidence" if either:

where "other" refers to the best ublast hit to a sequence that is not annotated as performing this step (and is not "ignored").

Otherwise, a candidate is "medium confidence" if either:

Other blast hits with at least 50% coverage are "low confidence."

Steps with no high- or medium-confidence candidates may be considered "gaps." For the typical bacterium that can make all 20 amino acids, there are 1-2 gaps in amino acid biosynthesis pathways. For diverse bacteria and archaea that can utilize a carbon source, there is a complete high-confidence catabolic pathway (including a transporter) just 38% of the time, and there is a complete medium-confidence pathway 63% of the time. Gaps may be due to:

GapMind relies on the predicted proteins in the genome and does not search the six-frame translation. In most cases, you can search the six-frame translation by clicking on links to Curated BLAST for each step definition (in the per-step page).

For more information, see the paper from 2019 on GapMind for amino acid biosynthesis, the paper from 2022 on GapMind for carbon sources, or view the source code, or see changes to Amino acid biosynthesis since the publication.

If you notice any errors or omissions in the step descriptions, or any questionable results, please let us know

by Morgan Price, Arkin group, Lawrence Berkeley National Laboratory