GapMind for Amino acid biosynthesis

 

Alignments for a candidate for trpE in Methanosarcina acetivorans C2A

Align Anthranilate synthase component 1 2; EC 4.1.3.27; Anthranilate synthase component I 2 (uncharacterized)
to candidate WP_011022927.1 MA_RS15665 anthranilate synthase component I

Query= curated2:Q5V213
         (536 letters)



>NCBI__GCF_000007345.1:WP_011022927.1
          Length = 560

 Score =  404 bits (1037), Expect = e-117
 Identities = 248/576 (43%), Positives = 328/576 (56%), Gaps = 87/576 (15%)

Query: 1   MTLDISREEFVEHAKA-DRPVVVRTAAELD---VDVEPLTAYAALTGRTSDVAANDYTFL 56
           ++ D+ +EEF+E     ++P +V+  A+++       PL  Y AL G         Y++L
Sbjct: 2   LSFDLGKEEFLELVSGLEKPGLVQLFAKVEGCSPACSPLELYGALRGS----GTTGYSYL 57

Query: 57  LESAEKVASSDPDGAFAPETDDRHARFSFVGYDPRAVVTVTGDESEVEAFDDRYADLVTT 116
           LES EK  S               AR+SFVG DP AV+ +   +  +E  + + + L   
Sbjct: 58  LESVEKQES--------------RARYSFVGNDPDAVLKINDRKISLELLNPKASPLFEA 103

Query: 117 ----------------------------------DGGDVVDDLRAAMPD---VALRNFPA 139
                                              G DV D LR A P    + L N   
Sbjct: 104 ICTKMEEVCGPETAEKENESKKNAGPEKFTAAIPRGKDVFDALRLAFPPANGIELLNSRR 163

Query: 140 MDRQHLEGGLVGFLSYDAVYDLWLDEVGLDRP-DSRFPDAQFVLTTSTVRFDHVEDTVSL 198
             RQ   GG +G+ +YDA+YD WL   G+++  +S  PD Q++L + +   DH+ + V +
Sbjct: 164 FARQTFLGGAIGYTAYDAIYDSWL---GVEKGFESDIPDLQYLLVSKSFVLDHLTEEVYI 220

Query: 199 VFTPVVRQGEDAGERYGELVAEAERVEAVLS------DLSPLSTGGFRREDEVAGP---- 248
           V TP V  G DA + Y E ++EAER  +V+       D +  + G      +V+GP    
Sbjct: 221 VLTPFVSPGSDAEQVYEEALSEAERFYSVIKKATQPEDAAKAAEGIIASGTDVSGPAKAP 280

Query: 249 ----------RDEYEDAVERAKEYVLSGDIYQGVISRTRELYGDVDPLGFYEALRAVNPS 298
                     R  +E++V +AKE++ +GDI+Q V+SR  E   +  P   Y  LRA+NPS
Sbjct: 281 NSNVQVCSVDRSGFEESVLQAKEHIFAGDIFQVVLSRKCEFKMEQSPFELYIQLRAINPS 340

Query: 299 PYMYLLGYDDLTIVGASPETLVSVAGDHVVSNPIAGTCPRGNSPVEDRRLAGEMLADGKE 358
           PYMY+  + DL IVGASPETL++V    V+ NPIAGTCPRG S  ED  LA  ML D KE
Sbjct: 341 PYMYIFEFGDLAIVGASPETLLTVHKRTVIINPIAGTCPRGKSEAEDETLASHMLNDEKE 400

Query: 359 RAEHTMLVDLARNDVRRVAEAGSVRVPEFMNVLKYSHVQHIESTVTGRLAEDKDAFDAAR 418
           RAEH MLVDL RNDVR V+E+GSV+V  FM VLKYSHVQHIESTV+G L  + D FDA R
Sbjct: 401 RAEHVMLVDLGRNDVRMVSESGSVKVSGFMKVLKYSHVQHIESTVSGTLRPECDQFDAFR 460

Query: 419 ATFPAGTLSGAPKIRAMEIIDELERSPRGPYGGGVGYFDWDGDTDFAIVIRSATVEDEGD 478
           A FPAGTLSGAPKIRAMEII E E  PRG YGGGVGY+ W+GD DFAIVIR+  ++    
Sbjct: 461 AVFPAGTLSGAPKIRAMEIISEREAVPRGIYGGGVGYYSWNGDADFAIVIRTLLIQGR-- 518

Query: 479 RDRITVQAGAGIVADSDPESEYVETEQKMDGVLTAL 514
             + +VQAGAGIVADSDP  E+ ET++KM  +LTA+
Sbjct: 519 --KASVQAGAGIVADSDPAYEFRETDRKMAAMLTAI 552


Lambda     K      H
   0.315    0.135    0.383 

Gapped
Lambda     K      H
   0.267   0.0410    0.140 


Matrix: BLOSUM62
Gap Penalties: Existence: 11, Extension: 1
Number of Sequences: 1
Number of Hits to DB: 814
Number of extensions: 46
Number of successful extensions: 7
Number of sequences better than 1.0e-02: 1
Number of HSP's gapped: 1
Number of HSP's successfully gapped: 1
Length of query: 536
Length of database: 560
Length adjustment: 36
Effective length of query: 500
Effective length of database: 524
Effective search space:   262000
Effective search space used:   262000
Neighboring words threshold: 11
Window for multiple hits: 40
X1: 16 ( 7.3 bits)
X2: 38 (14.6 bits)
X3: 64 (24.7 bits)
S1: 42 (22.0 bits)
S2: 53 (25.0 bits)

Align candidate WP_011022927.1 MA_RS15665 (anthranilate synthase component I)
to HMM TIGR01820 (trpE: anthranilate synthase component I (EC 4.1.3.27))

# hmmsearch :: search profile(s) against a sequence database
# HMMER 3.3.1 (Jul 2020); http://hmmer.org/
# Copyright (C) 2020 Howard Hughes Medical Institute.
# Freely distributed under the BSD open source license.
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# query HMM file:                  ../tmp/path.aa/TIGR01820.hmm
# target sequence database:        /tmp/gapView.3275186.genome.faa
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Query:       TIGR01820  [M=449]
Accession:   TIGR01820
Description: TrpE-arch: anthranilate synthase component I
Scores for complete sequences (score includes all domains):
   --- full sequence ---   --- best 1 domain ---    -#dom-
    E-value  score  bias    E-value  score  bias    exp  N  Sequence                             Description
    ------- ------ -----    ------- ------ -----   ---- --  --------                             -----------
   2.7e-212  691.9   0.0   3.1e-212  691.7   0.0    1.0  1  NCBI__GCF_000007345.1:WP_011022927.1  


Domain annotation for each sequence (and alignments):
>> NCBI__GCF_000007345.1:WP_011022927.1  
   #    score  bias  c-Evalue  i-Evalue hmmfrom  hmm to    alifrom  ali to    envfrom  env to     acc
 ---   ------ ----- --------- --------- ------- -------    ------- -------    ------- -------    ----
   1 !  691.7   0.0  3.1e-212  3.1e-212       1     449 []      39     553 ..      39     553 .. 0.97

  Alignments for each domain:
  == domain 1  score: 691.7 bits;  conditional E-value: 3.1e-212
                             TIGR01820   1 Plelykalrk..eseysflLesvekqskkaryslvgaspeavvkiner.........kavelfeeivskvkkl 62 
                                           Plely alr   +++ys+lLesvekq+++arys+vg++p+av+kin+r         ka+ lfe+i++k++++
  NCBI__GCF_000007345.1:WP_011022927.1  39 PLELYGALRGsgTTGYSYLLESVEKQESRARYSFVGNDPDAVLKINDRkislellnpKASPLFEAICTKMEEV 111
                                           99*******977888********************************************************** PP

                             TIGR01820  63 eg.......kkkae................gkdvldalrkalkklkeiellee...erqtflGglvGyvaYda 109
                                           +g       +   e                gkdv+dalr a++++++iell++   +rqtflGg++Gy+aYda
  NCBI__GCF_000007345.1:WP_011022927.1 112 CGpetaekeN---EskknagpekftaaiprGKDVFDALRLAFPPANGIELLNSrrfARQTFLGGAIGYTAYDA 181
                                           **76654430...1556678999***************************99988899*************** PP

                             TIGR01820 110 vrdywedaekekeseipeaefllvtkvlvfdhleeevslvvteevsad............eaekiveklkeae 170
                                           ++d+w+++ek +es+ip++++llv+k++v+dhl+eev++v t++vs+             eae++++++k+a+
  NCBI__GCF_000007345.1:WP_011022927.1 182 IYDSWLGVEKGFESDIPDLQYLLVSKSFVLDHLTEEVYIVLTPFVSPGsdaeqvyeealsEAERFYSVIKKAT 254
                                           **********************************************9999*********************** PP

                             TIGR01820 171 keeeekkeaeleslaekee....................feeavekakekifeGdifqvvlSrklelrldldp 223
                                           ++e+++k+ae   +++++                     fee+v +ake+if+GdifqvvlSrk+e++++++p
  NCBI__GCF_000007345.1:WP_011022927.1 255 QPEDAAKAAEGIIASGTDVsgpakapnsnvqvcsvdrsgFEESVLQAKEHIFAGDIFQVVLSRKCEFKMEQSP 327
                                           *********999999855567788889999999999999********************************** PP

                             TIGR01820 224 lelYaklreiNPSPYmyllefgdraivGaSPEtlvrvekrtveinPiAGtapRgkseeeDeelakelLsdeKe 296
                                           +elY +lr+iNPSPYmy++efgd+aivGaSPEtl++v+krtv+inPiAGt+pRgkse+eDe+la+++L+deKe
  NCBI__GCF_000007345.1:WP_011022927.1 328 FELYIQLRAINPSPYMYIFEFGDLAIVGASPETLLTVHKRTVIINPIAGTCPRGKSEAEDETLASHMLNDEKE 400
                                           ************************************************************************* PP

                             TIGR01820 297 rAEHvmLvDLaRNDvrkvsesgsvkvsefmkvlkyshvqHieSevvgtLkkeadafdalkAvfPAGtlsGaPK 369
                                           rAEHvmLvDL+RNDvr+vsesgsvkvs+fmkvlkyshvqHieS+v+gtL++e+d+fda++AvfPAGtlsGaPK
  NCBI__GCF_000007345.1:WP_011022927.1 401 RAEHVMLVDLGRNDVRMVSESGSVKVSGFMKVLKYSHVQHIESTVSGTLRPECDQFDAFRAVFPAGTLSGAPK 473
                                           ************************************************************************* PP

                             TIGR01820 370 irAmeiieelEkepRgvYgGgvGyfslngdadlAiviRtaliekkklriqaGAGivaDSdPekEfeEterKmk 442
                                           irAmeii+e E++pRg+YgGgvGy+s+ngdad+AiviRt+li+++k+++qaGAGivaDSdP++Ef+Et+rKm+
  NCBI__GCF_000007345.1:WP_011022927.1 474 IRAMEIISEREAVPRGIYGGGVGYYSWNGDADFAIVIRTLLIQGRKASVQAGAGIVADSDPAYEFRETDRKMA 546
                                           ************************************************************************* PP

                             TIGR01820 443 avlkaig 449
                                           a+l+aig
  NCBI__GCF_000007345.1:WP_011022927.1 547 AMLTAIG 553
                                           *****96 PP



Internal pipeline statistics summary:
-------------------------------------
Query model(s):                            1  (449 nodes)
Target sequences:                          1  (560 residues searched)
Passed MSV filter:                         1  (1); expected 0.0 (0.02)
Passed bias filter:                        1  (1); expected 0.0 (0.02)
Passed Vit filter:                         1  (1); expected 0.0 (0.001)
Passed Fwd filter:                         1  (1); expected 0.0 (1e-05)
Initial search space (Z):                  1  [actual number of targets]
Domain search space  (domZ):               1  [number of targets reported over threshold]
# CPU time: 0.00u 0.00s 00:00:00.00 Elapsed: 00:00:00.00
# Mc/sec: 32.89
//
[ok]

This GapMind analysis is from Jul 25 2024. The underlying query database was built on Jul 25 2024.

Links

Downloads

Related tools

About GapMind

Each pathway is defined by a set of rules based on individual steps or genes. Candidates for each step are identified by using ublast (a fast alternative to protein BLAST) against a database of manually-curated proteins (most of which are experimentally characterized) or by using HMMer with enzyme models (usually from TIGRFam). Ublast hits may be split across two different proteins.

A candidate for a step is "high confidence" if either:

where "other" refers to the best ublast hit to a sequence that is not annotated as performing this step (and is not "ignored").

Otherwise, a candidate is "medium confidence" if either:

Other blast hits with at least 50% coverage are "low confidence."

Steps with no high- or medium-confidence candidates may be considered "gaps." For the typical bacterium that can make all 20 amino acids, there are 1-2 gaps in amino acid biosynthesis pathways. For diverse bacteria and archaea that can utilize a carbon source, there is a complete high-confidence catabolic pathway (including a transporter) just 38% of the time, and there is a complete medium-confidence pathway 63% of the time. Gaps may be due to:

GapMind relies on the predicted proteins in the genome and does not search the six-frame translation. In most cases, you can search the six-frame translation by clicking on links to Curated BLAST for each step definition (in the per-step page).

For more information, see:

If you notice any errors or omissions in the step descriptions, or any questionable results, please let us know

by Morgan Price, Arkin group, Lawrence Berkeley National Laboratory