Align Malate synthase G (EC 2.3.3.9) (characterized)
to candidate GFF2238 HP15_2188 malate synthase G
Query= reanno::psRCH2:GFF353 (726 letters) >FitnessBrowser__Marino:GFF2238 Length = 726 Score = 1020 bits (2638), Expect = 0.0 Identities = 501/726 (69%), Positives = 593/726 (81%), Gaps = 2/726 (0%) Query: 1 MTERVQVGGLQVAKVLYDFVNNEAIPGTGVDAAAFWAGADSVIHDLAPKNRALLAKRDDL 60 MTERVQVGG+QVAK LYDFVNNEAIPGTG+DA FWA D ++++LAP+NR LLAKRD + Sbjct: 1 MTERVQVGGIQVAKNLYDFVNNEAIPGTGIDADKFWAEFDKIVNELAPRNRELLAKRDAI 60 Query: 61 QAQIDAWHQARAGQAHDAVAYKSFLQEIGYLLPEAEDFQATTENVDEEIARMAGPQLVVP 120 Q ++D+W++ GQ D YKSFL++IGYL+ E +F+ +T NVD E+A MAGPQLVVP Sbjct: 61 QEKMDSWNRDHKGQKLDMGEYKSFLKDIGYLVDEPSEFKISTSNVDPEVATMAGPQLVVP 120 Query: 121 IMNARFALNAANARWGSLYDALYGTDAISEADGASKGPGYNEIRGNKVIAYARNFLNEAA 180 IMNARFALNAANARWGSLYDALYGTDAISE DGA KG GYN +RG KVI +ARN L+ +A Sbjct: 121 IMNARFALNAANARWGSLYDALYGTDAISEEDGAEKGRGYNPVRGAKVIEWARNLLDSSA 180 Query: 181 PLETGSHVDSTGYRIEGGKLVVSLKDGSTTGLKNPAQLQGFQGEASAPIAVLLKNNGIHF 240 PL +GSH D+ Y + GGKLVV L++G +TGLK+ A G+ G A AP VLL NG+HF Sbjct: 181 PLASGSHKDAAKYVVVGGKLVVKLQNGESTGLKDEAGFVGYTGAADAPTGVLLVKNGMHF 240 Query: 241 EIQIDPASPIGQTDAAGVKDILMESALTTIMDCEDSIAAVDADDKTVVYRNWLGLMKGDL 300 EIQID PIG+ D A VKD+LMESALTTIMDCEDS+AAVDADDK + YRNWLGLMKGDL Sbjct: 241 EIQIDATHPIGKDDGAHVKDVLMESALTTIMDCEDSVAAVDADDKALAYRNWLGLMKGDL 300 Query: 301 VEELEKGGKRITRAMNPDRVYTKADGNGELTLHGRSLLFIRNVGHLMTNDAILDKEGNEV 360 E EKGG+++TR MN DR YT ADG+ EL+L GRSL+FIRNVGHLMTN AIL K+GNEV Sbjct: 301 QETFEKGGEQLTRKMNADRTYTAADGS-ELSLKGRSLMFIRNVGHLMTNPAILLKDGNEV 359 Query: 361 PEGIMDGLFTSLIAVHNLNGNTSRKNTRTGSMYIVKPKMHGPEEVAFATELFGRVEDVLG 420 PEG+MDGL TSLIA+H++ GN +N+ GSMYIVKPKMHGPEEVAF E FGRVED L Sbjct: 360 PEGLMDGLITSLIAIHDMKGNGKFQNSTKGSMYIVKPKMHGPEEVAFTNEFFGRVEDALS 419 Query: 421 LPRNTLKVGIMDEERRTTINLKACIKEARERVVFINTGFLDRTGDEIHTSMEAGPMVRKA 480 LPR +LKVGIMDEERRTT+NLKACI A+ER VFINTGFLDRTGDEIHTSME GP +RK Sbjct: 420 LPRFSLKVGIMDEERRTTVNLKACIHAAKERAVFINTGFLDRTGDEIHTSMELGPFIRKG 479 Query: 481 AMKAEKWISAYENNNVDVGLACGLQGKAQIGKGMWAMPDLMAAMLEQKVGHPMAGANTAW 540 AMK WI+AYE NVD+GL G +G AQIGKGMWAMPDLMA MLE K+GHP AGANTAW Sbjct: 480 AMKQATWINAYEQWNVDIGLETGFRGVAQIGKGMWAMPDLMAGMLEAKIGHPKAGANTAW 539 Query: 541 VPSPTAATLHAMHYHKIDVQARQVELAKREKASIDDILTIPLAQD-TNWSEEEKRNELDN 599 VPSPTAATLHA HYH++ V Q +L R +A++DDILT+P+ D + S E+ + ELDN Sbjct: 540 VPSPTAATLHATHYHQVSVADVQKQLESRTRAALDDILTVPVMDDPASLSAEDIQQELDN 599 Query: 600 NSQGILGYMVRWVEQGVGCSKVPDINDIALMEDRATLRISSQHVANWMRHGVVTKDQVVE 659 N+QGILGY+VRW++QGVGCSKVPDIND+ LMEDRATLRI+SQ +ANW+ HG+ ++DQ++E Sbjct: 600 NAQGILGYVVRWIDQGVGCSKVPDINDVGLMEDRATLRIASQLLANWLHHGICSEDQIME 659 Query: 660 SLKRMAPVVDRQNQGDPLYRPMAPDFDNSVAFQAALELVLEGTKQPNGYTEPVLHRRRRE 719 ++KRMA VVD+QN GD YRPMA +FD+SVAFQAAL+LVL+G +QPNGYTEP+LH R + Sbjct: 660 TMKRMAAVVDKQNAGDSAYRPMAGNFDDSVAFQAALDLVLKGREQPNGYTEPLLHAYRLK 719 Query: 720 FKAKNG 725 KAK G Sbjct: 720 AKAKYG 725 Lambda K H 0.316 0.133 0.386 Gapped Lambda K H 0.267 0.0410 0.140 Matrix: BLOSUM62 Gap Penalties: Existence: 11, Extension: 1 Number of Sequences: 1 Number of Hits to DB: 1471 Number of extensions: 46 Number of successful extensions: 3 Number of sequences better than 1.0e-02: 1 Number of HSP's gapped: 1 Number of HSP's successfully gapped: 1 Length of query: 726 Length of database: 726 Length adjustment: 40 Effective length of query: 686 Effective length of database: 686 Effective search space: 470596 Effective search space used: 470596 Neighboring words threshold: 11 Window for multiple hits: 40 X1: 16 ( 7.3 bits) X2: 38 (14.6 bits) X3: 64 (24.7 bits) S1: 41 (21.6 bits) S2: 55 (25.8 bits)
Align candidate GFF2238 HP15_2188 (malate synthase G)
to HMM TIGR01345 (glcB: malate synthase G (EC 2.3.3.9))
# hmmsearch :: search profile(s) against a sequence database # HMMER 3.3.1 (Jul 2020); http://hmmer.org/ # Copyright (C) 2020 Howard Hughes Medical Institute. # Freely distributed under the BSD open source license. # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - # query HMM file: ../tmp/path.carbon/TIGR01345.hmm # target sequence database: /tmp/gapView.18828.genome.faa # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Query: TIGR01345 [M=721] Accession: TIGR01345 Description: malate_syn_G: malate synthase G Scores for complete sequences (score includes all domains): --- full sequence --- --- best 1 domain --- -#dom- E-value score bias E-value score bias exp N Sequence Description ------- ------ ----- ------- ------ ----- ---- -- -------- ----------- 0 1210.5 0.7 0 1210.3 0.7 1.0 1 lcl|FitnessBrowser__Marino:GFF2238 HP15_2188 malate synthase G Domain annotation for each sequence (and alignments): >> lcl|FitnessBrowser__Marino:GFF2238 HP15_2188 malate synthase G # score bias c-Evalue i-Evalue hmmfrom hmm to alifrom ali to envfrom env to acc --- ------ ----- --------- --------- ------- ------- ------- ------- ------- ------- ---- 1 ! 1210.3 0.7 0 0 2 720 .. 4 723 .. 3 724 .. 0.99 Alignments for each domain: == domain 1 score: 1210.3 bits; conditional E-value: 0 TIGR01345 2 rvdagrlqvakklkdfveeevlpgtgvdaekfwsgfdeivrdlapenrellakrdeiqaaideyhrknk.gvidk 75 rv++g++qvak+l+dfv++e++pgtg+da+kfw++fd+iv++lap+nrellakrd iq +d + r+ k d lcl|FitnessBrowser__Marino:GFF2238 4 RVQVGGIQVAKNLYDFVNNEAIPGTGIDADKFWAEFDKIVNELAPRNRELLAKRDAIQEKMDSWNRDHKgQKLDM 78 7899*****************************************************************4457** PP TIGR01345 76 eayksflkeigylveepervtietenvdseiasqagpqlvvpvlnaryalnaanarwgslydalygsnvipeedg 150 yksflk+igylv+ep + +i t nvd e+a+ agpqlvvp++nar+alnaanarwgslydalyg+++i+eedg lcl|FitnessBrowser__Marino:GFF2238 79 GEYKSFLKDIGYLVDEPSEFKISTSNVDPEVATMAGPQLVVPIMNARFALNAANARWGSLYDALYGTDAISEEDG 153 *************************************************************************** PP TIGR01345 151 aekgkeynpkrgekviefarefldeslplesgsyadvvkykivdkklavqlesgkvtrlkdeeqfvgyrgdaadp 225 aekg+ ynp+rg kvie+ar++ld+s pl sgs++d+ ky +v +kl+v+l++g+ t lkde+ fvgy+g a +p lcl|FitnessBrowser__Marino:GFF2238 154 AEKGRGYNPVRGAKVIEWARNLLDSSAPLASGSHKDAAKYVVVGGKLVVKLQNGESTGLKDEAGFVGYTGAADAP 228 *************************************************************************** PP TIGR01345 226 evillktnglhielqidarhpigkadkakvkdivlesaittildcedsvaavdaedkvlvyrnllglmkgtlkek 300 + +ll +ng+h e+qida+hpigk+d a+vkd+++esa+tti+dcedsvaavda+dk l yrn+lglmkg+l+e+ lcl|FitnessBrowser__Marino:GFF2238 229 TGVLLVKNGMHFEIQIDATHPIGKDDGAHVKDVLMESALTTIMDCEDSVAAVDADDKALAYRNWLGLMKGDLQET 303 *************************************************************************** PP TIGR01345 301 lekngriikrklnedrsytaangeelslhgrsllfvrnvghlmtipviltdegeeipegildgvltsvialydlk 375 +ek g +++rk+n dr+ytaa+g+elsl+grsl+f+rnvghlmt+p+il ++g+e+peg++dg++ts+ia++d+k lcl|FitnessBrowser__Marino:GFF2238 304 FEKGGEQLTRKMNADRTYTAADGSELSLKGRSLMFIRNVGHLMTNPAILLKDGNEVPEGLMDGLITSLIAIHDMK 378 *************************************************************************** PP TIGR01345 376 vqnklrnsrkgsvyivkpkmhgpeevafanklftriedllglerhtlkvgvmdeerrtslnlkaciakvkervaf 450 ++k++ns kgs+yivkpkmhgpeevaf+n++f+r+ed l l+r +lkvg+mdeerrt++nlkaci+ +ker +f lcl|FitnessBrowser__Marino:GFF2238 379 GNGKFQNSTKGSMYIVKPKMHGPEEVAFTNEFFGRVEDALSLPRFSLKVGIMDEERRTTVNLKACIHAAKERAVF 453 *************************************************************************** PP TIGR01345 451 intgfldrtgdeihtsmeagamvrkadmksapwlkayernnvaagltcglrgkaqigkgmwampdlmaemlekkg 525 intgfldrtgdeihtsme g+++rk+ mk a+w++aye+ nv+ gl +g+rg aqigkgmwampdlma mle k+ lcl|FitnessBrowser__Marino:GFF2238 454 INTGFLDRTGDEIHTSMELGPFIRKGAMKQATWINAYEQWNVDIGLETGFRGVAQIGKGMWAMPDLMAGMLEAKI 528 *************************************************************************** PP TIGR01345 526 dqlragantawvpsptaatlhalhyhrvdvqkvqkeladaerraelkeiltipvaen.tnwseeeikeeldnnvq 599 + ++agantawvpsptaatlha+hyh+v v vqk+l + +ra l++ilt+pv ++ + s+e+i++eldnn+q lcl|FitnessBrowser__Marino:GFF2238 529 GHPKAGANTAWVPSPTAATLHATHYHQVSVADVQKQLESR-TRAALDDILTVPVMDDpASLSAEDIQQELDNNAQ 602 *************************************988.8999********98753899************** PP TIGR01345 600 gilgyvvrwveqgigcskvpdihnvalmedratlrissqhlanwlrhgivskeqvleslermakvvdkqnagdea 674 gilgyvvrw++qg+gcskvpdi +v lmedratlri+sq lanwl hgi s +q++e+++rma vvdkqnagd a lcl|FitnessBrowser__Marino:GFF2238 603 GILGYVVRWIDQGVGCSKVPDINDVGLMEDRATLRIASQLLANWLHHGICSEDQIMETMKRMAAVVDKQNAGDSA 677 *************************************************************************** PP TIGR01345 675 yrpmadnleasvafkaakdlilkgtkqpsgytepilharrlefkek 720 yrpma+n+++svaf+aa dl+lkg +qp+gytep+lha+rl+ k+k lcl|FitnessBrowser__Marino:GFF2238 678 YRPMAGNFDDSVAFQAALDLVLKGREQPNGYTEPLLHAYRLKAKAK 723 *******************************************987 PP Internal pipeline statistics summary: ------------------------------------- Query model(s): 1 (721 nodes) Target sequences: 1 (726 residues searched) Passed MSV filter: 1 (1); expected 0.0 (0.02) Passed bias filter: 1 (1); expected 0.0 (0.02) Passed Vit filter: 1 (1); expected 0.0 (0.001) Passed Fwd filter: 1 (1); expected 0.0 (1e-05) Initial search space (Z): 1 [actual number of targets] Domain search space (domZ): 1 [number of targets reported over threshold] # CPU time: 0.05u 0.01s 00:00:00.06 Elapsed: 00:00:00.05 # Mc/sec: 8.99 // [ok]
This GapMind analysis is from Sep 17 2021. The underlying query database was built on Sep 17 2021.
Each pathway is defined by a set of rules based on individual steps or genes. Candidates for each step are identified by using ublast (a fast alternative to protein BLAST) against a database of manually-curated proteins (most of which are experimentally characterized) or by using HMMer with enzyme models (usually from TIGRFam). Ublast hits may be split across two different proteins.
A candidate for a step is "high confidence" if either:
Otherwise, a candidate is "medium confidence" if either:
Other blast hits with at least 50% coverage are "low confidence."
Steps with no high- or medium-confidence candidates may be considered "gaps." For the typical bacterium that can make all 20 amino acids, there are 1-2 gaps in amino acid biosynthesis pathways. For diverse bacteria and archaea that can utilize a carbon source, there is a complete high-confidence catabolic pathway (including a transporter) just 38% of the time, and there is a complete medium-confidence pathway 63% of the time. Gaps may be due to:
GapMind relies on the predicted proteins in the genome and does not search the six-frame translation. In most cases, you can search the six-frame translation by clicking on links to Curated BLAST for each step definition (in the per-step page).
For more information, see:
If you notice any errors or omissions in the step descriptions, or any questionable results, please let us know
by Morgan Price, Arkin group, Lawrence Berkeley National Laboratory