Align Glucokinase; EC 2.7.1.2; Glucose kinase (uncharacterized)
to candidate WP_010959909.1 MCA_RS02785 glucokinase
Query= curated2:B2J224 (341 letters) >NCBI__GCF_000008325.1:WP_010959909.1 Length = 330 Score = 224 bits (570), Expect = 3e-63 Identities = 134/351 (38%), Positives = 188/351 (53%), Gaps = 34/351 (9%) Query: 3 LLLAGDIGGTKTILQLVETSDSQGLHTIYQESYHSADFPDLVPIVQQFLIKANTPIPEKA 62 +LLAGD+G TKT+L L + + L ++ + + S D+ L +V FL PE A Sbjct: 1 MLLAGDVGATKTVLGLFDCWGDR-LVSLSEAIFASTDYASLETVVAAFLDGQEERRPEVA 59 Query: 63 CFAIAGPIVKNTAKLTNLAWFLDTERLQQELGIPHIYLINDFAAVGYGIS-GLQKQDLHP 121 CF + GP+ + ++TNL W L L G+ + L+ND A+ G++ L + D Sbjct: 60 CFGVPGPVSEGRCEITNLPWVLSERELAAATGVSAVRLLNDVQAMALGMAYRLGEDDWVE 119 Query: 122 LQVGKPQPETP-IGIIGAGTGLGQGFLIKQGNNYQVFPSEGGHADFAPRNEIEFQLLKYL 180 L G +P + + +I AGTGLG+ L G Y P+EGGH+DFAP +E LL +L Sbjct: 120 LNPGAGRPRSGNVAVIAAGTGLGEAILYWDGERYHALPTEGGHSDFAPNGPLEEGLLAFL 179 Query: 181 LDKHDIQRISVERVVSGMGIVAIYQFLRDRKFAAES----------PDIAQIVRTWEQEA 230 D+ +S ER++SG G+ +Y +LR A ES PD A I+ W Sbjct: 180 RDRF-CGHVSYERILSGSGLANLYDYLRHAGVAPESEALHAALASAPDRAPIIAEW---- 234 Query: 231 GQEEKSVDPGAAIGTAALEKRDRLSEQTLQLFIEAYGAEAGNLALKLLPYGGLYIAGGIA 290 ALE+RD L L LF YGAEAGNLALK L GG+ + GGIA Sbjct: 235 ----------------ALERRDALCTAVLDLFAAIYGAEAGNLALKSLALGGVILGGGIA 278 Query: 291 PKILPLIQNSGFLLNFTQKGRMRPLLEEIPVYIILNPQVGLIGAALCAARL 341 PKILP++Q F+ FT KGR+ PLL +PV + ++PQ L+GAA A+ + Sbjct: 279 PKILPVLQAGRFMAAFTAKGRLSPLLGRLPVRVAIHPQPALLGAAHAASAM 329 Lambda K H 0.320 0.140 0.408 Gapped Lambda K H 0.267 0.0410 0.140 Matrix: BLOSUM62 Gap Penalties: Existence: 11, Extension: 1 Number of Sequences: 1 Number of Hits to DB: 291 Number of extensions: 11 Number of successful extensions: 5 Number of sequences better than 1.0e-02: 1 Number of HSP's gapped: 1 Number of HSP's successfully gapped: 1 Length of query: 341 Length of database: 330 Length adjustment: 28 Effective length of query: 313 Effective length of database: 302 Effective search space: 94526 Effective search space used: 94526 Neighboring words threshold: 11 Window for multiple hits: 40 X1: 16 ( 7.4 bits) X2: 38 (14.6 bits) X3: 64 (24.7 bits) S1: 41 (21.8 bits) S2: 49 (23.5 bits)
Align candidate WP_010959909.1 MCA_RS02785 (glucokinase)
to HMM TIGR00749 (glk: glucokinase (EC 2.7.1.2))
# hmmsearch :: search profile(s) against a sequence database # HMMER 3.3.1 (Jul 2020); http://hmmer.org/ # Copyright (C) 2020 Howard Hughes Medical Institute. # Freely distributed under the BSD open source license. # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - # query HMM file: ../tmp/path.carbon/TIGR00749.hmm # target sequence database: /tmp/gapView.2098.genome.faa # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Query: TIGR00749 [M=315] Accession: TIGR00749 Description: glk: glucokinase Scores for complete sequences (score includes all domains): --- full sequence --- --- best 1 domain --- -#dom- E-value score bias E-value score bias exp N Sequence Description ------- ------ ----- ------- ------ ----- ---- -- -------- ----------- 1.6e-81 260.0 0.0 1.8e-81 259.8 0.0 1.0 1 lcl|NCBI__GCF_000008325.1:WP_010959909.1 MCA_RS02785 glucokinase Domain annotation for each sequence (and alignments): >> lcl|NCBI__GCF_000008325.1:WP_010959909.1 MCA_RS02785 glucokinase # score bias c-Evalue i-Evalue hmmfrom hmm to alifrom ali to envfrom env to acc --- ------ ----- --------- --------- ------- ------- ------- ------- ------- ------- ---- 1 ! 259.8 0.0 1.8e-81 1.8e-81 1 315 [] 3 323 .. 3 323 .. 0.93 Alignments for each domain: == domain 1 score: 259.8 bits; conditional E-value: 1.8e-81 TIGR00749 1 lvgdiGGtnarlalvevapgeieqv..ktyssedfpsleavvrvyleeakvelkdpikgcfaiatPiig 67 l+gd+G t+++l+l + ++ ++ ++ s+d+ sle vv +l+ +++ p +cf + +P+ + lcl|NCBI__GCF_000008325.1:WP_010959909.1 3 LAGDVGATKTVLGLFDCWGDRLVSLseAIFASTDYASLETVVAAFLDGQEER--RPEVACFGVPGPVSE 69 79*************986655444446789****************998775..5669*********** PP TIGR00749 68 dfvrltnldWalsieelkqelalaklelindfaavayail.alkeedliqlg.gakveesaaiailGaG 134 +++tnl W ls el+ +++ + l+nd a+a++++ l+e+d + l+ ga s+ +a++ aG lcl|NCBI__GCF_000008325.1:WP_010959909.1 70 GRCEITNLPWVLSERELAAATGVSAVRLLNDVQAMALGMAyRLGEDDWVELNpGAGRPRSGNVAVIAAG 138 *************************************996369*******9757888899********* PP TIGR00749 135 tGlGvatliqqsdgrykvlageGghvdfaPrseleillleylrkky.grvsaervlsGsGlvliyeals 202 tGlG a l + +ry++l++eGgh dfaP+ +le ll +lr ++ g+vs er+lsGsGl+++y++l+ lcl|NCBI__GCF_000008325.1:WP_010959909.1 139 TGLGEAILYWD-GERYHALPTEGGHSDFAPNGPLEEGLLAFLRDRFcGHVSYERILSGSGLANLYDYLR 206 ******99994.579****************************************************** PP TIGR00749 203 krkgere....vsklskeelkekdiseaalegsdvlarralelflsilGalagnlalklgarGGvyvaG 267 ++ e + l + + +i+e ale d+l+ l+lf +i+Ga+agnlalk +a GGv + G lcl|NCBI__GCF_000008325.1:WP_010959909.1 207 HAGVAPEsealHAALASAPDRAPIIAEWALERRDALCTAVLDLFAAIYGAEAGNLALKSLALGGVILGG 275 *96444433777889999999999********************************************* PP TIGR00749 268 GivPrfiellkkssfraafedkGrlkellasiPvqvvlkkkvGllGag 315 Gi+P++++ l+++ f+aaf kGrl ll+ +Pv+v ++ ++ llGa+ lcl|NCBI__GCF_000008325.1:WP_010959909.1 276 GIAPKILPVLQAGRFMAAFTAKGRLSPLLGRLPVRVAIHPQPALLGAA 323 **********************************************97 PP Internal pipeline statistics summary: ------------------------------------- Query model(s): 1 (315 nodes) Target sequences: 1 (330 residues searched) Passed MSV filter: 1 (1); expected 0.0 (0.02) Passed bias filter: 1 (1); expected 0.0 (0.02) Passed Vit filter: 1 (1); expected 0.0 (0.001) Passed Fwd filter: 1 (1); expected 0.0 (1e-05) Initial search space (Z): 1 [actual number of targets] Domain search space (domZ): 1 [number of targets reported over threshold] # CPU time: 0.01u 0.01s 00:00:00.02 Elapsed: 00:00:00.01 # Mc/sec: 8.60 // [ok]
This GapMind analysis is from Sep 24 2021. The underlying query database was built on Sep 17 2021.
Each pathway is defined by a set of rules based on individual steps or genes. Candidates for each step are identified by using ublast (a fast alternative to protein BLAST) against a database of manually-curated proteins (most of which are experimentally characterized) or by using HMMer with enzyme models (usually from TIGRFam). Ublast hits may be split across two different proteins.
A candidate for a step is "high confidence" if either:
Otherwise, a candidate is "medium confidence" if either:
Other blast hits with at least 50% coverage are "low confidence."
Steps with no high- or medium-confidence candidates may be considered "gaps." For the typical bacterium that can make all 20 amino acids, there are 1-2 gaps in amino acid biosynthesis pathways. For diverse bacteria and archaea that can utilize a carbon source, there is a complete high-confidence catabolic pathway (including a transporter) just 38% of the time, and there is a complete medium-confidence pathway 63% of the time. Gaps may be due to:
GapMind relies on the predicted proteins in the genome and does not search the six-frame translation. In most cases, you can search the six-frame translation by clicking on links to Curated BLAST for each step definition (in the per-step page).
For more information, see:
If you notice any errors or omissions in the step descriptions, or any questionable results, please let us know
by Morgan Price, Arkin group, Lawrence Berkeley National Laboratory