Align Benzoyl-CoA-dihydrodiol lyase; EC 4.1.2.44 (characterized)
to candidate RR42_RS35130 RR42_RS35130 benzoyl-CoA-dihydrodiol lyase
Query= SwissProt::Q84HH6 (555 letters) >FitnessBrowser__Cup4G11:RR42_RS35130 Length = 552 Score = 749 bits (1933), Expect = 0.0 Identities = 371/550 (67%), Positives = 439/550 (79%), Gaps = 5/550 (0%) Query: 10 AELVDYRTEPSKYRHWSLATDGEIATLTLNIDEDGGIRPGYKLKLNSYDLGVDIELHDAL 69 A V+Y+T+PS+Y+H L DG IATL ++IDE+ G+RPGYKLKLNSYDLGVDIEL+DA+ Sbjct: 4 APRVEYQTDPSQYKHLKLTFDGPIATLAVDIDENAGLRPGYKLKLNSYDLGVDIELNDAV 63 Query: 70 QRVRFEHPEVRTVVVTSGKPKIFCSGANIYMLGLSTHAWKVNFCKFTNETRNGIEDSSQY 129 R+RFEHPEVRTVVVTSGK K+FCSGANI+MLG+S+H+WKVNFCKFTNETRNG+EDSS++ Sbjct: 64 NRIRFEHPEVRTVVVTSGKDKVFCSGANIFMLGVSSHSWKVNFCKFTNETRNGLEDSSKH 123 Query: 130 SGLKFLAACNGTTAGGGYELALACDEIVLVDDRNSSVSLPEVPLLGVLPGTGGLTRVTDK 189 SGLKFLAA NG AGGGYELALACDEI+LVDDR+S+VSLPEVPLLGVLPGTGGLTRVTDK Sbjct: 124 SGLKFLAAVNGACAGGGYELALACDEIILVDDRSSAVSLPEVPLLGVLPGTGGLTRVTDK 183 Query: 190 RRVRRDHADIFCTISEGVRGQRAKDWRLVDDVVKQQQFAEHIQARAKALAQTSDRPAGAK 249 R VR D ADIFCT +EGVRGQRAKDWRLVDD+ K FA+ +Q RA+ALA SDRPA A Sbjct: 184 RHVRHDLADIFCTTTEGVRGQRAKDWRLVDDIAKPAVFAQKVQERAQALAALSDRPANAS 243 Query: 250 GVKLTTLERTVDEKGYHYEFVDATIDADGRTVTLTVRAPAAVTAKTAAEIEAQGIKWWPL 309 GV LT L RT++ Y +V ID GR T TV+ P+A + A I G W+PL Sbjct: 244 GVALTPLARTLETDALRYTYVTVEIDRVGRKATFTVKGPSATQPTSVAAIAEAGAAWYPL 303 Query: 310 QMARELDDAILNLRTNHLDVGLWQLRTEGDAQVVLDIDATIDANRDNWFVRETIGMLRRT 369 Q+AREL+DAIL++RTN LD+G W ++TEGDA VL +DAT+ AN+D+W VRETIG+LRRT Sbjct: 304 QLARELEDAILSMRTNELDIGTWLIKTEGDAANVLAMDATLLANQDHWLVRETIGLLRRT 363 Query: 370 LARIDVSSRSLYALIEPGSCFAGTLLEIALAADRSYML----DAAEAKNVVGLSAMNFGT 425 +R+DVSSRSL+ALIEPGSCFAGT LE+ALA DRSY L D A A + ++ NFG Sbjct: 364 FSRLDVSSRSLFALIEPGSCFAGTFLELALACDRSYHLALPDDEARAPRIT-VAETNFGL 422 Query: 426 FPMVNGLSRIDARFYQEEAPVAAVKAKQGSLLSPAEAMELGLVTAIPDDLDWAEEVRIAI 485 +PMV G SR+ RFY E+ + AV+AK G L A +GLVTA PDD+DW +EVRIA+ Sbjct: 423 YPMVTGQSRLGRRFYDEQPALDAVRAKAGQPLDADAAFAVGLVTANPDDIDWTDEVRIAL 482 Query: 486 EERAALSPDALTGLEANLRFGPVETMNTRIFGRLSAWQNWIFNRPNAVGENGALKLFGSG 545 EERAA+SPDALTG+EANLRF E M TRIFGRL+AWQNWIF RPNAVGE GALK++G G Sbjct: 483 EERAAMSPDALTGMEANLRFNGQENMFTRIFGRLTAWQNWIFQRPNAVGEKGALKVYGKG 542 Query: 546 KKAQFDWNRV 555 KA FDWNRV Sbjct: 543 DKAAFDWNRV 552 Lambda K H 0.318 0.134 0.397 Gapped Lambda K H 0.267 0.0410 0.140 Matrix: BLOSUM62 Gap Penalties: Existence: 11, Extension: 1 Number of Sequences: 1 Number of Hits to DB: 830 Number of extensions: 28 Number of successful extensions: 2 Number of sequences better than 1.0e-02: 1 Number of HSP's gapped: 1 Number of HSP's successfully gapped: 1 Length of query: 555 Length of database: 552 Length adjustment: 36 Effective length of query: 519 Effective length of database: 516 Effective search space: 267804 Effective search space used: 267804 Neighboring words threshold: 11 Window for multiple hits: 40 X1: 16 ( 7.4 bits) X2: 38 (14.6 bits) X3: 64 (24.7 bits) S1: 41 (21.7 bits) S2: 53 (25.0 bits)
Align candidate RR42_RS35130 RR42_RS35130 (benzoyl-CoA-dihydrodiol lyase)
to HMM TIGR03222 (boxC: benzoyl-CoA-dihydrodiol lyase (EC 4.1.2.44))
# hmmsearch :: search profile(s) against a sequence database # HMMER 3.3.1 (Jul 2020); http://hmmer.org/ # Copyright (C) 2020 Howard Hughes Medical Institute. # Freely distributed under the BSD open source license. # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - # query HMM file: ../tmp/path.carbon/TIGR03222.hmm # target sequence database: /tmp/gapView.11170.genome.faa # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Query: TIGR03222 [M=548] Accession: TIGR03222 Description: benzo_boxC: benzoyl-CoA-dihydrodiol lyase Scores for complete sequences (score includes all domains): --- full sequence --- --- best 1 domain --- -#dom- E-value score bias E-value score bias exp N Sequence Description ------- ------ ----- ------- ------ ----- ---- -- -------- ----------- 1.8e-297 973.3 0.1 2e-297 973.2 0.1 1.0 1 lcl|FitnessBrowser__Cup4G11:RR42_RS35130 RR42_RS35130 benzoyl-CoA-dihydro Domain annotation for each sequence (and alignments): >> lcl|FitnessBrowser__Cup4G11:RR42_RS35130 RR42_RS35130 benzoyl-CoA-dihydrodiol lyase # score bias c-Evalue i-Evalue hmmfrom hmm to alifrom ali to envfrom env to acc --- ------ ----- --------- --------- ------- ------- ------- ------- ------- ------- ---- 1 ! 973.2 0.1 2e-297 2e-297 1 548 [] 7 552 .] 7 552 .] 1.00 Alignments for each domain: == domain 1 score: 973.2 bits; conditional E-value: 2e-297 TIGR03222 1 vdfrtepskyrhwkltfdGpvatltldvdedgglrdGyklklnsydlGvdieladalqrlrfehpevrv 69 v+++t+ps+y+h kltfdGp+atl++d+de++glr+GyklklnsydlGvdiel+da++r+rfehpevr+ lcl|FitnessBrowser__Cup4G11:RR42_RS35130 7 VEYQTDPSQYKHLKLTFDGPIATLAVDIDENAGLRPGYKLKLNSYDLGVDIELNDAVNRIRFEHPEVRT 75 79******************************************************************* PP TIGR03222 70 vvltsakdkvfcaGanikmlglsthahkvnfckftnetrngiedaseesglkflaavnGtaaGGGyela 138 vv+ts+kdkvfc+Gani+mlg+s+h++kvnfckftnetrng+ed+s++sglkflaavnG++aGGGyela lcl|FitnessBrowser__Cup4G11:RR42_RS35130 76 VVVTSGKDKVFCSGANIFMLGVSSHSWKVNFCKFTNETRNGLEDSSKHSGLKFLAAVNGACAGGGYELA 144 ********************************************************************* PP TIGR03222 139 lacdeivlvddrssavslpevpllavlpGtGGltrvtdkrrvrrdladifctieeGvkGkrakewrlvd 207 lacdei+lvddrssavslpevpll+vlpGtGGltrvtdkr+vr+dladifct++eGv+G+rak+wrlvd lcl|FitnessBrowser__Cup4G11:RR42_RS35130 145 LACDEIILVDDRSSAVSLPEVPLLGVLPGTGGLTRVTDKRHVRHDLADIFCTTTEGVRGQRAKDWRLVD 213 ********************************************************************* PP TIGR03222 208 evvksskfdaavaeraaelaaksdrpadakGveltklertieedgvryetvdvaidraartatitvkgp 276 ++ k++ f+++v+era++laa sdrpa+a Gv+lt+l rt e+d++ry++v v+idr r+at+tvkgp lcl|FitnessBrowser__Cup4G11:RR42_RS35130 214 DIAKPAVFAQKVQERAQALAALSDRPANASGVALTPLARTLETDALRYTYVTVEIDRVGRKATFTVKGP 282 ********************************************************************* PP TIGR03222 277 eaaapadlaaikaqGaefyplklarelddailhlrlneldiglwvlrteGdaelvlaadalleakedhw 345 +a++p+++aai +Ga++ypl+larel+dail++r+neldig+w+++teGda+ vla+da+l a++dhw lcl|FitnessBrowser__Cup4G11:RR42_RS35130 283 SATQPTSVAAIAEAGAAWYPLQLARELEDAILSMRTNELDIGTWLIKTEGDAANVLAMDATLLANQDHW 351 ********************************************************************* PP TIGR03222 346 lvreilgllkrtlkrldvssrslfalvepgscfaGtlaelvfaadrsymlegeleddedeeaaitlsel 414 lvre++gll+rt+ rldvssrslfal+epgscfaGt++el++a+drsy l +l+dde+ +++it++e+ lcl|FitnessBrowser__Cup4G11:RR42_RS35130 352 LVRETIGLLRRTFSRLDVSSRSLFALIEPGSCFAGTFLELALACDRSYHL--ALPDDEARAPRITVAET 418 *************************************************9..9**************** PP TIGR03222 415 nfgayplsnglsrlaarflaeeaaveavrdkiGealdaaeaeklglvtaalddidwedeirilleeras 483 nfg yp+++g+srl +rf++e+ a++avr+k G+ lda +a glvta +ddidw de+ri+leera+ lcl|FitnessBrowser__Cup4G11:RR42_RS35130 419 NFGLYPMVTGQSRLGRRFYDEQPALDAVRAKAGQPLDADAAFAVGLVTANPDDIDWTDEVRIALEERAA 487 ********************************************************************* PP TIGR03222 484 lspdaltGleanlrfagpetmetrifgrltawqnwifnrpnavGekGalklyGsGkkaqfdlerv 548 +spdaltG+eanlrf+g+e m trifgrltawqnwif+rpnavGekGalk+yG+G ka+fd++rv lcl|FitnessBrowser__Cup4G11:RR42_RS35130 488 MSPDALTGMEANLRFNGQENMFTRIFGRLTAWQNWIFQRPNAVGEKGALKVYGKGDKAAFDWNRV 552 ****************************************************************8 PP Internal pipeline statistics summary: ------------------------------------- Query model(s): 1 (548 nodes) Target sequences: 1 (552 residues searched) Passed MSV filter: 1 (1); expected 0.0 (0.02) Passed bias filter: 1 (1); expected 0.0 (0.02) Passed Vit filter: 1 (1); expected 0.0 (0.001) Passed Fwd filter: 1 (1); expected 0.0 (1e-05) Initial search space (Z): 1 [actual number of targets] Domain search space (domZ): 1 [number of targets reported over threshold] # CPU time: 0.02u 0.01s 00:00:00.03 Elapsed: 00:00:00.03 # Mc/sec: 7.81 // [ok]
This GapMind analysis is from Sep 17 2021. The underlying query database was built on Sep 17 2021.
Each pathway is defined by a set of rules based on individual steps or genes. Candidates for each step are identified by using ublast (a fast alternative to protein BLAST) against a database of manually-curated proteins (most of which are experimentally characterized) or by using HMMer with enzyme models (usually from TIGRFam). Ublast hits may be split across two different proteins.
A candidate for a step is "high confidence" if either:
Otherwise, a candidate is "medium confidence" if either:
Other blast hits with at least 50% coverage are "low confidence."
Steps with no high- or medium-confidence candidates may be considered "gaps." For the typical bacterium that can make all 20 amino acids, there are 1-2 gaps in amino acid biosynthesis pathways. For diverse bacteria and archaea that can utilize a carbon source, there is a complete high-confidence catabolic pathway (including a transporter) just 38% of the time, and there is a complete medium-confidence pathway 63% of the time. Gaps may be due to:
GapMind relies on the predicted proteins in the genome and does not search the six-frame translation. In most cases, you can search the six-frame translation by clicking on links to Curated BLAST for each step definition (in the per-step page).
For more information, see:
If you notice any errors or omissions in the step descriptions, or any questionable results, please let us know
by Morgan Price, Arkin group, Lawrence Berkeley National Laboratory