Align β-galactosidase (LacL+LacM) (EC 3.2.1.23) (characterized)
to candidate 5210843 Shew_3269 Beta-galactosidase (RefSeq)
Query= CAZy::CAZ66936.1 (626 letters) >FitnessBrowser__PV4:5210843 Length = 1076 Score = 347 bits (891), Expect = 1e-99 Identities = 216/627 (34%), Positives = 322/627 (51%), Gaps = 44/627 (7%) Query: 10 DPEVFRVNQLPAHSDHHYYHDTA----EFKTGSRFIKSLNGAWRFNFAKTPAERPVDFYQ 65 D +F VN+L H+ Y + S+ LNG WRF+ AK P P +F Sbjct: 37 DHTLFEVNKLAPHASFFGYESEPLALLDEMDRSQLYLDLNGRWRFHLAKNPDATPKEFAA 96 Query: 66 PDFDATDFDTIQVPGHIELAGYGQIQYINTLYPWEGKIYRRPPYTLNQDQLTPGLFSDAA 125 P+FDA+ + +IQVPG+ E GYG Y++ YP++ K P SD Sbjct: 97 PEFDASHWGSIQVPGNFETQGYGHAIYLDERYPFDTK--------------WPDAPSD-- 140 Query: 126 DNTVGSYLKTFDLDDAFKGQRIIIQFQGVEEALYVWLNGHFIGYAEDSFTPSEFDLTPYI 185 N G Y KTF L ++ +++ I AL +++NG +GY++ + TP+EFD+TPY+ Sbjct: 141 HNPTGLYRKTFTLPAHWQQKQVFIHIGAARSALTLFVNGREVGYSQGAKTPAEFDITPYL 200 Query: 186 QDQGNVLAVRVYKRSTAAFIEDQDMFRFSGIFRDVNILAEPASHITDLDIRPVPNANLKS 245 Q N++A+++ + S A+++E QDM R +GI R+V + A P I D+ + N +L Sbjct: 201 QAGDNLVAMQLIRWSDASYLESQDMLRMTGIEREVYLYATPKQRIEDIQVVTHLNEDLTR 260 Query: 246 GELNITTKVTG-EPATLALTVK----DHDGRVLTSQTQ----TGSGSVTFDTMLFDQLHL 296 +L I + +P AL ++ D G+ + Q G F L L Sbjct: 261 AKLAIRVDIASHQPGVRALELEARLLDPQGKPVAKANQRLSLKGDAKPVFSQTLISP-KL 319 Query: 297 WSPQTPYLYQLTIEVYDADRQLLEVVPYQFGFRTVELRDDKVIYVNNKRLVINGVNRHEW 356 W+ + P LY+L + + + L+V Q G R + + + + + VNNK + I GV+RHE Sbjct: 320 WNAEMPNLYRLILTLKTEKGETLQVASQQIGVRKIAIENGQ-LKVNNKAITIRGVDRHET 378 Query: 357 NAHTGRVISMDDMRADIQTMLANNINADRTCHYPDQLPWYQLCDEAGIYLMAENNLESHG 416 + TG V+S + M DI+ M NNINA R+ HYP+ W L D G+Y++ E N+ESH Sbjct: 379 DPQTGHVVSRETMELDIRLMKQNNINAVRSSHYPNHPYWLSLADRYGLYVIDEANIESHP 438 Query: 417 SWQKMGAIEPSYNVPGDNPHWLAAVIDRARSNYEWFKNHPSIIFWSLGNESYAGEDIAAM 476 AI+ + G+ WL A R E KNHPS+I WSLGNE+ G+ + Sbjct: 439 L-----AIDDKTQL-GNEMSWLPAHQARIERMVERDKNHPSVIIWSLGNEAGEGKLFERL 492 Query: 477 QAFYKEHDDSRLVHYEGVVHTPELKDRISDVESRMYEKPQNIVAYLEDNPTKPFLDCEYM 536 + K D +R V YE P +D+ + MY + I Y E +P + EY Sbjct: 493 YQWIKRRDPNRPVQYEPAGEAP-----YTDIVAPMYPSIERIREYAERASDRPLIMIEYA 547 Query: 537 HDMGNSLGGMQSYNDLIDKYPMYQGGFIWDFIDQALFVHDPITDQDVLRYGGDF-DERHS 595 H MGNS+G +Q Y D+I+ YP QGGFIWD++DQAL + + Q YG D+ + + Sbjct: 548 HAMGNSVGNLQDYWDVIEAYPQLQGGFIWDWVDQALAFSNDL-GQRYWAYGKDYHPDMPT 606 Query: 596 DYEFSGDGLMFADRTPKPAMQEVKYYY 622 D F +GL+ DR P P + EVK Y Sbjct: 607 DGNFLNNGLVDPDRNPHPHLSEVKKVY 633 Lambda K H 0.320 0.138 0.429 Gapped Lambda K H 0.267 0.0410 0.140 Matrix: BLOSUM62 Gap Penalties: Existence: 11, Extension: 1 Number of Sequences: 1 Number of Hits to DB: 1812 Number of extensions: 95 Number of successful extensions: 7 Number of sequences better than 1.0e-02: 1 Number of HSP's gapped: 1 Number of HSP's successfully gapped: 1 Length of query: 626 Length of database: 1076 Length adjustment: 41 Effective length of query: 585 Effective length of database: 1035 Effective search space: 605475 Effective search space used: 605475 Neighboring words threshold: 11 Window for multiple hits: 40 X1: 16 ( 7.4 bits) X2: 38 (14.6 bits) X3: 64 (24.7 bits) S1: 41 (21.8 bits) S2: 56 (26.2 bits)
This GapMind analysis is from Sep 17 2021. The underlying query database was built on Sep 17 2021.
Each pathway is defined by a set of rules based on individual steps or genes. Candidates for each step are identified by using ublast (a fast alternative to protein BLAST) against a database of manually-curated proteins (most of which are experimentally characterized) or by using HMMer with enzyme models (usually from TIGRFam). Ublast hits may be split across two different proteins.
A candidate for a step is "high confidence" if either:
Otherwise, a candidate is "medium confidence" if either:
Other blast hits with at least 50% coverage are "low confidence."
Steps with no high- or medium-confidence candidates may be considered "gaps." For the typical bacterium that can make all 20 amino acids, there are 1-2 gaps in amino acid biosynthesis pathways. For diverse bacteria and archaea that can utilize a carbon source, there is a complete high-confidence catabolic pathway (including a transporter) just 38% of the time, and there is a complete medium-confidence pathway 63% of the time. Gaps may be due to:
GapMind relies on the predicted proteins in the genome and does not search the six-frame translation. In most cases, you can search the six-frame translation by clicking on links to Curated BLAST for each step definition (in the per-step page).
For more information, see:
If you notice any errors or omissions in the step descriptions, or any questionable results, please let us know
by Morgan Price, Arkin group, Lawrence Berkeley National Laboratory