GapMind for catabolism of small carbon sources

 

Alignments for a candidate for lacZ in Bacteroides thetaiotaomicron VPI-5482

Align β-galactosidase (Gal4214-1) (EC 3.2.1.23) (characterized)
to candidate 351154 BT1626 beta-galactosidase (NCBI ptt file)

Query= CAZy::AAX48919.1
         (1046 letters)



>FitnessBrowser__Btheta:351154
          Length = 1022

 Score =  712 bits (1837), Expect = 0.0
 Identities = 406/1065 (38%), Positives = 582/1065 (54%), Gaps = 63/1065 (5%)

Query: 1    MNMKKRTILTSIFAFISIIVFAQEKPSRNDWENPEVFQINREPARAAFLPFADEASAIAD 60
            M +KKRT L  + A  +    AQ++P   +W++     +N+        P+AD AS I  
Sbjct: 1    MKLKKRTFLILMAALTATFASAQKQPLP-EWQSQYAVGLNKLAPHTYVWPYAD-ASDIGK 58

Query: 61   D--YTRSPWYMSLDGKWKFNWSPTPDERPKDFFNTDFNTTTWKEIGVPSNWELVGYGIPI 118
               Y +SP+YMSL+GKWKFNW   PD RPKDF+   + T  W +I VP NWE  GYG  I
Sbjct: 59   PGGYEQSPYYMSLNGKWKFNWVKNPDNRPKDFYQPSYYTGGWADINVPGNWERQGYGTAI 118

Query: 119  YTNITYPF-------VKNPPFIDHADNPVGSYRRTFELPENWDGRRVYLHFEGGTSAMYV 171
            Y N TY F        KNPP +  A+N VGSYRRTF++P +W GRRV L  EG  S  YV
Sbjct: 119  YVNETYEFDDKMFNFKKNPPLVPFAENEVGSYRRTFKVPADWKGRRVVLCCEGVISFYYV 178

Query: 172  WINGEKVGYSQNTKSPTEFDITKYVKVGKNQVAVEVYRWSDGSYLEDQDFWRLSGIDRSV 231
            W+NG+ +GY+Q +K+  E+DIT  +  G+N VA+EVYRWS G+YLE QD WRLSGI+R V
Sbjct: 179  WVNGKLLGYNQGSKTAAEWDITDVLSEGENVVALEVYRWSSGAYLECQDMWRLSGIERDV 238

Query: 232  YLYSTANTRIADFFARPDLDTS-YKNGSLSVDIKLKNANSVAKNNQTVEAKLVDAAGKEV 290
            YLYST    IAD+     LD   YK G  ++++ ++  ++ A +   +   L DA+GK V
Sbjct: 239  YLYSTPKQYIADYKVSASLDKEKYKEGIFNLEVTVEGPSATASS---IAYTLKDASGKAV 295

Query: 291  FIKTIKINLGANTVSSTTFEQMVKSPKLWNNETPNLYTLVLTLKDENGKFVETVATSIGF 350
                I I     +      E+ +   K WN E PNLYTLVL LKD  GK  E     +GF
Sbjct: 296  LQDAINIKSRGLSNFIAFDEKKIAEVKAWNAEHPNLYTLVLELKDAQGKVTELTGCEVGF 355

Query: 351  RKVELKNGQLLVNGIRIMVHGVNIHEHNPKTGHYQDEATMMKDIKLMKQLNINAVRCSHY 410
            R  E+K+G+  +NG+ ++V G N HEH+ + G    +  M +DI+LMKQ NIN VR SHY
Sbjct: 356  RTSEIKDGRFCINGVPVLVKGTNRHEHS-QLGRTVSKELMEQDIRLMKQHNINMVRNSHY 414

Query: 411  PNNLLWVKLCNKYGLFLVDEANIETHGMGAELQGSFDKTKHPAYLPE---WKAAHMDRIY 467
            P +  W +LC++YGL+++DEANIE+HGMG            PA L +   W  AHMDR +
Sbjct: 415  PTHPYWYQLCDRYGLYMIDEANIESHGMGYG----------PASLAKDSTWLTAHMDRTH 464

Query: 468  SLVERDKNQPSIILWSLGNECGNGPVFHEAYNWIKNRDKTRLVQFEQAGEQENTDVVCPM 527
             + ER KN P+I++WS GNE GNG  F   Y+W+K+ +K R VQ+E+A    NTD+ C M
Sbjct: 465  RMYERSKNHPAIVIWSQGNEAGNGINFERTYDWLKSVEKGRPVQYERAELNYNTDIYCRM 524

Query: 528  YPSMEYMKEYANRKDVKRPFIMCEYSHAMGNSNGNFQEYWDIIHSSTNMQGGFIWDWVDQ 587
            Y S++ +K Y  +KD+ RPFI+CEY HAMGNS G  +EYW++  +    QGG IWDWVDQ
Sbjct: 525  YRSVDEIKAYVGKKDIYRPFILCEYLHAMGNSCGGMKEYWEVFENEPMAQGGCIWDWVDQ 584

Query: 588  GFEETDEAGRKYWAYGGDMGGQNYTNDQNFCHNGLVWPDRTPHPGAFEVKKVYQDILFKG 647
             F E D+ G+ YW YGGD G +   +  NFC NGLV   R PHP   EVKK+YQ+I  K 
Sbjct: 585  NFREIDKDGKWYWTYGGDYGPEGIPSFGNFCGNGLVNAVREPHPHLLEVKKIYQNI--KA 642

Query: 648  VNLDKGIIEV--ENGFGYTNLDKYLFKFEVL-KNGLVIKSGVINIRLAPQSKKQIQIELP 704
               D+  ++V  +N + ++NL++Y+ ++ V  ++G V+  G   +   P +   + +   
Sbjct: 643  TLSDRKNLKVCIKNWYDFSNLNEYILRWNVKGEDGTVLAEGTKEVDCEPHATVDVTLGAV 702

Query: 705  KLTTEDGVEYLLNVFAYTKEGTELLPQNFEIAREQFSIGESNYFVKVAKASTNPIVKDSQ 764
            KL       Y LN+    KE T L+  ++E+A +QF +  +               K++ 
Sbjct: 703  KLPNTVREAY-LNLSWSRKEATPLVDTDWEVAYDQFVLAGN---------------KNTT 746

Query: 765  DAITLSANGVEVTINKKTGLMQKYTSGEENYFNQMPVPNFWRAPTDNDFGNYMQVNSNVW 824
                  A      ++K TG +   T   +         + +R  TDND  N  +  + +W
Sbjct: 747  AYRPQKAGETAFVVDKNTGALSSLTLDGKELLAAPITLSLFRPATDND--NRDRNGARLW 804

Query: 825  RTVGRFSSLDS-IEVKEVSTQTTVVAHLF--LKDIASTYTITYSMDADGSLTLQNSFKAG 881
            R  G  +     + +KE  T  TV A +              Y++D +G+L ++ +F+  
Sbjct: 805  RKAGLNNLTQKVVSLKEEKTSATVRAEILNGKGQKVGMADFVYALDKNGALKVRTTFQPD 864

Query: 882  EMALSEMPRFGMLFSLKKELDNFSYYGRGPWENYQDRNTSSLKGIYESKVADQYVPYTRP 941
               +  M R G+ F +    +  SY GRG  E Y DRN S   G+Y++ V   +  Y  P
Sbjct: 865  TAIVKSMARLGLTFRMADAYNQVSYLGRGDHETYIDRNQSGRIGLYDTTVERMFHYYATP 924

Query: 942  QENGYKTDIRWITLTNSSGNGIEILGLQPLGVSALNNYPEDFDPGLTKKQQHTNDITPRD 1001
            Q    +TD+RW  LT+ +G G+ +   +P   S +      F   L +K  H N++    
Sbjct: 925  QSTANRTDVRWAKLTDQAGEGVFMESNRPFQFSII-----PFSDVLLEKAHHINELERDG 979

Query: 1002 EVIICVDLAQRGLGGDNSW-GAMPHEQYQLRNKAYSYGFVIKPIK 1045
             + I +D  Q G+G      G +P  QY +  K  S+ F + P+K
Sbjct: 980  MITIHLDAEQAGVGTATCGPGVLP--QYLVPVKKQSFEFTLYPVK 1022


Lambda     K      H
   0.316    0.134    0.410 

Gapped
Lambda     K      H
   0.267   0.0410    0.140 


Matrix: BLOSUM62
Gap Penalties: Existence: 11, Extension: 1
Number of Sequences: 1
Number of Hits to DB: 3003
Number of extensions: 171
Number of successful extensions: 10
Number of sequences better than 1.0e-02: 1
Number of HSP's gapped: 1
Number of HSP's successfully gapped: 1
Length of query: 1046
Length of database: 1022
Length adjustment: 45
Effective length of query: 1001
Effective length of database: 977
Effective search space:   977977
Effective search space used:   977977
Neighboring words threshold: 11
Window for multiple hits: 40
X1: 16 ( 7.3 bits)
X2: 38 (14.6 bits)
X3: 64 (24.7 bits)
S1: 41 (21.6 bits)
S2: 57 (26.6 bits)

This GapMind analysis is from Sep 17 2021. The underlying query database was built on Sep 17 2021.

Links

Downloads

Related tools

About GapMind

Each pathway is defined by a set of rules based on individual steps or genes. Candidates for each step are identified by using ublast (a fast alternative to protein BLAST) against a database of manually-curated proteins (most of which are experimentally characterized) or by using HMMer with enzyme models (usually from TIGRFam). Ublast hits may be split across two different proteins.

A candidate for a step is "high confidence" if either:

where "other" refers to the best ublast hit to a sequence that is not annotated as performing this step (and is not "ignored").

Otherwise, a candidate is "medium confidence" if either:

Other blast hits with at least 50% coverage are "low confidence."

Steps with no high- or medium-confidence candidates may be considered "gaps." For the typical bacterium that can make all 20 amino acids, there are 1-2 gaps in amino acid biosynthesis pathways. For diverse bacteria and archaea that can utilize a carbon source, there is a complete high-confidence catabolic pathway (including a transporter) just 38% of the time, and there is a complete medium-confidence pathway 63% of the time. Gaps may be due to:

GapMind relies on the predicted proteins in the genome and does not search the six-frame translation. In most cases, you can search the six-frame translation by clicking on links to Curated BLAST for each step definition (in the per-step page).

For more information, see:

If you notice any errors or omissions in the step descriptions, or any questionable results, please let us know

by Morgan Price, Arkin group, Lawrence Berkeley National Laboratory