GapMind for catabolism of small carbon sources

 

Alignments for a candidate for lacZ in Bacteroides thetaiotaomicron VPI-5482

Align β-galactosidase (BgalB) (EC 3.2.1.23) (characterized)
to candidate 352706 BT3179 beta-galactosidase (NCBI ptt file)

Query= CAZy::AAC24219.1
         (1085 letters)



>FitnessBrowser__Btheta:352706
          Length = 1024

 Score =  670 bits (1728), Expect = 0.0
 Identities = 394/986 (39%), Positives = 549/986 (55%), Gaps = 56/986 (5%)

Query: 4   EWENPQLVSEGTEKPHASFIPYLNPFT---GEWEYPDDFILLNGNWKFFFAKNPFEVPEN 60
           EW++   V      PH    PY +      G +E    ++ LNG WKF + KNP   P++
Sbjct: 32  EWQSQYAVGLNKLDPHTYVWPYADASEVEKGTFEQSPYYMSLNGQWKFHWVKNPDTRPKD 91

Query: 61  FFLEGFDDTNWDEIEVPSNWEMKGYGKPIYTNVVYPFEP-------NPPFVPKDDNPTGI 113
           F+   +    W +I+VP NWE +GYG  IY N  Y F+        NPP VP  +N  G 
Sbjct: 92  FYKPSYYTGGWADIKVPGNWERQGYGTAIYVNETYEFDDKMFNFKKNPPLVPYKENEVGS 151

Query: 114 YRRWVEVPEEWFEKEIFLHFEGVRSFFYLWVNGKRMGFSKDSCTPAEFRVTDVLKPGKNL 173
           YRR  +VP  W  + + L  EGV SF+Y+WVNG+ +G+++ S T AE+ +TD L  G+N 
Sbjct: 152 YRRTFKVPAGWEGRRVVLCCEGVISFYYVWVNGEFLGYNQGSKTAAEWDITDKLTDGENT 211

Query: 174 ICVEVLKWSDGSYLEDQDMWWFAGIYRDVYLYALSKFHVRDIFVRTDLD-EDYRDGKIFL 232
           I +EV +WS G+YLE QDMW  +GI RDVYLY+  + ++ D  V + L+ E Y++G   L
Sbjct: 212 IALEVYRWSSGAYLECQDMWRLSGIERDVYLYSTPEQYIADYKVTSLLEKEHYKEGIFEL 271

Query: 233 DVELRNLGEEKEKDLIITLTDPQGKEMTLVEERVGPKNETLSFVFE---VKDPKKWSAET 289
           +V +          +  TL D   K +     ++         VF+   + D ++W+AE 
Sbjct: 272 EVAVGGTA-SGTSSIAYTLKDASDKTVLEGSRKLESHGSGNLIVFDEQRLPDVRRWNAEH 330

Query: 290 PHLYVLKVELGEDEKKV------NFGFKKVEVKDGRLLFNGKPLYIKGVNRHEFDPDRGH 343
           P LY L +EL +   KV        GF+  E+K+GR   NG P+ +KGVNRHE     G 
Sbjct: 331 PELYTLLLELKDAGGKVTEITGTKVGFRTSEIKNGRFCINGVPVLVKGVNRHEHS-QLGR 389

Query: 344 AVTVERMIQDIKLMKQHNINTVRTSHYPNQTKWYDLCDYYGLYVIDEANIESHGIGEAPE 403
            V+ E M QDI+LMKQHNINTVR SHYP    WY LCD YGLYVIDEANIESHG+G  P 
Sbjct: 390 TVSKELMEQDIRLMKQHNINTVRNSHYPAHPYWYQLCDRYGLYVIDEANIESHGMGYGP- 448

Query: 404 VTLANRPEWEKAHLDRIKRMVERDKNHPSIIFWSLGNEAGDGMNFEKAALWIKERDNTRL 463
            +LA    W  AH+DR +RM ER KNHPS++ WSLGNEAG+G+NFE+   W+K  +  R 
Sbjct: 449 ASLAKDSTWLPAHIDRTRRMYERSKNHPSVVIWSLGNEAGNGINFERTYDWLKSVEKNRP 508

Query: 464 VHYEGTTRRGESYYVDVFSLMYPKIDVLLEYASRKR-EKPFIMCEYAHAMGNSVGNLKDY 522
           V YE   R  E+Y  D++  MY  +DV+  Y +RK   +PFI+CEY HAMGNS G +K+Y
Sbjct: 509 VQYE---RAEENYNTDIYCRMYRSVDVIRNYVARKDIYRPFILCEYLHAMGNSCGGMKEY 565

Query: 523 WDVIEKYPYLHGGCIWDWVDQGIRKKDENGKEFWAYGGDFG--DEPNDKNFCCNGVVLPD 580
           W+V E  P   GGCIWDWVDQ  R+ D++GK +W YGGD+G  D P+  NFCCNG+V   
Sbjct: 566 WEVFENEPMAQGGCIWDWVDQSFREVDKDGKWYWTYGGDYGPKDVPSFGNFCCNGLVNAV 625

Query: 581 RTPEPELYEVKKFYQNIKVRQIAKD--TYEVENGYLFTDLEMFDGTWRIR-KDGEVVREE 637
           R P P L EVKK YQNIK   I K   T  V+N + F+DL  +   W++   DG V+ E 
Sbjct: 626 REPHPHLLEVKKIYQNIKSTLIDKKNLTVRVKNWFDFSDLNEYILHWKVTGDDGTVLAEG 685

Query: 638 RFKLSARPGEKKILKIPLPEMEDS--EYFLEICFSLSEDTLWAKKGHVVAWEQFLIKPPS 695
             +++  P     L +   ++  +  E +L++ ++  + T        +A++QF++  P+
Sbjct: 686 NKEVACEPHATVELTLGAVQLPKTIREAYLDLGWTRKKSTPLVDTAWEIAYDQFVL--PA 743

Query: 696 FEKTVVRESVDLSEDGRHLFVRSKDTELVFSKFTGLLKRIVYRGRNILTGSIVPNFWRVP 755
             K    +    SE G+  F   ++        TG LK +   G  +L   +  + +R  
Sbjct: 744 SGKVWNGKP---SEAGKTTFEVDEN--------TGALKSLCLDGEELLASPVTISLFRPA 792

Query: 756 TDNDVGNKMPERLSIWKRASKERKLFKMFFWKKEENSVSVQ-SVYQVPGNSW--VYLTYT 812
           TDND  ++M  +L  W++A       K+   K+ + S + Q ++  V G       L YT
Sbjct: 793 TDNDNRDRMGAKL--WRKAGLHTLTQKVVSLKESKTSATAQVNILNVTGKKVGDATLEYT 850

Query: 813 IFGNGDILVDLSLIP-AEGVPEIPRIGLQFAVPGDFRFVEWYGRGPHETYWDRKESGLFA 871
           +  NG + V  +  P    V  I R+GL F +   +  V + GRG HETY DR +SG   
Sbjct: 851 LNHNGSLKVQTTFQPDTTWVKSIARLGLTFEMNDTYGNVTYLGRGEHETYIDRNQSGKIG 910

Query: 872 RYRRTVQDMIHRYVRPQETGNRSDVRWFALSD--GRVNLFVSGMPVVDFSVWPFSMEDLE 929
            Y  T + M H YV PQ TGNR+DVRW  L+D  G+     S  P   FS  PFS   LE
Sbjct: 911 IYTTTPEKMFHYYVIPQSTGNRTDVRWVKLADDSGKGCWIESDSP-FQFSALPFSDLLLE 969

Query: 930 KADHVNELPERDFVTVNVDYRQMGLG 955
           KA H+N+L     +TV++D +Q G+G
Sbjct: 970 KALHINDLERNGRITVHLDAKQAGVG 995


Lambda     K      H
   0.320    0.140    0.440 

Gapped
Lambda     K      H
   0.267   0.0410    0.140 


Matrix: BLOSUM62
Gap Penalties: Existence: 11, Extension: 1
Number of Sequences: 1
Number of Hits to DB: 3446
Number of extensions: 200
Number of successful extensions: 12
Number of sequences better than 1.0e-02: 1
Number of HSP's gapped: 1
Number of HSP's successfully gapped: 1
Length of query: 1085
Length of database: 1024
Length adjustment: 45
Effective length of query: 1040
Effective length of database: 979
Effective search space:  1018160
Effective search space used:  1018160
Neighboring words threshold: 11
Window for multiple hits: 40
X1: 16 ( 7.4 bits)
X2: 38 (14.6 bits)
X3: 64 (24.7 bits)
S1: 41 (21.8 bits)
S2: 58 (26.9 bits)

This GapMind analysis is from Sep 17 2021. The underlying query database was built on Sep 17 2021.

Links

Downloads

Related tools

About GapMind

Each pathway is defined by a set of rules based on individual steps or genes. Candidates for each step are identified by using ublast (a fast alternative to protein BLAST) against a database of manually-curated proteins (most of which are experimentally characterized) or by using HMMer with enzyme models (usually from TIGRFam). Ublast hits may be split across two different proteins.

A candidate for a step is "high confidence" if either:

where "other" refers to the best ublast hit to a sequence that is not annotated as performing this step (and is not "ignored").

Otherwise, a candidate is "medium confidence" if either:

Other blast hits with at least 50% coverage are "low confidence."

Steps with no high- or medium-confidence candidates may be considered "gaps." For the typical bacterium that can make all 20 amino acids, there are 1-2 gaps in amino acid biosynthesis pathways. For diverse bacteria and archaea that can utilize a carbon source, there is a complete high-confidence catabolic pathway (including a transporter) just 38% of the time, and there is a complete medium-confidence pathway 63% of the time. Gaps may be due to:

GapMind relies on the predicted proteins in the genome and does not search the six-frame translation. In most cases, you can search the six-frame translation by clicking on links to Curated BLAST for each step definition (in the per-step page).

For more information, see:

If you notice any errors or omissions in the step descriptions, or any questionable results, please let us know

by Morgan Price, Arkin group, Lawrence Berkeley National Laboratory