Definition of L-arabinose catabolism
As rules and steps, or see full text
Rules
Overview: L-arabinose utilization in GapMind is based on MetaCyc pathways L-arabinose degradation I, via xylulose 5-phosphate (link); III, oxidation to 2-oxoglutarate (link); and IV, via glycolaldehyde (link). Pathway II via xylitol and xylulose is not represented in GapMind because it is not reported in prokaryotes (link).
- glycolaldehyde-dehydrogenase:
- all:
- arabinose-transport, araA, araB and araD
- or arabinose-transport, xacB, xacC, xacD, xacE and xacF
- or arabinose-transport, xacB, xacC, xacD, KDG-aldolase, glycolaldehyde-dehydrogenase, gyaR and glcB
- Comment: In pathway I, isomerase araA forms L-ribulose, kinase araB forms ribulose 5-phosphate, and epimerase araD forms D-xylulose 5-phosphate, which is an intermediate in the pentose phosphate pathway. In pathway III, the 1-dehydrogenase xacB (which acts on the furanose form, not the usual pyranose form?) forms arabino-1,4-lactone, lactonase xacC forms arbinonate, two dehydratases form 2-dehydro-3-deoxy-L-arabinonate and 2,5-dioxopentanonate (α-ketoglutarate semialdehyde), and dehydrogenase xacF forms 2-oxoglutarate, which is an intermediate in the TCA cycle. (Fitness data suggests that L-arabinose 1-epimerase or mutarotase is also involved, perhaps in creating the correct epimer for the 1-dehydrogenase, but is not included in GapMind.) Pathway IV begins as in pathway III, to 2-dehydro-3-deoxy-L-arabinonate, followed by KDG aldolase to pyruvate and glycolaldehyde; the glycolaldehyde is oxidized to glycolate and then to glyoxylate, and combined with acetyl-CoA by malate synthase, which is a TCA cycle intermediate. (Other pathways for glyxoylate assimilation are known but are not represented here.)
- arabinose-transport:
- gguA, gguB and chvE
- or araF, araG and araH
- or araS, araT, araU and araV
- or xacG, xacH, xacI, xacJ and xacK
- or xylFsa, xylGsa and xylHsa
- or araUsh, araVsh, araWsh and araZsh
- or araE
- or BT0355
- or Echvi_1880
- Comment: Transporters were identified using query: transporter:arabinose:L-arabinose:L-arabinofuranose:L-arabinopyranose:beta-L-arabinose:CPD-12045:CPD-12046
Steps
gguA: L-arabinose ABC transporter, ATPase component GguA
- Curated sequence O05176: GguA aka ATU2347 aka AGR_C_4264, component of Multiple sugar (arabinose, xylose, galactose, glucose, fucose) putative porter
- Total: 1 characterized proteins
gguB: L-arabinose ABC transporter, permease component GguB
- Curated sequence O05177: GguB aka ATU2346 aka AGR_C_4262, component of Multiple sugar (arabinose, xylose, galactose, glucose, fucose) putative porter
- Total: 1 characterized proteins
chvE: L-arabinose ABC transporter, substrate-binding component ChvE
- Curated sequence P25548: CVE1 aka ChvE aka ATU2348 aka AGR_C_4267, component of Multiple sugar (arabinose, xylose, galactose, glucose, fucose) putative porter
- Ignore hits to P54083 when looking for 'other' hits (Multiple sugar-binding periplasmic protein SbpA; Sugar-binding protein A)
- Comment: The related protein sbpA (P54083) binds arabinose
- Total: 1 characterized proteins
araF: L-arabinose ABC transporter, substrate-binding component AraF
- Curated sequence P02924: L-arabinose-binding periplasmic protein; ABP. AraF aka B1901, component of Arabinose porter. arabinose ABC transporter periplasmic binding protein (EC 7.5.2.13; EC 7.5.2.12). arabinose ABC transporter periplasmic binding protein (EC 7.5.2.13)
- Total: 1 characterized proteins
araG: L-arabinose ABC transporter, ATPase component AraG
- Curated sequence P0AAF3: L-arabinose ABC transporter, ATP-binding protein AraG; EC 3.6.3.17. Arabinose import ATP-binding protein AraG; EC 7.5.2.12. Arabinose import ATP-binding protein AraG aka B1900, component of Arabinose porter. arabinose ABC transporter ATP binding subunit (EC 7.5.2.13; EC 7.5.2.12). arabinose ABC transporter ATP binding subunit (EC 7.5.2.13)
- Total: 1 characterized proteins
araH: L-arabinose ABC transporter, permease component AraH
- Curated sequence CH_014278: L-arabinose ABC transporter, permease protein AraH. L-arabinose transport system permease protein AraH. L-arabinose transport system permease protein araH aka b4460, component of Arabinose porter. arabinose ABC transporter membrane subunit (EC 7.5.2.13; EC 7.5.2.12). arabinose ABC transporter membrane subunit (EC 7.5.2.13)
- Total: 1 characterized proteins
araS: L-arabinose ABC transporter, substrate-binding component AraS
araT: L-arabinose ABC transporter, permease component 1 (AraT)
araU: L-arabinose ABC transporter, permease component 2 (AraU)
araV: L-arabinose ABC transporter, ATPase component AraV
xacG: L-arabinose ABC transporter, substrate-binding component XacG
xacH: L-arabinose ABC transporter, permease component 1 (XacH)
xacI: L-arabinose ABC transporter, permease component 2 (XacI)
xacJ: L-arabinose ABC transporter, ATPase component 1 (XacJ)
- UniProt sequence D4GP38: RecName: Full=Xylose/arabinose import ATP-binding protein XacJ {ECO:0000305}; EC=7.5.2.13 {ECO:0000269|PubMed:31089701};
- Total: 1 characterized proteins
xacK: L-arabinose ABC transporter, ATPase component 2 (XacK)
- UniProt sequence D4GP39: RecName: Full=Xylose/arabinose import ATP-binding protein XacK {ECO:0000305}; EC=7.5.2.13 {ECO:0000269|PubMed:31089701};
- Total: 1 characterized proteins
xylFsa: L-arabinose ABC transporter, substrate-binding component XylF
- UniProt sequence Q4J710: RecName: Full=Xylose/arabinose-binding protein XylF {ECO:0000305}; AltName: Full=D-xylose/L-arabinose substrate binding protein {ECO:0000303|PubMed:29150511}; Short=SBP {ECO:0000303|PubMed:29150511};
- Total: 1 characterized proteins
xylGsa: L-arabinose ABC transporter, ATPase component XylG
- UniProt sequence P0DTT6: RecName: Full=Xylose/arabinose import ATP-binding protein XylG {ECO:0000305}; EC=7.5.2.13 {ECO:0000269|PubMed:29150511};
- Total: 1 characterized proteins
xylHsa: L-arabinose ABC transporter, permease component XylH
araUsh: L-arabinose ABC transporter, substrate-binding component AraU(Sh)
- UniProt sequence A0KWY4: SubName: Full=Periplasmic binding protein/LacI transcriptional regulator {ECO:0000313|EMBL:ABK48303.1};
- Comment: Rodionov et al proposed that the Shewanella arabinose transporter is araUVWZ; this was confirmed by fitness data for Shewana3_2073:2076
- Total: 1 characterized proteins
araVsh: L-arabinose ABC transporter, ATPase component AraV(Sh)
araWsh: L-arabinose ABC transporter, permease component 1 AraW(Sh)
araZsh: L-arabinose ABC transporter, permease component 2 AraZ(Sh)
araE: L-arabinose:H+ symporter
- Curated sequence P0AE24: Arabinose-proton symporter; Arabinose transporter. Arabinose (xylose; galactose):H+ symporter, AraE (low affinity high capacity). arabinose:H+ symporter. arabinose:H+ symporter
- Curated sequence P96710: Arabinose-proton symporter; Arabinose transporter. L-arabinose:proton symporter, AraE (Sa-Nogueira and Ramos, 1997). Also transports xylose, galactose and α-1,5 arabinobiose
- Curated sequence C4B4V9: Arabinose/xylose transporter, AraE
- Total: 3 characterized proteins
BT0355: L-arabinose:Na+ symporter
- UniProt sequence Q8AAV7: SubName: Full=Na+/glucose cotransporter {ECO:0000313|EMBL:AAO75462.1};
- Comment: In the RB-TnSeq data, BT0355 is very important for L-arabinose utilization, and this does not seem to be a polar effect (the effect is found on both strands). In contrast, PMC5061871 reported a subtle effect of deleting BT0355 on L-arabinose utilization, but found that it was required for arabinobiose utilization.
- Total: 1 characterized proteins
Echvi_1880: L-arabinose:Na+ symporter
- UniProt sequence L0FZT5: SubName: Full=SSS sodium solute transporter {ECO:0000313|EMBL:AGA78135.1};
- Comment: Echvi_1880 is specifically important for L-arabinose utilization
- Total: 1 characterized proteins
araA: L-arabinose isomerase
araB: ribulokinase
- Curated proteins or TIGRFams with EC 2.7.1.16
- UniProt sequence C4B4W2: SubName: Full=L-ribulokinase {ECO:0000313|EMBL:BAH60840.1};
- UniProt sequence Q8AAW2: SubName: Full=Xylulose kinase (Xylulokinase) {ECO:0000313|EMBL:AAO75457.1};
- Comment: BT0350 (Q8AAW2) is similar to the L-ribulokinase of Corynebacterium glutamicum (PMC2687266; C4B4W2) and is specifically improtant during growth on L-arabinose.
- Total: 1 HMMs and 7 characterized proteins
araD: L-ribulose-5-phosphate epimerase
xacB: L-arabinose 1-dehydrogenase
xacC: L-arabinono-1,4-lactonase
- Curated proteins or TIGRFams with EC 3.1.1.15
- UniProt sequence Q92RN9: RecName: Full=Putative sugar lactone lactonase; EC=3.1.1.-;
- UniProt sequence A0A165IRV8: SubName: Full=Gluconolactonase {ECO:0000313|EMBL:KZT13455.1};
- Ignore hits to Q92RN9 when looking for 'other' hits (Putative sugar lactone lactonase; EC 3.1.1.-)
- Comment: SMc00883 (Q92RN9) is specifically important for L-arabinose utilization and does not appear polar. (It has a vague annotation in SwissProt.) Similarly for Ac3H11_615 (A0A165IRV8)
- Total: 11 characterized proteins
xacD: L-arabinonate dehydratase
- Curated proteins or TIGRFams with EC 4.2.1.25
- Ignore hits to Q92RP0 when looking for 'other' hits (Putative dehydratase IlvD1; EC 4.2.1.-)
- Comment: The function of Q92RP0 seems to be unknown so ignore it
- Total: 11 characterized proteins
xacE: 2-dehydro-3-deoxy-L-arabinonate dehydratase
xacF: alpha-ketoglutarate semialdehyde dehydrogenase
aldA: (glycol)aldehyde dehydrogenase
aldox-large: (glycol)aldehyde oxidoreductase, large subunit
- Curated sequence MONOMER-18071: glycolaldehyde oxidoreductase large subunit
- Curated sequence Q4J6M3: Glyceraldehyde dehydrogenase large chain; Glyceraldehyde dehydrogenase subunit A; Glyceraldehyde dehydrogenase subunit alpha; EC 1.2.99.8
- Ignore hits to items matching 1.2.99.8 when looking for 'other' hits
- Comment: glycolaldehyde oxidoreductase has multiple subunits and no EC number (Q97VI4, Q97VI7, Q97VI6). This is an inference from close homologs from S. acidocaldarius, which have demonstrated activity on glyceraldehyde-3-phosphate, glyceraldehyde, and acetaldehyde, but not on glycolaldehyde itself, so there's no proof that these genes provide the activity. Related enzymes in EC 1.2.99.8 are promiscuous, may well have this activity, so ignore.
- Total: 2 characterized proteins
aldox-med: (glycol)aldehyde oxidoreductase, medium subunit
- Curated sequence MONOMER-18072: glycolaldehyde oxidoreductase medium subunit
- Curated sequence Q4J6M6: Glyceraldehyde dehydrogenase medium chain; Glyceraldehyde dehydrogenase subunit B; Glyceraldehyde dehydrogenase subunit beta; EC 1.2.99.8
- Ignore hits to items matching 1.2.99.8 when looking for 'other' hits
- Total: 2 characterized proteins
aldox-small: (glycol)aldehyde oxidoreductase, small subunit
- Curated sequence MONOMER-18073: glycolaldehyde oxidoreductase small subunit
- Curated sequence Q4J6M5: Glyceraldehyde dehydrogenase small chain; Glyceraldehyde dehydrogenase subunit C; Glyceraldehyde dehydrogenase subunit gamma; EC 1.2.99.8
- Ignore hits to items matching 1.2.99.8 when looking for 'other' hits
- Total: 2 characterized proteins
gyaR: glyoxylate reductase
- Curated proteins or TIGRFams with EC 1.1.1.26
- Ignore hits to items matching 1.1.1.79 when looking for 'other' hits
- Comment: The NADP based glyoxylate reductase (EC 1.1.1.79) is probably biased in the wrong direction for glycolate oxidation, so do not include, but ignore homology to it.
- Total: 6 characterized proteins
glcB: malate synthase
- Curated proteins or TIGRFams with EC 2.3.3.9
- Ignore hits to items matching 4.1.3.24 when looking for 'other' hits
- Comment: Besides the standard enzyme, there's an archaeal enzyme that is sometimes annotated as EC 4.1.3.24, but that only includes the formation of malyl-CoA, not the cleavage to malate.
- Total: 2 HMMs and 22 characterized proteins
KDG-aldolase: 2-dehydro-3-deoxy-L-arabinonate aldolase
- Curated proteins or TIGRFams with EC 4.1.2.18
- Ignore hits to Q97U28 when looking for 'other' hits (2-dehydro-3-deoxy-phosphogluconate/2-dehydro-3-deoxy-6 phosphogalactonate aldolase (EC 4.1.2.55). 2-dehydro-3-deoxy-phosphogluconate/2-dehydro-3-deoxy-6-phosphogalactonate aldolase; EC 4.1.2.55)
- Curated sequence Q4JC35: 2-dehydro-3-deoxy-phosphogluconate/2-dehydro-3-deoxy-6 phosphogalactonate aldolase (EC 4.1.2.55). 2-dehydro-3-deoxy-phosphogluconate/2-dehydro-3-deoxy-6-phosphogalactonate aldolase; EC 4.1.2.55
- Comment: Q97U28 is the same protein but with 14 more N-terminal a.a., and is annotated with 4.1.2.55 only. And a similar enzyme from S. acidcaldarius is thought to perform this reaction as well (PMC2962468)
- Total: 2 characterized proteins
Links
Downloads
Related tools
About GapMind
Each pathway is defined by a set of rules based on individual steps or genes. Candidates for each step are identified by using
ublast (a fast alternative to protein BLAST)
against a database of manually-curated proteins (most of which are experimentally characterized) or by using
HMMer with enzyme models (usually from
TIGRFam). Ublast hits may be split across two different proteins.
A candidate for a step is "high confidence" if either:
- ublast finds a hit to a characterized protein at above 40% identity and 80% coverage, and bits >= other bits+10.
- (Hits to curated proteins without experimental data as to their function are never considered high confidence.)
- HMMer finds a hit with 80% coverage of the model, and either other identity < 40 or other coverage < 0.75.
where "other" refers to the best ublast hit to a sequence that is not annotated as performing this step (and is not "ignored").
Otherwise, a candidate is "medium confidence" if either:
- ublast finds a hit at above 40% identity and 70% coverage (ignoring otherBits).
- ublast finds a hit at above 30% identity and 80% coverage, and bits >= other bits.
- HMMer finds a hit (regardless of coverage or other bits).
Other blast hits with at least 50% coverage are "low confidence."
Steps with no high- or medium-confidence candidates may be considered "gaps."
For the typical bacterium that can make all 20 amino acids, there are 1-2 gaps in amino acid biosynthesis pathways.
For diverse bacteria and archaea that can utilize a carbon source, there is a complete
high-confidence catabolic pathway (including a transporter) just 38% of the time, and
there is a complete medium-confidence pathway 63% of the time.
Gaps may be due to:
- our ignorance of proteins' functions,
- omissions in the gene models,
- frame-shift errors in the genome sequence, or
- the organism lacks the pathway.
GapMind relies on the predicted proteins in the genome and does not search the six-frame translation. In most cases, you can search the six-frame translation by clicking on links to Curated BLAST for each step definition (in the per-step page).
For more information, see:
If you notice any errors or omissions in the step descriptions, or any questionable results, please let us know
by Morgan Price, Arkin group, Lawrence Berkeley National Laboratory