Definition of myo-inositol catabolism

GapMind for catabolism of small carbon sources

Definition of myo-inositol catabolism

# Myo-inositol degradation in GapMind is based on MetaCyc pathways
# myo-inositol degradation I via inosose dehydratase (metacyc:P562-PWY)
# and pathway II inosose dehydrogenase (metacyc:PWY-7241).

# ABC transporters have been described in
# Caulobacter crescentus, Phaeobacter inhibens, Pseudomonas simiae, and Pseudomonas fluorescens

PGA1_c07300	myo-inositol ABC transport, substrate-binding component	curated:reanno::Phaeo:GFF715
PGA1_c07310	myo-inositol ABC transporter, permease component	curated:reanno::Phaeo:GFF716
PGA1_c07320	myo-inositol ABC transporter, ATPase component	curated:reanno::Phaeo:GFF717

# Transporters were identified using:
# query: transporter:myo-inositol:myoinositol:m-inositol:inositol
myo-inositol-transport: PGA1_c07300 PGA1_c07310 PGA1_c07320

iatP	myo-inositol ABC transporter, permease component IatP	curated:TCDB::B8H230
iatA	myo-inositol ABC transporter, ATPase component IatA	curated:TCDB::B8H229
ibpA	myo-inositol ABC transporter, substrate-binding component IbpA	curated:TCDB::B8H228
myo-inositol-transport:  iatP iatA ibpA

# The ortholog in P. fluorescens FW300-N2E3 (AO353_21380, A0A0N9WNI6)
# was not reannotated but does have the phenotype
PS417_11885	myo-inositol ABC transporter, substrate-binding component	curated:reanno::WCS417:GFF2331	uniprot:A0A0N9WNI6
PS417_11890	myo-inositol ABC transporter, ATPase component	curated:reanno::WCS417:GFF2332	curated:reanno::pseudo3_N2E3:AO353_21385
PS417_11895	myo-inositol ABC transporter, permease component	curated:reanno::WCS417:GFF2333	curated:reanno::pseudo3_N2E3:AO353_21390
myo-inositol-transport: PS417_11885 PS417_11890 PS417_11895

# Homomeric transporters

# Distantly related transporters with the same domain content were grouped together into iolT
iolT	myo-inositol:H+ symporter	curated:SwissProt::Q8VZR6	curated:CharProtDB::CH_091483	curated:CharProtDB::CH_091623	curated:CharProtDB::CH_123508	curated:SwissProt::O34718	curated:SwissProt::P30606	curated:SwissProt::P87110	curated:SwissProt::Q10286	curated:TCDB::A8DCT2	curated:TCDB::AIU34725.1	curated:TCDB::Q01440	curated:TCDB::Q8NTX0	curated:TCDB::E1WAV3	curated:TCDB::E1WAV4	curated:TCDB::Q8NL90	curated:CharProtDB::CH_123411	curated:CharProtDB::CH_124311
myo-inositol-transport: iolT

SMIT1	myo-inositol:Na+ symporter	curated:SwissProt::Q28728	curated:SwissProt::Q8K0E3	curated:SwissProt::Q8WWX8	curated:SwissProt::Q9JKZ2	curated:SwissProt::Q9Z1F2	curated:TCDB::P31637	curated:TCDB::P53794
myo-inositol-transport: SMIT1

HMIT	myo-inositol:H+ symporter	curated:CharProtDB::CH_091598	curated:SwissProt::Q921A2	curated:SwissProt::Q96QE2	curated:SwissProt::Q9C757	curated:SwissProt::Q9ZQP6
myo-inositol-transport: HMIT

iolF	myo-inositol:H+ symporter	curated:SwissProt::P42417
myo-inositol-transport: iolF

import glucosamine.steps:kdgK # 2-keto-3-deoxygluconate kinase
import glucose.steps:eda # 2-keto-3-deoxygluconate 6-phosphate aldolase
import fructose.steps:tpi # triose-phosphate isomerase

# 2-oxidation of myo-inositol is also linked to EC:1.1.5.n1 (quinoprotein inositol dehydrogenase),
# but that is not linked to sequence
iolG	myo-inositol 2-dehydrogenase	EC:1.1.1.18

iolE	scyllo-inosose 2-dehydratase	EC:4.2.1.44

# Erroneous annotations of epi-inositol hydrolases from SEED were "confirmed" by the fitness data and included in
# reannotations; these are all ignored.
iolD	3D-(3,5/4)-trihydroxycyclohexane-1,2-dione hydrolase	EC:3.7.1.22	ignore_other:epi-inositol hydrolase

# The EC number is missing from the reannotations
iolB	5-deoxy-D-glucuronate isomerase	EC:5.3.1.30	term:5-deoxy-glucuronate isomerase

iolC	5-dehydro-2-deoxy-D-gluconate kinase	EC:2.7.1.92

# DUF2090 appears to be a substitute for the aldolase. It is distantly related to aldolases
# and is found fused to iolC in inositol degradation clusters that lack any apparent iolJ.
# Fitness data confirms that these fusion proteins are required for myo-inositol utilization
# (but does not prove that the DUF2090 domain is required).
# These proteins are included in iolJ via their reannotations.
iolJ	5-dehydro-2-deoxyphosphogluconate aldolase	EC:4.1.2.29

# Related to methylmalonate-semialdehyde dehydrogenase (1.2.1.27), and many enzymes
# may have both activities.
# Q9I702 is annotated as doing this but as "putative" and without the EC number
mmsA	malonate-semialdehyde dehydrogenase	EC:1.2.1.18	ignore_other:1.2.1.27	ignore:SwissProt::Q9I702

# Both pathways begin with the 2-dehydrogenase (iolG) forming scyllo-inosose.
# In pathway I, inosose dehydratase (iolE)  forms 3D-(3,5/4)-trihydroxycyclohexane-1,2-dione,
# followed by a ring-cleaving hydrolase to 5-deoxy-D-glucuronate,
# isomerization to 5-dehydro-2-deoxy-D-gluconate, phosphorylation,
# and cleavage by an aldolase to 3-oxopropionate (malonate semialdehyde) and
# glycerone phosphate; the 3-oxopropionate is oxidized to acetyl-CoA while the
# glycerone phosphate is converted by triose-phosphate isomerase to glyceraldehyde 3-phosphate.
all: myo-inositol-transport iolG iolE iolD iolB iolC iolJ mmsA tpi

iolM	2-inosose 4-dehydrogenase	curated:metacyc::MONOMER-17949
iolN	2,4-diketo-inositol hydratase	curated:SwissProt::Q9WYP4
iolO	5-dehydro-L-gluconate epimerase	curated:SwissProt::Q9WYP7
uxaE	D-tagaturonate epimerase	EC:5.1.2.7
uxuB	D-mannonate dehydrogenase	EC:1.1.1.57

# Many proteins are annotated in SwissProt as "D-galactonate dehydratase family member" but
# have little activity on D-mannonoate; it is probably not the physiological substrate
uxuA	D-mannonate dehydratase	EC:4.2.1.8	ignore:SwissProt::A4WA78	ignore:SwissProt::A5KUH4	ignore:SwissProt::A6AMN2	ignore:SwissProt::A6M2W4	ignore:SwissProt::A6VRA1	ignore:SwissProt::A8RQK7	ignore:SwissProt::B1ELW6	ignore:SwissProt::B3PDB1	ignore:SwissProt::B5GCP6	ignore:SwissProt::B5QBD4	ignore:SwissProt::B5R541	ignore:SwissProt::B5RAG0	ignore:SwissProt::B8HCK2	ignore:SwissProt::C6CBG9	ignore:SwissProt::C6CVY9	ignore:SwissProt::C6D9S0	ignore:SwissProt::C6DI84	ignore:SwissProt::C7PW26	ignore:SwissProt::C8ZZN2	ignore:SwissProt::C9A1P5	ignore:SwissProt::C9CN91	ignore:SwissProt::C9NUM5	ignore:SwissProt::C9Y5D5	ignore:SwissProt::D0KC90	ignore:SwissProt::D0X4R4	ignore:SwissProt::D4GJ14	ignore:SwissProt::D7BPX0	ignore:SwissProt::D8ADB5	ignore:SwissProt::D9UNB2	ignore:SwissProt::E1V4Y0	ignore:SwissProt::Q1QT89	ignore:SwissProt::Q2CIN0	ignore:SwissProt::Q6DAR4	ignore:SwissProt::Q8FHC7

# In pathway II, a dehydrogenase forms 3-dehydro scyllo-inosose (also known as 2,4-diketo-inositol),
# a hydratase forms 5-dehydro-L-gluconate, an epimerase forms D-tagaturonate,
# another forms to D-fructonate, a reductase forms D-mannonate, a dehydratase forms
# 2-keto-3-deoxygluconate, a kinase forms 2-keto-3-deoxy-gluconate 6-phosphate,
# and an aldolase forms glyceraldehyde-3-phosphate and pyruvate.
all: myo-inositol-transport iolG iolM iolN iolO uxaE uxuB uxuA kdgK eda

Downloads

Candidates (tab-delimited)
Steps (tab-delimited)
Rules (tab-delimited)
Protein sequences (fasta format)
Organisms (tab-delimited)
SQLite3 databases

Related tools

About GapMind

Each pathway is defined by a set of rules based on individual steps or genes. Candidates for each step are identified by using ublast (a fast alternative to protein BLAST) against a database of manually-curated proteins (most of which are experimentally characterized) or by using HMMer with enzyme models (usually from TIGRFam). Ublast hits may be split across two different proteins.

A candidate for a step is "high confidence" if either:

ublast finds a hit to a characterized protein at above 40% identity and 80% coverage, and bits >= other bits+10.
- (Hits to curated proteins without experimental data as to their function are never considered high confidence.)
HMMer finds a hit with 80% coverage of the model, and either other identity < 40 or other coverage < 0.75.

where "other" refers to the best ublast hit to a sequence that is not annotated as performing this step (and is not "ignored").

Otherwise, a candidate is "medium confidence" if either:

ublast finds a hit at above 40% identity and 70% coverage (ignoring otherBits).
ublast finds a hit at above 30% identity and 80% coverage, and bits >= other bits.
HMMer finds a hit (regardless of coverage or other bits).

Other blast hits with at least 50% coverage are "low confidence."

Steps with no high- or medium-confidence candidates may be considered "gaps." For the typical bacterium that can make all 20 amino acids, there are 1-2 gaps in amino acid biosynthesis pathways. For diverse bacteria and archaea that can utilize a carbon source, there is a complete high-confidence catabolic pathway (including a transporter) just 38% of the time, and there is a complete medium-confidence pathway 63% of the time. Gaps may be due to:

our ignorance of proteins' functions,
omissions in the gene models,
frame-shift errors in the genome sequence, or
the organism lacks the pathway.

GapMind relies on the predicted proteins in the genome and does not search the six-frame translation. In most cases, you can search the six-frame translation by clicking on links to Curated BLAST for each step definition (in the per-step page).

For more information, see:

the paper from 2019 on GapMind for amino acid biosynthesis
the paper from 2022 on GapMind for carbon sources
the source code
instructions for running GapMind on your computer

If you notice any errors or omissions in the step descriptions, or any questionable results, please let us know

by Morgan Price, Arkin group, Lawrence Berkeley National Laboratory

GapMind for catabolism of small carbon sources