Skip to content

API Reference

This page documents the key functions and modules in SL-GPS.

Core Modules

main.py - Training Pipeline Entry Point

Main orchestration script that coordinates data generation and ANN training.

Key Variables (edit these to customize your run):

fuel = 'CH4'  # Fuel species name
mech_file = 'gri30.cti'  # Detailed mechanism file (Cantera CTI format)
n_cases = 100  # Number of training simulations
t_rng = [800, 2300]  # Temperature range (K)
p_rng = [2.1, 2.5]  # Pressure range (log atm)
alpha = 0.001  # GPS error tolerance (smaller = more species)
always_threshold = 0.99  # Species in >99% of cases (always included)
never_threshold = 0.01  # Species in <1% of cases (never included)
data_path = 'TrainingData/MyData'  # Where to save generated data
model_path = 'Models/MyModel.h5'  # Where to save trained ANN
scaler_path = 'Scalers/MyScaler.pkl'  # Where to save input scaler

Workflow:

if __name__ == '__main__':
    # If data doesn't exist, generate it
    if not os.path.exists(data_path):
        make_data_parallel(...)  # Generates data.csv and species.csv

    # Train neural network on the data
    make_model(input_specs, data_path, scaler_path, model_path)

make_data_parallel.py - Training Data Generation

Generates autoignition simulation data using parallel processing and GPS-based species selection.

make_data_parallel()

def make_data_parallel(
    fuel, mech_file, end_threshold, ign_HRR_threshold_div,
    ign_GPS_resolution, norm_GPS_resolution, GPS_per_interval,
    n_cases, t_rng, p_rng, phi_rng, alpha,
    always_threshold, never_threshold, pathname, species_ranges
)

Purpose: Run n_cases autoignition simulations with randomized initial conditions, apply GPS to identify important species at each interval, and save state vectors and species masks.

Parameters:

Parameter Type Description Example
fuel str Fuel species name 'CH4'
mech_file str Cantera mechanism file 'gri30.cti'
end_threshold float HRR threshold to end simulation (J/m³s) 2e5
ign_HRR_threshold_div int Divisor for max HRR to define ignition 300
ign_GPS_resolution int Timesteps per interval during ignition 200
norm_GPS_resolution int Timesteps per interval post-ignition 40
GPS_per_interval int GPS evaluation points per interval 4
n_cases int Number of simulations to run 100
t_rng list Temperature range [min, max] (K) [800, 2300]
p_rng list Log pressure range [min, max] (atm) [2.1, 2.5]
phi_rng list Equivalence ratio range (not used if species_ranges set) [0.6, 1.4]
alpha float GPS pathway threshold (smaller = more species) 0.001
always_threshold float Species occurrence threshold for "always include" 0.99
never_threshold float Species occurrence threshold for "never include" 0.01
pathname str Output directory for data 'TrainingData/Data'
species_ranges dict Species composition ranges {'CH4': (0, 1), 'O2': (0, 0.4)}

Outputs: - pathname/data.csv - State vectors: [Temperature, Pressure, species mole fractions] - pathname/species.csv - Binary masks: 1 if species important in that timestep - pathname/always_spec_nums.csv - Indices of always-included species - pathname/never_spec_nums.csv - Indices of never-included species

Example:

make_data_parallel(
    fuel='CH4', mech_file='gri30.cti', end_threshold=2e5,
    ign_HRR_threshold_div=300, ign_GPS_resolution=200,
    norm_GPS_resolution=40, GPS_per_interval=4, n_cases=100,
    t_rng=[800, 2300], p_rng=[2.1, 2.5], phi_rng=[0.6, 1.4],
    alpha=0.001, always_threshold=0.99, never_threshold=0.01,
    pathname='TrainingData/CH4_data', species_ranges={
        'CH4': (0, 1), 'N2': (0, 0.8), 'O2': (0, 0.4)
    }
)

process_simulation() (internal)

def process_simulation(sim_data) -> tuple[np.ndarray, np.ndarray]

Internal function called by joblib.Parallel for each simulation. Returns state vectors and species masks for one autoignition run.


mech_train.py - Neural Network Training

Trains ensemble of ANNs to predict important species based on thermochemical state.

spec_train()

def spec_train(X_train, Y_train) -> tuple[Model, History, History]

Purpose: Train a single ANN model for species prediction.

Parameters: - X_train (np.ndarray): Normalized input data [samples, features] - Y_train (np.ndarray): Binary target data [samples, species]

Returns: - model - Trained Keras model - history - Training history - train_test_history - Train/test split history

Default Architecture:

Input Layer: [Temperature, Pressure, species mole fractions]
  ↓
Dense(16, activation='relu', kernel_initializer='he_normal')
  ↓
Dense(Y_train.shape[1], activation='sigmoid')
  ↓
Output Layer: [binary predictions for each species]

To customize architecture, edit spec_train() before the output layer:

# Current default:
model.add(tf.keras.layers.Dense(16, activation='relu', kernel_initializer='he_normal'))

# Example: Add deeper network
model.add(tf.keras.layers.Dense(64, activation='relu', kernel_initializer='he_normal'))
model.add(tf.keras.layers.Dropout(0.2))
model.add(tf.keras.layers.Dense(32, activation='relu', kernel_initializer='he_normal'))

make_model()

def make_model(input_specs, data_path, scaler_path, model_path)

Purpose: Load training data, normalize it, train ensemble of ANNs in parallel, and save best model.

Parameters:

Parameter Type Description
input_specs list Species names used as ANN inputs
data_path str Directory containing data.csv and species.csv
scaler_path str Path to save MinMaxScaler pickle
model_path str Path to save best .h5 model

Key Settings:

num_processes = 28  # Parallel training (configurable via `make_model(..., num_processes=...)`)
train_test_split = 0.2  # 80% train, 20% validation
early_stopping_patience = 30  # Epochs to wait for val_loss improvement
batch_size = 32
epochs = 200

Workflow: 1. Load state vectors and species masks from CSV 2. Select only input_specs columns as features 3. Normalize input using MinMaxScaler (0-1 range) 4. Train 28 models in parallel with different random initializations 5. Select best model by validation loss 6. Save model as .h5 and scaler as .pkl

Example:

make_model(
    input_specs=['CH4', 'H2O', 'OH', 'H', 'O2', 'CO2', 'O'],
    data_path='TrainingData/CH4_data',
    scaler_path='Scalers/ch4_scaler.pkl',
    model_path='Models/ch4_model.h5'
)

SL_GPS.py - Adaptive Simulation

Runs autoignition simulation with trained ANN dynamically reducing the mechanism.

Key Variables:

fuel = 'CH4'
mech_file = 'gri30.cti'  # Detailed mechanism
input_specs = ['CH4', 'H2O', 'OH', 'H', 'O2', 'CO2', 'O']  # ANN inputs
scaler_path = 'Scalers/scaler.pkl'  # MinMaxScaler
model_path = 'Models/model.h5'  # Trained ANN
data_path = 'TrainingData/Data'  # Training data (for always/never species)

T0_in = 1500  # Initial temperature (K)
phi = 1.0  # Equivalence ratio
atm = 1.0  # Pressure (atm)
t_end = 0.002  # Simulation end time (s)

norm_Dt = 0.0002  # Timestep pre-ignition (s)
ign_Dt = 0.00005  # Timestep during ignition (s)
ign_threshold = 9e7  # HRR threshold to detect ignition (J/m³s)

results_path = 'Results/results.pkl'  # Output pickle file

Outputs: Pickle file containing:

{
    'time': array,  # Simulation time
    'temperature': array,  # Temperature evolution
    'heat_release_rate': array,  # HRR evolution
    'mole_fractions': dict,  # Species compositions
    'mechanism_size': array,  # Number of species/reactions over time
}

utils.py - Core Utilities

Low-level functions for simulation, mechanism reduction, and GPS species selection.

auto_ign_build_SL()

def auto_ign_build_SL(
    fuel, mech_file, input_specs, norm_Dt, ign_Dt,
    T0_in, phi, atm, t_end, scaler_path, model_path,
    data_path, ign_threshold
) -> dict

Purpose: Run adaptive autoignition simulation with ANN-based mechanism reduction.

Workflow: 1. Load trained ANN and scaler 2. Load training data to get always/never species 3. Start simulation at (T0_in, phi, atm) 4. At each timestep: - Get current thermochemical state - Scale state and feed to ANN - Get binary predictions for species - Combine with always/never species - Build reduced mechanism with only selected species - Step simulation 5. Switch between norm_Dt (slow) and ign_Dt (fast) based on HRR 6. Return results dict

sub_mech()

def sub_mech(mech_file, species_names) -> ct.Solution

Purpose: Build a reduced Cantera solution with only specified species and their reactions.

Parameters: - mech_file - Path to detailed CTI mechanism - species_names - List of species to keep (must match Cantera names exactly)

Returns: Cantera Solution object with reduced mechanism

Key Logic: - Keeps only reactions where ALL reactants and products are in species_names - Removes reactions involving excluded species

GPS_spec()

def GPS_spec(
    soln_in, fuel, raw, t_start, t_end, alpha, GPS_per_interval
) -> set

Purpose: Apply GPS algorithm to identify important species in a simulation interval.

Parameters: - soln_in - Cantera solution object - fuel - Fuel species name - raw - Raw simulation data (time, T, P, X) - t_start, t_end - Time interval (s) - alpha - GPS pathway threshold (smaller = more species) - GPS_per_interval - Number of GPS evaluations in interval

Returns: Set of important species

GPS Configuration (hardcoded in function):

elements = ['C', 'H', 'O']
sources = [fuel, 'O2']
targets = ['CO2', 'H2O']

findIgnInterval()

def findIgnInterval(hrr, threshold) -> tuple[int, int]

Returns time indices where HRR exceeds threshold (ignition start/end).

find_ign_delay()

def find_ign_delay(times, temperature) -> float

Returns ignition delay time when temperature rises by 50% of (T_max - T_min).

auto_ign_build_X0()

def auto_ign_build_X0(
    soln, T0, atm, X0, end_threshold=2e3, end=5, dir_raw=None
) -> tuple[dict, float, int]

Purpose: Run detailed autoignition with full mechanism.

Parameters: - soln - Cantera solution - T0 - Initial temperature (K) - atm - Pressure (atm) - X0 - Initial composition (mole fractions dict or string) - end_threshold - HRR limit to stop simulation - end - Maximum simulation time (s) - dir_raw - Directory to save raw data

Returns: - raw - Simulation data dict - time_exec - CPU execution time (s) - step_count - Number of timesteps


Usage Examples

Example 1: Generate Training Data Only

from slgps.make_data_parallel import make_data_parallel

make_data_parallel(
    fuel='CH4',
    mech_file='gri30.cti',
    end_threshold=2e5,
    ign_HRR_threshold_div=300,
    ign_GPS_resolution=200,
    norm_GPS_resolution=40,
    GPS_per_interval=4,
    n_cases=50,  # Small for testing
    t_rng=[1000, 2000],
    p_rng=[2.0, 2.5],
    phi_rng=[0.8, 1.2],
    alpha=0.001,
    always_threshold=0.99,
    never_threshold=0.01,
    pathname='MyData',
    species_ranges={'CH4': (0, 1), 'O2': (0, 0.4), 'N2': (0, 0.8)}
)

Example 2: Train ANN on Existing Data

from slgps.mech_train import make_model

make_model(
    input_specs=['CH4', 'O2', 'CO2', 'H2O'],
    data_path='MyData',
    scaler_path='MyScaler.pkl',
    model_path='MyModel.h5'
)

Example 3: Run Adaptive Simulation

from slgps.utils import auto_ign_build_SL

results = auto_ign_build_SL(
    fuel='CH4',
    mech_file='gri30.cti',
    input_specs=['CH4', 'O2', 'CO2', 'H2O'],
    norm_Dt=0.0002,
    ign_Dt=0.00005,
    T0_in=1500,
    phi=1.0,
    atm=1.0,
    t_end=0.001,
    scaler_path='MyScaler.pkl',
    model_path='MyModel.h5',
    data_path='MyData',
    ign_threshold=9e7
)

File Formats

data.csv (State Vectors)

# Temperature,Atmospheres,CH4,O2,N2,CO2,H2O,...
1500,1.0,0.05,0.21,0.74,0.0,0.0,...
1550,1.0,0.04,0.20,0.76,0.001,0.01,...
...

Column order: 1. # Temperature (K) 2. Atmospheres (pressure, atm) 3. Species mole fractions in order of input_specs

species.csv (Binary Masks)

CH4,O2,N2,CO2,H2O,...
1,1,1,0,0,...
1,1,1,1,1,...
...

Values: 1 = important in this timestep, 0 = not important


Performance Tips

  • Reduce GPS_per_interval from 4 to 2 for faster data generation
  • Reduce n_cases for initial testing
  • Use fewer input_specs to speed up ANN training
  • Increase num_processes in make_model() or via the GUI if you have more CPU cores. The Gradio frontend exposes a "Number of Processes" slider to control parallel workers.
  • Use GPU: Install tensorflow[and-cuda] for 10-100x training speedup

Debugging

Enable verbose output:

import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '0'  # See TensorFlow logs
import tensorflow as tf
tf.debugging.set_log_device_placement(True)  # Log device usage

Check intermediate results:

import pandas as pd
data = pd.read_csv('MyData/data.csv')
print(data.head())
print(data.shape)  # (num_samples, num_features)