4.10. Peaks
Functions for elution peak detection and evaluation.
- asari.peaks.audit_mass_track(list_intensity, min_fwhm, min_intensity_threshold, min_peak_height, min_prominence_threshold)[source]
Get statistical summary on a mass track (list_intensity), then rescale, detrend, smooth and subtract baseline if needed. All scans in a mass track have intensity values, positive or 0s.
- Parameters:
list_intensity (list[np.integer]) – list of intensity values from a mass track.
min_fwhm (float) – taken as half of min_timepoints in main parameters.
min_intensity_threshold (float) – as in main parameters.
min_peak_height (float) – as in main parameters.
min_prominence_ratio (float) – require ratio of prominence relative to peak height, default 0.05.
- Returns:
_baseline_ (float) – estimated baseline level
noise_level (float) – estimated noise level
scaling_factor (float) – a normalization factor to scale the data under preset ceiling
new_prominence (float) – new prominence value, overwriting with noise_level if it is greater than the initial min_prominence_threshold.
list_intensity (list[np.integer]) – list of intensity values after being cleaned up here (also subtracted by baseline).
Note
If the max intensity of a mass track is higher that a preset ceiling (1E8), the mass track is rescaled under the preset ceiling for the purpose of peak detection. After peak detection, the peak height is scaled back using the same scaling factor.
If the median intensity on a mass track is below the preset min_intensity_threshold (default 1e3 for Orbitrap data), this is a low-intensity track. Both baseline level and noise level are set to min_intensity_threshold.
If over half the data points are above min_intensity_threshold and the median intensity is heigher than 10 times of preset min_peak_height (default 1e5 for Orbitrap data), detrend (scipy.signal.detrend) is performed on the mass track.
If a track is not low-intensity, the bottom signals are taken as intensity values below the lower quartile plus min_intensity_threshold. The constant of min_intensity_threshold makes this method stable, even when zeros dominate the track. Here, the baseline level and noise level are assigned as the mean and standard deviation of the bottom signals, respectively.
Smoothing (chromatograms.smooth_moving_average) is applied when the noise level is higher than 1% of max intensity and max intensity is lower than 10 times of the preset min_peak_height.
- asari.peaks.batch_deep_detect_elution_peaks(list_mass_tracks, number_of_scans, parameters)[source]
Performs elution peak detection of a list of mass tracks via multiprocessing.
- Parameters:
list_mass_tracks (list[tuple]) – list of mass tracks. Asari uses this on composite mass tracks in full experiment. But this can be generic mass tracks.
number_of_scans (int) – number of scans, usually corresponding to maximum number in RT.
parameters (dict) – parameter dictionary passed from main.py, which imports from default_parameters and updates the dict by user arguments.
- Returns:
a list of JSON elution peaks.
- Return type:
FeatureList
See also
- asari.peaks.check_overlap_peaks(list_peaks)[source]
Check overlap btw a list of JSON peaks, already ordered by RT from find_peaks. Overlap usually from splitting.
- Parameters:
list_peaks (list[dict]) – a list of peaks in JSON format.
- Return type:
A list of unique peaks in JSON format.
- asari.peaks.cleanup_peak_cluster(cluster_peaks)[source]
Safeguards peak boundaries when reporting overlap peaks. If this contains 3 or more peaks (indicating split in noisy data), merge them. scipy.find_peaks sometimes report two overlap peak: one small and the other joined with the small peak. Mostly already controlled by wlen parameter. Used by check_overlap_peaks.
- Parameters:
cluster_peaks (list[dict]) – a list of peak dictionary from the same cluster
- Return type:
list of peaks after cleanup.
- asari.peaks.compute_noise_by_flanks(peak, list_intensity, noise_data_points, min_intensity_threshold, old_noise_level)[source]
Compute noise level by averaging the adjacent nonpeak data points, used for SNR (signal-to-noise ratio) calculation.
- Parameters:
peak (dict) – an elution peak in JSON format, e.g. {‘id_number’: k, ‘mz’: mz, ‘apex’: x, ‘left_base’: xx, ‘right_base’: yy}
list_intensity (list) – list of intensity of an ROI.
noise_data_points (list[int]) – the indices of data points that do not belong to any peak. Precomputed in stats_detect_elution_peaks.
min_intensity_threshold (float) – minimal intensity threshold used for mass track extraction.
old_noise_level (float) – fallback value if it fails to collect non-peak data points.
- Returns:
Noise level as a float number.
If the computed noise level is lower than min_intensity_threshold, the latter is used.
If this fails to get usable data points, returns the old_noise_level.
Note
The default window size is 100 scans on each side of a peak, offset by 30 as padding. Future version should infer these parameters based on global data distribution.
- asari.peaks.detect_evaluate_peaks_on_roi(list_intensity_roi, rt_numbers_roi, min_peak_height, min_fwhm, min_prominence_threshold, wlen, snr, peakshape, min_prominence_ratio, noise_level)[source]
Return list of peaks based on detection in ROI. An ROI is a segment on a masstrack, defined by list_intensity_roi and rt_numbers_roi. ROIs are extracted from a masstrack by filtering out low-intensity regions.
- Parameters:
list_intensity (list[np.integer]) – list of intensity values from a mass track.
rt_numbers_roi (list[int]) – scan numbers that define the ROI.
min_peak_height (float) – as in main parameters.
min_fwhm (float) – taken as half of min_timepoints in main parameters.
min_prominence_threshold (float) – as in main parameters, default min_peak_height/3.
wlen (int) – window size for evaluating prominence in peaks. Important to resolve clustered narrow peaks. Default 25 scans.
snr (float) – minimal signal to noise ratio required for a peak. Not used here.
peakshape (float) – parameters[‘gaussian_shape’], minimal shape score required for a peak.
min_prominence_ratio (float) – require ratio of prominence relative to peak height, default 0.05.
noise_level (float) – estimated noise level on the mass track.
- Return type:
A list of peaks in JSON format.
- asari.peaks.evaluate_gaussian_peak_on_intensity_list(intensity_list, height, apex, left, right)[source]
Use Gaussian models to fit peaks, R^2 as goodness of fitting.
- Parameters:
intensity_list (int[np.integer]) – list of intensity values.
height (float) – estimated height of a peak
apex (float) – estimated apex of a peak
left (float) – estimated left bound of a peak
right (float) – estimated right bound of a peak
- Return type:
goodness_fitting, fitted_sigma
Note
The parameters height, apex, left, right are relevant to intensity_list. When intensity_list is of ROI, these parameters are not referring to full mass track.
Very high peaks in LC-MS tend to get high fitness scores, but inaccurate estimation of parameters, e.g. the predicted peaks too narrow. Could consider re-weighted methods (e.g. by np.sqrt) for very high data points.
Peak shapes may be more Voigt or bigaussian, but the impact on fitting result is negligible.
- asari.peaks.evaluate_roi_peak_json_(ii, list_intensity_roi, rt_numbers_roi, peaks, properties, peakshape, min_fwhm)[source]
Return the ii-th peak in peaks with basic properties assigned in a JSON dictionary.
- Parameters:
ii (int) – the ii-th peak to use in peaks.
list_intensity (list[np.integer]) – list of intensity values from a mass track.
rt_numbers_roi (list[int]) – scan numbers that define the ROI.
peaks (list[int]) – as from scipy find_peaks
properties (dict) – as from scipy find_peaks
peakshape (float) – parameters[‘gaussian_shape’], minimal shape score required for a peak.
min_fwhm (float) – taken as half of min_timepoints in main parameters.
Note
This handles the conversion btw indices of ROI and indices of mass_track. The peak shape is evluated here on a Gaussian model. The left, right bases are constrained by N*stdev in the fitted model, mostly to ignore long tails, which can be optimized in the future.
- asari.peaks.extend_ROI(ROI, number_of_scans)[source]
Add 3 datapoints to each end if ROI is too short (< 3*min_fwhm), so that peak detection does not run into boundary issues.
- Parameters:
ROI (list) – rerpesenting a region of interest
number_of_scans (int) – the number of scans an ROI must possess, if an ROI has fewer scans, 3 datapoints will be added
- asari.peaks.gaussian_function__(x, a, mu, sigma)[source]
Gaussian function.
- Parameters:
x (float) – input variable.
a (float) – constant for magnitude or height
mu (float) – constant for center position
sigma (float) – constant for standard deviation
- Return type:
A computed float value
- asari.peaks.get_gaussian_peakarea_on_intensity_list(intensity_list, left, right)[source]
Use Gaussian model to fit peak and return peak area.
- Parameters:
intensity_list (list[np.integer]) – list of intensity values.
height (float) – estimated height of a peak
apex (float) – estimated apex of a peak
left (float) – estimated left bound of a peak
right (float) – estimated right bound of a peak
- Return type:
peak area, float value as gaussian integral.
- asari.peaks.iter_peak_detection_parameters(list_mass_tracks, number_of_scans, parameters, shared_list)[source]
Generate iterables for multiprocess.starmap for running elution peak detection.
- Parameters:
list_mass_tracks (list[tuple]) – list of mass tracks. Asari uses this on composite mass tracks in full experiment. But this can be generic mass tracks.
number_of_scans (int) – number of scans, usually corresponding to maximum number in RT.
parameters (dict) – parameter dictionary passed from main.py, which imports from default_parameters and updates the dict by user arguments.
shared_list (list) – list object used to pass data btw multiple processing.
- Returns:
[(mass_track, number_of_scans, min_peak_height, min_fwhm, min_prominence_threshold, wlen, snr, peakshape, min_prominence_ratio, iteration, min_intensity_threshold, shared_list), …]
- Return type:
A list of iterative parameters
- asari.peaks.lowess_smooth_track(list_intensity, number_of_scans)[source]
To smooth data using LOWESS before peak detection. For testing. Smoothing will reduce the height of narrow peaks in CMAP, but not on the reported values, because peak area is extracted from each sample after. The likely slight expansion of peak bases can add to robustness. smooth_moving_average is preferred for most data. LOWESS is not good for small peaks.
- Parameters:
list_intensity (list[np.integer]) – list of intensity values for a mass track
number_of_scans (int) – the number of scans in the experiment
- Return type:
The smoothed intensities of the mass tracks as smoothed by LOWESS.
- asari.peaks.quick_detect_unique_elution_peak(intensity_track, min_peak_height=100000, min_fwhm=3, min_prominence_threshold_ratio=0.2)[source]
Quick peak detection, only looking for a high peak with high prominence. This can be used for quick check on good peaks, or selecting landmarks for alignment purposes.
- Parameters:
intensity_track (list[np.integer]) – list of intensity values from a mass track.
min_peak_height (float) – minimal peak height required for a peak.
min_fwhm (float) – minimal peak width required for a peak.
min_prominence_threshold_ratio (float) – required ratio of prominence relative to peak height.
- Return type:
A qualified peaks in
{'apex' : xx, }
format or None.
- asari.peaks.stats_detect_elution_peaks(mass_track, number_of_scans, min_peak_height, min_fwhm, min_prominence_threshold, wlen, snr, peakshape, min_prominence_ratio, iteration, min_intensity_threshold, shared_list)[source]
Statistics guided peak detection. This is the main method in asari for detecting elution peaks on a mass track.
- Parameters:
mass_track (dict) – {‘id_number’: k, ‘mz’: mz, ‘rt_scan_numbers’: [..], ‘intensity’: [..]}
number_of_scans (int) – number of scans in mass track.
min_peak_height (float) – as in main parameters.
min_fwhm (float) – taken as half of min_timepoints in main parameters.
min_prominence_threshold (float) – as in main parameters, default min_peak_height/3.
wlen (int) – window size for evaluating prominence in peaks. Important to resolve clustered narrow peaks. Default 25 scans.
snr (float) – minimal signal to noise ratio required for a peak.
peakshape (float) – parameters[‘gaussian_shape’], minimal shape score required for a peak.
min_prominence_ratio (float) – require ratio of prominence relative to peak height, default 0.05.
iteration (int) – not used in this function but possible to add iteration on high intensity ROIs if peaks are not detected 1st round.
min_intensity_threshold (float) – as in main parameters.
shared_list (list) – list used to exchange data in multiprocessing.
- Updates:
shared_list (list[dict]) – list of peaks in JSON format, to pool with batch_deep_detect_elution_peaks.
Note
The mass_track is first cleaned: rescale, detrend, smooth and subtract baseline if needed. The track wide noise_level is estimated on stdev of bottom quartile values. ROIs are then separated by a filter (i.e. baseline + noise level) on each mass track. Gap allowed for 2 scans in constructing ROIs. Actually in composite mass track - gaps should not exist after combining many samples. The advantages of separating ROI are 1) better performance of long LC runs, 2) big peaks are less likely to shallow over small peaks, and 3) recalculate min_prominence_threshold.
Prominence requirement is critical to peak detection. It is dynamically determined based on the noise level in a region. The peakshape is calculated on cleaned mass track. SNR is computed on local noise (average of up to 100 nonpeak data points on each side of a peak).