4.10. Peaks

Functions for elution peak detection and evaluation.

asari.peaks.audit_mass_track(list_intensity, min_fwhm, min_intensity_threshold, min_peak_height, min_prominence_threshold)[source]

Get statistical summary on a mass track (list_intensity), then rescale, detrend, smooth and subtract baseline if needed. All scans in a mass track have intensity values, positive or 0s.

Parameters:
  • list_intensity (list[np.integer]) – list of intensity values from a mass track.

  • min_fwhm (float) – taken as half of min_timepoints in main parameters.

  • min_intensity_threshold (float) – as in main parameters.

  • min_peak_height (float) – as in main parameters.

  • min_prominence_ratio (float) – require ratio of prominence relative to peak height, default 0.05.

Returns:

  • _baseline_ (float) – estimated baseline level

  • noise_level (float) – estimated noise level

  • scaling_factor (float) – a normalization factor to scale the data under preset ceiling

  • new_prominence (float) – new prominence value, overwriting with noise_level if it is greater than the initial min_prominence_threshold.

  • list_intensity (list[np.integer]) – list of intensity values after being cleaned up here (also subtracted by baseline).

Note

If the max intensity of a mass track is higher that a preset ceiling (1E8), the mass track is rescaled under the preset ceiling for the purpose of peak detection. After peak detection, the peak height is scaled back using the same scaling factor.

If the median intensity on a mass track is below the preset min_intensity_threshold (default 1e3 for Orbitrap data), this is a low-intensity track. Both baseline level and noise level are set to min_intensity_threshold.

If over half the data points are above min_intensity_threshold and the median intensity is heigher than 10 times of preset min_peak_height (default 1e5 for Orbitrap data), detrend (scipy.signal.detrend) is performed on the mass track.

If a track is not low-intensity, the bottom signals are taken as intensity values below the lower quartile plus min_intensity_threshold. The constant of min_intensity_threshold makes this method stable, even when zeros dominate the track. Here, the baseline level and noise level are assigned as the mean and standard deviation of the bottom signals, respectively.

Smoothing (chromatograms.smooth_moving_average) is applied when the noise level is higher than 1% of max intensity and max intensity is lower than 10 times of the preset min_peak_height.

asari.peaks.batch_deep_detect_elution_peaks(list_mass_tracks, number_of_scans, parameters)[source]

Performs elution peak detection of a list of mass tracks via multiprocessing.

Parameters:
  • list_mass_tracks (list[tuple]) – list of mass tracks. Asari uses this on composite mass tracks in full experiment. But this can be generic mass tracks.

  • number_of_scans (int) – number of scans, usually corresponding to maximum number in RT.

  • parameters (dict) – parameter dictionary passed from main.py, which imports from default_parameters and updates the dict by user arguments.

Returns:

a list of JSON elution peaks.

Return type:

FeatureList

asari.peaks.check_overlap_peaks(list_peaks)[source]

Check overlap btw a list of JSON peaks, already ordered by RT from find_peaks. Overlap usually from splitting.

Parameters:

list_peaks (list[dict]) – a list of peaks in JSON format.

Return type:

A list of unique peaks in JSON format.

asari.peaks.cleanup_peak_cluster(cluster_peaks)[source]

Safeguards peak boundaries when reporting overlap peaks. If this contains 3 or more peaks (indicating split in noisy data), merge them. scipy.find_peaks sometimes report two overlap peak: one small and the other joined with the small peak. Mostly already controlled by wlen parameter. Used by check_overlap_peaks.

Parameters:

cluster_peaks (list[dict]) – a list of peak dictionary from the same cluster

Return type:

list of peaks after cleanup.

asari.peaks.compute_noise_by_flanks(peak, list_intensity, noise_data_points, min_intensity_threshold, old_noise_level)[source]

Compute noise level by averaging the adjacent nonpeak data points, used for SNR (signal-to-noise ratio) calculation.

Parameters:
  • peak (dict) – an elution peak in JSON format, e.g. {‘id_number’: k, ‘mz’: mz, ‘apex’: x, ‘left_base’: xx, ‘right_base’: yy}

  • list_intensity (list) – list of intensity of an ROI.

  • noise_data_points (list[int]) – the indices of data points that do not belong to any peak. Precomputed in stats_detect_elution_peaks.

  • min_intensity_threshold (float) – minimal intensity threshold used for mass track extraction.

  • old_noise_level (float) – fallback value if it fails to collect non-peak data points.

Returns:

  • Noise level as a float number.

  • If the computed noise level is lower than min_intensity_threshold, the latter is used.

  • If this fails to get usable data points, returns the old_noise_level.

Note

The default window size is 100 scans on each side of a peak, offset by 30 as padding. Future version should infer these parameters based on global data distribution.

asari.peaks.detect_evaluate_peaks_on_roi(list_intensity_roi, rt_numbers_roi, min_peak_height, min_fwhm, min_prominence_threshold, wlen, snr, peakshape, min_prominence_ratio, noise_level)[source]

Return list of peaks based on detection in ROI. An ROI is a segment on a masstrack, defined by list_intensity_roi and rt_numbers_roi. ROIs are extracted from a masstrack by filtering out low-intensity regions.

Parameters:
  • list_intensity (list[np.integer]) – list of intensity values from a mass track.

  • rt_numbers_roi (list[int]) – scan numbers that define the ROI.

  • min_peak_height (float) – as in main parameters.

  • min_fwhm (float) – taken as half of min_timepoints in main parameters.

  • min_prominence_threshold (float) – as in main parameters, default min_peak_height/3.

  • wlen (int) – window size for evaluating prominence in peaks. Important to resolve clustered narrow peaks. Default 25 scans.

  • snr (float) – minimal signal to noise ratio required for a peak. Not used here.

  • peakshape (float) – parameters[‘gaussian_shape’], minimal shape score required for a peak.

  • min_prominence_ratio (float) – require ratio of prominence relative to peak height, default 0.05.

  • noise_level (float) – estimated noise level on the mass track.

Return type:

A list of peaks in JSON format.

asari.peaks.evaluate_gaussian_peak_on_intensity_list(intensity_list, height, apex, left, right)[source]

Use Gaussian models to fit peaks, R^2 as goodness of fitting.

Parameters:
  • intensity_list (int[np.integer]) – list of intensity values.

  • height (float) – estimated height of a peak

  • apex (float) – estimated apex of a peak

  • left (float) – estimated left bound of a peak

  • right (float) – estimated right bound of a peak

Return type:

goodness_fitting, fitted_sigma

Note

The parameters height, apex, left, right are relevant to intensity_list. When intensity_list is of ROI, these parameters are not referring to full mass track.

Very high peaks in LC-MS tend to get high fitness scores, but inaccurate estimation of parameters, e.g. the predicted peaks too narrow. Could consider re-weighted methods (e.g. by np.sqrt) for very high data points.

Peak shapes may be more Voigt or bigaussian, but the impact on fitting result is negligible.

asari.peaks.evaluate_roi_peak_json_(ii, list_intensity_roi, rt_numbers_roi, peaks, properties, peakshape, min_fwhm)[source]

Return the ii-th peak in peaks with basic properties assigned in a JSON dictionary.

Parameters:
  • ii (int) – the ii-th peak to use in peaks.

  • list_intensity (list[np.integer]) – list of intensity values from a mass track.

  • rt_numbers_roi (list[int]) – scan numbers that define the ROI.

  • peaks (list[int]) – as from scipy find_peaks

  • properties (dict) – as from scipy find_peaks

  • peakshape (float) – parameters[‘gaussian_shape’], minimal shape score required for a peak.

  • min_fwhm (float) – taken as half of min_timepoints in main parameters.

Note

This handles the conversion btw indices of ROI and indices of mass_track. The peak shape is evluated here on a Gaussian model. The left, right bases are constrained by N*stdev in the fitted model, mostly to ignore long tails, which can be optimized in the future.

asari.peaks.extend_ROI(ROI, number_of_scans)[source]

Add 3 datapoints to each end if ROI is too short (< 3*min_fwhm), so that peak detection does not run into boundary issues.

Parameters:
  • ROI (list) – rerpesenting a region of interest

  • number_of_scans (int) – the number of scans an ROI must possess, if an ROI has fewer scans, 3 datapoints will be added

asari.peaks.gaussian_function__(x, a, mu, sigma)[source]

Gaussian function.

Parameters:
  • x (float) – input variable.

  • a (float) – constant for magnitude or height

  • mu (float) – constant for center position

  • sigma (float) – constant for standard deviation

Return type:

A computed float value

asari.peaks.get_gaussian_peakarea_on_intensity_list(intensity_list, left, right)[source]

Use Gaussian model to fit peak and return peak area.

Parameters:
  • intensity_list (list[np.integer]) – list of intensity values.

  • height (float) – estimated height of a peak

  • apex (float) – estimated apex of a peak

  • left (float) – estimated left bound of a peak

  • right (float) – estimated right bound of a peak

Return type:

peak area, float value as gaussian integral.

asari.peaks.goodness_fitting__(y_orignal, y_fitted)[source]

Returns R^2 as goodness of fitting.

asari.peaks.iter_peak_detection_parameters(list_mass_tracks, number_of_scans, parameters, shared_list)[source]

Generate iterables for multiprocess.starmap for running elution peak detection.

Parameters:
  • list_mass_tracks (list[tuple]) – list of mass tracks. Asari uses this on composite mass tracks in full experiment. But this can be generic mass tracks.

  • number_of_scans (int) – number of scans, usually corresponding to maximum number in RT.

  • parameters (dict) – parameter dictionary passed from main.py, which imports from default_parameters and updates the dict by user arguments.

  • shared_list (list) – list object used to pass data btw multiple processing.

Returns:

[(mass_track, number_of_scans, min_peak_height, min_fwhm, min_prominence_threshold, wlen, snr, peakshape, min_prominence_ratio, iteration, min_intensity_threshold, shared_list), …]

Return type:

A list of iterative parameters

asari.peaks.lowess_smooth_track(list_intensity, number_of_scans)[source]

To smooth data using LOWESS before peak detection. For testing. Smoothing will reduce the height of narrow peaks in CMAP, but not on the reported values, because peak area is extracted from each sample after. The likely slight expansion of peak bases can add to robustness. smooth_moving_average is preferred for most data. LOWESS is not good for small peaks.

Parameters:
  • list_intensity (list[np.integer]) – list of intensity values for a mass track

  • number_of_scans (int) – the number of scans in the experiment

Return type:

The smoothed intensities of the mass tracks as smoothed by LOWESS.

asari.peaks.quick_detect_unique_elution_peak(intensity_track, min_peak_height=100000, min_fwhm=3, min_prominence_threshold_ratio=0.2)[source]

Quick peak detection, only looking for a high peak with high prominence. This can be used for quick check on good peaks, or selecting landmarks for alignment purposes.

Parameters:
  • intensity_track (list[np.integer]) – list of intensity values from a mass track.

  • min_peak_height (float) – minimal peak height required for a peak.

  • min_fwhm (float) – minimal peak width required for a peak.

  • min_prominence_threshold_ratio (float) – required ratio of prominence relative to peak height.

Return type:

A qualified peaks in {'apex' : xx, } format or None.

asari.peaks.stats_detect_elution_peaks(mass_track, number_of_scans, min_peak_height, min_fwhm, min_prominence_threshold, wlen, snr, peakshape, min_prominence_ratio, iteration, min_intensity_threshold, shared_list)[source]

Statistics guided peak detection. This is the main method in asari for detecting elution peaks on a mass track.

Parameters:
  • mass_track (dict) – {‘id_number’: k, ‘mz’: mz, ‘rt_scan_numbers’: [..], ‘intensity’: [..]}

  • number_of_scans (int) – number of scans in mass track.

  • min_peak_height (float) – as in main parameters.

  • min_fwhm (float) – taken as half of min_timepoints in main parameters.

  • min_prominence_threshold (float) – as in main parameters, default min_peak_height/3.

  • wlen (int) – window size for evaluating prominence in peaks. Important to resolve clustered narrow peaks. Default 25 scans.

  • snr (float) – minimal signal to noise ratio required for a peak.

  • peakshape (float) – parameters[‘gaussian_shape’], minimal shape score required for a peak.

  • min_prominence_ratio (float) – require ratio of prominence relative to peak height, default 0.05.

  • iteration (int) – not used in this function but possible to add iteration on high intensity ROIs if peaks are not detected 1st round.

  • min_intensity_threshold (float) – as in main parameters.

  • shared_list (list) – list used to exchange data in multiprocessing.

Updates:

shared_list (list[dict]) – list of peaks in JSON format, to pool with batch_deep_detect_elution_peaks.

Note

The mass_track is first cleaned: rescale, detrend, smooth and subtract baseline if needed. The track wide noise_level is estimated on stdev of bottom quartile values. ROIs are then separated by a filter (i.e. baseline + noise level) on each mass track. Gap allowed for 2 scans in constructing ROIs. Actually in composite mass track - gaps should not exist after combining many samples. The advantages of separating ROI are 1) better performance of long LC runs, 2) big peaks are less likely to shallow over small peaks, and 3) recalculate min_prominence_threshold.

Prominence requirement is critical to peak detection. It is dynamically determined based on the noise level in a region. The peakshape is calculated on cleaned mass track. SNR is computed on local noise (average of up to 100 nonpeak data points on each side of a peak).