4.2. Analyze

Functions for subcommand analyze

asari.analyze.analyze_single_sample(infile, mz_tolerance_ppm=5, min_intensity=100, min_timepoints=5, min_peak_height=1000, parameters={})[source]

Analyze single mzML file and print statistics. Used by asari subcommand analyze. This uses ext_Experiment and default HMDB data to estimate mass accuracy.

Parameters:
  • infile (str) – input mzML filepath.

  • mz_tolerance_ppm (float, optional, default: 5) – m/z tolerance in part-per-million. Used to seggregate m/z regsions here.

  • min_intensity (float, optional, default: 100) – minimal intentsity value to consider, also used to filter out 0s.

  • min_timepoints (int, optional, default: 5) – minimal consecutive scans to be considered real signal.

  • min_peak_height (float, optional, default: 1000) – a bin is not considered if the max intensity < min_peak_height.

  • parameters (dict, optional, default: {}) – not used, just place holder to use ext_Experiment class.

asari.analyze.estimate_min_peak_height(list_input_files, mz_tolerance_ppm=5, min_intensity=100, min_timepoints=5, min_peak_height=500, num_files_to_use=3)[source]

Compute estimated min peak height. This gets min peak height from the andmark tracks in each file, which is the min of the mass tracks with paired 13C/12C pattern (based on m/z diff only).

Parameters:
  • list_input_files (list[str]) – input mzML filepaths, but only using num_files_to_use.

  • mz_tolerance_ppm (float, optional, default: 5) – m/z tolerance in part-per-million. Used to seggregate m/z regsions here.

  • min_intensity (float, optional, default: 100) – minimal intentsity value to consider, also used to filter out 0s.

  • min_timepoints (int, optional, default: 5) – minimal consecutive scans to be considered real signal.

  • min_peak_height (float, optional, default: 500) – a bin is not considered if the max intensity < min_peak_height.

  • num_files_to_use (int, optional, default: 3) – Use randomly chosen num_files_to_use from list_input_files.

Return type:

int, an estimated parameter for min peak_height as half of the min verified landmark peaks.

asari.analyze.ext_estimate_min_peak_height(list_input_files, mz_tolerance_ppm=5, min_intensity=100, min_timepoints=5, min_peak_height=500, num_files_to_use=3)[source]

Extended estimate_min_peak_height for Xasari use.

Parameters:
  • list_input_files (list[str]) – input mzML filepaths, but only using num_files_to_use.

  • mz_tolerance_ppm (float, optional, default: 5) – m/z tolerance in part-per-million. Used to seggregate m/z regsions here.

  • min_intensity (float, optional, default: 100) – minimal intentsity value to consider, also used to filter out 0s.

  • min_timepoints (int, optional, default: 5) – minimal consecutive scans to be considered real signal.

  • min_peak_height (float, optional, default: 500) – a bin is not considered if the max intensity < min_peak_height.

  • num_files_to_use (int, optional, default: 3) – Use randomly chosen num_files_to_use from list_input_files.

Returns:

  • A dict of ion mode and recommended min_peak_height.

  • The latter is an estimated parameter for min peak_height as half of the min verified landmark peaks.

asari.analyze.get_file_masstrack_stats(infile, mz_tolerance_ppm=5, min_intensity=100, min_timepoints=5, min_peak_height=1000, return_sample=False)[source]

Extract mass tracks from a file and get statistics. The ionization_mode is assumed on one scan, thus not supporting polarity switch in a single file. Landmark tracks are the mass tracks with paired 13C/12C pattern (based on m/z diff only).

Parameters:
  • infile (str) – input mzML filepath.

  • mz_tolerance_ppm (float, optional, default: 5) – m/z tolerance in part-per-million. Used to seggregate m/z regsions here.

  • min_intensity (float, optional, default: 100) – minimal intentsity value to consider, also used to filter out 0s.

  • min_timepoints (int, optional, default: 5) – minimal consecutive scans to be considered real signal.

  • min_peak_height (float, optional, default : 1000) – a bin is not considered if the max intensity < min_peak_height.

  • return_sample (bool, optional, default: False) – if True, return full sample dictionary with mass tracks. Else, return _mz_landmarks_, ionization_mode, min_peak_height_.

Note

Example output:

Total number of MS1 spectra: 741
of which 0 are positive ionization mode.

Assuming ionization mode is neg.

Maxium retention time (sec): 299.818228
m/z range: (min 80.011578, median 358.010062, max 997.616794)

Found 14063 mass tracks.
Found 4054 12C/13C isotopic pairs as landmarks.
Max intensity in any landmark track:  687,714,048
Minimal height of landmark tracks:  2,334

Mass accuracy was estimated on 124 matched values as -1.8 ppm.

To-do: to add output info on instrumentation.