4.2. Analyze
Functions for subcommand analyze
- asari.analyze.analyze_single_sample(infile, mz_tolerance_ppm=5, min_intensity=100, min_timepoints=5, min_peak_height=1000, parameters={})[source]
Analyze single mzML file and print statistics. Used by asari subcommand analyze. This uses ext_Experiment and default HMDB data to estimate mass accuracy.
- Parameters:
infile (str) – input mzML filepath.
mz_tolerance_ppm (float, optional, default: 5) – m/z tolerance in part-per-million. Used to seggregate m/z regsions here.
min_intensity (float, optional, default: 100) – minimal intentsity value to consider, also used to filter out 0s.
min_timepoints (int, optional, default: 5) – minimal consecutive scans to be considered real signal.
min_peak_height (float, optional, default: 1000) – a bin is not considered if the max intensity < min_peak_height.
parameters (dict, optional, default: {}) – not used, just place holder to use ext_Experiment class.
- asari.analyze.estimate_min_peak_height(list_input_files, mz_tolerance_ppm=5, min_intensity=100, min_timepoints=5, min_peak_height=500, num_files_to_use=3)[source]
Compute estimated min peak height. This gets min peak height from the andmark tracks in each file, which is the min of the mass tracks with paired 13C/12C pattern (based on m/z diff only).
- Parameters:
list_input_files (list[str]) – input mzML filepaths, but only using num_files_to_use.
mz_tolerance_ppm (float, optional, default: 5) – m/z tolerance in part-per-million. Used to seggregate m/z regsions here.
min_intensity (float, optional, default: 100) – minimal intentsity value to consider, also used to filter out 0s.
min_timepoints (int, optional, default: 5) – minimal consecutive scans to be considered real signal.
min_peak_height (float, optional, default: 500) – a bin is not considered if the max intensity < min_peak_height.
num_files_to_use (int, optional, default: 3) – Use randomly chosen num_files_to_use from list_input_files.
- Return type:
int, an estimated parameter for min peak_height as half of the min verified landmark peaks.
- asari.analyze.ext_estimate_min_peak_height(list_input_files, mz_tolerance_ppm=5, min_intensity=100, min_timepoints=5, min_peak_height=500, num_files_to_use=3)[source]
Extended estimate_min_peak_height for Xasari use.
- Parameters:
list_input_files (list[str]) – input mzML filepaths, but only using num_files_to_use.
mz_tolerance_ppm (float, optional, default: 5) – m/z tolerance in part-per-million. Used to seggregate m/z regsions here.
min_intensity (float, optional, default: 100) – minimal intentsity value to consider, also used to filter out 0s.
min_timepoints (int, optional, default: 5) – minimal consecutive scans to be considered real signal.
min_peak_height (float, optional, default: 500) – a bin is not considered if the max intensity < min_peak_height.
num_files_to_use (int, optional, default: 3) – Use randomly chosen num_files_to_use from list_input_files.
- Returns:
A dict of ion mode and recommended min_peak_height.
The latter is an estimated parameter for min peak_height as half of the min verified landmark peaks.
See also
- asari.analyze.get_file_masstrack_stats(infile, mz_tolerance_ppm=5, min_intensity=100, min_timepoints=5, min_peak_height=1000, return_sample=False)[source]
Extract mass tracks from a file and get statistics. The ionization_mode is assumed on one scan, thus not supporting polarity switch in a single file. Landmark tracks are the mass tracks with paired 13C/12C pattern (based on m/z diff only).
- Parameters:
infile (str) – input mzML filepath.
mz_tolerance_ppm (float, optional, default: 5) – m/z tolerance in part-per-million. Used to seggregate m/z regsions here.
min_intensity (float, optional, default: 100) – minimal intentsity value to consider, also used to filter out 0s.
min_timepoints (int, optional, default: 5) – minimal consecutive scans to be considered real signal.
min_peak_height (float, optional, default : 1000) – a bin is not considered if the max intensity < min_peak_height.
return_sample (bool, optional, default: False) – if True, return full sample dictionary with mass tracks. Else, return _mz_landmarks_, ionization_mode, min_peak_height_.
Note
Example output:
Total number of MS1 spectra: 741 of which 0 are positive ionization mode. Assuming ionization mode is neg. Maxium retention time (sec): 299.818228 m/z range: (min 80.011578, median 358.010062, max 997.616794) Found 14063 mass tracks. Found 4054 12C/13C isotopic pairs as landmarks. Max intensity in any landmark track: 687,714,048 Minimal height of landmark tracks: 2,334 Mass accuracy was estimated on 124 matched values as -1.8 ppm.
To-do: to add output info on instrumentation.