Thanks for your response, Rob. I will do my best to answer your questions. Please let me know if anything is unclear and more info would help. I appreciate your attention to this!
This is a rather powerful Dell workstation running Ubuntu 22.04 LTS, with a 12-core Intel processor and 503GB RAM.
I'm running as a user with admin privileges, but am not using sudo, so as I understand these should not be root processes.
In short, we're running some custom Python code to analyze ~1.3GB hyperspectral images, do some linear algebra and output some plots and arrays describing the biochemical composition in these images. This is benchmarked to take 2-4GB of RAM per image. There
is one image per job. By default, parallel
is running 24 jobs, dual-threading on each of 12 cores... There should be plenty of RAM to run 24 4GB jobs at once. Since this is an embarrassingly parallel computation and we already use bash scripting
in this workflow, I prefer to keep it simple and use GNU Parallel rather than Python parallel frameworks... it always worked great in the past.
# This is what we run to execute the .jobs
file
parallel -a $job_list_name
# I have also tried limiting the number of jobs to 20, which also leads to the same crashing problem after a few runs.
parallel--jobs 20 -a $job_list_name
# Here is how we prepare the .jobs file. We produce one
job per image, each given its own line in a text file, with options set by a bunch of variables in a Jupyter notebook. Note, I have also confirmed it still crashes if we run outside of Jupyter.
for file in $data/*.hdr
do
if [[ "$file" != *'hroma'* ]] && [[ "$file" != *'roadband'* ]]; then
echo "python wrappers/analyze_sample.py \
--file_path $file \
--fluorophores ${fluorophores[*]} \
--min_desired_wavelength ${desired_wavelength_range[0]} \
--max_desired_wavelength ${desired_wavelength_range[1]} \
--red_channel ${FalseColor_channels[0]} \
--green_channel ${FalseColor_channels[1]} \
--blue_channel ${FalseColor_channels[2]} \
--red_cap ${FalseColor_caps[0]} \
--green_cap ${FalseColor_caps[1]} \
--blue_cap ${FalseColor_caps[2]} \
--plot 1 \
--spectral_library_path "$spectral_library_path" \
--output_dir $output_directory_full \
--threshold 38" >> $job_list_name
fi
done
Thanks again!