Tutorial: Workflow Parallelization

Warning: Workflows should be optimized to an image test-set before running a whole dataset. See the VIS workflow tutorial or VIS/NIR tutorial. Our download tool, which talks to a LemnaTec database system, has a specific file structure, which may be different than yours unless you are using our tool, but we also have instructions to run PlantCV over a flat file directory (just keep this in mind).

Running PlantCV over PhenoFront image data-set structure

We normally execute workflows in a shell script or in in a condor job file (or dagman workflow if running multiple workflows into one json database)

  • First call the plantcv-workflow.py script that does the parallelization
  • -d is the --dir directory of images
  • -p is the --workflow that you are going to run over the images, see the VIS tutorial and PSII tutorial
  • -i is the --outdir your desired location for the output images
  • -a is the --adaptor to indicate structure to grab the metadata from, either 'filename' or the default, which is 'phenofront' (lemnatec structured output)
  • -t is the --type extension 'png' is the default. Any format readable by opencv is accepted such as 'tif' or 'jpg'
  • -l is the --delimiter for the filename, default is "_"
  • -C is the --coprocess the specified imgtype with the imgtype specified in --match (e.g. coprocess NIR images with VIS).
  • -f is the --meta (data) format map for example, default is "imgtype_camera_frame_zoom_id".
  • -M is the --match metadata option, for example to select a certain zoom or angle. For example: 'imgtype:VIS,camera:SV,zoom:z500'
  • -D is the --dates option, to select a certain date range of data. YYYY-MM-DD-hh-mm-ss_YYYY-MM-DD-hh-mm-ss. If the second date is excluded then the current date is assumed.
  • -j is the --json, json database name
  • -m is the --mask any image mask that you would like to provide
  • -T is the --threads (cpus) you would like to use.
  • -w is the --writeimg option, if True will write output images. default= False
  • -c is the --create option to overwrite an json database if it exists, if you are creating a new database or appending to database, do NOT add the -c flag
  • -o is the --other_args option, used to pass non-standard options to the workflow script. Must take the form --other_args="--option1 value1 --option2 value2"

If running as a command in a shell script

#!/bin/bash

# Here we are running a VIS top-view workflow

time \
/home/nfahlgren/programs/plantcv/plantcv-workflow.py \
-d /home/nfahlgren/projects/lemnatec/burnin2/images3 \
-p /home/nfahlgren/programs/plantcv/scripts/image_analysis/vis_tv/vis_tv_z300_L1.py \
-t png \
-j burnin2.json \
-i /home/nfahlgren/projects/lemnatec/burnin2/plantcv3/images \
-m /home/nfahlgren/programs/plantcv/masks/vis_tv/mask_brass_tv_z300_L1.png \
-f imgtype_camera_frame_zoom_id \
-M imgtype:VIS,camera:TV,zoom:z300 \
-C NIR \
-T 10 \
-w


# Here we are running a second VIS top-view workflow at a second zoom level

time \
/home/nfahlgren/programs/plantcv/plantcv-workflow.py \
-d /home/nfahlgren/projects/lemnatec/burnin2/images3 \
-p /home/nfahlgren/programs/plantcv/scripts/image_analysis/vis_tv/vis_tv_z1000_L1.py \
-t png \
-j burnin2.json \
-i /home/nfahlgren/projects/lemnatec/burnin2/plantcv3/images \
-m /home/nfahlgren/programs/plantcv/masks/vis_tv/mask_brass_tv_z1000_L1.png \
-f imgtype_camera_frame_zoom_id \
-M imgtype:VIS,camera:TV,zoom:z1000 \
-C NIR \
-T 10 \
-w 

Example Batch Script (Windows)

If you are running on Windows (except with WSL), you will need to use a batch script. Assuming you are using Anaconda Prompt, make sure you conda activate plantcv, and cd to your project directory. Also, there are no comments in batch scripts and python can only find files in you immediate working directory (even if the file is in your PATH).

python.exe ^
%CONDA_PREFIX%\Scripts\plantcv-workflow.py ^ 
-d C:\Users\nfahlgren\Documents\projects\lemnatec\burnin2\images3 ^
-p C:\Users\nfahlgren\Documents\programs\plantcv\scripts\image_analysis\vis_tv\vis_tv_z300_L1.py ^
-t png ^
-j burnin2.json ^
-i C:\Users\nfahlgren\Documents\projects\lemnatec\burnin2\plantcv3\images ^
-m C:\Users\nfahlgren\Documents\programs\plantcv\masks\vis_tv\mask_brass_tv_z300_L1.png ^
-f imgtype_camera_frame_zoom_id ^
-M imgtype:VIS,camera:TV,zoom:z300 ^
-C NIR ^
-T 10 ^
-w

If saved as run_workflow.cmd you can then execute it in the Anaconda Prompt:

(plantcv) C:\Users\nfahlgren\Documents\projects\lemnatec> run_workflows.cmd

Example Condor Jobfile

#################################
# HTCondor job description file #
#################################

universe         = vanilla
executable       = /home/mgehan/plantcv/plantcv-workflow.py
arguments        = -d /shares/tmockler_share/mgehan/LemnaTec/bnapus_phenotyping_katie/images-full -p /home/mgehan/kt-greenham-lemnatec/scripts/vis_nir_tv_z500_h2_e10000_brassica.py -j ktbrassica.json -i /home/mgehan/kt-greenham-lemnatec/output/output500 -f imgtype_camera_zoom_lifter_gain_exposure_id -M imgtype:VIS,camera:TV,zoom:z500 -T 16 -C NIR -w
log              = $(Cluster).$(Process).log
output           = $(Cluster).$(Process).out
error            = $(Cluster).$(Process).error
request_cpus     = 16
notification     = always
nice_user        = False
getenv           = true
####################

queue

Running PlantCV workflows over a flat directory of images

Note: We will try and update PlantCV so that it can run over flat directories in a more flexible manner. But for now please follow the instructions on Running PlantCV over a flat directory carefully.

In order for PlantCV to scrape all of the necessary metadata from the image files, image files need to be named in a particular way.

Image name might include:

  1. Plant ID
  2. Timestamp
  3. Measurement/Experiment Label
  4. Image Type
  5. Camera Label
  6. Zoom

Example Name:

AABA002948_2014-03-14 03-29-45_Pilot-031014_VIS_TV_z3500.png

  1. Plant ID = AABA002948
  2. Timestamp = 2014-03-14 03-29-45
  3. Measurement Label = Pilot-031014
  4. Image Type = VIS
  5. Camera Label = TV
  6. Zoom = z3500

Valid Metadata

Valid metadata that can be collected from filenames are camera, imgtype, zoom, exposure, gain, frame, lifter, timestamp, id, plantbarcode, treatment, cartag, measurementlabel, and other.

Next, run images over a flat directory with images named as described above:

We normally execute workflows as a shell script or as a condor jobfile (or dagman workflow)

#!/bin/bash

# Here we are running a VIS top-view workflow over a flat directory of images

# Image names for this example look like this: cam1_16-08-06-16:45_el1100s1_p19.jpg

/home/mgehan/plantcv/plantcv-workflow.py \
-d /shares/mgehan_share/raw_data/raw_image/2016-08_pat-edger/data/split-round1/split-cam1 \
-a filename \
-p /home/mgehan/pat-edger/round1-python-pipelines/2016-08_pat-edger_brassica-cam1-splitimg.py \
-j edger-round1-brassica.json \
-i /shares/mgehan_share/raw_data/raw_image/2016-08_pat-edger/data/split-round1/split-cam1/output \
-f camera_timestamp_id_other \
-t jpg \
-T 16 \
-w

Convert the output JSON file into CSV tables

plantcv-utils.py json2csv -j output.json -c result-table

This can be added as an additional line in the shell script that runs the workflow too.

See Accessory Tools for more information.