Tutorial: Workflow Parallelization

Warning: Workflows should be optimized to an image test-set before running a whole dataset. See the VIS workflow tutorial or VIS/NIR tutorial. Our download tool, which talks to a LemnaTec database system, has a specific file structure, which may be different than yours unless you are using our tool, but we also have instructions to run PlantCV over a flat file directory (just keep this in mind).

Running PlantCV over PhenoFront image data-set structure

We normally execute workflows in a shell script or in in a condor job file (or dagman workflow if running multiple workflows into one json database)

  • First call the plantcv-workflow.py script that does the parallelization
  • -d is the --directory of images
  • -p is the --workflow that you are going to run over the images, see the VIS tutorial and PSII tutorial
  • -i is the --outdir your desired location for the output images
  • -a is the --adaptor to indicate structure to grab the metadata from, either 'filename' or the default, which is 'phenofront' (lemnatec structured output)
  • -t is the --type extension, default is 'png'
  • -l is the --deliminator for the filename, default is "_"
  • -C is the --coprocess the specified imgtype with the imgtype specified in --match (e.g. coprocess NIR images with VIS).
  • -f is the --meta (data) format map for example, default is "imgtype_camera_frame_zoom_id".
  • -M is the --match metadata option, for example to select a certain zoom or angle. For example: 'imgtype:VIS,camera:SV,zoom:z500'
  • -D is the --dates option, to select a certain date range of data. YYYY-MM-DD-hh-mm-ss_YYYY-MM-DD-hh-mm-ss. If the second date is excluded then the current date is assumed.
  • -j is the --json, json database name
  • -m is the --mask any image mask that you would like to provide
  • -T is the --threads (cpus) you would like to use.
  • -w is the --writeimg option, if True will write output images. default= False
  • -c is the --create option to overwrite an json database if it exists, if you are creating a new database or appending to database, do NOT add the -c flag
  • -o is the --other_args option, used to pass non-standard options to the workflow script. Must take the form --other_args="--option1 value1 --option2 value2"

If running as a command in a shell script

#!/bin/bash

# Here we are running a VIS top-view workflow

time \
/home/nfahlgren/programs/plantcv/plantcv-workflow.py \
-d /home/nfahlgren/projects/lemnatec/burnin2/images3 \
-p /home/nfahlgren/programs/plantcv/scripts/image_analysis/vis_tv/vis_tv_z300_L1.py \
-t png \
-j burnin2.json \
-i /home/nfahlgren/projects/lemnatec/burnin2/plantcv3/images \
-m /home/nfahlgren/programs/plantcv/masks/vis_tv/mask_brass_tv_z300_L1.png \
-f imgtype_camera_frame_zoom_id \
-M imgtype:VIS,camera:TV,zoom:z300 \
-C NIR \
-T 10 \
-w


# Here we are running a second VIS top-view workflow at a second zoom level

time \
/home/nfahlgren/programs/plantcv/plantcv-workflow.py \
-d /home/nfahlgren/projects/lemnatec/burnin2/images3 \
-p /home/nfahlgren/programs/plantcv/scripts/image_analysis/vis_tv/vis_tv_z1000_L1.py \
-t png \
-j burnin2.json \
-i /home/nfahlgren/projects/lemnatec/burnin2/plantcv3/images \
-m /home/nfahlgren/programs/plantcv/masks/vis_tv/mask_brass_tv_z1000_L1.png \
-f imgtype_camera_frame_zoom_id \
-M imgtype:VIS,camera:TV,zoom:z1000 \
-C NIR \
-T 10 \
-w 

Example Condor Jobfile

#################################
# HTCondor job description file #
#################################

universe         = vanilla
executable       = /home/mgehan/plantcv/plantcv-workflow.py
arguments        = -d /shares/tmockler_share/mgehan/LemnaTec/bnapus_phenotyping_katie/images-full -p /home/mgehan/kt-greenham-lemnatec/scripts/vis_nir_tv_z500_h2_e10000_brassica.py -j ktbrassica.json -i /home/mgehan/kt-greenham-lemnatec/output/output500 -f imgtype_camera_zoom_lifter_gain_exposure_id -M imgtype:VIS,camera:TV,zoom:z500 -T 16 -C NIR -w
log              = $(Cluster).$(Process).log
output           = $(Cluster).$(Process).out
error            = $(Cluster).$(Process).error
request_cpus     = 16
notification     = always
nice_user        = False
getenv           = true
####################

queue

Running PlantCV workflows over a flat directory of images

Note: We will try and update PlantCV so that it can run over flat directories in a more flexible manner. But for now please follow the instructions on Running PlantCV over a flat directory carefully.

In order for PlantCV to scrape all of the necessary metadata from the image files, image files need to be named in a particular way.

Image name might include:

  1. Plant ID
  2. Timestamp
  3. Measurement/Experiment Label
  4. Image Type
  5. Camera Label
  6. Zoom

Example Name:

AABA002948_2014-03-14 03-29-45_Pilot-031014_VIS_TV_z3500.png

  1. Plant ID = AABA002948
  2. Timestamp = 2014-03-14 03-29-45
  3. Measurement Label = Pilot-031014
  4. Image Type = VIS
  5. Camera Label = TV
  6. Zoom = z3500

Valid Metadata

Valid metadata that can be collected from filenames are camera, imgtype, zoom, exposure, gain, frame, lifter, timestamp, id, plantbarcode, treatment, cartag, measurementlabel, and other.

Next, run images over a flat directory with images named as described above:

We normally execute workflows as a shell script or as a condor jobfile (or dagman workflow)

#!/bin/bash

# Here we are running a VIS top-view workflow over a flat directory of images

# Image names for this example look like this: cam1_16-08-06-16:45_el1100s1_p19.jpg

/home/mgehan/plantcv/plantcv-workflow.py \
-d /shares/mgehan_share/raw_data/raw_image/2016-08_pat-edger/data/split-round1/split-cam1 \
-a filename \
-p /home/mgehan/pat-edger/round1-python-pipelines/2016-08_pat-edger_brassica-cam1-splitimg.py \
-j edger-round1-brassica.json \
-i /shares/mgehan_share/raw_data/raw_image/2016-08_pat-edger/data/split-round1/split-cam1/output \
-f camera_timestamp_id_other \
-t jpg \
-T 16 \
-w

Convert the output JSON file into CSV tables

plantcv-utils.py json2csv -j output.json -c result-table

See Accessory Tools for more information.