The repository local_pipeline_management in the LEEF-UZH organisation on github contains the bash functions to manage the pipeline remotely. These commands do run in the Linux terminal as well as in the Mac terminals. check with windows!!!
To use these commands, you can either download the repository and unzip it somewhere, or clone the repository using git. This is slightly more complicated, but makes it easier to update the local commands from the github repo.
To clone the commands do the following:
which will create a directory called
local_pipeline_management
. When downloading the zip file,
you have to extract it, which will create a directory called
local_pipeline_management-main
. The content of these two
directories are identical for the further discussion here.
Inside this directory is a directory called bin
which
contains the scripts to manage the pipeline remotely. The commands
are:
server
check_connection
upload
prepare
start
status
wait_till_done
download
download_logs
download_RRD
report_diag
report_interactive
archive
clean
do_all
To execute these commands, you have to be either in the directory
where the commands are located, or the directory has to be in the path.
If they are not in the path, you have to prepend ./
to the
command to work, e.g. ./upload -h
instead of
upload -h
when they are in the path. For this tutorial, I
will put them in the path.
All commands contain a basic usage help, which can be called by using
the -h
or --help
argument as in
e.g. ./upload -h
.
We will now go through the commands available and explain what they are doing and how they can be used. Finally, we will show a basic workflow on how to upload data, start the server, download results, and prepare the pipeline server for the next run.
server
The command server
returns the adress of the pipeline
server. When the adress of the pipeline server changes, you can open the
script in a text editor and simply replace the adress in the last line
with the new adress.
A typical usage would be
check_connection
Checks the reachability of the server and verifies the credentials, i.e. if you can execute the commands successfully.
A typical usage would be
upload
This command uplaods data to the pipeline server. The most common
usage is to uplad the data for the pipeline server. This is done by
specifying the directory in which the
00.general.parameter
and 0.raw.data
directory resides locally.
The copying could also be done by mounting the leef_data
as a samba share, but it would be slower.
A typical usage would be to upload the folder ./20210101
into the folder Incoming
on the pipeline server.
prepare
Copying the data from within the folder from
in the
LEEF
folder where it can be processed by the pipeline.
Before copying the data, folder leftovers from earlier pipeline runs are
deleted by running the clean
script.
A typical usage would be
start
The pipeline consists of three actual pipelines,
bemovi.mag.16
- bemovi magnification 16bemovi.mag.25
- bemovi magnification 25fast
- remaining measurementsThe typical usage is to run both pipelines (first fast
,
and afterwards bemovi
) by providing the argument
all
.
During the pipeline runs, logfiles are created in the pipeline folder. These have the extension
.txt
- the general log file which should be looked at
to mag=ke sure thhat there are no errors. Thes should be logged in
theerror.txt
file.done.txt
This file contains the timing info and is
created at the end of the pipeline.and are created for each pipeline run named as above.
A typical usage would be
status
The status returned, is the status when the pipeline is started using
start
. When started manually from the pipeline server (or
via ssh), the status
will not be reported correctly.
A typical usage would be
wait_till_done
Waits and displays a spinning symbol (spinning every five minutes) until the pipeline is finished.
Interruption of this command will not interrupt the pipeline!
A typical usage would be
download
Download files or folder from the LEEF
directory on the
pipeline server. If you want to download files from other folders, use
..
to move one directory up. For example,
../Incoming
would download the whole Incoming
directory.
A typical usage would be
download_logs
This is a specialised version of the download
command.
It downloads the log files into the directory
./pipeline_logs
A typical usage would be
download_RRD
This is a specialised version of the download
command.
It downloads the RRD (Research Ready Data), either only the main
database, or the complete set. Downloading all RRD can take a long
time!
A typical usage would be
report_diag
Creates a diagnostic report of the RRD database and opens it. The
second parameter specifies the format of the report. Supported are at
the moment html
, pdf
and
word
.
A typical usage would be
report_interactive
Creates an interactive report of the RRD database and opens it in the web browser.
A typical usage would be
archive
Move all content in the folder ‘LEEF/3.archived.data’ to the container ‘LEEF.archived.data’ and copy the content of the folder ‘LEEF/9.backend’ to the container ‘LEEF.backend’ on the S3 Swift Object Storage. The transfer uses the ‘swift’ command.
A typical usage would be
clean
Delete all raw data and results folders from the pipeline. The folders containing the archived data as well as the backend (containing the Reserch Read Data databases) are not deleted!
This script is run automatically the script prepare
is
executed.
The script asks for confirmation before deleting anything!
A typical usage would be
do_all
This is a convenience function which executes the following commands in order:
A typical usage would be
which runs the pipeline using the data in ./20210101
and
downloads the logs and RRD and opens the diagnostic report.
A Typical workflow for the pipeline consist of the steps outlined below. It assumes, that the pipeline folder is complete as described in the section Raw Data Folder Structure for the Pipeline in the document 01 Background LEEF Data.
Let’s assume, that one sampling day is complete and all data has been
collected in the folder ./20210401
. The local preparations
are covered in the document LINK.
This will upload the data folder ./20210401
and prepare
the pipeline to process that data.
This will start the pipeline processing and check if it is running and give a message accordingly.
will than wait until the pipeline is finished and display a spinning symbol.
This will download the log files which can be viewed to assess the progress and possible errors.
The logs should be checked, and if everything is fine, the RRD can be downloaded by using
or, for the complete set of RRD,
will create and open an html report of the RRD database which can be evaluated if the pipeline measurements and the pipeline provided consistent results and can be used for further analysis.
Only if the previous evaluation is succesfull, the pipeline data should be archived, i.e. moved to a different storage by using
It is important to note the following points:
0.raw.data
,
1.pre-processed.data
or the 2.extracted.data
folder. You will recognise them when they are there.3.archived.data
and 9.backend
must not be deleted, as data is added to them during each run and they
are managed by the pipeline (TODO).