The pipeline server is located in the S3IT Science Cloud and is accessible from within the UZH network.
To access it from outside the UZH network, it is necessary to use the UZH VPN!
This applies to all activities on the server, e.g. uploading, downloading, managing and mounting the samba share!
To manage the pipeline server instance itself, you have to connect to Dashboard of the S3IT Science cloud at https://cloud.s3it.uzh.ch/auth/login/?next=/. Normally, no interaction with the dashboard is necessary. See the Admin Guide for details.
Before you can use the bash scripts for management of the pipeline, you need a terminal running a bash shell. These are included in Mac and Linux, but need to be installed on Windows. Probably the easiest approach is to use the Windows Subsystem for Linux. This can relatively easily be installed as described here. Please see this How-To Geek article for details on how to run bash script after the WSL is installed. In this Linux Bash Shell in Windows you can execute the bash scripts provided.
Other options for Windows include To be added after input from Uriah.
To be able to remotely log in the pipeline server, an ssh client is needed. In Mac, Linux and the WSL are these builtin. Another widely used ssh shell under Windows is provided by putty.
For any interaction with the pipeline server you have to authenticate. The pipeline server is setup to only accept passwordless logins (except of mounting the SAMBA share), which authenticates by using a so called ssh certificate, which is unique to your computer. This is safer than password authentication and much easier to use once setup.
Before you generate a new one, you should check if you already have one (not impossible) by executing
from the bash shell. If it tells you
No such file or directory
, you have to generate one as
described in the S3IT
training handout for Mac, Windows or Linux.
After you have an ssh key for your computer, you have to
contactsomebody who already has access to the pipeline server to have it
added to the instance so that you can interact with the pipeline server
either via the local pipeline management scripts (which are using ssh)
or logging in using ssh directly (username ubuntu
) and
executing commands on the pipeline server.
The data submitted to the pipeline consists out of two folder: one
0.raw.data
folder containing the measured data and
measurement specific metadata, and the folder
00.general.parameter
containing the metadata and some data
used by all measurements. The folder structure has to be as follows:
0.raw.data
├── bemovi.mag.16
│ ├── ........_00097.cxd
│ ├── ...
│ ├── ........_00192.cxd
│ ├── bemovi_extract.mag.16.yml
│ ├── svm_video.description.txt
│ ├── svm_video_classifiers_18c_16x.rds
│ └── svm_video_classifiers_increasing_16x_best_available.rds
├── bemovi.mag.25
│ ├── ........_00001.cxd
│ ├── ...
│ ├── ........_00096.cxd
│ ├── bemovi_extract.mag.25.cropped.yml
│ ├── bemovi_extract.mag.25.yml
│ ├── svm_video.description.txt
│ ├── svm_video_classifiers_18c_25x.rds
│ └── svm_video_classifiers_increasing_25x_best_available.rds
├── flowcam
│ ├── 11
│ ├── 12
│ ├── 13
│ ├── 14
│ ├── 15
│ ├── 16
│ ├── 17
│ ├── 21
│ ├── 22
│ ├── 23
│ ├── 24
│ ├── 25
│ ├── 26
│ ├── 27
│ ├── 34
│ ├── 37
│ ├──flowcam_dilution.csv
│ ├──flowcam.yml
│ ├──svm_flowcam_classifiers_18c.rds
│ └──svm_flowcam_classifiers_increasing_best_available.rds
├── flowcytometer
│ ├── ........
│ ├── .........ciplus
│ ├──gates_coordinates.csv
│ └──metadata_flowcytometer.csv
├── manualcount
│ └── .........xlsx
└── o2meter
└── .........csv
00.general.parameter
├── compositions.csv
├── experimental_design.csv
└── sample_metadata.yml
see the document on Teams with the detailed steps necessary to assemble the data and the necessary metadata.
These two folders need top be uploaded to the pipeline server and the pipeline needs to be started.
There are two approaches of uploading the data to the pipeline server and start the pipeline afterwards: using local bash scripts from a local computer or executing the commands from the pipeline server.
The recommended approach is to use the local bash scripts, as this will minimise the likelihood of errors or accidental data loss. Nevertheless, for some actions it might be necessary to work directly on the pipeline server, usually via an ssh session and to execute commands on the pipeline server.
After completing the pipeline, the folder LEEF
on the
pipeline server will look as follws:
./LEEF
├── 0.raw.data
│ ├── bemovi.mag.16
│ ├── bemovi.mag.25
│ ├── flowcam
│ ├── flowcytometer
│ ├── manualcount
│ └── o2meter
├── 00.general.parameter
│ ├── compositions.csv
│ ├── experimental_design.csv
│ └── sample_metadata.yml
├── 1.pre-processed.data
│ ├── bemovi.mag.16
│ ├── bemovi.mag.25
│ ├── flowcam
│ ├── flowcytometer
│ ├── manualcount
│ └── o2meter
├── 2.extracted.data
│ ├── bemovi.mag.16
│ ├── bemovi.mag.25
│ ├── flowcam
│ ├── flowcytometer
│ ├── manualcount
│ └── o2meter
├── 3.archived.data
│ ├── extracted
│ ├── pre_processed
│ └── raw
├── 9.backend
│ ├── LEEF.RRD.sqlite
│ ├── LEEF.RRD_bemovi_master.sqlite
│ ├── LEEF.RRD_bemovi_master_cropped.sqlite
│ ├── LEEF.RRD_flowcam_algae_metadata.sqlite
│ └── LEEF.RRD_flowcam_algae_traits.sqlite
├── log.2021-03-03--15-06-32.fast.done.txt
├── log.2021-03-03--15-06-32.fast.txt
├── log.2021-03-03--15-14-20.bemovi.mag.16.done.txt
├── log.2021-03-03--15-14-20.bemovi.mag.16.error.txt
├── log.2021-03-03--15-14-20.bemovi.mag.16.txt
├── log.2021-03-03--15-14-20.bemovi.mag.25.done.txt
├── log.2021-03-03--15-14-20.bemovi.mag.25.error.txt
└── log.2021-03-03--15-14-20.bemovi.mag.25.txt
This folder contains the pre-processed data. Pre-processed means,
that the raw data (0.raw.data
) is converted into open
formats where this is possible to be done lossless and
compressed (in case of the bemovi videos). All further processing is
done with the pre-processed data.
This folder contains the data which will be used in the further
analysis outside the pipeline. It contains the intermediate extracted
data as well as the data which will finally be added to the backend
(9.backend
). The final extracted data for the backend is in
csv
format.
Data is archived as raw data, pre-processed data, and extracted data.
In the respective folders, a folder using the timestamp as specified in
the sample.metadata.yml
is created containing the actual
data. The raw data as well as the pre-processed data from the bemovi is
just copied over, while the others are in form of .tar.gz
archives. Of all files sha256 hashes are calculated to guarantee the
correctness of the data.
The backend consists of the sqlite database LEEF.RRD.sqlite which containes the Research Ready Data (RRD) of all measurements. This one will be used for the further analysis.
Contains the log files for the run of the pipelines. In addition, in each numbered directory, there is also a single log file.