01 Background LEEF Data

Initial Remarks and Prerequisite

The pipeline server is located in the S3IT Science Cloud and is accessible from within the UZH network.

To access it from outside the UZH network, it is necessary to use the UZH VPN!

This applies to all activities on the server, e.g. uploading, downloading, managing and mounting the samba share!

Manage Pipeline Server Instance

To manage the pipeline server instance itself, you have to connect to Dashboard of the S3IT Science cloud at https://cloud.s3it.uzh.ch/auth/login/?next=/. Normally, no interaction with the dashboard is necessary. See the Admin Guide for details.

bash Shell

Before you can use the bash scripts for management of the pipeline, you need a terminal running a bash shell. These are included in Mac and Linux, but need to be installed on Windows. Probably the easiest approach is to use the Windows Subsystem for Linux. This can relatively easily be installed as described here. Please see this How-To Geek article for details on how to run bash script after the WSL is installed. In this Linux Bash Shell in Windows you can execute the bash scripts provided.

Other options for Windows include To be added after input from Uriah.

ssh Client

To be able to remotely log in the pipeline server, an ssh client is needed. In Mac, Linux and the WSL are these builtin. Another widely used ssh shell under Windows is provided by putty.

ssh keys

For any interaction with the pipeline server you have to authenticate. The pipeline server is setup to only accept passwordless logins (except of mounting the SAMBA share), which authenticates by using a so called ssh certificate, which is unique to your computer. This is safer than password authentication and much easier to use once setup.

Before you generate a new one, you should check if you already have one (not impossible) by executing

cat ~/.ssh/id_rsa.pub

from the bash shell. If it tells you No such file or directory, you have to generate one as described in the S3IT training handout for Mac, Windows or Linux.

After you have an ssh key for your computer, you have to contactsomebody who already has access to the pipeline server to have it added to the instance so that you can interact with the pipeline server either via the local pipeline management scripts (which are using ssh) or logging in using ssh directly (username ubuntu) and executing commands on the pipeline server.

LEEF Data

Measurements in the Pipeline

  • bemovi see link for manual
  • flowcam see link for manual
  • flowcytometer see link for manual
  • manual count see link for manual
  • O2 meter see link for manual

Raw Data Folder Structure for the Pipeline

The data submitted to the pipeline consists out of two folder: one 0.raw.data folder containing the measured data and measurement specific metadata, and the folder 00.general.parameter containing the metadata and some data used by all measurements. The folder structure has to be as follows:

0.raw.data
├── bemovi.mag.16
│   ├── ........_00097.cxd
│   ├── ...
│   ├── ........_00192.cxd
│   ├── bemovi_extract.mag.16.yml
│   ├── svm_video.description.txt
│   ├── svm_video_classifiers_18c_16x.rds
│   └── svm_video_classifiers_increasing_16x_best_available.rds
├── bemovi.mag.25
│   ├── ........_00001.cxd
│   ├── ...
│   ├── ........_00096.cxd
│   ├── bemovi_extract.mag.25.cropped.yml
│   ├── bemovi_extract.mag.25.yml
│   ├── svm_video.description.txt
│   ├── svm_video_classifiers_18c_25x.rds
│   └── svm_video_classifiers_increasing_25x_best_available.rds
├── flowcam
│   ├── 11
│   ├── 12
│   ├── 13
│   ├── 14
│   ├── 15
│   ├── 16
│   ├── 17
│   ├── 21
│   ├── 22
│   ├── 23
│   ├── 24
│   ├── 25
│   ├── 26
│   ├── 27
│   ├── 34
│   ├── 37
│   ├──flowcam_dilution.csv
│   ├──flowcam.yml
│   ├──svm_flowcam_classifiers_18c.rds
│   └──svm_flowcam_classifiers_increasing_best_available.rds
├── flowcytometer
│   ├── ........
│   ├── .........ciplus
│   ├──gates_coordinates.csv
│   └──metadata_flowcytometer.csv
├── manualcount
│   └── .........xlsx
└── o2meter
    └── .........csv
00.general.parameter
├── compositions.csv
├── experimental_design.csv
└── sample_metadata.yml

see the document on Teams with the detailed steps necessary to assemble the data and the necessary metadata.

These two folders need top be uploaded to the pipeline server and the pipeline needs to be started.

Uploading and managing the Pipeline

There are two approaches of uploading the data to the pipeline server and start the pipeline afterwards: using local bash scripts from a local computer or executing the commands from the pipeline server.

The recommended approach is to use the local bash scripts, as this will minimise the likelihood of errors or accidental data loss. Nevertheless, for some actions it might be necessary to work directly on the pipeline server, usually via an ssh session and to execute commands on the pipeline server.

Folder structure after pipeline

After completing the pipeline, the folder LEEF on the pipeline server will look as follws:

./LEEF
├── 0.raw.data
│   ├── bemovi.mag.16
│   ├── bemovi.mag.25
│   ├── flowcam
│   ├── flowcytometer
│   ├── manualcount
│   └── o2meter
├── 00.general.parameter
│   ├── compositions.csv
│   ├── experimental_design.csv
│   └── sample_metadata.yml
├── 1.pre-processed.data
│   ├── bemovi.mag.16
│   ├── bemovi.mag.25
│   ├── flowcam
│   ├── flowcytometer
│   ├── manualcount
│   └── o2meter
├── 2.extracted.data
│   ├── bemovi.mag.16
│   ├── bemovi.mag.25
│   ├── flowcam
│   ├── flowcytometer
│   ├── manualcount
│   └── o2meter
├── 3.archived.data
│   ├── extracted
│   ├── pre_processed
│   └── raw
├── 9.backend
│   ├── LEEF.RRD.sqlite
│   ├── LEEF.RRD_bemovi_master.sqlite
│   ├── LEEF.RRD_bemovi_master_cropped.sqlite
│   ├── LEEF.RRD_flowcam_algae_metadata.sqlite
│   └── LEEF.RRD_flowcam_algae_traits.sqlite
├── log.2021-03-03--15-06-32.fast.done.txt
├── log.2021-03-03--15-06-32.fast.txt
├── log.2021-03-03--15-14-20.bemovi.mag.16.done.txt
├── log.2021-03-03--15-14-20.bemovi.mag.16.error.txt
├── log.2021-03-03--15-14-20.bemovi.mag.16.txt
├── log.2021-03-03--15-14-20.bemovi.mag.25.done.txt
├── log.2021-03-03--15-14-20.bemovi.mag.25.error.txt
└── log.2021-03-03--15-14-20.bemovi.mag.25.txt

1.pre-processed.data

This folder contains the pre-processed data. Pre-processed means, that the raw data (0.raw.data) is converted into open formats where this is possible to be done lossless and compressed (in case of the bemovi videos). All further processing is done with the pre-processed data.

2.extracted.data

This folder contains the data which will be used in the further analysis outside the pipeline. It contains the intermediate extracted data as well as the data which will finally be added to the backend (9.backend). The final extracted data for the backend is in csv format.

3.archived.data

Data is archived as raw data, pre-processed data, and extracted data. In the respective folders, a folder using the timestamp as specified in the sample.metadata.yml is created containing the actual data. The raw data as well as the pre-processed data from the bemovi is just copied over, while the others are in form of .tar.gz archives. Of all files sha256 hashes are calculated to guarantee the correctness of the data.

9.backend

The backend consists of the sqlite database LEEF.RRD.sqlite which containes the Research Ready Data (RRD) of all measurements. This one will be used for the further analysis.

log.xxx.txt

Contains the log files for the run of the pipelines. In addition, in each numbered directory, there is also a single log file.

Pipeline server samba share

The LEEF folder of the pipeline server is accessible via a samba share, although this is rather slow and is not the preferred way of uploading / downloading data. Nevertheless, it is useful to mount the pipeline server to see the progress of the pipeline and investigate errors further.

The share shows the LEEF folder. For a description see the Background LEEF Data article.

The credentials are available in a private internal document.

Mount samba share on a Mac

  • open Finder
  • press ⌘K to open ‘Connect to Server …’
  • enter smb://USERNAME@IP and when prompter for a password, enter it and tick the box ‘Remember password’ (or similar wording the next time, you will not have to enter the password

Mount samba share on Windows

  • no idea - somebody with a windows computer has to provide that info