Setup & preparations

OVarFlow has been designed to be used in two alternative ways:

  • directly, executing a so-called Snakefile containing the workflow or

  • using a pre-built Docker-Container containing the Snakefile as well as all executables and their dependencies.

Both solutions have their own strengths and weaknesses. Direct usage of the Snakefile will give you more control over the workflow. This includes the opportunity to easily update the individual programs used in the workflow. But changes are not limited to the programs used, you might also incorporate your own personal changes to the Snakefile, thereby altering the data evaluation procedure. Of course those options require at least a basic understanding of the Python 3 programming language and the Snakemake syntax. Docker on the other side will hide some of the complexity of OVarFlow, including installation of additional programs, but it will also limit the end user to the program versions bundled within the docker container. Ultimately the Docker container only encapsulates the Snakefile and applications, which are utilized in OVarFlow. To sum things up:

  • Snakemake & Conda will allow for more control and easy updating of applications.

  • Docker on the other hand needs fewer configuration but gives the user less control.

Still both options are designed to be used under a Linux based operating system and have not been tested on other platforms. Anyway variant calling is a computationally demanding task which consumer hardware is badly suited for. Therefore high performance computing (HPC), which is vastly dominated by Linux, is required.

The following paragraphs are directed to novice users, with no prior experience in the usage of Conda & Snakemake or Docker. The descriptions are intended to create a basic setup and refer to broader documentation of the respective software.

Setting up a Conda environment

Conda is a package and environment manager. It allows for the installation of various software comparable to an appstore. Also different versions of a single software can be installed that are totally independent of one another.

Different distributions of Conda are available, namely Anaconda and Miniconda. The basic functionality of both distributions is identical, but Anaconda is meant to provide a full grown application suit for data science using Python and R. In doing so Anaconda will install a plethora of software that is commonly used in the field. Most of this software is not needed for the usage of OVarFlow. Hence the installation of Miniconda is recommended. This minimum installer for Conda still allows for the manual installation of every software that comes bundled with Anaconda in case it should be needed at a later time.

  • Download the Python 3 installer for Linux in the 64-bit version. 32-bit computers would be overwhelmed with variant calling anyway.

  • Verification is optional but highly recommended (sha256sum Miniconda3-latest-Linux-x86_64.sh).

  • A detailed description of the installation is available, but essentially comes down to a single command-line:

    1bash Miniconda3-latest-Linux-x86_64.sh
    

    The installer will prompt some questions. Novice users can accept the defaults.

  • After closing and reopening the shell, the Conda command should now be available. This can easily be tested by running the command conda help.

Now that Conda is installed, additional software resources - channels in Conda terminology - have to be made available.

  • List the currently available channels via conda info.

  • Conda-forge and Bioconda need to be added:

    1conda config --add channels defaults
    2conda config --add channels bioconda
    3conda config --add channels conda-forge
    
  • That the available channels have indeed been altered can be verified again by conda info.

Your Conda installation is now ready to be used with OVarFlow. It will enable you to obtain all of the software that is used by OVarFlow. Installation of software dependencies and further usage with OVarFlow is covered in the Conda & Snakemake usage section.

Setting up Docker or Singularity

Alternatively to the above Conda usage container virtualization can be employed. This technology has the advantage of bundling an application and its dependencies. In this case no Conda installation is required, as all indispensable software components are included in the container. On the other hand the software for container virtualization itself has to be present on the system. Also a certain understanding of container technologies is mandatory to be used efficiently. Docker and Singularity are two widely used, compatible container technologies.

Docker

Docker provides a comprehensive documentation but the docker Docker curriculum might be better suited for novice users. Dockers biggest drawback is probably its need for root access to the respective computer. If that’s a hindrance Singularity might be an alternative.

The company behind Docker provides .deb and .rpm packages for various Linux distributions. As Docker is written in the Go programming languages, statically link binaries are available as well.

The Docker installation can easily be tested:

1sudo docker run hello-world

By adding your user to the group docker the need to include sudo with every docker command is circumvented.

However usage of Docker is far from self explanatory and a basic understanding of OS-level virtualization with the concept of images and containers should be given. Briefly, images are the blueprint of a container. The image itself is immutable and contains all the code of an application. A container is a running instance of the image. The application is then executed from the container. When used without caution a new container is created every time Docker is started. (For programmers: its a bit like the concept of class and object.)

An overview of the basic docker commands is available:

1docker --help

The most basic docker usage shall be shown with the example image godlovedc/lolcow. This image can be obtained via:

1docker pull godlovedc/lolcow

This image should now be listed in the locally available images:

1docker images

You can create and run a new container from the image:

1docker run godlovedc/lolcow

All containers available on the system can be listed via:

1docker ps -a

Also a second container can be created from the image, by executing docker run godlovedc/lolcow a second time. Now docker ps -a will list two containers that where created from the image godlovedc/lolcow.

A given container can also be used again. Its name can be obtained first via docker ps -a.

1docker start -i <container_code_name>

Singularity

Singularity offers an equally comprehensive documentation. Especially the quick start section is worth having a look at. A detailed description of the installation process as well as an introduction into the usage of the singularity command is given.

OVarFlow does not provide a dedicated Singularity image. But Docker images can be used with Singularity as well. An usage example of the lolcow image is also included:

1singularity pull docker://godlovdc/lolcow

Further details can be found in the linked documentation.

Finally it should be noted, that the links provided point at the documentation of the version 3.5, which is current at the time of writing. By changing the version number in the provided links you can also obtain documentation for different versions of Singularity.