Newer
Older
# obis
Data-provenance tracking tools like openBIS make it possible to understand and follow the research process. What was studied, what data was acquired and how, how was data analyzed to arrive at final results for publication -- this is information that is captured in openBIS. In the standard usage scenario, openBIS stores and manages data directly. This has the advantage that openBIS acts as a gatekeeper to the data, making it easy to keep backups or enforce access restrictions, etc. However, this way of working is not a good solution for all situations.
Some research groups work with large amounts of data (e.g., multiple TB), which makes it inefficient and impractical to give openBIS control of the data. Other research groups require that data be stored on a shared file system under a well-defined directory structure, be it for historical reasons or because of the tools they use. In this case as well, it is difficult to give openBIS full control of the data.
For situations like these, we have developed `obis`, a tool for orderly management of data in conditions that require great flexibility. `obis` makes it possible to track data on a file system, where users have complete freedom to structure and manipulate the data as they wish, while retaining the benefits of openBIS. With `obis`, only metadata is actually stored and managed by openBIS. The data itself is managed externally, by the user, but openBIS is aware of its existence and the data can be used for provenance tracking. `obis` is packaged as a stand-alone utility, which, to be available, only needs to be added to the `PATH` variable in a UNIX or UNIX-like environment.
Under the covers, `obis` takes advantage of publicly available and tested tools to manage data on the file system. In particular, it uses `git` and `git-annex` to track the content of a dataset. Using `git-annex`, even large binary artifacts can be tracked efficiently. For communication with openBIS, `obis` uses the openBIS API, which offers the power to register and track all metadata supported by openBIS.
# Installation
Since `obis` is based on `pybis`, it requires python 3.6 and the corresponding `pip3`.
First, install `pybis` if it is not already installed. Then install `obis` (paths relative to repository root):
```
pip3 install src/python/PyBis
pip3 install src/python/Obis
```
# See also
V. Korolev, A. Joshi, V. Korolev, M.A. Grasso, A. Joshi, M.A. Grasso, et al., "PROB: A tool for tracking provenance and reproducibility of big data experiments", Reproduce'14. HPCA 2014, vol. 11, pp. 264-286, 2014.
http://ebiquity.umbc.edu/_file_directory_/papers/693.pdf