Newer
Older
# Welcome to pyBIS!
pyBIS is a Python module for interacting with openBIS, designed to be used in Jupyter. It offers some sort of IDE for openBIS, supporting TAB completition and input checks, making the life of a researcher hopefully easier.
## Dependencies and Requirements
- pyBIS relies the openBIS API v3
- openBIS version 16.05.2 or newer is required
- 18.06.2 or later is recommended
That command will download install pybis and all its dependencies.
If you haven't done yet, install Jupyter Notebook:
```
pip install jupyter
```
## Tab completition and other hints
Used in a Jupyter Notebook environment, pybis helps you to enter the commands. After every dot `.` you might hit the `TAB` key in order to look at the available commands.
If you are unsure what parameters to add to a , add a question mark right after the method and hit `SHIFT+ENTER`. Jupyter will then look up the signature of the method and show some helpful docstring.
When working with properties of entities, they might use a **controlled vocabulary** or are of a specific **property type**. Add an underscore `_` character right after the property and hit `SHIFT+ENTER` to show the valid values. When a property only acceps a controlled vocabulary, you will be shown the valid terms in a nicely formatted table.
## connect to OpenBIS
Interactivel, i.e. within a Jupyter notebook, you can use `getpass` to enter your password:
```
from pybis import Openbis
o = Openbis('https://example.com', verify_certificates=False)
import getpass
password = getpass.getpass()
o.login('username', password, save_token=True) # save the session token in ~/.pybis/example.com.token
In a script you would rather use two environment variables to provide username and password:
```
from pybis import Openbis
o = Openbis(os.environ['OPENBIS_HOST'], verify_certificates=False)
o.login(os.environ['OPENBIS_USERNAME'], os.environ['OPENBIS_PASSWORD'])
```
Check whether the session token is still valid and log out:
```
o.token
o.is_session_active()
o.logout()
```
# Masterdata
OpenBIS stores quite a lot of meta-data along with your dataSets. The collection of data that describes this meta-data (i.e. meta-meta-data) is called masterdata. It consists of:
* sample types
* dataSet types
* material types
* experiment types
* property types
* vocabularies
* vocabulary terms
* plugins (jython scripts that allow complex data checks)
* tags
* semantic annotations
## browse masterdata
sample_types = o.get_sample_types() # get a list of sample types
sample_types.df # DataFrame object
st = o.get_sample_types()[3] # get 4th element of that list
st = o.get_sample_type('YEAST')
st.attrs.all() # get all attributes as a dict
st.get_validationPlugin() # returns a plugin object
st.get_property_assignments() # show the list of properties
# for that sample type
o.get_material_types()
o.get_dataset_types()
o.get_experiment_types()
o.get_property_types()
pt = o.get_property_type('BARCODE_COMPLEXITY_CHECKER')
pt.attrs.all()
o.get_plugins()
pl = o.get_plugin('Diff_time')
pl.script # the Jython script that processes this property
o.get_vocabularies()
o.get_vocabulary('BACTERIAL_ANTIBIOTIC_RESISTANCE')
o.get_terms(vocabulary='STORAGE')
o.get_tags()
```
## create property types
Samples (objects), experiments (collections) and dataSets contain general **attributes** as well as type-specific **properties**. Before they can be assigned to their respective type, they need to be created first.
code = 'MY_NEW_PROPERTY_TYPE',
label = 'yet another property type',
description = 'my first property',
dataType = 'VARCHAR'
pt_voc = o.new_property_type(
code = 'MY_CONTROLLED_VOCABULARY',
label = 'label me',
description = 'give me a description',
dataType = 'CONTROLLEDVOCABULARY',
vocabulary = 'STORAGE'
)
The `dataType` attribute can contain any of these values:
* `INTEGER`
* `VARCHAR`
* `MULTILINE_VARCHAR`
* `REAL`
* `TIMESTAMP`
* `BOOLEAN`
* `HYPERLINK`
* `XML`
* `CONTROLLEDVOCABULARY`
* `MATERIAL`
When choosing `CONTROLLEDVOCABULARY`, you must specify a `vocabulary` attribute (see example). Likewise, when choosing `MATERIAL`, a `materialType` attribute must be provided.
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
## create sample types
```
sample_type = o.new_sample_type(
code = 'my_own_sample_type', # mandatory
generatedCodePrefix = 'S', # mandatory
description = '',
autoGeneratedCode = True,
subcodeUnique = False,
listable = True,
showContainer = False,
showParents = True,
showParentMetadata = False,
validationPlugin = 'Has_Parents' # see plugins below
)
sample_type.save()
```
## assign properties to sample type
A sample type needs to be saved before properties can be assigned to. This assignment procedure applies to all entity types (dataset type, experiment type, material type).
```
sample_type.assign_property(
prop = 'diff_time', # mandatory
section = '',
ordinal = 5,
mandatory = True,
initialValueForExistingEntities = 'initial value'
showInEditView = True,
showRawValueInForms = True
)
sample_type.revoke_property('diff_time')
sample_type.get_property_assignments()
```
## create dataset types
```
dataset_type = o.new_dataset_type(
code = 'my_dataset_type', # mandatory
description=None,
mainDataSetPattern=None,
mainDataSetPath=None,
disallowDeletion=False,
validationPlugin=None,
)
dataset_type.save()
dataset_type.assign_property('property_name')
dataset_type.revoke_property('property_name')
dataset_type.get_property_assignments()
```
## create experiment types
```
experiment_type = o.new_experiment_type(
code,
description=None,
validationPlugin=None,
)
experiment_type.save()
experiment_type.assign_property('property_name')
experiment_type.revoke_property('property_name')
experiment_type.get_property_assignments()
```
## create material types
```
material_type = o.new_material_type(
code,
description=None,
validationPlugin=None,
)
material_type.save()
material_type.assign_property('property_name')
material_type.revoke_property('property_name')
material_type.get_property_assignments()
```
## create plugins
Plugins are Jython scripts that can accomplish more complex data-checks than ordinary types and vocabularies can achieve. They are assigned to entity types (dataset type, sample type etc). [Documentation and examples can be found here](https://wiki-bsse.ethz.ch/display/openBISDoc/Properties+Handled+By+Scripts)
```
pl = o.new_plugin(
name ='my_new_entry_validation_plugin',
pluginType ='ENTITY_VALIDATION', # or 'DYNAMIC_PROPERTY' or 'MANAGED_PROPERTY',
entityKind = None, # or 'SAMPLE', 'MATERIAL', 'EXPERIMENT', 'DATA_SET'
script = 'def calculate(): pass' # a JYTHON script
)
pl.save()
```
## Users, Groups and RoleAssignments
```
o.get_groups()
group = o.new_group(code='group_name', description='...')
group = o.get_group('group_name')
group.save()
group.assign_role(role='ADMIN', space='DEFAULT')
group.get_roles()
group.revoke_role(role='ADMIN', space='DEFAULT')
group.add_members(['admin'])
group.get_members()
group.del_members(['admin'])
group.delete()
o.get_persons()
person = o.new_person(userId='username')
person.space = 'USER_SPACE'
person.save()
person.assign_role(role='ADMIN', space='MY_SPACE')
person.assign_role(role='OBSERVER')
person.get_roles()
person.revoke_role(role='ADMIN', space='MY_SPACE')
person.revoke_role(role='OBSERVER')
o.get_role_assignments()
o.get_role_assignments(space='MY_SPACE')
o.get_role_assignments(group='MY_GROUP')
ra = o.get_role_assignment(techId)
ra.delete()
```
## Spaces
```
space = o.new_space(code='space_name', description='')
space.save()
space.delete('reason for deletion')
o.get_spaces(
start_with = 1, # start_with and count
count = 7, # enable paging
)
Swen Vermeul
committed
space = o.get_space('MY_SPACE')
space.code
space.description
space.registrator
space.registrationDate
space.modifier
space.modificationDate
space.attrs.all() # returns a dict containing all attributes
```
## Projects
```
project = o.new_project(
space=space,
code='project_name',
description='some project description'
)
project = space.new_project(
code='project_code',
description='project description'
)
project.save()
o.get_projects(
space = 'MY_SPACE', # show only projects in MY_SPACE
start_with = 1, # start_with and count
count = 7, # enable paging
)
o.get_projects(space='MY_SPACE')
space.get_projects()
project.get_experiments()
project.get_attachments()
p.add_attachment(fileName='testfile', description= 'another file', title= 'one more attachment')
project.download_attachments()
Swen Vermeul
committed
project.code
project.description
Swen Vermeul
committed
project.attrs.all() # returns a dict containing all attributes
project.freeze = True
project.freezeForExperiments = True
project.freezeForSamples = True
```
## Samples
Samples are nowadays called **Objects** in openBIS. pyBIS is not yet thoroughly supporting this term in all methods where «sample» occurs.
NOTE: In openBIS, `samples` entities have recently been renamed to `objects`. All methods have synonyms using the term `object`, e.g. `get_object`, `new_object`, `get_object_types`.
```
sample = o.new_sample(
space = 'MY_SPACE',
experiment = '/MY_SPACE/MY_PROJECT/EXPERIMENT_1',
parents = [parent_sample, '/MY_SPACE/YEA66'],
children = [child_sample],
props = {"name": "some name", "description": "something interesting"}
)
sample = space.new_sample( type='YEAST' )
sample.save()
sample = o.get_sample('/MY_SPACE/MY_SAMPLE_CODE')
sample = o.get_sample('20170518112808649-52')
sample.space
sample.code
sample.permId
sample.identifier
sample.type # once the sample type is defined, you cannot modify it
sample.space
sample.space = 'MY_OTHER_SPACE'
sample.experiment # a sample can belong to one experiment only
sample.experiment = '/MY_SPACE/MY_PROJECT/MY_EXPERIMENT'
sample.project
sample.project = '/MY_SPACE/MY_PROJECT' # only works if project samples are
enabled
sample.tags
sample.tags = ['guten_tag', 'zahl_tag' ]
sample.attrs.all() # returns all attributes as a dict
sample.props.all() # returns all properties as a dict
sample.get_attachments()
sample.download_attachments()
sample.add_attachment('testfile.xls')
```
### parents, children, components and container
Swen Vermeul
committed
```
sample.get_parents()
sample.set_parents(['/MY_SPACE/PARENT_SAMPLE_NAME')
sample.add_parents('/MY_SPACE/PARENT_SAMPLE_NAME')
sample.del_parents('/MY_SPACE/PARENT_SAMPLE_NAME')
sample.get_children()
sample.set_children('/MY_SPACE/CHILD_SAMPLE_NAME')
sample.add_children('/MY_SPACE/CHILD_SAMPLE_NAME')
sample.del_children('/MY_SPACE/CHILD_SAMPLE_NAME')
Swen Vermeul
committed
# A Sample may belong to another Sample, which acts as a container.
# As opposed to DataSets, a Sample may only belong to one container.
sample.container # returns a sample object
sample.container = '/MY_SPACE/CONTAINER_SAMPLE_NAME' # watch out, this will change the identifier of the sample to:
# /MY_SPACE/CONTAINER_SAMPLE_NAME:SAMPLE_NAME
sample.container = '' # this will remove the container.
Swen Vermeul
committed
# A Sample may contain other Samples, in order to act like a container (see above)
# The Sample-objects inside that Sample are called «components» or «contained Samples»
# You may also use the xxx_contained() functions, which are just aliases.
sample.get_components()
sample.set_components('/MY_SPACE/COMPONENT_NAME')
sample.add_components('/MY_SPACE/COMPONENT_NAME')
sample.del_components('/MY_SPACE/COMPONENT_NAME')
```
### sample tags
```
sample.get_tags()
sample.set_tags('tag1')
sample.add_tags(['tag2','tag3'])
sample.del_tags('tag1')
```
### useful tricks when dealing with properties, using Jupyter or IPython
```
sample.p + TAB # in IPython or Jupyter: show list of available properties
sample.p.my_property_ + TAB # in IPython or Jupyter: show datatype or controlled vocabulary
sample.p['my-weird.property-name'] # accessing properties containing a dash or a dot
sample.set_props({ ... }) # set properties by providing a dict
sample.p # same thing as .props
sample.p.my_property = "some value" # set the value of a property
# value is checked (type/vocabulary)
sample.save() # update the sample in openBIS
```
Swen Vermeul
committed
### querying samples
```
samples = o.get_samples(
space ='MY_SPACE',
type ='YEAST',
tags =['*'], # only sample with existing tags
start_with = 1, # start_with and count
count = 7, # enable paging
NAME = 'some name', # properties are always uppercase
# to distinguish them from attributes
**{ "SOME.WEIRD:PROP": "value"} # property name contains a dot or a
# colon: cannot be passed as an argument
props=['NAME', 'MATING_TYPE'] # show these properties in the result
samples.df # returns a pandas DataFrame object
samples.get_datasets(type='ANALYZED_DATA')
```
### freezing samples
```
sample.freeze = True
sample.freezeForComponents = True
sample.freezeForChildren = True
sample.freezeForParents = True
sample.freezeForDataSets = True
```
## Experiments
NOTE: In openBIS, `experiment` entities have recently been renamed to `collection`. All methods have synonyms using the term `collection`, e.g. `get_collections`, `new_collection`, `get_collection_types`.
exp = o.new_experiment
type='DEFAULT_EXPERIMENT',
space='MY_SPACE',
project='YEASTS'
)
exp.save()
o.get_experiments(
project='YEASTS',
space='MY_SPACE',
type='DEFAULT_EXPERIMENT',
tags='*',
finished_flag=False,
props=['name', 'finished_flag']
)
exp = o.get_experiment('/MY_SPACE/MY_PROJECT/MY_EXPERIMENT')
exp.set_props({ key: value})
exp.props
exp.p # same thing as .props
exp.p.finished_flag=True
exp.p.my_property = "some value" # set the value of a property (value is checked)
exp.p + TAB # in IPython or Jupyter: show list of available properties
exp.p.my_property_ + TAB # in IPython or Jupyter: show datatype or controlled vocabulary
exp.p['my-weird.property-name'] # accessing properties containing a dash or a dot
Swen Vermeul
committed
exp.attrs.all() # returns all attributes as a dict
exp.props.all() # returns all properties as a dict
exp.attrs.tags = ['some', 'tags']
exp.tags = ['some', 'tags'] # same thing
Swen Vermeul
committed
exp.code
exp.description
exp.registrator
exp.registrationDate
exp.modifier
exp.modificationDate
exp.freeze = True
exp.freezeForDataSets = True
exp.freezeForSamples = True
```
## Datasets
### working with existing dataSets
```
sample.get_datasets()
ds = o.get_dataset('20160719143426517-259')
ds.get_parents()
ds.get_children()
ds.status # AVAILABLE LOCKED ARCHIVED
# ARCHIVE_PENDING UNARCHIVE_PENDING
# BACKUP_PENDING
ds.archive()
ds.unarchive()
Swen Vermeul
committed
ds.attrs.all() # returns all attributes as a dict
ds.props.all() # returns all properties as a dict
Swen Vermeul
committed
ds.add_attachment() # attachments usually contain meta-data
ds.get_attachments() # about the dataSet, not the data itself.
ds.download_attachments()
```
Swen Vermeul
committed
### download dataSets
```
ds.get_files(start_folder="/") # get file list as pandas table
ds.file_list # get file list as array
ds.download() # simply download all files to hostname/permId/
ds.download(
destination = 'my_data', # download files to folder my_data/
create_default_folders = False,# ignore the /original/DEFAULT folders made by openBIS
Swen Vermeul
committed
wait_until_finished = False, # download in background, continue immediately
workers = 10 # 10 downloads parallel (default)
)
```
### dataSet attributes and properties
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
```
ds.set_props({ key: value})
ds.props
ds.p # same thing as .props
ds.p.my_property = "some value" # set the value of a property
ds.p + TAB # show list of available properties
ds.p.my_property_ + TAB # show datatype or controlled vocabulary
ds.p['my-weird.property-name'] # accessing properties containing a dash or a dot
ds.attrs.all() # returns all attributes as a dict
ds.props.all() # returns all properties as a dict
```
### querying dataSets
* examples of a complex queries with methods chaining.
* NOTE: properties must be in UPPERCASE to distinguish them from attributes
```
datasets = o.get_experiments(project='YEASTS')\
.get_samples(type='FLY')\
.get_datasets(
type='ANALYZED_DATA',
props=['MY_PROPERTY'],
MY_PROPERTY='some analyzed data'
)
```
```
datasets = o.get_experiment('/MY_NEW_SPACE/MY_PROJECT/MY_EXPERIMENT4')\
.get_samples(type='UNKNOWN')\
.get_parents()\
.get_datasets(type='RAW_DATA')
```
### deal with dataSets query results
```
datasets.df # get a pandas dataFrame object
# use it in a for-loop:
for dataset in datasets:
print(dataset.permID)
dataset.delete('give me a reason')
```
### freeze dataSets
* once a dataSet has been frozen, it cannot be changed by anyone anymore
* so be careful!
```
ds.freeze = True
ds.freezeForChildren = True
ds.freezeForParents = True
ds.freezeForComponents = True
ds.freezeForContainers = True
ds.save()
```
### create a new dataSet
```
ds_new = o.new_dataset(
type = 'ANALYZED_DATA',
experiment = '/SPACE/PROJECT/EXP1',
sample = '/SPACE/SAMP1',
files = ['my_analyzed_data.dat'],
ds_new.save()
```
### create dataSet with zipfile
```
# DataSet containing one zipfile which will be unzipped in openBIS
ds_new = o.new_dataset(
type = 'RAW_DATA',
sample = '/SPACE/SAMP1',
zipfile = 'my_zipped_folder.zip',
)
ds_new.save()
```
### create dataSet with mixed content
* mixed content means: folders and files are provided
* a relative specified folder (and all its content) will end up in the root, while keeping its structure
* `../measurements/` --> `/measurements/`
* `some/folder/somewhere/` --> `/somewhere/`
* relative files will also end up in the root
* `my_file.txt` --> `/my_file.txt`
* `../somwhere/else/my_other_file.txt` --> `/my_other_file.txt`
* `some/folder/file.txt` --> `/file.txt`
```
# Dataset containing files and folders
# the content of the folder will be zipped (on-the-fly) and uploaded to openBIS.
# openBIS will keep the folder structure intact.
# relative path will be shortened to its basename. For example:
# local openBIS
# ../../myData/ myData/
# some/experiment/results/ results/
ds_new = o.new_dataset(
type = 'RAW_DATA',
sample = '/SPACE/SAMP1',
files = ['../measurements/', 'my_analyis.ipynb', 'results/']
)
ds_new.save()
### create dataSet container
```
# DataSet CONTAINER (contains other DataSets, but no files)
ds_new = o.new_dataset(
type = 'ANALYZED_DATA',
experiment = '/SPACE/PROJECT/EXP1',
sample = '/SPACE/SAMP1',
kind = 'CONTAINER',
```
### get, set, add and remove parent datasets
```
dataset.get_parents()
dataset.set_parents(['20170115220259155-412'])
dataset.add_parents(['20170115220259155-412'])
dataset.del_parents(['20170115220259155-412'])
```
#### get, set, add and remove child datasets
```
dataset.get_children()
dataset.set_children(['20170115220259155-412'])
dataset.add_children(['20170115220259155-412'])
dataset.del_children(['20170115220259155-412'])
```
### dataSet containers
* A DataSet may belong to other DataSets, which must be of kind=CONTAINER
* As opposed to Samples, DataSets may belong (contained) to more than one DataSet-container
```
Swen Vermeul
committed
dataset.get_containers()
dataset.set_containers(['20170115220259155-412'])
dataset.add_containers(['20170115220259155-412'])
dataset.del_containers(['20170115220259155-412'])
```
Swen Vermeul
committed
* a DataSet of kind=CONTAINER may contain other DataSets, to act like a folder (see above)
* the DataSet-objects inside that DataSet are called components or contained DataSets
* you may also use the xxx_contained() functions, which are just aliases.
```
dataset.get_components()
dataset.set_components(['20170115220259155-412'])
dataset.add_components(['20170115220259155-412'])
dataset.del_components(['20170115220259155-412'])
```
## Semantic Annotations
```
# create semantic annotation for sample type 'UNKNOWN'
sa = o.new_semantic_annotation(
entityType = 'UNKNOWN',
predicateOntologyId = 'po_id',
predicateOntologyVersion = 'po_version',
predicateAccessionId = 'pa_id',
descriptorOntologyId = 'do_id',
descriptorOntologyVersion = 'do_version',
descriptorAccessionId = 'da_id'
)
# create semantic annotation for property type
# (predicate and descriptor values omitted for brevity)
sa = o.new_semantic_annotation(propertyType = 'DESCRIPTION', ...)
sa.save()
# create semantic annotation for sample property assignment (predicate and descriptor values omitted for brevity)
sa = o.new_semantic_annotation(entityType = 'UNKNOWN', propertyType = 'DESCRIPTION', ...)
sa.save()
# create a semantic annotation directly from a sample type
# will also create sample property assignment annotations when propertyType is given
st = o.get_sample_type("ORDER")
st.new_semantic_annotation(...)
# get all semantic annotations
o.get_semantic_annotations()
# get semantic annotation by perm id
sa = o.get_semantic_annotation("20171015135637955-30")
# update semantic annotation
sa.predicateOntologyId = 'new_po_id'
sa.descriptorOntologyId = 'new_do_id'
sa.save()
# delete semantic annotation
sa.delete('reason')
```
new_tag = o.new_tag(
code = 'my_tag',
description = 'some descriptive text'
)
new_tag.description = 'some new description'
new_tag.save()
o.get_tags()
o.get_tag('/username/TAG_Name')
o.get_tag('TAG_Name')
tag.get_experiments()
tag.get_samples()
An entity such as Sample (Object), Experiment (Collection), Material or DataSet can be of a specific *entity type*:
* Sample Type
* Experiment Type
* DataSet Type
* Material Type
Every type defines which **Properties** may be defined. Properties act like **Attributes**, but they are type-specific. Properties can contain all sorts of information, such as free text, XML, Hyperlink, Boolean and also **Controlled Vocabulary**. Such a Controlled Vocabulary consists of many **VocabularyTerms**. These terms are used to only allow certain values entered in a Property field.
So for example, you want to add a property called **Animal** to a Sample and you want to control which terms are entered in this Property field. For this you need to do a couple of steps:
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
1. create a new vocabulary *AnimalVocabulary*
2. add terms to that vocabulary: *Cat, Dog, Mouse*
3. create a new PropertyType (e.g. *Animal*) of DataType *CONTROLLEDVOCABULARY* and assign the *AnimalVocabulary* to it
4. create a new SampleType (e.g. *Pet*) and *assign* the created PropertyType to that Sample type.
5. If you now create a new Sample of type *Pet* you will be able to add a property *Animal* to it which only accepts the terms *Cat, Dog* or *Mouse*.
**create new Vocabulary with three VocabularyTerms**
```
voc = o.new_vocabulary(
code = 'BBB',
description = 'description of vocabulary aaa',
urlTemplate = 'https://ethz.ch',
terms = [
{ "code": 'term_code1', "label": "term_label1", "description": "term_description1"},
{ "code": 'term_code2', "label": "term_label2", "description": "term_description2"},
{ "code": 'term_code3', "label": "term_label3", "description": "term_description3"}
]
)
voc.save()
```
**create additional VocabularyTerms**
```
term = o.new_term(
code='TERM_CODE_XXX',
vocabularyCode='BBB',
label='here comes a label',
description='here might appear a meaningful description'
)
term.save()
```
**update VocabularyTerms**
To change the ordinal of a term, it has to be moved either to the top with the `.move_to_top()` method or after another term using the `.move_after_term('TERM_BEFORE')` method.
```
voc = o.get_vocabulary('STORAGE')
term = voc.get_terms()['RT']
term.label = "Room Temperature"
term.official = True
term.move_to_top()
term.move_after_term('-40')
term.save()
term.delete()
```