Skip to content
Snippets Groups Projects
dss-dropboxes.md 43.7 KiB
Newer Older
  • Learn to ignore specific revisions
  • Marco Del Tufo's avatar
    .
    Marco Del Tufo committed
    1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000
    Dropboxes
    =========
    
    Jython Dropboxes
    ----------------
    
    ### Introduction
    
    The jython dropbox feature makes it possible for a script written in the
    Python language to control the data set registration process of the
    openBIS Data Store Server. A script can modify the files in the dropbox
    and register data sets, samples, and experiments as part of its
    processing. The framework provides tools to track file operations and,
    if necessary, revert them, ensuring that the incoming file or directory
    is returned to its original state in the event of an error.
    
    By deafult python 2.5 is used, but it's possible to use python version
    2.7.
    
    Dropboxes are dss core plugins: [Core
    Plugins](/display/openBISDoc2010/Core+Plugins)
    
    ### Simple Example
    
    Here is an example that registers files that arrive in the drop box as
    data sets. They are explicitly attached to the experiment "JYTHON" in
    the project "TESTPROJ" and space "TESTGROUP".
    
    **data-set-handler-basic.py**
    
        def process(transaction):
          # Create a data set
          dataSet = transaction.createNewDataSet()
         
          # Reference the incoming file that was placed in the dropbox
          incoming = transaction.getIncoming()
          # Add the incoming file into the data set
          transaction.moveFile(incoming.getAbsolutePath(), dataSet)
         
          # Get an experiment for the data set
          exp = transaction.getExperiment("/TESTGROUP/TESTPROJ/JYTHON")
    
          # Set the owner of the data set -- the specified experiment
          dataSet.setExperiment(exp)
    
    This example is is unrealistically simple, but contains all the elements
    necessary to implement a jython drop box. The main idea is to perform
    several operations within the bounds of a transaction on the data and
    metadata. The transaction is used to track the changes made so they can
    be executed together or all reverted if a problem occurs.
    
    ### More Realistic Example
    
    The above example demonstrates the concept, but it is unrealistically
    simple. In general, we want to be able to determine and specify the
    experiment/sample for a data set and explicitly set the data set type as
    well.
    
    In this example, we handle a usage scenario where there is one
    experiment done every day. All data produced on a single day is
    associated with the experiment for that date. If the experiment for a
    given day does not exist, it is created.
    
    **data-set-handler-experiment-reg.py**
    
        from datetime import datetime
         
        def process(transaction):
    
          # Try to get the experiment for today
          now_str = datetime.today().strftime('%Y%m%d')
          expid = "/TESTGROUP/TESTPROJ/" + now_str
          exp = transaction.getExperiment(expid)
    
    
          # Create an experiment if necessary
          if None == exp:
            exp = transaction.createNewExperiment(expid, "COMPOUND_HCS")
            exp.setPropertyValue("DESCRIPTION", "An experiment created on " + datetime.today().strftime('%Y-%m-%d'))
            exp.setPropertyValue("COMMENT", now_str)
          
          dataSet = transaction.createNewDataSet()
            
          incoming = transaction.getIncoming()
          transaction.moveFile(incoming.getAbsolutePath(), dataSet)
          dataSet.setDataSetType("HCS_IMAGE")
          dataSet.setExperiment(exp)
    
    More complex processing is also possible. In the following sections, we
    explain how to configure a jython dropbox and describe the API in
    greater detail.
    
    ### Model
    
    The model underlying dropbox registration is the following: when a new
    file or folder is found in the dropbox folder, the process function of
    the script file is invoked with a [data set registration
    transaction](#Dropboxes-IDataSetRegistrationTransaction) as an argument.
    The process function has the responsibility of looking at the incoming
    file or folder and determining what needs to be registered or modified
    in the metadata database and what data needs to be stored on the file
    system. The
    [IDataSetRegistrationTransaction](#Dropboxes-IDataSetRegistrationTransaction) interface
    defines the API for specifying entities to register and update.
    
    Committing a transaction is actually a two-part process. The metadata is
    stored in the openBIS application server's database; the data is kept on
    the file system in a sharded directory structure beneath the data store
    server's *store* directory. All modifications requested as part of a
    transaction are committed atomically — they either all succeed or all
    fail.
    
    Several [Events](#Dropboxes-Events) occur in the process of committing a
    transaction. By defining jython functions, it is possible to be notified
    and intervene when an event occurs. Because the infrastructure reserves
    the right to delay or retry actions if resources become unavailable, the
    process function and event functions cannot use global variables to
    communicate with each other. Instead, they should use the registration
    context object to communicate. Anything stored in the registration
    context must, however, be serializable by Java serialization.
    
    Details
    -------
    
    ### Dropbox Configuration
    
    A jython dropbox is typically distributed as a [core
    plugin](/display/openBISDoc2010/Core+Plugins) and configured in its
    plugin.properties file. A dropbox configured to run a jython script,
    which is kept in the same directory as plugin.properties. The
    configuration requires a storage processor and the name of the script (a
    full path is not necessary if the script is in the same directory as the
    plugin.properties). Here is an example configuration for a dropbox that
    uses the jython handler.
    
    **plugin.properties**
    
        #
        # REQUIRED PARAMETERS
        #
        # The directory to watch for new data sets
        incoming-dir = ${root-dir}/incoming-jython
    
        # The handler class. Must be either ch.systemsx.cisd.etlserver.registrator.api.v2.JythonTopLevelDataSetHandlerV2 or a subclass thereof
        top-level-data-set-handler = ch.systemsx.cisd.etlserver.registrator.api.v2.JythonTopLevelDataSetHandlerV2
    
        # The script to execute, reloaded and recompiled each time a file/folder is placed in the dropbox
        script-path = ${root-dir}/data-set-handler.py
    
        # The appropriate storage processor
        storage-processor = ch.systemsx.cisd.etlserver.DefaultStorageProcessor
    
        # Specify jython version. Default is whatever is specified in datastore server service.properties under property "jython-version"
        plugin-jython-version=2.5
        #
        # OPTIONAL PARAMETERS
        #
         
        # False if incoming directory is assumed to exist.
        # Default - true: Incoming directory will be created on start up if it doesn't exist.
        incoming-dir-create = true
    
        # Defines how the drop box decides if a folder is ready to process: either by a 'marker-file' or a time out which is called 'auto-detection'
        # The time out is set globally in the service.properties and is called 'quiet-period'. This means when the number of seconds is over and no changes have
        # been made to the incoming folder the drop will start to register. The marker file must have the following naming schema: '.MARKER_is_finished_<incoming_folder_name>'
        incoming-data-completeness-condition = marker-file 
         
        # Defines whether the dropbox should handle .h5 archives as folders (true) or as files (false). Default is true.
        h5-folders = true
         
        # Defines whether the dropbox should handle .h5ar archives as folders (true) or as files (false). Default is true.
        h5ar-folders = true
    
    #### Development mode
    
    Set property `development-mode = true` in your dropbox to enable a quick
    feedback loop when developing your dropbox. By default dropboxes have
    complex auto-recovery mechanism working, which on errors waits and
    retries the registration several times. It can be useful in case of
    short network problems or other unexpected turbulences. In this case it
    can take a long time between the dropbox tries to register something,
    and actual error report. During development it is essential to have a
    quick feedback if your dropbox does what it should or not. Thus - set
    the development mode if you are modifying your script and remember to
    set it back when you are done.
    
    #### Jython version
    
    Set property `plugin-jython-version=2.7` in your dropbox
    plugin.properties to change default jython version for the single
    dropbox. Available are versions 2.5 and 2.7
    
    Jython API
    ----------
    
    When a new file is placed in the dropbox, the framework compiles and
    executes the script, checks that the signatures of the `process`
    function and any defined event-handling functions are correct, and then
    invokes its `process` function.
    
    ### IDataSetRegistrationTransaction
    
    Have a look
    at [IDataSetRegistrationTransactionV2](https://openbis.ch/javadoc/20.10.x/javadoc-dropbox-api/ch/systemsx/cisd/etlserver/registrator/api/v2/IDataSetRegistrationTransactionV2.html)
    for the calls available in a transaction. Note that you need to use the
    file methods in the transaction, like e.g. `moveFile()`,  rather than
    manipulating the file system directly to get fully transactional
    behavior.
    
    #### TransDatabase queries
    
    The query object returned
    by `getDatabaseQuery(String dataSourceName)` allows to perform any query
    and executing any statement on the given query database in the context
    of a database transaction. Here are the methods available from the query
    interface:
    
        public interface DynamicQuery {
    
            /**
             * Performs a SQL query. The returned List is connected to the database and
             * updateable.
             * 
             * @param query  The SQL query template.
             * @param parameters  The parameters to fill into the SQL query template.
             * 
             * @return The result set as List; each row is represented as one Map<String,Object>.
             */
            List<Map<String, Object>> select(final String query,
                    final Object... parameters);
    
            /**
             * Performs a SQL query. The returned List is connected and
             * updateable.
             * 
             * @param type  The Java type to return one rows in the returned
             *            result set.
             * @param query  The SQL query template.
             * @param parameters  The parameters to fill into the SQL query template.
             * 
             * @return The result set as List; each row is represented as one Map<String,Object>.
             */
            <T> List<T> select(final Class<T> type, final String query,
                    final Object... parameters);
    
            /**
             * Executes a SQL statement.
             * 
             * @param query  The SQL query template.
             * @param parameters  The parameters to fill into the SQL query template.
             * 
             * @return The number of rows updated by the SQL statement, or -1 if not
             *         applicable. <b>Note:</b> Not all JDBC drivers support this
             *         cleanly.
             */
            int update(final String query, final Object... parameters);
    
            /**
             * Executes a SQL statement as a batch for all parameter values provided.
             * 
             * @param query  The SQL query template.
             * @param parameters  The parameters to fill into the SQL query template. At least
             *            one of the parameters needs to be an array or
             *            <code>Collection</code>. If multiple parameters are arrays or
             *            <code>Collection</code>, all of them need to have the same
             *            size.
             * 
             * @return The number of rows updated by the SQL statement, or -1 if not
             *         applicable. <b>Note:</b> Not all JDBC drivers support this
             *         cleanly.
             */
            int batchUpdate(final String query, final Object... parameters);
    
            /**
             * Executes a SQL statement. Supposed to be used for INSERT statements with
             * an automatically generated integer key.
             * 
             * @param query  The SQL query template.
             * @param parameters  The parameters to fill into the SQL query template.
             * 
             * @return The automatically generated key. <b>Note:</b> Not all JDBC
             *         drivers support this cleanly.
             */
            long insert(final String query, final Object... parameters);
    
            /**
             * Executes a SQL statement. Supposed to be used for INSERT statements with
             * one or more automatically generated keys.
             * 
             * @param query  The SQL query template.
             * @param parameters  The parameters to fill into the SQL query template.
             * 
             * @return The automatically generated keys. <b>Note:</b> Not all JDBC
             *         drivers support this cleanly and it is in general driver-dependent 
             *         what keys are present in the returned map.
             */
            Map<String, Object> insertMultiKeys(final String query,
                    final Object... parameters);
    
            /**
             * Executes a SQL statement as a batch for all parameter values provided.
             * Supposed to be used for INSERT statements with an automatically generated
             * integer key.
             * 
             * @param query  The SQL query template.
             * @param parameters  The parameters to fill into the SQL query template. At least
             *            one of the parameters needs to be an array or
             *            <code>Collection</code>. If multiple parameters are arrays or
             *            <code>Collection</code>, all of them need to have the same
             *            size.
             * 
             * @return The automatically generated key for each element of the batch.
             *         <b>Note:</b> Not all JDBC drivers support this cleanly.
             */
            long[] batchInsert(final String query, final Object... parameters);
    
            /**
             * Executes a SQL statement as a batch for all parameter values provided.
             * Supposed to be used for INSERT statements with one or more automatically
             * generated keys.
             * 
             * @param query  The SQL query template.
             * @param parameters  The parameters to fill into the SQL query template. At least
             *            one of the parameters needs to be an array or
             *            <code>Collection</code>. If multiple parameters are arrays or
             *            <code>Collection</code>, all of them need to have the same
             *            size.
             * 
             * @return The automatically generated keys for each element of the batch.
             *         <b>Note:</b> Not all JDBC drivers support this cleanly and it is
             *         in general driver-dependent what keys are present in the returned map.
             */
            Map<String, Object>[] batchInsertMultiKeys(final String query,
                    final Object... parameters);
        }
    
    ### Events / Registration Process Hooks
    
      
    The script can be informed of events that occur during the registration
    process. To be informed of an event, define a function in the script
    file with the name specified in the table. The script can do anything it
    wants within an event function. Typical things to do in event functions
    include sending emails or registering data in secondary databases. Some
    of the event functions can be used to control the behavior of the
    registration.
    
    This table summarizes the supported events.
    
    #### Events Table
    
    |Function Name|Return Value|Description|
    |--- |--- |--- |
    |pre_metadata_registration(DataSetRegistrationContext context)|void|Called before the openBIS AS is informed of the metadata modifications. Throwing an exception in this method aborts the transaction.|
    |post_metadata_registration(DataSetRegistrationContext context)|void|The metadata has been successfully stored in the openBIS AS. This can also be a place to register data in a secondary transaction, with the semantics that any errors are ignored.|
    |rollback_pre_registration(DataSetRegistrationContext context, Exception exception)|void|Called if the metadata was not successfully storedin the openBIS AS.|
    |post_storage(DataSetRegistrationContext context)|void|Called once the data has been placed in the appropriate sharded directory of the store. This can only happen if the metadata was successfully registered with the AS.|
    |should_retry_processing(DataSetRegistrationContext context, Exception problem)|boolean|A problem occurred with the process function, should the operation be retried? A retry happens only if this method returns true.|
    
    Note: the `rollback_pre_registration` function is intended to handle
    cases when the dropbox code finished properly, but the registration of
    data in openbis failed. These kinds of problems are impossible to handle
    from inside of the `process` function. The exceptions raised during the
    call to the `process` function should be handled by the function itself
    by catching exceptions.
    
    #### Typical Usage Table
    
    |Function Name|Usage|
    |--- |--- |
    |pre_metadata_registration(DataSetRegistrationContext context)|This event can be used as a place to register information in a secondary database. If the transaction in the secondary database does not commit, false can be returned to prevent the data from entering openBIS.|
    |post_metadata_registration(DataSetRegistrationContext context)|This event can be used as a place to register information in a secondary database. Errors encountered are ignored.|
    |rollback_pre_registration(DataSetRegistrationContext context, Exception exception)|Undoing a commit to a secondary transaction. Sending an email to the admin that the data set could not be stored.|
    |post_storage(DataSetRegistrationContext context)|Sending an email to tell the user that the data has been successfully registered. Notifying an external system that a data set has been registered.|
    |should_retry_processing(DataSetRegistrationContext context, Exception problem)|Informing openBIS if it should retry processing a data set.|
    
    Example Scripts
    ---------------
    
    A simple script that registers the incoming file as a data set
    associated with a particular experiment.
    
    **data-set-handler-basic.py**
    
        def process(transaction):
            dataSet = transaction.createNewDataSet()
            incoming = transaction.getIncoming()
            transaction.moveFile(incoming.getAbsolutePath(), dataSet)
            dataSet.setExperiment(transaction.getExperiment("/TESTGROUP/TESTPROJ/JYTHON"))
    
    A script that registers the incoming file and associates it to a daily
    experiment, which is created if necessary.
    
    **data-set-handler-experiment-reg.py**
    
        from datetime import datetime
        def process(transaction)
            # Try to get the experiment for today
            now_str = datetime.today().strftime('%Y%m%d')
            expid = "/TESTGROUP/TESTPROJ/" + now_str
            exp = transaction.getExperiment(expid)
            # Create an experiment
            if None == exp:
                exp = transaction.createNewExperiment(expid, "COMPOUND_HCS")
                exp.setPropertyValue("DESCRIPTION", "An experiment created on " + datetime.today().strftime('%Y-%m-%d'))
                exp.setPropertyValue("COMMENT", now_str)
            dataSet = transaction.createNewDataSet()
            incoming = transaction.getIncoming()    
            transaction.moveFile(incoming.getAbsolutePath(), dataSet)
            dataSet.setDataSetType("HCS_IMAGE")
            dataSet.setExperiment(exp)
    
    Delete, Move, or Leave Alone on Error
    -------------------------------------
    
    When a problem occurs processing a file in the dropbox, the processing
    is retried. This behavior can be controlled (see
    [\#Errors](#Dropboxes-Errors)). If openBIS determines that it should not
    retry after an error or that it cannot successfully register the
    entities requested, the registration fails. It it possible to configure
    what happens to a file in the dropbox if a registration fails. The
    configuration can specify a behavior – delete the file, move it to an
    error folder, or leave it untouched – for each of several possible
    sources of errors.
    
    By default, the file is left untouched in every case. To change this
    behavior, specify an on-error-decision property on the drop box. This
    has one required sub-key, "class"; other sub-keys are determined by the
    class.
    
    ### Summary
    
    -   Main Key:  
        -   on-error-decision
    
    -   Required Sub Keys:
        -   class : The class the implements the decision
    
    There is currently one class available :
    ch.systemsx.cisd.etlserver.registrator.ConfiguredOnErrorActionDecision
    
    This class has the following sub keys:
    
    -   -   invalid-data-set (a data set that fails validation)
        -   validation-script-error (the validation script did not execute
            correctly)
        -   registration-error (openBIS failed to register the data set)
        -   registration-script-error (the registration script did not
            execute correctly)
        -   storage-processor-error (the storage processor reports an error)
        -   post-registration-error (an error happened after the data set
            had been registered and stored)
    
    ### Example
    
    **plugin.properties**
    
        #
        # On Error Decision
        #
        # The class that implements the decision
        on-error-decision.class = ch.systemsx.cisd.etlserver.registrator.ConfiguredOnErrorActionDecision
         
        # What to do if the data set fails validation
        on-error-decision.invalid-data-set = MOVE_TO_ERROR
         
        # What to do if the validation script has problems
        on-error-decision.validation-script-error = MOVE_TO_ERROR
         
        # What to do if the openBIS does not accept the entities
        on-error-decision.registration-error = MOVE_TO_ERROR
         
        # What to do if the registration script has problems
        on-error-decision.registration-script-error = MOVE_TO_ERROR
         
        # What to do if the storage processor does not run correctly
        on-error-decision.storage-processor-error = MOVE_TO_ERROR
         
        # What to do if an error occurs after the entities have been registered in openBIS
        on-error-decision.post-registration-error = MOVE_TO_ERROR
    
    ### Search
    
    The transaction provides an interface for listing and searching for core
    entities, experiment, sample, and data set.
    
    #### API
    
    To use the search capability, one must first retrieve the search service
    from the transaction. By default the search service returns the entities
    filtered to only those accessible by the user on behalf of wich, the
    script is running. It is still possible to search all existing entities
    by using unfiltered search service accessible from the transaction via
    method getSearchServiceUnfiltered().
    
    #### Experiment
    
    For experiment, there is a facility for listing all experiments that
    belong to a specified project.
    
    #### Sample and Data Set
    
    For sample and data set, a more powerful search capability is available.
    This requires a bit more knowledge of the java classes, but is very
    flexible. For each entity, there is a simplified method that performs a
    search for samples or data sets, respectively, with a specified value
    for a particular property, optionally restricted by entity type (sample
    type or data set type). This provides an easy-to-use interface for a
    common case. More complex searches, however, need to use the more
    powerful API.
    
    ### Authorization Service
    
    The transaction provides an interface for querying the access privileges
    of a user and for filtering collections of entities down to those
    visible to a user.
    
    #### API
    
    To use the authorization service, one must first retrieve the it from
    the transaction.
    
    ### Example
    
    #### Combined Example
    
    In this example, we create a data set, list experiments belonging to a
    project, search for samples, search for data sets, and assign the
    experiment, sample, and parent data sets based on the results of the
    searches.
    
    **data-set-handler-with-search.py**
    
        def process(tr):
            data_set = tr.createNewDataSet()
            incoming = tr.getIncoming()
            tr.moveFile(incoming.getAbsolutePath(), data_set)
            # Get the search service
            search_service = tr.getSearchService()
    
            # List all experiments in a project
            experiments = search_service.listExperiments("/cisd/noe")
    
            # Search for all samples with a property value determined by the file name; we don't care about the type
            samplePropValue = incoming.getName()
            samples = search_service.searchForSamples("ORGANISM", samplePropValue, None)
    
            # If possible, set the owner to the first sample, otherwise the first experiment
            if samples.size() > 0:
                data_set.setSample(samples[0])
            else:
                data_set.setExperiment(experiments[0])
    
            # Search for any potential parent data sets and use them as parents
            parent_data_sets = search_service.searchForDataSets("COMMENT", "no comment", "HCS_IMAGE")
            parent_data_set_codes = map(lambda each : each.getDataSetCode(), parent_data_sets)
            data_set.setParentDatasets(parent_data_set_codes)
    
    An example from the Deep Sequencing environment handling BAM files:
    
    **data-set-handler-alignment.py**
    
        '''
        This is handling bowtie-BAM files and extracts some properties from the BAM header and
        the samtools flagstat command. The results are formatted and attached  as a property
        to the openBIS DataSet.
        Prerequisites are the DataSetType: ALIGNMENT and
        the following properties assigned to the DataSetType mentioned above:
        ALIGNMENT_SOFTWARE, ISSUED_COMMAND, SAMTOOLS_FLAGSTAT,
        TOTAL_READS, MAPPED_READS
        Obviously you need a working samtools binary
        Note:
        print statements go to: ~openbis/sprint/datastore_server/log/startup_log.txt
        '''
        import os
        from ch.systemsx.cisd.openbis.generic.shared.api.v1.dto import SearchCriteria
        FOLDER='/net/bs-dsu-data/array0/dsu/dss/incoming-jython-alignment/'
        SAMTOOLS='/usr/local/dsu/samtools/samtools'
        def process(transaction):
            incoming = transaction.getIncoming()
            # Create a data set and set type
            dataSet = transaction.createNewDataSet("ALIGNMENT")
            dataSet.setMeasuredData(False)
            incomingPath = incoming.getAbsolutePath()
            # Get the incoming name
            name = incoming.getName()
            # expected incoming Name, e.g.:ETHZ_BSSE_110429_63558AAXX_1_sorted.bam
            split = name.split("_")
            sample=split[2]+ '_'+ split[3] + ':' + split[4]
            # Extract values from a samtools view and set the results as DataSet properties
            # Command: samtools view -H ETHZ_BSSE_110429_63558AAXX_1_sorted.bam
            arguments = SAMTOOLS + ' view -H ' + FOLDER + name
            #print('Arguments: '+ arguments)
            cmdResult = os.popen(arguments).read()
            properties = cmdResult.split("\n")[-2].split('\t')
            aligner = (properties[1].split(':')[1].upper() +  '_' + properties[2].split(':')[1])
            command = properties[3]
            arguments = SAMTOOLS + ' flagstat ' + FOLDER + name
            cmdResult = os.popen(arguments).read()
            totalReads = cmdResult.split('\n')[0].split(' ')[0]
            mappedReads = cmdResult.split('\n')[2].split(' ')[0]
            dataSet.setPropertyValue("ALIGNMENT_SOFTWARE", aligner)
            dataSet.setPropertyValue("ISSUED_COMMAND", command)
            dataSet.setPropertyValue("SAMTOOLS_FLAGSTAT", cmdResult)
            dataSet.setPropertyValue("TOTAL_READS", totalReads)
            dataSet.setPropertyValue("MAPPED_READS", mappedReads)
            # Add the incoming file into the data set
            transaction.moveFile(incomingPath, dataSet)
            # Get the search service
            search_service = transaction.getSearchService()
            # Search for the sample
            sc = SearchCriteria()
            sc.addMatchClause(SearchCriteria.MatchClause.createAttributeMatch(SearchCriteria.MatchClauseAttribute.CODE, sample));
            foundSamples = search_service.searchForSamples(sc)
            if foundSamples.size() > 0:
              dataSet.setSample(foundSamples[0])
    
    Error Handling
    --------------
    
    ### Automatic Retry (auto recovery)
    
    OpenBIS has a complex mechanism to ensure that the data registration via
    dropboxes is atomic. When error occurs during data registration, the
    dropbox will try several times before it gives up on the process. The
    retries can happen to the initial processing of the data, as well as to
    the registration in application server. Even if these fail there is
    still a chance to finish the registration. If the registration reaches
    the certain level it stores the checkpoint on the disk. If at any point
    the process fails, or the dss goes down it tries to recover from the
    checkpoint.
    
    There are two types of checkpoint files: State files and marker files.
    There are stored in two different directories. The default location for
    the state files is `datastore_sever/recovery-state`. This can be changed
    by the property `dss-recovery-state-dir`  in DSS `service.properties`.
    The default location for the marker files was
    `<store location>/<share id>/recovery-marker`. This may lead to problems
    if this local is remote. Since version 20.10.6 the default location is
     `datastore_sever/recovery-marker-dir`. This can be changed by the
    property `dss-recovery-marker-dir`  in DSS `service.properties`. 
    
    The `process` function will be retried if a
    `should_retry_processing` function is defined in the dropbox script and
    it returns true. There are two configuration settings that affect this
    behavior. The setting `process-max-retry-count` limits the number of
    times the process function can be retried. The number of times to retry
    before giving up and the waiting periods are defined using properties
    shown in the table below.
    
    IMPORTANT NOTE: Please note, that the registration is considered as
    failed only after, the whole retrying / recovery process will fail. It
    means that it can take a long time before the .faulty\_paths file is
    created, even when there is a simple dropbox error.
    
    Therefor during development of a dropbox we recommend
    using** [development mode](#Dropboxes-Developmentmode)** , wich
    basically sets all retry values to 0, thus disabling the auto-recovery
    feature.
    
    |Key|Default Value|Meaning|
    |--- |--- |--- |
    |process-max-retry-count|6|The maximum number of times the process function can be retried.|
    |process-retry-pause-in-sec|300|The amount of time to wait between retries of the process function.|
    |metadata-registration-max-retry-count|6|The number of times registering metadata with the server can be retried.|
    |metadata-registration-retry-pause-in-sec|300|The number of times registering metadata with the server can be retried.|
    |recovery-max-retry-count|50|The number of times the recovery from checkpoint can be retries.|
    |recovery-min-retry-period|60|The amount of time to wait between recovery from checkpoint retries.|
    
    ![image info](img/771.png)
    
    ### Manual Recovery
    
    The registration of data sets with Jython dropboxes has been designed to
    be quite robust. Nonetheless, there are situations in which problems may
    arise. This can especially be a problem during the development of
    dropboxes. Here are the locations and semantics of several important
    files and folders that can be useful for debugging a dropbox.
    
    |File or Folder|Meaning|
    |--- |--- |
    |datastore_server/log-registrations|Keeps logs of registrations. See the registration log documentation for more information.|
    |[store]/[share]/pre-staging|Contains hard-link copies of the original data. Dropbox process operate on these hardlink copies.|
    |[store]/[share]/staging|The location used to prepare data sets for registration.|
    |[store]/[share]/pre-commit|Where data from data sets are kept while register the metadata with the AS. Once metadata registration succeeds, files are moved from this folder into the final store directory.|
    |[store]/[share]/recovery-marker (before version 20.10.6)
    datastore_sever/recovery-marker-dir (since version 20.10.6)|Directories, one per dropbox, where marker files are kept that indicate that a recovery should happen on an incoming file if it is reprocessed. Deleting a marker file will force the incoming file to be processed as a new file, not a recovery.| 
    
    Classpath / Configuration
    -------------------------
    
    If you want other jython modules to be available to the code that
    implements the drop box, you will need to modify the
    datastore\_server.conf file and add something like
    
        -Dpython.path=data/dropboxes/scripts:lib/jython-lib
    
    To the JAVA\_OPTS environment variable. The line should now look
    something like this:
    
        JAVA_OPTS=${JAVA_OPTS:=-server -d64 -Dpython.path=data/dropboxes/scripts:lib/jython-lib}
    
    If the Jython dropbox need third-party JAR files they have to be added
    to the core plugin in a sub-folder `lib/`.
    
    Validation scripts
    ------------------
    
    See [Jython
    DataSetValidator](/display/openBISDoc2010/Jython+DataSetValidator).
    
    Global Thread Parameters
    ------------------------
    
    If you want to write a drop box which uses some parameters defined in
    the service.properties you can access those properties via
    the `getGlobalState`. Here we show an example how to use:
    
    **Global tread properties**
    
         def getThreadProperties(transaction):
              threadPropertyDict = {}
              threadProperties = transaction.getGlobalState().getThreadParameters().getThreadProperties()
              for key in threadProperties:
                try:
                  threadPropertyDict[key] = threadProperties.getProperty(key)
                except:
                  pass
              return threadPropertyDict
    
            # You can later access the thread properties like this:
            threadPropertyDict = getThreadProperties(transaction)
            incomingRootDir = threadPropertyDict[u'incoming-root-dir']
    
    Sending Emails from a Drop box
    ------------------------------
    
        def post_storage(context):
            mailClient = context.getGlobalState().getMailClient()
            results = context.getPersistentMap().get(PERSISTANT_KEY_MAP)
            sendEmail(mailClient, results[0]) 
    
        def process(transaction):
            transaction.getRegistrationContext().getPersistentMap().put(PERSISTANT_KEY_MAP, [fcId])
    
    Java Dropboxes
    --------------
    
    The above examples show how to implement dropboxes in Python. Python,
    however, is not the only language option: it is also possible to write
    dropboxes in Java. Whereas Python has the advantage of short turnaround
    and less verbose syntax, Java is a good choice in the dropbox employs
    complex logic and/or does not need to be modified frequently. A natural
    progression is to use Python at the beginning, when creating a new
    dropbox, to take advantage of the short turnaround cycle and then move
    to Java once the dropbox implementation becomes more stable. Since the
    API is the same, this language transition process is quite painless.
    
    ### Configuration
    
    As with other dropboxes, a Java dropbox should be deployed as a
    core-plugin.
    
    **plugin.properties**
    
        #
        # REQUIRED PARAMETERS
        #
        # The directory to watch for new data sets
        incoming-dir = ${root-dir}/incoming-java-dropbox
    
        # The handler class. Must be either ch.systemsx.cisd.etlserver.registrator.api.v2.JavaTopLevelDataSetHandlerV2 or a subclass thereof
        top-level-data-set-handler = ch.systemsx.cisd.etlserver.registrator.api.v2.JavaTopLevelDataSetHandlerV2
    
        # The class that implements the dropbox (must implement ch.systemsx.cisd.etlserver.registrator.api.v2.IJavaDataSetRegistrationDropboxV2)
        program-class = ch.systemsx.cisd.etlserver.registrator.api.v2.ExampleJavaDataSetRegistrationDropboxV2
    
        # The appropriate storage processor
        storage-processor = ch.systemsx.cisd.etlserver.DefaultStorageProcessor
    
        #
        # OPTIONAL PARAMETERS
        #
         
        # False if incoming directory is assumed to exist.
        # Default - true: Incoming directory will be created on start up if it doesn't exist.
        incoming-dir-create = true
    
    The program-class parameter specifies the class that implements the
    logic of the dropbox. This class must implement the
    IJavaDataSetRegistrationDropboxV2 interface. This class, and any other
    code it uses, should be packaged in a jar file that is provided with the
    core-plugin. The name of the jar file can be freely chosen.
    
    ### Implementation
    
    To implement a dropbox in Java, implement
    the IJavaDataSetRegistrationDropboxV2 interface, which codifies the
    interaction between the datastore server and the dropbox. We recommend
    subclassing AbstractJavaDataSetRegistrationDropboxV2 to bootstrap the
    implementation of this interface.
    
    **IJavaDataSetRegistrationDropboxV2**
    
        /**
         * The interface that V2 dropboxes must implement. Defines the process method, which is called to
         * handle new data in the dropbox's incoming folder, and various event methods called as the
         * registration process progresses.
         * 
         * @author Pawel Glyzewski
         */
        public interface IJavaDataSetRegistrationDropboxV2
        {
            /**
             * Invoked when new data is found in the incoming folder. Implements the logic of registering
             * and modifying entities.
             * 
             * @param transaction The transaction that offers methods for registering and modifying entities
             *            and performing operations on the file system.
             */
            public void process(IDataSetRegistrationTransactionV2 transaction);
            /**
             * Invoked just before the metadata is registered with the openBIS AS. Gives dropbox
             * implementations an opportunity to perform additional operations. If an exception is thrown in
             * this method, the transaction is rolledback.
             * 
             * @param context Context of the registration. Offers access to the global state and persistent
             *            map.
             */
            public void preMetadataRegistration(DataSetRegistrationContext context);
            /**
             * Invoked if the transaction is rolledback before the metadata is registered with the openBIS
             * AS.
             * 
             * @param context Context of the registration. Offers access to the global state and persistent
             *            map.
             * @param throwable The throwable that triggered rollback.
             */
            public void rollbackPreRegistration(DataSetRegistrationContext context, Throwable throwable);
            /**
             * Invoked just after the metadata is registered with the openBIS AS. Gives dropbox
             * implementations an opportunity to perform additional operations. If an exception is thrown in
             * this method, it is logged but otherwise ignored.
             * 
             * @param context Context of the registration. Offers access to the global state and persistent
             *            map.
             */
            public void postMetadataRegistration(DataSetRegistrationContext context);
            /**
             * Invoked after the data has been stored in its final location on the file system and the
             * storage has been confirmed with the AS.
             * 
             * @param context Context of the registration. Offers access to the global state and persistent
             *            map.
             */
            public void postStorage(DataSetRegistrationContext context);
            /**
             * Is a function defined that can be used to check if a failed registration should be retried?
             * Primarily for use implementations of this interface that dispatch to dynamic languages.
             * 
             * @return true shouldRetryProcessing is defined, false otherwise.
             */
            public boolean isRetryFunctionDefined();
            /**
             * Given the problem with registration, should it be retried?
             * 
             * @param context Context of the registration. Offers access to the global state and persistent
             *            map.
             * @param problem The exception that caused the registration to fail.
             * @return true if the registration should be retried.
             */
            public boolean shouldRetryProcessing(DataSetRegistrationContext context, Exception problem)
                    throws NotImplementedException;
        }
    
    Sending Emails in a drop box (simple)
    -------------------------------------
    
        from ch.systemsx.cisd.common.mail import EMailAddress
    
        def process(transaction):
            replyTo = EMailAddress("manuel.kohler@id.ethz.ch")
            fromAddress = replyTo
            recipient1 = EMailAddress("recipient1@ethz.ch")
            recipient2 = EMailAddress("recipient2@ethz.ch")
    
          transaction.getGlobalState().getMailClient().sendEmailMessage("This is the subject", \
                        "This is the body", replyTo, fromAddress, recipient1, recipient2);
    
    ### Java Dropbox Example
    
    This is a simple example of a pure-java dropbox that creates a sample
    and registers the incoming file as a data set of this sample.
    
    **ExampleJavaDataSetRegistrationDropboxV2.java**
    
        package ch.systemsx.cisd.etlserver.registrator.api.v2;
        import ch.systemsx.cisd.etlserver.registrator.api.v1.IDataSet;
        import ch.systemsx.cisd.etlserver.registrator.api.v1.ISample;
        import ch.systemsx.cisd.openbis.dss.generic.shared.api.internal.v1.IExperimentImmutable;
        /**
         * An example dropbox implemented in Java.
         * 
         * @author Chandrasekhar Ramakrishnan
         */
        public class ExampleJavaDataSetRegistrationDropboxV2 extends
                AbstractJavaDataSetRegistrationDropboxV2
        {
            @Override
            public void process(IDataSetRegistrationTransactionV2 transaction)
            {
                String sampleId = "/CISD/JAVA-TEST";
                ISample sample = transaction.createNewSample(sampleId, "DYNAMIC_PLATE");
                IExperimentImmutable exp = transaction.getExperiment("/CISD/NEMO/EXP-TEST-1");
                sample.setExperiment(exp);
                IDataSet dataSet = transaction.createNewDataSet();
                dataSet.setSample(sample);
                transaction.moveFile(transaction.getIncoming().getAbsolutePath(), dataSet);
            }
        }
    
    Java Code location
    
    The Java file should go into a `lib` folder and should be wrapped as a
    `jar`. The name does not matter.
    
    While building a jar, the project should have the following
    dependencies: `openBIS-API-dropbox-<version>.jar`,
    `lib-commonbase-<version>.jar` and `cisd-hotdeploy-13.01.0.jar`. The
    first two are available in the distribution in the archives
    `openBIS-API-commonbase-<version>.zip` and
    `openBIS-API-dropbox-<version>.zip`, the third one is available in [the
    Ivy
    repo](https://sissource.ethz.ch/openbis/openbis-public/openbis-ivy/-/blob/main/cisd/cisd-hotdeploy/13.01.0/cisd-hotdeploy-13.01.0.jar).
    
    Example path where the created `jar` should reside:
    
    `servers/core-plugins/illumina-ngs/2/dss/drop-boxes/register-cluster-alignment-java/lib`
    
    Create a `jar` from your java dropbox file:
    
    `jar cvf foo.jar foo.java`
    
    Restart the DSS
    
    Calling an Aggregation Service from a drop box
    ----------------------------------------------
    
    **drop box code**
    
        '''
        @author:
        Manuel Kohler
        '''
        from ch.systemsx.cisd.openbis.dss.generic.server.EncapsulatedOpenBISService import createQueryApiServer
         
        def process(transaction):
            # use the etl server session token
            session_token = transaction.getOpenBisServiceSessionToken()
    
            # To find out do SQL on the openBIS DB: select code from data_stores;
            dss = "STANDARD"
    
            # folder name under the reporting_plugins
            service_key = "reporting_experimental"   
    
            # some parameters which are handed over
            d = {"param1": "hello", "param2": "from a drop box"}
    
            # connection to the openbis server returns IQueryApiServer
            s = createQueryApiServer("http://127.0.0.1:8888/openbis/openbis/", "600")
    
            # Actual call
            # Parameters: String sessionToken, String dataStoreCode,String serviceKey, Map<String, Object> parameters)
            s.createReportFromAggregationService(session_token, dss, service_key, d)
    
    Known limitations
    -----------------
    
    #### Blocking
    
    Registering/updating a large number of entities can cause other
    concurrent operations that try to modify the same or related entities to
    be blocked. This limitation applies to both dropboxes and batch
    operations triggered from the web UI. Lists of operations that are
    blocked are presented below. Each list contains operations that cannot
    be performed when a specific kind of entity is being registered/updated.
    
    Experiment:
    
    -   creating/updating an experiment in the same project
    -   updating the same space
    -   updating the same project
    -   updating the same experiment
    
    Sample:
    
    -   creating/updating an experiment in the same project
    -   creating/updating a sample in the same experiment
    -   updating the same space