objectstore Package

objectstore Package

objectstore package, abstraction for storing blobs of data for use in Galaxy, all providers ensure that data can be accessed on the filesystem for running tools

class galaxy.objectstore.CachingObjectStore(path, backend)[source]

Bases: galaxy.objectstore.ObjectStore

Object store that uses a directory for caching files, but defers and writes back to another object store.

class galaxy.objectstore.DiskObjectStore(config, config_xml=None, file_path=None, extra_dirs=None)[source]

Bases: galaxy.objectstore.ObjectStore

Standard Galaxy object store, stores objects in files under a specific directory on disk.

>>> from galaxy.util.bunch import Bunch
>>> import tempfile
>>> file_path=tempfile.mkdtemp()
>>> obj = Bunch(id=1)
>>> s = DiskObjectStore(Bunch(umask=077, job_working_directory=file_path, new_file_path=file_path, object_store_check_old_style=False), file_path=file_path)
>>> s.create(obj)
>>> s.exists(obj)
True
>>> assert s.get_filename(obj) == file_path + '/000/dataset_1.dat'
create(obj, **kwargs)[source]
delete(obj, entire_dir=False, **kwargs)[source]
empty(obj, **kwargs)[source]
exists(obj, **kwargs)[source]
get_data(obj, start=0, count=-1, **kwargs)[source]
get_filename(obj, **kwargs)[source]
get_object_url(obj, **kwargs)[source]
get_store_usage_percent()[source]
size(obj, **kwargs)[source]
update_from_file(obj, file_name=None, create=False, **kwargs)[source]

create parameter is not used in this implementation

class galaxy.objectstore.DistributedObjectStore(config, config_xml=None, fsmon=False)[source]

Bases: galaxy.objectstore.NestedObjectStore

ObjectStore that defers to a list of backends, for getting objects the first store where the object exists is used, objects are created in a store selected randomly, but with weighting.

create(obj, **kwargs)[source]

create() is the only method in which obj.object_store_id may be None

shutdown()[source]
class galaxy.objectstore.HierarchicalObjectStore(config, config_xml=None, fsmon=False)[source]

Bases: galaxy.objectstore.NestedObjectStore

ObjectStore that defers to a list of backends, for getting objects the first store where the object exists is used, objects are always created in the first store.

create(obj, **kwargs)[source]

Create will always be called by the primary object_store

exists(obj, **kwargs)[source]

Exists must check all child object stores

class galaxy.objectstore.NestedObjectStore(config, config_xml=None)[source]

Bases: galaxy.objectstore.ObjectStore

Base for ObjectStores that use other ObjectStores (DistributedObjectStore, HierarchicalObjectStore)

create(obj, **kwargs)[source]
delete(obj, **kwargs)[source]
empty(obj, **kwargs)[source]
exists(obj, **kwargs)[source]
file_ready(obj, **kwargs)[source]
get_data(obj, **kwargs)[source]
get_filename(obj, **kwargs)[source]
get_object_url(obj, **kwargs)[source]
shutdown()[source]
size(obj, **kwargs)[source]
update_from_file(obj, **kwargs)[source]
class galaxy.objectstore.ObjectStore(config, config_xml=None, **kwargs)[source]

Bases: object

ObjectStore abstract interface

create(obj, base_dir=None, dir_only=False, extra_dir=None, extra_dir_at_root=False, alt_name=None, obj_dir=False)[source]

Mark the object identified by obj as existing in the store, but with no content. This method will create a proper directory structure for the file if the directory does not already exist. See exists method for the description of other fields.

delete(obj, entire_dir=False, base_dir=None, extra_dir=None, extra_dir_at_root=False, alt_name=None, obj_dir=False)[source]

Deletes the object identified by obj. See exists method for the description of other fields.

Parameters:entire_dir (bool) – If True, delete the entire directory pointed to by extra_dir. For safety reasons, this option applies only for and in conjunction with the extra_dir or obj_dir options.
empty(obj, base_dir=None, extra_dir=None, extra_dir_at_root=False, alt_name=None, obj_dir=False)[source]

Test if the object identified by obj has content. If the object does not exist raises ObjectNotFound. See exists method for the description of the fields.

exists(obj, base_dir=None, dir_only=False, extra_dir=None, extra_dir_at_root=False, alt_name=None)[source]

Returns True if the object identified by obj exists in this file store, False otherwise.

FIELD DESCRIPTIONS (these apply to all the methods in this class):

Parameters:
  • obj (object) – A Galaxy object with an assigned database ID accessible via the .id attribute.
  • base_dir (string) – A key in self.extra_dirs corresponding to the base directory in which this object should be created, or None to specify the default directory.
  • dir_only (bool) – If True, check only the path where the file identified by obj should be located, not the dataset itself. This option applies to extra_dir argument as well.
  • extra_dir (string) – Append extra_dir to the directory structure where the dataset identified by obj should be located. (e.g., 000/extra_dir/obj.id)
  • extra_dir_at_root (bool) – Applicable only if extra_dir is set. If True, the extra_dir argument is placed at root of the created directory structure rather than at the end (e.g., extra_dir/000/obj.id vs. 000/extra_dir/obj.id)
  • alt_name (string) – Use this name as the alternative name for the created dataset rather than the default.
  • obj_dir (bool) – Append a subdirectory named with the object’s ID (e.g. 000/obj.id)
file_ready(obj, base_dir=None, dir_only=False, extra_dir=None, extra_dir_at_root=False, alt_name=None, obj_dir=False)[source]

A helper method that checks if a file corresponding to a dataset is ready and available to be used. Return True if so, False otherwise.

get_data(obj, start=0, count=-1, base_dir=None, extra_dir=None, extra_dir_at_root=False, alt_name=None, obj_dir=False)[source]

Fetch count bytes of data starting at offset start from the object identified uniquely by obj. If the object does not exist raises ObjectNotFound. See exists method for the description of other fields.

Parameters:
  • start (int) – Set the position to start reading the dataset file
  • count (int) – Read at most count bytes from the dataset
get_filename(obj, base_dir=None, dir_only=False, extra_dir=None, extra_dir_at_root=False, alt_name=None, obj_dir=False)[source]

Get the expected filename (including the absolute path) which can be used to access the contents of the object uniquely identified by obj. See exists method for the description of the fields.

get_object_url(obj, extra_dir=None, extra_dir_at_root=False, alt_name=None, obj_dir=False)[source]

If the store supports direct URL access, return a URL. Otherwise return None. Note: need to be careful to to bypass dataset security with this. See exists method for the description of the fields.

get_store_usage_percent()[source]

Return the percentage indicating how full the store is

shutdown()[source]
size(obj, extra_dir=None, extra_dir_at_root=False, alt_name=None, obj_dir=False)[source]

Return size of the object identified by obj. If the object does not exist, return 0. See exists method for the description of the fields.

update_from_file(obj, base_dir=None, extra_dir=None, extra_dir_at_root=False, alt_name=None, obj_dir=False, file_name=None, create=False)[source]

Inform the store that the file associated with the object has been updated. If file_name is provided, update from that file instead of the default. If the object does not exist raises ObjectNotFound. See exists method for the description of other fields.

Parameters:
  • file_name (string) – Use file pointed to by file_name as the source for updating the dataset identified by obj
  • create (bool) – If True and the default dataset does not exist, create it first.
galaxy.objectstore.build_object_store_from_config(config, fsmon=False, config_xml=None)[source]

Depending on the configuration setting, invoke the appropriate object store

galaxy.objectstore.convert_bytes(bytes)[source]

A helper function used for pretty printing disk usage

galaxy.objectstore.create_object_in_session(obj)[source]
galaxy.objectstore.local_extra_dirs(func)[source]

A decorator for non-local plugins to utilize local directories for their extra_dirs (job_working_directory and temp).

s3_multipart_upload Module

Split large file into multiple pieces for upload to S3. This parallelizes the task over available cores using multiprocessing. Code mostly taken form CloudBioLinux.

galaxy.objectstore.s3_multipart_upload.map_wrap(f)[source]
galaxy.objectstore.s3_multipart_upload.mp_from_ids(s3server, mp_id, mp_keyname, mp_bucketname)[source]

Get the multipart upload from the bucket and multipart IDs.

This allows us to reconstitute a connection to the upload from within multiprocessing functions.

galaxy.objectstore.s3_multipart_upload.multimap(*args, **kwds)[source]

Provide multiprocessing imap like function.

The context manager handles setting up the pool, worked around interrupt issues and terminating the pool on completion.

galaxy.objectstore.s3_multipart_upload.multipart_upload(s3server, bucket, s3_key_name, tarball, mb_size)[source]

Upload large files using Amazon’s multipart upload functionality.

galaxy.objectstore.s3_multipart_upload.transfer_part(*args, **kwargs)[source]

Transfer a part of a multipart upload. Designed to be run in parallel.