jobs Package

jobs Package

Support for running a tool in Galaxy via an internal job management system

class galaxy.jobs.ComputeEnvironment[source]

Bases: object

Definition of the job as it will be run on the (potentially) remote compute server.

config_directory()[source]

Directory containing config files (potentially remote)

input_paths()[source]

Input DatasetPaths defined by job.

new_file_path()[source]

Absolute path to dump new files for this job on compute server.

output_paths()[source]

Output DatasetPaths defined by job.

sep()[source]

os.path.sep for the platform this job will execute in.

tool_directory()[source]

Absolute path to tool files for this job on compute server.

unstructured_path_rewriter()[source]

Return a function that takes in a value, determines if it is path to be rewritten (will be passed non-path values as well - onus is on this function to determine both if its input is a path and if it should be rewritten.)

version_path()[source]

Location of the version file for the underlying tool.

working_directory()[source]

Job working directory (potentially remote)

class galaxy.jobs.JobConfiguration(app)[source]

Bases: object

A parser and interface to advanced job management features.

These features are configured in the job configuration, by default, job_conf.xml

DEFAULT_NWORKERS = 4
convert_legacy_destinations(job_runners)[source]

Converts legacy (from a URL) destinations to contain the appropriate runner params defined in the URL.

Parameters:job_runners (list of job runner plugins) – All loaded job runner plugins.
default_job_tool_configuration

The default JobToolConfiguration, used if a tool does not have an explicit defintion in the configuration. It consists of a reference to the default handler and default destination.

Returns:JobToolConfiguration – a representation of a <tool> element that uses the default handler and destination
get_destination(id_or_tag)[source]

Given a destination ID or tag, return the JobDestination matching the provided ID or tag

Parameters:id_or_tag (str) – A destination ID or tag.
Returns:JobDestination – A valid destination

Destinations are deepcopied as they are expected to be passed in to job runners, which will modify them for persisting params set at runtime.

get_destinations(id_or_tag)[source]

Given a destination ID or tag, return all JobDestinations matching the provided ID or tag

Parameters:id_or_tag (str) – A destination ID or tag.
Returns:list or tuple of JobDestinations

Destinations are not deepcopied, so they should not be passed to anything which might modify them.

get_handler(id_or_tag)[source]

Given a handler ID or tag, return the provided ID or an ID matching the provided tag

Parameters:id_or_tag (str) – A handler ID or tag.
Returns:str – A valid job handler ID.
get_job_runner_plugins(handler_id)[source]

Load all configured job runner plugins

Returns:list of job runner plugins
get_job_tool_configurations(ids)[source]

Get all configured JobToolConfigurations for a tool ID, or, if given a list of IDs, the JobToolConfigurations for the first id in ids matching a tool definition.

Note

You should not mix tool shed tool IDs, versionless tool shed IDs, and tool config tool IDs that refer to the same tool.

Parameters:ids (list or str.) – Tool ID or IDs to fetch the JobToolConfiguration of.
Returns:list – JobToolConfiguration Bunches representing <tool> elements matching the specified ID(s).

Example tool ID strings include:

  • Full tool shed id: toolshed.example.org/repos/nate/filter_tool_repo/filter_tool/1.0.0
  • Tool shed id less version: toolshed.example.org/repos/nate/filter_tool_repo/filter_tool
  • Tool config tool id: filter_tool
get_tool_resource_parameters(tool_id)[source]

Given a tool id, return XML elements describing parameters to insert into job resources.

Tool id:A tool ID (a string)
Returns:List of parameter elements.
is_handler(server_name)[source]

Given a server name, indicate whether the server is a job handler

Parameters:server_name (str) – The name to check
Returns:bool
is_id(collection)[source]

Given a collection of handlers or destinations, indicate whether the collection represents a tag or a real ID

Parameters:collection (tuple or list) – A representation of a destination or handler
Returns:bool
is_tag(collection)[source]

Given a collection of handlers or destinations, indicate whether the collection represents a tag or a real ID

Parameters:collection (tuple or list) – A representation of a destination or handler
Returns:bool
class galaxy.jobs.JobDestination(**kwds)[source]

Bases: galaxy.util.bunch.Bunch

Provides details about where a job runs

class galaxy.jobs.JobToolConfiguration(**kwds)[source]

Bases: galaxy.util.bunch.Bunch

Provides details on what handler and destination a tool should use

A JobToolConfiguration will have the required attribute ‘id’ and optional attributes ‘handler’, ‘destination’, and ‘params’

get_resource_group()[source]
class galaxy.jobs.JobWrapper(job, queue, use_persisted_destination=False)[source]

Bases: object

Wraps a ‘model.Job’ with convenience methods for running processes and state management.

can_split()[source]
change_ownership_for_run()[source]
change_state(state, info=False)[source]
check_limits(runtime=None)[source]
check_tool_output(stdout, stderr, tool_exit_code, job)[source]
cleanup(delete_files=True)[source]
clear_working_directory()[source]
commands_in_new_shell
compute_outputs()[source]
create_working_directory()[source]
default_compute_environment(job=None)[source]
fail(message, exception=False, stdout='', stderr='', exit_code=None)[source]

Indicate job failure by setting state and message on all output datasets.

finish(stdout, stderr, tool_exit_code=None, remote_working_directory=None)[source]

Called to indicate that the associated command has been run. Updates the output datasets based on stderr and stdout from the command, and the contents of the output files.

galaxy_lib_dir
galaxy_system_pwent
get_command_line()[source]
get_dataset_finish_context(job_context, dataset)[source]
get_env_setup_clause()[source]
get_id_tag()[source]
get_input_dataset_fnames(ds)[source]
get_input_fnames()[source]
get_input_paths(job=None)[source]
get_job()[source]
get_job_runner()
get_job_runner_url()[source]
get_mutable_output_fnames()[source]
get_output_destination(output_path)[source]

Destination for outputs marked as from_work_dir. This is the normal case, just copy these files directly to the ulimate destination.

get_output_file_id(file)[source]
get_output_fnames()[source]
get_output_hdas_and_fnames()[source]
get_output_sizes()[source]
get_parallelism()[source]
get_param_dict()[source]

Restore the dictionary of parameters from the database.

get_session_id()[source]
get_state()[source]
get_tool_provided_job_metadata()[source]
get_version_string_path()[source]
has_limits()[source]
invalidate_external_metadata()[source]
job_destination

Return the JobDestination that this job will use to run. This will either be a configured destination, a randomly selected destination if the configured destination was a tag, or a dynamically generated destination from the dynamic runner.

Calling this method for the first time causes the dynamic runner to do its calculation, if any.

Returns:JobDestination
mark_as_resubmitted(info=None)[source]
pause(job=None, message=None)[source]
prepare(compute_environment=None)[source]

Prepare the job to run by creating the working directory and the config files.

reclaim_ownership()[source]
requires_setting_metadata
set_job_destination(job_destination, external_id=None)[source]

Persist job destination params in the database for recovery.

self.job_destination is not used because a runner may choose to rewrite parts of the destination (e.g. the params).

set_runner(runner_url, external_id)[source]
setup_external_metadata(exec_dir=None, tmp_dir=None, dataset_files_path=None, config_root=None, config_file=None, datatypes_config=None, set_extension=True, **kwds)[source]
user
user_system_pwent
class galaxy.jobs.NoopQueue[source]

Bases: object

Implements the JobQueue / JobStopQueue interface but does nothing

put(*args, **kwargs)[source]
put_stop(*args)[source]
shutdown()[source]
class galaxy.jobs.ParallelismInfo(tag)[source]

Bases: object

Stores the information (if any) for running multiple instances of the tool in parallel on the same set of inputs.

class galaxy.jobs.SharedComputeEnvironment(job_wrapper, job)[source]

Bases: galaxy.jobs.SimpleComputeEnvironment

Default ComputeEnviornment for job and task wrapper to pass to ToolEvaluator - valid when Galaxy and compute share all the relevant file systems.

input_paths()[source]
new_file_path()[source]
output_paths()[source]
tool_directory()[source]
version_path()[source]
working_directory()[source]
class galaxy.jobs.SimpleComputeEnvironment[source]

Bases: object

config_directory()[source]
sep()[source]
unstructured_path_rewriter()[source]
class galaxy.jobs.TaskWrapper(task, queue)[source]

Bases: galaxy.jobs.JobWrapper

Extension of JobWrapper intended for running tasks. Should be refactored into a generalized executable unit wrapper parent, then jobs and tasks.

can_split()[source]
change_state(state, info=False)[source]
cleanup(delete_files=True)[source]
fail(message, exception=False)[source]
finish(stdout, stderr, tool_exit_code=None)[source]

Called to indicate that the associated command has been run. Updates the output datasets based on stderr and stdout from the command, and the contents of the output files.

get_command_line()[source]
get_dataset_finish_context(job_context, dataset)[source]
get_exit_code()[source]
get_id_tag()[source]
get_job()[source]
get_output_destination(output_path)[source]

Destination for outputs marked as from_work_dir. These must be copied with the same basenme as the path for the ultimate output destination. This is required in the task case so they can be merged.

get_output_file_id(file)[source]
get_param_dict()[source]

Restore the dictionary of parameters from the database.

get_session_id()[source]
get_state()[source]
get_task()[source]
get_tool_provided_job_metadata()[source]
prepare(compute_environment=None)[source]

Prepare the job to run by creating the working directory and the config files.

set_runner(runner_url, external_id)[source]
setup_external_metadata(exec_dir=None, tmp_dir=None, dataset_files_path=None, config_root=None, config_file=None, datatypes_config=None, set_extension=True, **kwds)[source]
galaxy.jobs.config_exception(e, file)[source]

handler Module

Galaxy job handler, prepares, runs, tracks, and finishes Galaxy jobs

class galaxy.jobs.handler.DefaultJobDispatcher(app)[source]

Bases: object

put(job_wrapper)[source]
recover(job, job_wrapper)[source]
shutdown()[source]
stop(job)[source]

Stop the given job. The input variable job may be either a Job or a Task.

url_to_destination(url)[source]

This is used by the runner mapper (a.k.a. dynamic runner) and recovery methods to have runners convert URLs to destinations.

New-style runner plugin IDs must match the URL’s scheme for this to work.

class galaxy.jobs.handler.JobHandler(app)[source]

Bases: object

Handle the preparation, running, tracking, and finishing of jobs

shutdown()[source]
start()[source]
class galaxy.jobs.handler.JobHandlerQueue(app, dispatcher)[source]

Bases: object

Job Handler’s Internal Queue, this is what actually implements waiting for jobs to be runnable and dispatching to a JobRunner.

STOP_SIGNAL = <object object>
get_total_job_count_per_destination()[source]
get_user_job_count(user_id)[source]
get_user_job_count_per_destination(user_id)[source]
increase_running_job_count(user_id, destination_id)[source]
job_pair_for_id(id)[source]
job_wrapper(job, use_persisted_destination=False)[source]
put(job_id, tool_id)[source]

Add a job to the queue (by job identifier)

shutdown()[source]

Attempts to gracefully shut down the worker thread

start()[source]

Starts the JobHandler’s thread after checking for any unhandled jobs.

class galaxy.jobs.handler.JobHandlerStopQueue(app, dispatcher)[source]

Bases: object

A queue for jobs which need to be terminated prematurely.

STOP_SIGNAL = <object object>
monitor()[source]

Continually iterate the waiting jobs, stop any that are found.

monitor_step()[source]

Called repeatedly by monitor to stop jobs.

put(job_id, error_msg=None)[source]
shutdown()[source]

Attempts to gracefully shut down the worker thread

manager Module

Top-level Galaxy job manager, moves jobs to handler(s)

class galaxy.jobs.manager.JobManager(app)[source]

Bases: object

Highest level interface to job management.

TODO: Currently the app accesses “job_queue” and “job_stop_queue” directly.
This should be decoupled.
shutdown()[source]
start()[source]
class galaxy.jobs.manager.NoopHandler(*args, **kwargs)[source]

Bases: object

shutdown(*args)[source]
start()[source]

mapper Module

exception galaxy.jobs.mapper.JobMappingException(failure_message)[source]

Bases: exceptions.Exception

exception galaxy.jobs.mapper.JobNotReadyException(job_state=None, message=None)[source]

Bases: exceptions.Exception

class galaxy.jobs.mapper.JobRunnerMapper(job_wrapper, url_to_destination, job_config)[source]

Bases: object

This class is responsible to managing the mapping of jobs (in the form of job_wrappers) to job runner url strings.

cache_job_destination(raw_job_destination)[source]
get_job_destination(params)[source]

Cache the job_destination to avoid recalculation.

transfer_manager Module

Manage transfers from arbitrary URLs to temporary files. Socket interface for IPC with multiple process configurations.

class galaxy.jobs.transfer_manager.Sleeper[source]

Bases: object

Provides a ‘sleep’ method that sleeps for a number of seconds unless the notify method is called (from a different thread).

sleep(seconds)[source]
wake()[source]
class galaxy.jobs.transfer_manager.TransferManager(app)[source]

Bases: object

Manage simple data transfers from URLs to temporary locations.

get_state(transfer_jobs, via_socket=False)[source]
new(path=None, **kwd)[source]
run(transfer_jobs)[source]

This method blocks, so if invoking the transfer manager ever starts taking too long, we should move it to a thread. However, the transfer_manager will either daemonize or return after submitting to a running daemon, so it should be fairly quick to return.

shutdown()[source]