datatypes Package¶
assembly
Module¶
velvet datatypes James E Johnson - University of Minnesota for velvet assembler tool in galaxy
-
class
galaxy.datatypes.assembly.
Amos
(**kwd)[source]¶ Bases:
galaxy.datatypes.data.Text
Class describing the AMOS assembly file
-
file_ext
= 'afg'¶
-
metadata_spec
= dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0'¶
-
-
class
galaxy.datatypes.assembly.
Roadmaps
(**kwd)[source]¶ Bases:
galaxy.datatypes.data.Text
Class describing the Sequences file generated by velveth
-
metadata_spec
= dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0'¶
-
-
class
galaxy.datatypes.assembly.
Sequences
(**kwd)[source]¶ Bases:
galaxy.datatypes.sequence.Fasta
Class describing the Sequences file generated by velveth
-
metadata_spec
= dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', sequences (MetadataParameter): Number of sequences, defaults to '0'¶
-
-
class
galaxy.datatypes.assembly.
Velvet
(**kwd)[source]¶ Bases:
galaxy.datatypes.images.Html
-
allow_datatype_change
= False¶
-
composite_type
= 'auto_primary_file'¶
-
file_ext
= 'html'¶
-
metadata_spec
= dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', base_name (MetadataParameter): base name for velveth dataset, defaults to 'velvet', paired_end_reads (MetadataParameter): has paired-end reads, defaults to 'False', long_reads (MetadataParameter): has long reads, defaults to 'False', short2_reads (MetadataParameter): has 2nd short reads, defaults to 'False'¶
-
binary
Module¶
Binary classes
-
class
galaxy.datatypes.binary.
Ab1
(**kwd)[source]¶ Bases:
galaxy.datatypes.binary.Binary
Class describing an ab1 binary sequence file
-
file_ext
= 'ab1'¶
-
metadata_spec
= dbkey (DBKeyParameter): Database/Build, defaults to '?'¶
-
-
class
galaxy.datatypes.binary.
Bam
(**kwd)[source]¶ Bases:
galaxy.datatypes.binary.Binary
Class describing a BAM binary file
-
data_sources
= {'index': 'bigwig', 'data': 'bai'}¶
-
dataproviders
= {'chunk64': <function chunk64_dataprovider at 0x7f85def345f0>, 'id-seq-qual': <function id_seq_qual_dataprovider at 0x7f85df3ff7d0>, 'header': <function header_dataprovider at 0x7f85df3ff668>, 'column': <function column_dataprovider at 0x7f85df3ff398>, 'chunk': <function chunk_dataprovider at 0x7f85def34488>, 'samtools': <function samtools_dataprovider at 0x7f85df3ffc08>, 'regex-line': <function regex_line_dataprovider at 0x7f85df3ff230>, 'genomic-region': <function genomic_region_dataprovider at 0x7f85df3ff938>, 'base': <function base_dataprovider at 0x7f85def34320>, 'dict': <function dict_dataprovider at 0x7f85df3ff500>, 'line': <function line_dataprovider at 0x7f85df3ff0c8>, 'genomic-region-dict': <function genomic_region_dict_dataprovider at 0x7f85df3ffaa0>}¶
-
file_ext
= 'bam'¶
-
groom_dataset_content
(file_name)[source]¶ Ensures that the Bam file contents are sorted. This function is called on an output dataset after the content is initially generated.
-
metadata_spec
= dbkey (DBKeyParameter): Database/Build, defaults to '?', bam_index (FileParameter): BAM Index File, defaults to 'None', bam_version (MetadataParameter): BAM Version, defaults to 'None', sort_order (MetadataParameter): Sort Order, defaults to 'None', read_groups (MetadataParameter): Read Groups, defaults to '[]', reference_names (MetadataParameter): Chromosome Names, defaults to '[]', reference_lengths (MetadataParameter): Chromosome Lengths, defaults to '[]', bam_header (MetadataParameter): Dictionary of BAM Headers, defaults to '{}'¶
-
samtools_dataprovider
(*args, **kwargs)[source]¶ Generic samtools interface - all options available through settings.
-
track_type
= 'ReadTrack'¶
-
-
class
galaxy.datatypes.binary.
Bcf
(**kwd)[source]¶ Bases:
galaxy.datatypes.binary.Binary
Class describing a BCF file
-
file_ext
= 'bcf'¶
-
metadata_spec
= dbkey (DBKeyParameter): Database/Build, defaults to '?'¶
-
-
class
galaxy.datatypes.binary.
BigBed
(**kwd)[source]¶ Bases:
galaxy.datatypes.binary.BigWig
BigBed support from UCSC.
-
data_sources
= {'data_standalone': 'bigbed'}¶
-
metadata_spec
= dbkey (DBKeyParameter): Database/Build, defaults to '?'¶
-
-
class
galaxy.datatypes.binary.
BigWig
(**kwd)[source]¶ Bases:
galaxy.datatypes.binary.Binary
Accessing binary BigWig files from UCSC. The supplemental info in the paper has the binary details: http://bioinformatics.oxfordjournals.org/cgi/content/abstract/btq351v1
-
data_sources
= {'data_standalone': 'bigwig'}¶
-
metadata_spec
= dbkey (DBKeyParameter): Database/Build, defaults to '?'¶
-
track_type
= 'LineTrack'¶
-
-
class
galaxy.datatypes.binary.
Binary
(**kwd)[source]¶ Bases:
galaxy.datatypes.data.Data
Binary data
-
display_data
(trans, dataset, preview=False, filename=None, to_ext=None, size=None, offset=None, **kwd)[source]¶
-
metadata_spec
= dbkey (DBKeyParameter): Database/Build, defaults to '?'¶
-
sniffable_binary_formats
= [{'ext': 'bam', 'type': 'bam', 'class': <class 'galaxy.datatypes.binary.Bam'>}, {'ext': 'bcf', 'type': 'bcf', 'class': <class 'galaxy.datatypes.binary.Bcf'>}, {'ext': 'sff', 'type': 'sff', 'class': <class 'galaxy.datatypes.binary.Sff'>}, {'ext': 'bigwig', 'type': 'bigwig', 'class': <class 'galaxy.datatypes.binary.BigWig'>}, {'ext': 'bigbed', 'type': 'bigbed', 'class': <class 'galaxy.datatypes.binary.BigBed'>}, {'ext': 'twobit', 'type': 'twobit', 'class': <class 'galaxy.datatypes.binary.TwoBit'>}, {'ext': 'gemini.sqlite', 'type': 'gemini.sqlite', 'class': <class 'galaxy.datatypes.binary.GeminiSQLite'>}, {'ext': 'sqlite', 'type': 'sqlite', 'class': <class 'galaxy.datatypes.binary.SQlite'>}, {'ext': 'xlsx', 'type': 'xlsx', 'class': <class 'galaxy.datatypes.binary.Xlsx'>}, {'ext': 'sra', 'type': 'sra', 'class': <class 'galaxy.datatypes.binary.Sra'>}, {'ext': 'pdf', 'type': 'pdf', 'class': <class 'galaxy.datatypes.images.Pdf'>}]¶
-
unsniffable_binary_formats
= ['ab1', 'compressed_archive', 'asn1-binary', 'h5', 'scf']¶
-
-
class
galaxy.datatypes.binary.
CompressedArchive
(**kwd)[source]¶ Bases:
galaxy.datatypes.binary.Binary
Class describing an compressed binary file This class can be sublass’ed to implement archive filetypes that will not be unpacked by upload.py.
-
compressed
= True¶
-
file_ext
= 'compressed_archive'¶
-
metadata_spec
= dbkey (DBKeyParameter): Database/Build, defaults to '?'¶
-
-
class
galaxy.datatypes.binary.
GeminiSQLite
(**kwd)[source]¶ Bases:
galaxy.datatypes.binary.SQlite
Class describing a Gemini Sqlite database
-
file_ext
= 'gemini.sqlite'¶
-
metadata_spec
= dbkey (DBKeyParameter): Database/Build, defaults to '?', tables (ListParameter): Database Tables, defaults to '[]', table_columns (DictParameter): Database Table Columns, defaults to '{}', table_row_count (DictParameter): Database Table Row Count, defaults to '{}', gemini_version (MetadataParameter): Gemini Version, defaults to '0.10.0'¶
-
-
class
galaxy.datatypes.binary.
GenericAsn1Binary
(**kwd)[source]¶ Bases:
galaxy.datatypes.binary.Binary
Class for generic ASN.1 binary format
-
file_ext
= 'asn1-binary'¶
-
metadata_spec
= dbkey (DBKeyParameter): Database/Build, defaults to '?'¶
-
-
class
galaxy.datatypes.binary.
H5
(**kwd)[source]¶ Bases:
galaxy.datatypes.binary.Binary
Class describing an HDF5 file
-
file_ext
= 'h5'¶
-
metadata_spec
= dbkey (DBKeyParameter): Database/Build, defaults to '?'¶
-
-
class
galaxy.datatypes.binary.
SQlite
(**kwd)[source]¶ Bases:
galaxy.datatypes.binary.Binary
Class describing a Sqlite database
-
dataproviders
= {'chunk64': <function chunk64_dataprovider at 0x7f85def345f0>, 'chunk': <function chunk_dataprovider at 0x7f85def34488>, 'sqlite': <function sqlite_dataprovider at 0x7f85df4018c0>, 'base': <function base_dataprovider at 0x7f85def34320>, 'sqlite-dict': <function sqlite_datadictprovider at 0x7f85df401b90>, 'sqlite-table': <function sqlite_datatableprovider at 0x7f85df401a28>}¶
-
file_ext
= 'sqlite'¶
-
metadata_spec
= dbkey (DBKeyParameter): Database/Build, defaults to '?', tables (ListParameter): Database Tables, defaults to '[]', table_columns (DictParameter): Database Table Columns, defaults to '{}', table_row_count (DictParameter): Database Table Row Count, defaults to '{}'¶
-
-
class
galaxy.datatypes.binary.
Scf
(**kwd)[source]¶ Bases:
galaxy.datatypes.binary.Binary
Class describing an scf binary sequence file
-
file_ext
= 'scf'¶
-
metadata_spec
= dbkey (DBKeyParameter): Database/Build, defaults to '?'¶
-
-
class
galaxy.datatypes.binary.
Sff
(**kwd)[source]¶ Bases:
galaxy.datatypes.binary.Binary
Standard Flowgram Format (SFF)
-
file_ext
= 'sff'¶
-
metadata_spec
= dbkey (DBKeyParameter): Database/Build, defaults to '?'¶
-
-
class
galaxy.datatypes.binary.
Sra
(**kwd)[source]¶ Bases:
galaxy.datatypes.binary.Binary
Sequence Read Archive (SRA) datatype originally from mdshw5/sra-tools-galaxy
-
file_ext
= 'sra'¶
-
metadata_spec
= dbkey (DBKeyParameter): Database/Build, defaults to '?'¶
-
sniff
(filename)[source]¶ The first 8 bytes of any NCBI sra file is ‘NCBI.sra’, and the file is binary. For details about the format, see http://www.ncbi.nlm.nih.gov/books/n/helpsra/SRA_Overview_BK/#SRA_Overview_BK.4_SRA_Data_Structure
-
-
class
galaxy.datatypes.binary.
TwoBit
(**kwd)[source]¶ Bases:
galaxy.datatypes.binary.Binary
Class describing a TwoBit format nucleotide file
-
file_ext
= 'twobit'¶
-
metadata_spec
= dbkey (DBKeyParameter): Database/Build, defaults to '?'¶
-
checkers
Module¶
chrominfo
Module¶
-
class
galaxy.datatypes.chrominfo.
ChromInfo
(**kwd)[source]¶ Bases:
galaxy.datatypes.tabular.Tabular
-
file_ext
= 'len'¶
-
metadata_spec
= dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', comment_lines (MetadataParameter): Number of comment lines, defaults to '0', columns (MetadataParameter): Number of columns, defaults to '0', column_types (ColumnTypesParameter): Column types, defaults to '[]', column_names (MetadataParameter): Column names, defaults to '[]', chrom (ColumnParameter): Chrom column, defaults to '1', length (ColumnParameter): Length column, defaults to '2'¶
-
coverage
Module¶
Coverage datatypes
-
class
galaxy.datatypes.coverage.
LastzCoverage
(**kwd)[source]¶ Bases:
galaxy.datatypes.tabular.Tabular
-
file_ext
= 'coverage'¶
-
metadata_spec
= dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', comment_lines (MetadataParameter): Number of comment lines, defaults to '0', columns (MetadataParameter): Number of columns, defaults to '3', column_types (ColumnTypesParameter): Column types, defaults to '[]', column_names (MetadataParameter): Column names, defaults to '[]', chromCol (ColumnParameter): Chrom column, defaults to '1', positionCol (ColumnParameter): Position column, defaults to '2', forwardCol (ColumnParameter): Forward or aggregate read column, defaults to '3', reverseCol (ColumnParameter): Optional reverse read column, defaults to 'None'¶
-
data
Module¶
-
class
galaxy.datatypes.data.
Data
(**kwd)[source]¶ Bases:
object
Base class for all datatypes. Implements basic interfaces as well as class methods for metadata.
>>> class DataTest( Data ): ... MetadataElement( name="test" ) ... >>> DataTest.metadata_spec.test.name 'test' >>> DataTest.metadata_spec.test.desc 'test' >>> type( DataTest.metadata_spec.test.param ) <class 'galaxy.datatypes.metadata.MetadataParameter'>
-
CHUNKABLE
= False¶
-
add_display_app
(app_id, label, file_function, links_function)[source]¶ Adds a display app to the datatype. app_id is a unique id label is the primary display label, e.g., display at ‘UCSC’ file_function is a string containing the name of the function that returns a properly formatted display links_function is a string containing the name of the function that returns a list of (link_name,link)
-
after_setting_metadata
(dataset)[source]¶ This function is called on the dataset after metadata is set.
-
allow_datatype_change
= True¶
-
as_display_type
(dataset, type, **kwd)[source]¶ Returns modified file contents for a particular display type
-
before_setting_metadata
(dataset)[source]¶ This function is called on the dataset before metadata is set.
-
composite_files
= {}¶
-
composite_type
= None¶
-
convert_dataset
(trans, original_dataset, target_type, return_output=False, visible=True, deps=None, set_output_history=True)[source]¶ This function adds a job to the queue to convert a dataset to another type. Returns a message about success/failure.
-
copy_safe_peek
= True¶
-
data_sources
= {}¶
-
dataprovider
(dataset, data_format, **settings)[source]¶ Base dataprovider factory for all datatypes that returns the proper provider for the given data_format or raises a NoProviderAvailable.
-
dataproviders
= {'chunk64': <function chunk64_dataprovider at 0x7f85def345f0>, 'base': <function base_dataprovider at 0x7f85def34320>, 'chunk': <function chunk_dataprovider at 0x7f85def34488>}¶
-
dataset_content_needs_grooming
(file_name)[source]¶ This function is called on an output dataset file after the content is initially generated.
-
display_data
(trans, data, preview=False, filename=None, to_ext=None, size=None, offset=None, **kwd)[source]¶ Old display method, for transition - though still used by API and test framework. Datatypes should be very careful if overridding this method and this interface between datatypes and Galaxy will likely change.
TOOD: Document alternatives to overridding this method (data providers?).
-
find_conversion_destination
(dataset, accepted_formats, datatypes_registry, **kwd)[source]¶ Returns ( target_ext, existing converted dataset )
-
get_converter_types
(original_dataset, datatypes_registry)[source]¶ Returns available converters by type for this dataset
-
get_display_links
(dataset, type, app, base_url, target_frame='_blank', **kwd)[source]¶ Returns a list of tuples of (name, link) for a particular display type. No check on ‘access’ permissions is done here - if you can view the dataset, you can also save it or send it to a destination outside of Galaxy, so Galaxy security restrictions do not apply anyway.
-
get_raw_data
(dataset)[source]¶ Returns the full data. To stream it open the file_name and read/write as needed
-
groom_dataset_content
(file_name)[source]¶ This function is called on an output dataset file if dataset_content_needs_grooming returns True.
-
has_resolution
¶
-
is_binary
= True¶
-
matches_any
(target_datatypes)[source]¶ Check if this datatype is of any of the target_datatypes or is a subtype thereof.
-
max_optional_metadata_filesize
¶
-
static
merge
(split_files, output_file)[source]¶ Merge files with copy.copyfileobj() will not hit the max argument limitation of cat. gz and bz2 files are also working.
-
metadata_spec
= dbkey (DBKeyParameter): Database/Build, defaults to '?'¶ dictionary of metadata fields for this datatype:
-
missing_meta
(dataset, check=[], skip=[])[source]¶ Checks for empty metadata values, Returns True if non-optional metadata is missing Specifying a list of ‘check’ values will only check those names provided; when used, optionality is ignored Specifying a list of ‘skip’ items will return True even when a named metadata value is missing
-
primary_file_name
= 'index'¶
-
repair_methods
(dataset)[source]¶ Unimplemented method, returns dict with method/option for repairing errors
-
set_meta
(dataset, overwrite=True, **kwd)[source]¶ Unimplemented method, allows guessing of metadata from contents of file
-
supported_display_apps
= {}¶
-
track_type
= None¶
-
writable_files
¶
-
-
class
galaxy.datatypes.data.
DataMeta
(name, bases, dict_)[source]¶ Bases:
type
Metaclass for Data class. Sets up metadata spec.
-
class
galaxy.datatypes.data.
GenericAsn1
(**kwd)[source]¶ Bases:
galaxy.datatypes.data.Text
Class for generic ASN.1 text format
-
file_ext
= 'asn1'¶
-
metadata_spec
= dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0'¶
-
-
class
galaxy.datatypes.data.
LineCount
(**kwd)[source]¶ Bases:
galaxy.datatypes.data.Text
Dataset contains a single line with a single integer that denotes the line count for a related dataset. Used for custom builds.
-
metadata_spec
= dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0'¶
-
-
class
galaxy.datatypes.data.
Newick
(**kwd)[source]¶ Bases:
galaxy.datatypes.data.Text
New Hampshire/Newick Format
-
file_ext
= 'nhx'¶
-
metadata_spec
= dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0'¶
-
-
class
galaxy.datatypes.data.
Nexus
(**kwd)[source]¶ Bases:
galaxy.datatypes.data.Text
Nexus format as used By Paup, Mr Bayes, etc
-
file_ext
= 'nex'¶
-
metadata_spec
= dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0'¶
-
-
class
galaxy.datatypes.data.
Text
(**kwd)[source]¶ Bases:
galaxy.datatypes.data.Data
-
count_data_lines
(dataset)[source]¶ Count the number of lines of data in dataset, skipping all blank lines and comments.
-
dataproviders
= {'chunk64': <function chunk64_dataprovider at 0x7f85def345f0>, 'base': <function base_dataprovider at 0x7f85def34320>, 'line': <function line_dataprovider at 0x7f85def34b90>, 'chunk': <function chunk_dataprovider at 0x7f85def34488>, 'regex-line': <function regex_line_dataprovider at 0x7f85def34cf8>}¶
-
estimate_file_lines
(dataset)[source]¶ Perform a rough estimate by extrapolating number of lines from a small read.
-
file_ext
= 'txt'¶
-
line_class
= 'line'¶ Add metadata elements
-
line_dataprovider
(*args, **kwargs)[source]¶ Returns an iterator over the dataset’s lines (that have been `strip`ed) optionally excluding blank lines and lines that start with a comment character.
-
metadata_spec
= dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0'¶
-
regex_line_dataprovider
(*args, **kwargs)[source]¶ Returns an iterator over the dataset’s lines optionally including/excluding lines that match one or more regex filters.
-
set_peek
(dataset, line_count=None, is_multi_byte=False, WIDTH=256, skipchars=[])[source]¶ Set the peek. This method is used by various subclasses of Text.
-
-
galaxy.datatypes.data.
get_file_peek
(file_name, is_multi_byte=False, WIDTH=256, LINE_COUNT=5, skipchars=[])[source]¶ Returns the first LINE_COUNT lines wrapped to WIDTH
## >>> fname = get_test_fname(‘4.bed’) ## >>> get_file_peek(fname) ## ‘chr22 30128507 31828507 uc003bnx.1_cds_2_0_chr22_29227_f 0 +
‘
genetics
Module¶
rgenetics datatypes Use at your peril Ross Lazarus for the rgenetics and galaxy projects
genome graphs datatypes derived from Interval datatypes genome graphs datasets have a header row with appropriate columnames The first column is always the marker - eg columname = rs, first row= rs12345 if the rows are snps subsequent row values are all numeric ! Will fail if any non numeric (eg ‘+’ or ‘NA’) values ross lazarus for rgenetics august 20 2007
-
class
galaxy.datatypes.genetics.
Affybatch
(**kwd)[source]¶ Bases:
galaxy.datatypes.genetics.RexpBase
derived class for BioC data structures in Galaxy
-
file_ext
= 'affybatch'¶
-
metadata_spec
= dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', columns (MetadataParameter): Number of columns, defaults to '0', column_names (MetadataParameter): Column names, defaults to '[]', pheCols (MetadataParameter): Select list for potentially interesting variables, defaults to '[]', base_name (MetadataParameter): base name for all transformed versions of this expression dataset, defaults to 'rexpression', pheno_path (MetadataParameter): Path to phenotype data for this experiment, defaults to 'rexpression.pheno'¶
-
-
class
galaxy.datatypes.genetics.
Eigenstratgeno
(**kwd)[source]¶ Bases:
galaxy.datatypes.genetics.Rgenetics
Eigenstrat format - may be able to get rid of this if we move to shellfish Rgenetics data collections
-
file_ext
= 'eigenstratgeno'¶
-
metadata_spec
= dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', base_name (MetadataParameter): base name for all transformed versions of this genetic dataset, defaults to 'RgeneticsData'¶
-
-
class
galaxy.datatypes.genetics.
Eigenstratpca
(**kwd)[source]¶ Bases:
galaxy.datatypes.genetics.Rgenetics
Eigenstrat PCA file for case control adjustment Rgenetics data collections
-
file_ext
= 'eigenstratpca'¶
-
metadata_spec
= dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', base_name (MetadataParameter): base name for all transformed versions of this genetic dataset, defaults to 'RgeneticsData'¶
-
-
class
galaxy.datatypes.genetics.
Eset
(**kwd)[source]¶ Bases:
galaxy.datatypes.genetics.RexpBase
derived class for BioC data structures in Galaxy
-
file_ext
= 'eset'¶
-
metadata_spec
= dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', columns (MetadataParameter): Number of columns, defaults to '0', column_names (MetadataParameter): Column names, defaults to '[]', pheCols (MetadataParameter): Select list for potentially interesting variables, defaults to '[]', base_name (MetadataParameter): base name for all transformed versions of this expression dataset, defaults to 'rexpression', pheno_path (MetadataParameter): Path to phenotype data for this experiment, defaults to 'rexpression.pheno'¶
-
-
class
galaxy.datatypes.genetics.
Fped
(**kwd)[source]¶ Bases:
galaxy.datatypes.genetics.Rgenetics
FBAT pedigree format - single file, map is header row of rs numbers. Strange. Rgenetics data collections
-
file_ext
= 'fped'¶
-
metadata_spec
= dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', base_name (MetadataParameter): base name for all transformed versions of this genetic dataset, defaults to 'RgeneticsData'¶
-
-
class
galaxy.datatypes.genetics.
Fphe
(**kwd)[source]¶ Bases:
galaxy.datatypes.genetics.Rgenetics
fbat pedigree file - mad format with ! as first char on header row Rgenetics data collections
-
file_ext
= 'fphe'¶
-
metadata_spec
= dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', base_name (MetadataParameter): base name for all transformed versions of this genetic dataset, defaults to 'RgeneticsData'¶
-
-
class
galaxy.datatypes.genetics.
GenomeGraphs
(**kwd)[source]¶ Bases:
galaxy.datatypes.tabular.Tabular
Tab delimited data containing a marker id and any number of numeric values
-
file_ext
= 'gg'¶
-
metadata_spec
= dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', comment_lines (MetadataParameter): Number of comment lines, defaults to '0', columns (MetadataParameter): Number of columns, defaults to '3', column_types (MetadataParameter): Column types, defaults to '[]', column_names (MetadataParameter): Column names, defaults to '[]', markerCol (ColumnParameter): Marker ID column, defaults to '1'¶
-
ucsc_links
(dataset, type, app, base_url)[source]¶ from the ever-helpful angie hinrichs angie@soe.ucsc.edu a genome graphs call looks like this
http://genome.ucsc.edu/cgi-bin/hgGenome?clade=mammal&org=Human&db=hg18&hgGenome_dataSetName=dname &hgGenome_dataSetDescription=test&hgGenome_formatType=best%20guess&hgGenome_markerType=best%20guess &hgGenome_columnLabels=best%20guess&hgGenome_maxVal=&hgGenome_labelVals= &hgGenome_maxGapToFill=25000000&hgGenome_uploadFile=http://galaxy.esphealth.org/datasets/333/display/index &hgGenome_doSubmitUpload=submit
Galaxy gives this for an interval file
http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg18&position=chr1:1-1000&hgt.customText= http%3A%2F%2Fgalaxy.esphealth.org%2Fdisplay_as%3Fid%3D339%26display_app%3Ducsc
-
-
class
galaxy.datatypes.genetics.
Lped
(**kwd)[source]¶ Bases:
galaxy.datatypes.genetics.Rgenetics
linkage pedigree (ped,map) Rgenetics data collections
-
file_ext
= 'lped'¶
-
metadata_spec
= dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', base_name (MetadataParameter): base name for all transformed versions of this genetic dataset, defaults to 'RgeneticsData'¶
-
-
class
galaxy.datatypes.genetics.
MAlist
(**kwd)[source]¶ Bases:
galaxy.datatypes.genetics.RexpBase
derived class for BioC data structures in Galaxy
-
file_ext
= 'malist'¶
-
metadata_spec
= dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', columns (MetadataParameter): Number of columns, defaults to '0', column_names (MetadataParameter): Column names, defaults to '[]', pheCols (MetadataParameter): Select list for potentially interesting variables, defaults to '[]', base_name (MetadataParameter): base name for all transformed versions of this expression dataset, defaults to 'rexpression', pheno_path (MetadataParameter): Path to phenotype data for this experiment, defaults to 'rexpression.pheno'¶
-
-
class
galaxy.datatypes.genetics.
Pbed
(**kwd)[source]¶ Bases:
galaxy.datatypes.genetics.Rgenetics
Plink Binary compressed 2bit/geno Rgenetics data collections
-
file_ext
= 'pbed'¶
-
metadata_spec
= dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', base_name (MetadataParameter): base name for all transformed versions of this genetic dataset, defaults to 'RgeneticsData'¶
-
-
class
galaxy.datatypes.genetics.
Phe
(**kwd)[source]¶ Bases:
galaxy.datatypes.genetics.Rgenetics
Phenotype file
-
file_ext
= 'phe'¶
-
metadata_spec
= dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', base_name (MetadataParameter): base name for all transformed versions of this genetic dataset, defaults to 'RgeneticsData'¶
-
-
class
galaxy.datatypes.genetics.
Pheno
(**kwd)[source]¶ Bases:
galaxy.datatypes.tabular.Tabular
base class for pheno files
-
file_ext
= 'pheno'¶
-
metadata_spec
= dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', comment_lines (MetadataParameter): Number of comment lines, defaults to '0', columns (MetadataParameter): Number of columns, defaults to '0', column_types (ColumnTypesParameter): Column types, defaults to '[]', column_names (MetadataParameter): Column names, defaults to '[]'¶
-
-
class
galaxy.datatypes.genetics.
Pphe
(**kwd)[source]¶ Bases:
galaxy.datatypes.genetics.Rgenetics
Plink phenotype file - header must have FID IID... Rgenetics data collections
-
file_ext
= 'pphe'¶
-
metadata_spec
= dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', base_name (MetadataParameter): base name for all transformed versions of this genetic dataset, defaults to 'RgeneticsData'¶
-
-
class
galaxy.datatypes.genetics.
RexpBase
(**kwd)[source]¶ Bases:
galaxy.datatypes.images.Html
base class for BioC data structures in Galaxy must be constructed with the pheno data in place since that goes into the metadata for each instance
-
allow_datatype_change
= False¶
-
composite_type
= 'auto_primary_file'¶
-
file_ext
= 'rexpbase'¶
-
generate_primary_file
(dataset=None)[source]¶ This is called only at upload to write the html file cannot rename the datasets here - they come with the default unfortunately
-
get_file_peek
(filename)[source]¶ can’t really peek at a filename - need the extra_files_path and such?
-
get_phecols
(phenolist=[], maxConc=20)[source]¶ sept 2009: cannot use whitespace to split - make a more complex structure here and adjust the methods that rely on this structure return interesting phenotype column names for an rexpression eset or affybatch to use in array subsetting and so on. Returns a data structure for a dynamic Galaxy select parameter. A column with only 1 value doesn’t change, so is not interesting for analysis. A column with a different value in every row is equivalent to a unique identifier so is also not interesting for anova or limma analysis - both these are removed after the concordance (count of unique terms) is constructed for each column. Then a complication - each remaining pair of columns is tested for redundancy - if two columns are always paired, then only one is needed :)
-
get_pheno
(dataset)[source]¶ expects a .pheno file in the extra_files_dir - ugh note that R is wierd and adds the row.name in the header so the columns are all wrong - unless you tell it not to. A file can be written as write.table(file=’foo.pheno’,pData(foo),sep=’ ‘,quote=F,row.names=F)
-
html_table
= None¶
-
is_binary
= True¶
-
make_html_table
(pp='nothing supplied from peek\n')[source]¶ Create HTML table, used for displaying peek
-
metadata_spec
= dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', columns (MetadataParameter): Number of columns, defaults to '0', column_names (MetadataParameter): Column names, defaults to '[]', pheCols (MetadataParameter): Select list for potentially interesting variables, defaults to '[]', base_name (MetadataParameter): base name for all transformed versions of this expression dataset, defaults to 'rexpression', pheno_path (MetadataParameter): Path to phenotype data for this experiment, defaults to 'rexpression.pheno'¶
-
-
class
galaxy.datatypes.genetics.
Rgenetics
(**kwd)[source]¶ Bases:
galaxy.datatypes.images.Html
base class to use for rgenetics datatypes derived from html - composite datatype elements stored in extra files path
-
allow_datatype_change
= False¶
-
composite_type
= 'auto_primary_file'¶
-
file_ext
= 'rgenetics'¶
-
metadata_spec
= dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', base_name (MetadataParameter): base name for all transformed versions of this genetic dataset, defaults to 'RgeneticsData'¶
-
-
class
galaxy.datatypes.genetics.
SNPMatrix
(**kwd)[source]¶ Bases:
galaxy.datatypes.genetics.Rgenetics
BioC SNPMatrix Rgenetics data collections
-
file_ext
= 'snpmatrix'¶
-
metadata_spec
= dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', base_name (MetadataParameter): base name for all transformed versions of this genetic dataset, defaults to 'RgeneticsData'¶
-
-
class
galaxy.datatypes.genetics.
Snptest
(**kwd)[source]¶ Bases:
galaxy.datatypes.genetics.Rgenetics
BioC snptest Rgenetics data collections
-
file_ext
= 'snptest'¶
-
metadata_spec
= dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', base_name (MetadataParameter): base name for all transformed versions of this genetic dataset, defaults to 'RgeneticsData'¶
-
-
class
galaxy.datatypes.genetics.
ldIndep
(**kwd)[source]¶ Bases:
galaxy.datatypes.genetics.Rgenetics
LD (a good measure of redundancy of information) depleted Plink Binary compressed 2bit/geno This is really a plink binary, but some tools work better with less redundancy so are constrained to these files
-
file_ext
= 'ldreduced'¶
-
metadata_spec
= dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', base_name (MetadataParameter): base name for all transformed versions of this genetic dataset, defaults to 'RgeneticsData'¶
-
-
class
galaxy.datatypes.genetics.
rgFeatureList
(**kwd)[source]¶ Bases:
galaxy.datatypes.genetics.rgTabList
for featureid lists of exclusions or inclusions in the clean tool output from QC eg low maf, high missingness, bad hwe in controls, excess mendel errors,... featureid subsets on statistical criteria -> specialized display such as gg same infrastructure for expression?
-
file_ext
= 'rgFList'¶
-
metadata_spec
= dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', comment_lines (MetadataParameter): Number of comment lines, defaults to '0', columns (MetadataParameter): Number of columns, defaults to '0', column_types (ColumnTypesParameter): Column types, defaults to '[]', column_names (MetadataParameter): Column names, defaults to '[]'¶
-
-
class
galaxy.datatypes.genetics.
rgSampleList
(**kwd)[source]¶ Bases:
galaxy.datatypes.genetics.rgTabList
for sampleid exclusions or inclusions in the clean tool output from QC eg excess het, gender error, ibd pair member,eigen outlier,excess mendel errors,... since they can be uploaded, should be flexible but they are persistent at least same infrastructure for expression?
-
file_ext
= 'rgSList'¶
-
metadata_spec
= dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', comment_lines (MetadataParameter): Number of comment lines, defaults to '0', columns (MetadataParameter): Number of columns, defaults to '0', column_types (ColumnTypesParameter): Column types, defaults to '[]', column_names (MetadataParameter): Column names, defaults to '[]'¶
-
-
class
galaxy.datatypes.genetics.
rgTabList
(**kwd)[source]¶ Bases:
galaxy.datatypes.tabular.Tabular
for sampleid and for featureid lists of exclusions or inclusions in the clean tool featureid subsets on statistical criteria -> specialized display such as gg
-
file_ext
= 'rgTList'¶
-
metadata_spec
= dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', comment_lines (MetadataParameter): Number of comment lines, defaults to '0', columns (MetadataParameter): Number of columns, defaults to '0', column_types (ColumnTypesParameter): Column types, defaults to '[]', column_names (MetadataParameter): Column names, defaults to '[]'¶
-
images
Module¶
Image classes
-
class
galaxy.datatypes.images.
Bmp
(**kwd)[source]¶ Bases:
galaxy.datatypes.images.Image
-
file_ext
= 'bmp'¶
-
metadata_spec
= dbkey (DBKeyParameter): Database/Build, defaults to '?'¶
-
-
class
galaxy.datatypes.images.
Eps
(**kwd)[source]¶ Bases:
galaxy.datatypes.images.Image
-
file_ext
= 'eps'¶
-
metadata_spec
= dbkey (DBKeyParameter): Database/Build, defaults to '?'¶
-
-
class
galaxy.datatypes.images.
Gif
(**kwd)[source]¶ Bases:
galaxy.datatypes.images.Image
-
file_ext
= 'gif'¶
-
metadata_spec
= dbkey (DBKeyParameter): Database/Build, defaults to '?'¶
-
-
class
galaxy.datatypes.images.
Gmaj
(**kwd)[source]¶ Bases:
galaxy.datatypes.data.Data
Class describing a GMAJ Applet
-
copy_safe_peek
= False¶
-
file_ext
= 'gmaj.zip'¶
-
metadata_spec
= dbkey (DBKeyParameter): Database/Build, defaults to '?'¶
-
-
class
galaxy.datatypes.images.
Html
(**kwd)[source]¶ Bases:
galaxy.datatypes.data.Text
Class describing an html file
-
file_ext
= 'html'¶
-
metadata_spec
= dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0'¶
-
-
class
galaxy.datatypes.images.
Im
(**kwd)[source]¶ Bases:
galaxy.datatypes.images.Image
-
file_ext
= 'im'¶
-
metadata_spec
= dbkey (DBKeyParameter): Database/Build, defaults to '?'¶
-
-
class
galaxy.datatypes.images.
Image
(**kwd)[source]¶ Bases:
galaxy.datatypes.data.Data
Class describing an image
-
metadata_spec
= dbkey (DBKeyParameter): Database/Build, defaults to '?'¶
-
-
class
galaxy.datatypes.images.
Jpg
(**kwd)[source]¶ Bases:
galaxy.datatypes.images.Image
-
file_ext
= 'jpg'¶
-
metadata_spec
= dbkey (DBKeyParameter): Database/Build, defaults to '?'¶
-
-
class
galaxy.datatypes.images.
Laj
(**kwd)[source]¶ Bases:
galaxy.datatypes.data.Text
Class describing a LAJ Applet
-
copy_safe_peek
= False¶
-
file_ext
= 'laj'¶
-
metadata_spec
= dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0'¶
-
-
class
galaxy.datatypes.images.
Pbm
(**kwd)[source]¶ Bases:
galaxy.datatypes.images.Image
-
file_ext
= 'pbm'¶
-
metadata_spec
= dbkey (DBKeyParameter): Database/Build, defaults to '?'¶
-
-
class
galaxy.datatypes.images.
Pcd
(**kwd)[source]¶ Bases:
galaxy.datatypes.images.Image
-
file_ext
= 'pcd'¶
-
metadata_spec
= dbkey (DBKeyParameter): Database/Build, defaults to '?'¶
-
-
class
galaxy.datatypes.images.
Pcx
(**kwd)[source]¶ Bases:
galaxy.datatypes.images.Image
-
file_ext
= 'pcx'¶
-
metadata_spec
= dbkey (DBKeyParameter): Database/Build, defaults to '?'¶
-
-
class
galaxy.datatypes.images.
Pdf
(**kwd)[source]¶ Bases:
galaxy.datatypes.images.Image
-
file_ext
= 'pdf'¶
-
metadata_spec
= dbkey (DBKeyParameter): Database/Build, defaults to '?'¶
-
-
class
galaxy.datatypes.images.
Pgm
(**kwd)[source]¶ Bases:
galaxy.datatypes.images.Image
-
file_ext
= 'pgm'¶
-
metadata_spec
= dbkey (DBKeyParameter): Database/Build, defaults to '?'¶
-
-
class
galaxy.datatypes.images.
Png
(**kwd)[source]¶ Bases:
galaxy.datatypes.images.Image
-
file_ext
= 'png'¶
-
metadata_spec
= dbkey (DBKeyParameter): Database/Build, defaults to '?'¶
-
-
class
galaxy.datatypes.images.
Ppm
(**kwd)[source]¶ Bases:
galaxy.datatypes.images.Image
-
file_ext
= 'ppm'¶
-
metadata_spec
= dbkey (DBKeyParameter): Database/Build, defaults to '?'¶
-
-
class
galaxy.datatypes.images.
Psd
(**kwd)[source]¶ Bases:
galaxy.datatypes.images.Image
-
file_ext
= 'psd'¶
-
metadata_spec
= dbkey (DBKeyParameter): Database/Build, defaults to '?'¶
-
-
class
galaxy.datatypes.images.
Rast
(**kwd)[source]¶ Bases:
galaxy.datatypes.images.Image
-
file_ext
= 'rast'¶
-
metadata_spec
= dbkey (DBKeyParameter): Database/Build, defaults to '?'¶
-
-
class
galaxy.datatypes.images.
Rgb
(**kwd)[source]¶ Bases:
galaxy.datatypes.images.Image
-
file_ext
= 'rgb'¶
-
metadata_spec
= dbkey (DBKeyParameter): Database/Build, defaults to '?'¶
-
-
class
galaxy.datatypes.images.
Tiff
(**kwd)[source]¶ Bases:
galaxy.datatypes.images.Image
-
file_ext
= 'tiff'¶
-
metadata_spec
= dbkey (DBKeyParameter): Database/Build, defaults to '?'¶
-
-
class
galaxy.datatypes.images.
Xbm
(**kwd)[source]¶ Bases:
galaxy.datatypes.images.Image
-
file_ext
= 'xbm'¶
-
metadata_spec
= dbkey (DBKeyParameter): Database/Build, defaults to '?'¶
-
-
class
galaxy.datatypes.images.
Xpm
(**kwd)[source]¶ Bases:
galaxy.datatypes.images.Image
-
file_ext
= 'xpm'¶
-
metadata_spec
= dbkey (DBKeyParameter): Database/Build, defaults to '?'¶
-
interval
Module¶
Interval datatypes
-
class
galaxy.datatypes.interval.
Bed
(**kwd)[source]¶ Bases:
galaxy.datatypes.interval.Interval
Tab delimited data in BED format
-
as_ucsc_display_file
(dataset, **kwd)[source]¶ Returns file contents with only the bed data. If bed 6+, treat as interval.
-
data_sources
= {'index': 'bigwig', 'data': 'tabix', 'feature_search': 'fli'}¶
-
file_ext
= 'bed'¶
-
metadata_spec
= dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', comment_lines (MetadataParameter): Number of comment lines, defaults to '0', columns (MetadataParameter): Number of columns, defaults to '3', column_types (ColumnTypesParameter): Column types, defaults to '[]', column_names (MetadataParameter): Column names, defaults to '[]', chromCol (ColumnParameter): Chrom column, defaults to '1', startCol (ColumnParameter): Start column, defaults to '2', endCol (ColumnParameter): End column, defaults to '3', strandCol (ColumnParameter): Strand column (click box & select), defaults to 'None', nameCol (ColumnParameter): Name/Identifier column (click box & select), defaults to 'None', viz_filter_cols (ColumnParameter): Score column for visualization, defaults to '[4]'¶
-
set_meta
(dataset, overwrite=True, **kwd)[source]¶ Sets the metadata information for datasets previously determined to be in bed format.
-
sniff
(filename)[source]¶ Checks for ‘bedness’
BED lines have three required fields and nine additional optional fields. The number of fields per line must be consistent throughout any single set of data in an annotation track. The order of the optional fields is binding: lower-numbered fields must always be populated if higher-numbered fields are used. The data type of all 12 columns is: 1-str, 2-int, 3-int, 4-str, 5-int, 6-str, 7-int, 8-int, 9-int or list, 10-int, 11-list, 12-list
For complete details see http://genome.ucsc.edu/FAQ/FAQformat#format1
>>> fname = get_test_fname( 'test_tab.bed' ) >>> Bed().sniff( fname ) True >>> fname = get_test_fname( 'interval1.bed' ) >>> Bed().sniff( fname ) True >>> fname = get_test_fname( 'complete.bed' ) >>> Bed().sniff( fname ) True
-
track_type
= 'FeatureTrack'¶ Add metadata elements
-
-
class
galaxy.datatypes.interval.
Bed12
(**kwd)[source]¶ Bases:
galaxy.datatypes.interval.BedStrict
Tab delimited data in strict BED format - no non-standard columns allowed; column count forced to 12
-
file_ext
= 'bed12'¶
-
metadata_spec
= dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', comment_lines (MetadataParameter): Number of comment lines, defaults to '0', columns (MetadataParameter): Number of columns, defaults to '3', column_types (ColumnTypesParameter): Column types, defaults to '[]', column_names (MetadataParameter): Column names, defaults to '[]', chromCol (MetadataParameter): Chrom column, defaults to '1', startCol (MetadataParameter): Start column, defaults to '2', endCol (MetadataParameter): End column, defaults to '3', strandCol (MetadataParameter): Strand column (click box & select), defaults to 'None', nameCol (MetadataParameter): Name/Identifier column (click box & select), defaults to 'None', viz_filter_cols (ColumnParameter): Score column for visualization, defaults to '[4]'¶
-
-
class
galaxy.datatypes.interval.
Bed6
(**kwd)[source]¶ Bases:
galaxy.datatypes.interval.BedStrict
Tab delimited data in strict BED format - no non-standard columns allowed; column count forced to 6
-
file_ext
= 'bed6'¶
-
metadata_spec
= dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', comment_lines (MetadataParameter): Number of comment lines, defaults to '0', columns (MetadataParameter): Number of columns, defaults to '3', column_types (ColumnTypesParameter): Column types, defaults to '[]', column_names (MetadataParameter): Column names, defaults to '[]', chromCol (MetadataParameter): Chrom column, defaults to '1', startCol (MetadataParameter): Start column, defaults to '2', endCol (MetadataParameter): End column, defaults to '3', strandCol (MetadataParameter): Strand column (click box & select), defaults to 'None', nameCol (MetadataParameter): Name/Identifier column (click box & select), defaults to 'None', viz_filter_cols (ColumnParameter): Score column for visualization, defaults to '[4]'¶
-
-
class
galaxy.datatypes.interval.
BedGraph
(**kwd)[source]¶ Bases:
galaxy.datatypes.interval.Interval
Tab delimited chrom/start/end/datavalue dataset
-
as_ucsc_display_file
(dataset, **kwd)[source]¶ Returns file contents as is with no modifications. TODO: this is a functional stub and will need to be enhanced moving forward to provide additional support for bedgraph.
-
data_sources
= {'index': 'bigwig', 'data': 'bigwig'}¶
-
file_ext
= 'bedgraph'¶
-
get_estimated_display_viewport
(dataset, chrom_col=0, start_col=1, end_col=2)[source]¶ Set viewport based on dataset’s first 100 lines.
-
metadata_spec
= dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', comment_lines (MetadataParameter): Number of comment lines, defaults to '0', columns (MetadataParameter): Number of columns, defaults to '3', column_types (ColumnTypesParameter): Column types, defaults to '[]', column_names (MetadataParameter): Column names, defaults to '[]', chromCol (ColumnParameter): Chrom column, defaults to '1', startCol (ColumnParameter): Start column, defaults to '2', endCol (ColumnParameter): End column, defaults to '3', strandCol (ColumnParameter): Strand column (click box & select), defaults to 'None', nameCol (ColumnParameter): Name/Identifier column (click box & select), defaults to 'None'¶
-
track_type
= 'LineTrack'¶
-
-
class
galaxy.datatypes.interval.
BedStrict
(**kwd)[source]¶ Bases:
galaxy.datatypes.interval.Bed
Tab delimited data in strict BED format - no non-standard columns allowed
-
allow_datatype_change
= False¶
-
file_ext
= 'bedstrict'¶
-
metadata_spec
= dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', comment_lines (MetadataParameter): Number of comment lines, defaults to '0', columns (MetadataParameter): Number of columns, defaults to '3', column_types (ColumnTypesParameter): Column types, defaults to '[]', column_names (MetadataParameter): Column names, defaults to '[]', chromCol (MetadataParameter): Chrom column, defaults to '1', startCol (MetadataParameter): Start column, defaults to '2', endCol (MetadataParameter): End column, defaults to '3', strandCol (MetadataParameter): Strand column (click box & select), defaults to 'None', nameCol (MetadataParameter): Name/Identifier column (click box & select), defaults to 'None', viz_filter_cols (ColumnParameter): Score column for visualization, defaults to '[4]'¶
-
-
class
galaxy.datatypes.interval.
ChromatinInteractions
(**kwd)[source]¶ Bases:
galaxy.datatypes.interval.Interval
Chromatin interactions obtained from 3C/5C/Hi-C experiments.
-
column_names
= ['Chrom1', 'Start1', 'End1', 'Chrom2', 'Start2', 'End2', 'Value']¶ Add metadata elements
-
data_sources
= {'index': 'bigwig', 'data': 'tabix'}¶
-
file_ext
= 'chrint'¶
-
metadata_spec
= dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', comment_lines (MetadataParameter): Number of comment lines, defaults to '0', columns (MetadataParameter): Number of columns, defaults to '7', column_types (ColumnTypesParameter): Column types, defaults to '[]', column_names (MetadataParameter): Column names, defaults to '[]', chromCol (ColumnParameter): Chrom column, defaults to '1', startCol (ColumnParameter): Start column, defaults to '2', endCol (ColumnParameter): End column, defaults to '3', strandCol (ColumnParameter): Strand column (click box & select), defaults to 'None', nameCol (ColumnParameter): Name/Identifier column (click box & select), defaults to 'None', chrom1Col (ColumnParameter): Chrom1 column, defaults to '1', start1Col (ColumnParameter): Start1 column, defaults to '2', end1Col (ColumnParameter): End1 column, defaults to '3', chrom2Col (ColumnParameter): Chrom2 column, defaults to '4', start2Col (ColumnParameter): Start2 column, defaults to '5', end2Col (ColumnParameter): End2 column, defaults to '6', valueCol (ColumnParameter): Value column, defaults to '7'¶
-
track_type
= 'DiagonalHeatmapTrack'¶
-
-
class
galaxy.datatypes.interval.
CustomTrack
(**kwd)[source]¶ Bases:
galaxy.datatypes.tabular.Tabular
UCSC CustomTrack
-
file_ext
= 'customtrack'¶
-
get_estimated_display_viewport
(dataset, chrom_col=None, start_col=None, end_col=None)[source]¶ Return a chrom, start, stop tuple for viewing a file.
-
metadata_spec
= dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', comment_lines (MetadataParameter): Number of comment lines, defaults to '0', columns (MetadataParameter): Number of columns, defaults to '0', column_types (ColumnTypesParameter): Column types, defaults to '[]', column_names (MetadataParameter): Column names, defaults to '[]'¶
-
sniff
(filename)[source]¶ Determines whether the file is in customtrack format.
CustomTrack files are built within Galaxy and are basically bed or interval files with the first line looking something like this.
track name=”User Track” description=”User Supplied Track (from Galaxy)” color=0,0,0 visibility=1
>>> fname = get_test_fname( 'complete.bed' ) >>> CustomTrack().sniff( fname ) False >>> fname = get_test_fname( 'ucsc.customtrack' ) >>> CustomTrack().sniff( fname ) True
-
-
class
galaxy.datatypes.interval.
ENCODEPeak
(**kwd)[source]¶ Bases:
galaxy.datatypes.interval.Interval
Human ENCODE peak format. There are both broad and narrow peak formats. Formats are very similar; narrow peak has an additional column, though.
Broad peak ( http://genome.ucsc.edu/FAQ/FAQformat#format13 ): This format is used to provide called regions of signal enrichment based on pooled, normalized (interpreted) data. It is a BED 6+3 format.
Narrow peak http://genome.ucsc.edu/FAQ/FAQformat#format12 and : This format is used to provide called peaks of signal enrichment based on pooled, normalized (interpreted) data. It is a BED6+4 format.
-
column_names
= ['Chrom', 'Start', 'End', 'Name', 'Score', 'Strand', 'SignalValue', 'pValue', 'qValue', 'Peak']¶
-
data_sources
= {'index': 'bigwig', 'data': 'tabix'}¶ Add metadata elements
-
file_ext
= 'encodepeak'¶
-
metadata_spec
= dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', comment_lines (MetadataParameter): Number of comment lines, defaults to '0', columns (MetadataParameter): Number of columns, defaults to '3', column_types (ColumnTypesParameter): Column types, defaults to '[]', column_names (MetadataParameter): Column names, defaults to '[]', chromCol (ColumnParameter): Chrom column, defaults to '1', startCol (ColumnParameter): Start column, defaults to '2', endCol (ColumnParameter): End column, defaults to '3', strandCol (ColumnParameter): Strand column (click box & select), defaults to 'None', nameCol (ColumnParameter): Name/Identifier column (click box & select), defaults to 'None'¶
-
-
class
galaxy.datatypes.interval.
Gff
(**kwd)[source]¶ Bases:
galaxy.datatypes.tabular.Tabular
,galaxy.datatypes.interval._RemoteCallMixin
Tab delimited data in Gff format
-
column_names
= ['Seqname', 'Source', 'Feature', 'Start', 'End', 'Score', 'Strand', 'Frame', 'Group']¶
-
data_sources
= {'index': 'bigwig', 'data': 'interval_index', 'feature_search': 'fli'}¶
-
dataproviders
= {'dataset-column': <function dataset_column_dataprovider at 0x7f85df3fcd70>, 'chunk64': <function chunk64_dataprovider at 0x7f85def345f0>, 'genomic-region-dict': <function genomic_region_dict_dataprovider at 0x7f85dee6f5f0>, 'column': <function column_dataprovider at 0x7f85df3fcc08>, 'interval-dict': <function interval_dict_dataprovider at 0x7f85dee6f8c0>, 'chunk': <function chunk_dataprovider at 0x7f85def34488>, 'interval': <function interval_dataprovider at 0x7f85dee6f758>, 'regex-line': <function regex_line_dataprovider at 0x7f85def34cf8>, 'genomic-region': <function genomic_region_dataprovider at 0x7f85dee6f488>, 'base': <function base_dataprovider at 0x7f85def34320>, 'dict': <function dict_dataprovider at 0x7f85df3fced8>, 'dataset-dict': <function dataset_dict_dataprovider at 0x7f85df4080c8>, 'line': <function line_dataprovider at 0x7f85def34b90>}¶
-
file_ext
= 'gff'¶
-
get_estimated_display_viewport
(dataset)[source]¶ Return a chrom, start, stop tuple for viewing a file. There are slight differences between gff 2 and gff 3 formats. This function should correctly handle both...
-
metadata_spec
= dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', comment_lines (MetadataParameter): Number of comment lines, defaults to '0', columns (MetadataParameter): Number of columns, defaults to '9', column_types (ColumnTypesParameter): Column types, defaults to '['str', 'str', 'str', 'int', 'int', 'int', 'str', 'str', 'str']', column_names (MetadataParameter): Column names, defaults to '[]', attributes (MetadataParameter): Number of attributes, defaults to '0', attribute_types (DictParameter): Attribute types, defaults to '{}'¶
-
sniff
(filename)[source]¶ Determines whether the file is in gff format
GFF lines have nine required fields that must be tab-separated.
For complete details see http://genome.ucsc.edu/FAQ/FAQformat#format3
>>> fname = get_test_fname( 'gff_version_3.gff' ) >>> Gff().sniff( fname ) False >>> fname = get_test_fname( 'test.gff' ) >>> Gff().sniff( fname ) True
-
track_type
= 'FeatureTrack'¶ Add metadata elements
-
-
class
galaxy.datatypes.interval.
Gff3
(**kwd)[source]¶ Bases:
galaxy.datatypes.interval.Gff
Tab delimited data in Gff3 format
-
column_names
= ['Seqid', 'Source', 'Type', 'Start', 'End', 'Score', 'Strand', 'Phase', 'Attributes']¶
-
file_ext
= 'gff3'¶
-
metadata_spec
= dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', comment_lines (MetadataParameter): Number of comment lines, defaults to '0', columns (MetadataParameter): Number of columns, defaults to '9', column_types (ColumnTypesParameter): Column types, defaults to '['str', 'str', 'str', 'int', 'int', 'float', 'str', 'int', 'list']', column_names (MetadataParameter): Column names, defaults to '[]', attributes (MetadataParameter): Number of attributes, defaults to '0', attribute_types (DictParameter): Attribute types, defaults to '{}'¶
-
sniff
(filename)[source]¶ Determines whether the file is in gff version 3 format
GFF 3 format:
- adds a mechanism for representing more than one level of hierarchical grouping of features and subfeatures.
- separates the ideas of group membership and feature name/id
- constrains the feature type field to be taken from a controlled vocabulary.
- allows a single feature, such as an exon, to belong to more than one group at a time.
- provides an explicit convention for pairwise alignments
- provides an explicit convention for features that occupy disjunct regions
The format consists of 9 columns, separated by tabs (NOT spaces).
Undefined fields are replaced with the ”.” character, as described in the original GFF spec.
For complete details see http://song.sourceforge.net/gff3.shtml
>>> fname = get_test_fname( 'test.gff' ) >>> Gff3().sniff( fname ) False >>> fname = get_test_fname('gff_version_3.gff') >>> Gff3().sniff( fname ) True
-
track_type
= 'FeatureTrack'¶ Add metadata elements
-
valid_gff3_phase
= ['.', '0', '1', '2']¶
-
valid_gff3_strand
= ['+', '-', '.', '?']¶
-
-
class
galaxy.datatypes.interval.
Gtf
(**kwd)[source]¶ Bases:
galaxy.datatypes.interval.Gff
Tab delimited data in Gtf format
-
column_names
= ['Seqname', 'Source', 'Feature', 'Start', 'End', 'Score', 'Strand', 'Frame', 'Attributes']¶
-
file_ext
= 'gtf'¶
-
metadata_spec
= dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', comment_lines (MetadataParameter): Number of comment lines, defaults to '0', columns (MetadataParameter): Number of columns, defaults to '9', column_types (ColumnTypesParameter): Column types, defaults to '['str', 'str', 'str', 'int', 'int', 'float', 'str', 'int', 'list']', column_names (MetadataParameter): Column names, defaults to '[]', attributes (MetadataParameter): Number of attributes, defaults to '0', attribute_types (DictParameter): Attribute types, defaults to '{}'¶
-
sniff
(filename)[source]¶ Determines whether the file is in gtf format
GTF lines have nine required fields that must be tab-separated. The first eight GTF fields are the same as GFF. The group field has been expanded into a list of attributes. Each attribute consists of a type/value pair. Attributes must end in a semi-colon, and be separated from any following attribute by exactly one space. The attribute list must begin with the two mandatory attributes:
gene_id value - A globally unique identifier for the genomic source of the sequence. transcript_id value - A globally unique identifier for the predicted transcript.For complete details see http://genome.ucsc.edu/FAQ/FAQformat#format4
>>> fname = get_test_fname( '1.bed' ) >>> Gtf().sniff( fname ) False >>> fname = get_test_fname( 'test.gff' ) >>> Gtf().sniff( fname ) False >>> fname = get_test_fname( 'test.gtf' ) >>> Gtf().sniff( fname ) True
-
track_type
= 'FeatureTrack'¶ Add metadata elements
-
-
class
galaxy.datatypes.interval.
Interval
(**kwd)[source]¶ Bases:
galaxy.datatypes.tabular.Tabular
Tab delimited data containing interval information
-
data_sources
= {'index': 'bigwig', 'data': 'tabix'}¶ Add metadata elements
-
dataproviders
= {'dataset-column': <function dataset_column_dataprovider at 0x7f85df3fcd70>, 'chunk64': <function chunk64_dataprovider at 0x7f85def345f0>, 'genomic-region-dict': <function genomic_region_dict_dataprovider at 0x7f85dee63758>, 'column': <function column_dataprovider at 0x7f85df3fcc08>, 'interval-dict': <function interval_dict_dataprovider at 0x7f85dee63a28>, 'chunk': <function chunk_dataprovider at 0x7f85def34488>, 'interval': <function interval_dataprovider at 0x7f85dee638c0>, 'regex-line': <function regex_line_dataprovider at 0x7f85def34cf8>, 'genomic-region': <function genomic_region_dataprovider at 0x7f85dee635f0>, 'base': <function base_dataprovider at 0x7f85def34320>, 'dict': <function dict_dataprovider at 0x7f85df3fced8>, 'dataset-dict': <function dataset_dict_dataprovider at 0x7f85df4080c8>, 'line': <function line_dataprovider at 0x7f85def34b90>}¶
-
file_ext
= 'interval'¶
-
get_estimated_display_viewport
(dataset, chrom_col=None, start_col=None, end_col=None)[source]¶ Return a chrom, start, stop tuple for viewing a file.
-
get_track_window
(dataset, data, start, end)[source]¶ Assumes the incoming track data is sorted already.
-
line_class
= 'region'¶
-
metadata_spec
= dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', comment_lines (MetadataParameter): Number of comment lines, defaults to '0', columns (MetadataParameter): Number of columns, defaults to '3', column_types (ColumnTypesParameter): Column types, defaults to '[]', column_names (MetadataParameter): Column names, defaults to '[]', chromCol (ColumnParameter): Chrom column, defaults to '1', startCol (ColumnParameter): Start column, defaults to '2', endCol (ColumnParameter): End column, defaults to '3', strandCol (ColumnParameter): Strand column (click box & select), defaults to 'None', nameCol (ColumnParameter): Name/Identifier column (click box & select), defaults to 'None'¶
-
set_meta
(dataset, overwrite=True, first_line_is_header=False, **kwd)[source]¶ Tries to guess from the line the location number of the column for the chromosome, region start-end and strand
-
sniff
(filename)[source]¶ Checks for ‘intervalness’
This format is mostly used by galaxy itself. Valid interval files should include a valid header comment, but this seems to be loosely regulated.
>>> fname = get_test_fname( 'test_space.txt' ) >>> Interval().sniff( fname ) False >>> fname = get_test_fname( 'interval.interval' ) >>> Interval().sniff( fname ) True
-
track_type
= 'FeatureTrack'¶
-
-
class
galaxy.datatypes.interval.
Wiggle
(**kwd)[source]¶ Bases:
galaxy.datatypes.tabular.Tabular
,galaxy.datatypes.interval._RemoteCallMixin
Tab delimited data in wiggle format
-
data_sources
= {'index': 'bigwig', 'data': 'bigwig'}¶
-
dataproviders
= {'dataset-column': <function dataset_column_dataprovider at 0x7f85df3fcd70>, 'chunk64': <function chunk64_dataprovider at 0x7f85def345f0>, 'wiggle-dict': <function wiggle_dict_dataprovider at 0x7f85dee75230>, 'column': <function column_dataprovider at 0x7f85df3fcc08>, 'chunk': <function chunk_dataprovider at 0x7f85def34488>, 'regex-line': <function regex_line_dataprovider at 0x7f85def34cf8>, 'wiggle': <function wiggle_dataprovider at 0x7f85dee750c8>, 'base': <function base_dataprovider at 0x7f85def34320>, 'dict': <function dict_dataprovider at 0x7f85df3fced8>, 'dataset-dict': <function dataset_dict_dataprovider at 0x7f85df4080c8>, 'line': <function line_dataprovider at 0x7f85def34b90>}¶
-
file_ext
= 'wig'¶
-
get_estimated_display_viewport
(dataset)[source]¶ Return a chrom, start, stop tuple for viewing a file.
-
metadata_spec
= dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', comment_lines (MetadataParameter): Number of comment lines, defaults to '0', columns (MetadataParameter): Number of columns, defaults to '3', column_types (ColumnTypesParameter): Column types, defaults to '[]', column_names (MetadataParameter): Column names, defaults to '[]'¶
-
sniff
(filename)[source]¶ Determines wether the file is in wiggle format
The .wig format is line-oriented. Wiggle data is preceeded by a track definition line, which adds a number of options for controlling the default display of this track. Following the track definition line is the track data, which can be entered in several different formats.
The track definition line begins with the word ‘track’ followed by the track type. The track type with version is REQUIRED, and it currently must be wiggle_0. For example, track type=wiggle_0...
For complete details see http://genome.ucsc.edu/goldenPath/help/wiggle.html
>>> fname = get_test_fname( 'interval1.bed' ) >>> Wiggle().sniff( fname ) False >>> fname = get_test_fname( 'wiggle.wig' ) >>> Wiggle().sniff( fname ) True
-
track_type
= 'LineTrack'¶
-
metadata
Module¶
Galaxy Metadata
-
class
galaxy.datatypes.metadata.
FileParameter
(spec)[source]¶ Bases:
galaxy.datatypes.metadata.MetadataParameter
-
from_external_value
(value, parent, path_rewriter=None)[source]¶ Turns a value read from a external dict into its value to be pushed directly into the metadata dict.
-
-
class
galaxy.datatypes.metadata.
JobExternalOutputMetadataWrapper
(job)[source]¶ Bases:
object
Class with methods allowing set_meta() to be called externally to the Galaxy head. This class allows access to external metadata filenames for all outputs associated with a job. We will use JSON as the medium of exchange of information, except for the DatasetInstance object which will use pickle (in the future this could be JSONified as well)
-
class
galaxy.datatypes.metadata.
MetadataCollection
(parent)[source]¶ Bases:
object
MetadataCollection is not a collection at all, but rather a proxy to the real metadata which is stored as a Dictionary. This class handles processing the metadata elements when they are set and retrieved, returning default values in cases when metadata is not set.
-
parent
¶
-
spec
¶
-
-
galaxy.datatypes.metadata.
MetadataElement
= <galaxy.datatypes.metadata.Statement object>¶ MetadataParameter sub-classes.
-
class
galaxy.datatypes.metadata.
MetadataElementSpec
(datatype, name=None, desc=None, param=<class 'galaxy.datatypes.metadata.MetadataParameter'>, default=None, no_value=None, visible=True, set_in_upload=False, **kwargs)[source]¶ Bases:
object
Defines a metadata element and adds it to the metadata_spec (which is a MetadataSpecCollection) of datatype.
-
class
galaxy.datatypes.metadata.
MetadataParameter
(spec)[source]¶ Bases:
object
-
from_external_value
(value, parent)[source]¶ Turns a value read from an external dict into its value to be pushed directly into the metadata dict.
-
get_html
(value, context=None, other_values=None, **kwd)[source]¶ The “context” is simply the metadata collection/bunch holding this piece of metadata. This is passed in to allow for metadata to validate against each other (note: this could turn into a huge, recursive mess if not done with care). For example, a column assignment should validate against the number of columns in the dataset.
-
classmethod
marshal
(value)[source]¶ This method should/can be overridden to convert the incoming value to whatever type it is supposed to be.
-
-
class
galaxy.datatypes.metadata.
MetadataSpecCollection
(dict=None)[source]¶ Bases:
galaxy.util.odict.odict
A simple extension of dict which allows cleaner access to items and allows the values to be iterated over directly as if it were a list. append() is also implemented for simplicity and does not “append”.
-
class
galaxy.datatypes.metadata.
MetadataTempFile
(**kwds)[source]¶ Bases:
object
-
file_name
¶
-
tmp_dir
= 'database/tmp'¶
-
ngsindex
Module¶
NGS indexes
-
class
galaxy.datatypes.ngsindex.
BowtieBaseIndex
(**kwd)[source]¶ Bases:
galaxy.datatypes.ngsindex.BowtieIndex
Bowtie base space index
-
file_ext
= 'bowtie_base_index'¶
-
metadata_spec
= dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', base_name (MetadataParameter): base name for this index set, defaults to 'galaxy_generated_bowtie_index', sequence_space (MetadataParameter): sequence_space for this index set, defaults to 'base'¶
-
-
class
galaxy.datatypes.ngsindex.
BowtieColorIndex
(**kwd)[source]¶ Bases:
galaxy.datatypes.ngsindex.BowtieIndex
Bowtie color space index
-
file_ext
= 'bowtie_color_index'¶
-
metadata_spec
= dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', base_name (MetadataParameter): base name for this index set, defaults to 'galaxy_generated_bowtie_index', sequence_space (MetadataParameter): sequence_space for this index set, defaults to 'color'¶
-
-
class
galaxy.datatypes.ngsindex.
BowtieIndex
(**kwd)[source]¶ Bases:
galaxy.datatypes.images.Html
base class for BowtieIndex is subclassed by BowtieColorIndex and BowtieBaseIndex
-
allow_datatype_change
= False¶
-
composite_type
= 'auto_primary_file'¶
-
file_ext
= 'bowtie_index'¶
-
generate_primary_file
(dataset=None)[source]¶ This is called only at upload to write the html file cannot rename the datasets here - they come with the default unfortunately
-
is_binary
= True¶
-
metadata_spec
= dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', base_name (MetadataParameter): base name for this index set, defaults to 'galaxy_generated_bowtie_index', sequence_space (MetadataParameter): sequence_space for this index set, defaults to 'unknown'¶
-
qualityscore
Module¶
Qualityscore class
-
class
galaxy.datatypes.qualityscore.
QualityScore
(**kwd)[source]¶ Bases:
galaxy.datatypes.data.Text
until we know more about quality score formats
-
file_ext
= 'qual'¶
-
metadata_spec
= dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0'¶
-
-
class
galaxy.datatypes.qualityscore.
QualityScore454
(**kwd)[source]¶ Bases:
galaxy.datatypes.qualityscore.QualityScore
until we know more about quality score formats
-
file_ext
= 'qual454'¶
-
metadata_spec
= dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0'¶
-
-
class
galaxy.datatypes.qualityscore.
QualityScoreIllumina
(**kwd)[source]¶ Bases:
galaxy.datatypes.qualityscore.QualityScore
until we know more about quality score formats
-
file_ext
= 'qualillumina'¶
-
metadata_spec
= dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0'¶
-
-
class
galaxy.datatypes.qualityscore.
QualityScoreSOLiD
(**kwd)[source]¶ Bases:
galaxy.datatypes.qualityscore.QualityScore
until we know more about quality score formats
-
file_ext
= 'qualsolid'¶
-
metadata_spec
= dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0'¶
-
-
class
galaxy.datatypes.qualityscore.
QualityScoreSolexa
(**kwd)[source]¶ Bases:
galaxy.datatypes.qualityscore.QualityScore
until we know more about quality score formats
-
file_ext
= 'qualsolexa'¶
-
metadata_spec
= dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0'¶
-
registry
Module¶
Provides mapping between extensions and datatypes, mime-types, etc.
-
class
galaxy.datatypes.registry.
Registry
[source]¶ Bases:
object
-
find_conversion_destination_for_dataset_by_extensions
(dataset, accepted_formats, converter_safe=True)[source]¶ Returns ( target_ext, existing converted dataset )
-
get_converter_by_target_type
(source_ext, target_ext)[source]¶ Returns a converter based on source and target datatypes
-
get_datatype_class_by_name
(name)[source]¶ Return the datatype class where the datatype’s type attribute (as defined in the datatype_conf.xml file) contains name.
-
get_mimetype_by_extension
(ext, default='application/octet-stream')[source]¶ Returns a mimetype based on an extension
-
get_upload_metadata_params
(context, group, tool)[source]¶ Returns dict of case value:inputs for metadata conditional for upload tool
-
integrated_datatypes_configs
¶
-
load_datatype_converters
(toolbox, installed_repository_dict=None, deactivate=False)[source]¶ If deactivate is False, add datatype converters from self.converters or self.proprietary_converters to the calling app’s toolbox. If deactivate is True, eliminates relevant converters from the calling app’s toolbox.
-
load_datatype_sniffers
(root, deactivate=False, handling_proprietary_datatypes=False, override=False)[source]¶ Process the sniffers element from a parsed a datatypes XML file located at root_dir/config (if processing the Galaxy distributed config) or contained within an installed Tool Shed repository. If deactivate is True, an installed Tool Shed repository that includes custom sniffers is being deactivated or uninstalled, so appropriate loaded sniffers will be removed from the registry. The value of override will be False when a Tool Shed repository is being installed. Since installation is occurring after the datatypes registry has been initialized at server startup, its contents cannot be overridden by newly introduced conflicting sniffers.
-
load_datatypes
(root_dir=None, config=None, deactivate=False, override=True)[source]¶ Parse a datatypes XML file located at root_dir/config (if processing the Galaxy distributed config) or contained within an installed Tool Shed repository. If deactivate is True, an installed Tool Shed repository that includes custom datatypes is being deactivated or uninstalled, so appropriate loaded datatypes will be removed from the registry. The value of override will be False when a Tool Shed repository is being installed. Since installation is occurring after the datatypes registry has been initialized at server startup, its contents cannot be overridden by newly introduced conflicting data types.
-
load_display_applications
(app, installed_repository_dict=None, deactivate=False)[source]¶ If deactivate is False, add display applications from self.display_app_containers or self.proprietary_display_app_containers to appropriate datatypes. If deactivate is True, eliminates relevant display applications from appropriate datatypes.
-
sequence
Module¶
Sequence classes
-
class
galaxy.datatypes.sequence.
Alignment
(**kwd)[source]¶ Bases:
galaxy.datatypes.data.Text
Class describing an alignment
-
metadata_spec
= dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', species (SelectParameter): Species, defaults to '[]'¶
-
-
class
galaxy.datatypes.sequence.
Axt
(**kwd)[source]¶ Bases:
galaxy.datatypes.data.Text
Class describing an axt alignment
-
file_ext
= 'axt'¶
-
metadata_spec
= dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0'¶
-
sniff
(filename)[source]¶ Determines whether the file is in axt format
axt alignment files are produced from Blastz, an alignment tool available from Webb Miller’s lab at Penn State University.
Each alignment block in an axt file contains three lines: a summary line and 2 sequence lines. Blocks are separated from one another by blank lines.
The summary line contains chromosomal position and size information about the alignment. It consists of 9 required fields.
The sequence lines contain the sequence of the primary assembly (line 2) and aligning assembly (line 3) with inserts. Repeats are indicated by lower-case letters.
For complete details see http://genome.ucsc.edu/goldenPath/help/axt.html
>>> fname = get_test_fname( 'alignment.axt' ) >>> Axt().sniff( fname ) True >>> fname = get_test_fname( 'alignment.lav' ) >>> Axt().sniff( fname ) False
-
-
class
galaxy.datatypes.sequence.
Fasta
(**kwd)[source]¶ Bases:
galaxy.datatypes.sequence.Sequence
Class representing a FASTA sequence
-
file_ext
= 'fasta'¶
-
metadata_spec
= dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', sequences (MetadataParameter): Number of sequences, defaults to '0'¶
-
sniff
(filename)[source]¶ Determines whether the file is in fasta format
A sequence in FASTA format consists of a single-line description, followed by lines of sequence data. The first character of the description line is a greater-than (“>”) symbol in the first column. All lines should be shorter than 80 characters
For complete details see http://www.ncbi.nlm.nih.gov/blast/fasta.shtml
Rules for sniffing as True:
We don’t care about line length (other than empty lines).
The first non-empty line must start with ‘>’ and the Very Next line.strip() must have sequence data and not be a header.
‘sequence data’ here is loosely defined as non-empty lines which do not start with ‘>’
This will cause Color Space FASTA (csfasta) to be detected as True (they are, after all, still FASTA files - they have a header line followed by sequence data)
Previously this method did some checking to determine if the sequence data had integers (presumably to differentiate between fasta and csfasta)
This should be done through sniff order, where csfasta (currently has a null sniff function) is detected for first (stricter definition) followed sometime after by fasta
We will only check that the first purported sequence is correctly formatted.
>>> fname = get_test_fname( 'sequence.maf' ) >>> Fasta().sniff( fname ) False >>> fname = get_test_fname( 'sequence.fasta' ) >>> Fasta().sniff( fname ) True
-
classmethod
split
(input_datasets, subdir_generator_function, split_params)[source]¶ Split a FASTA file sequence by sequence.
Note that even if split_mode=”number_of_parts”, the actual number of sub-files produced may not match that requested by split_size.
If split_mode=”to_size” then split_size is treated as the number of FASTA records to put in each sub-file (not size in bytes).
-
-
class
galaxy.datatypes.sequence.
Fastq
(**kwd)[source]¶ Bases:
galaxy.datatypes.sequence.Sequence
Class representing a generic FASTQ sequence
-
file_ext
= 'fastq'¶
-
metadata_spec
= dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', sequences (MetadataParameter): Number of sequences, defaults to '0'¶
-
static
process_split_file
(data)[source]¶ This is called in the context of an external process launched by a Task (possibly not on the Galaxy machine) to create the input files for the Task. The parameters: data - a dict containing the contents of the split file
-
set_meta
(dataset, **kwd)[source]¶ Set the number of sequences and the number of data lines in dataset. FIXME: This does not properly handle line wrapping
-
sniff
(filename)[source]¶ Determines whether the file is in generic fastq format For details, see http://maq.sourceforge.net/fastq.shtml
- Note: There are three kinds of FASTQ files, known as “Sanger” (sometimes called “Standard”), Solexa, and Illumina
- These differ in the representation of the quality scores
>>> fname = get_test_fname( '1.fastqsanger' ) >>> Fastq().sniff( fname ) True >>> fname = get_test_fname( '2.fastqsanger' ) >>> Fastq().sniff( fname ) True
-
-
class
galaxy.datatypes.sequence.
FastqCSSanger
(**kwd)[source]¶ Bases:
galaxy.datatypes.sequence.Fastq
Class representing a Color Space FASTQ sequence ( e.g a SOLiD variant )
-
file_ext
= 'fastqcssanger'¶
-
metadata_spec
= dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', sequences (MetadataParameter): Number of sequences, defaults to '0'¶
-
-
class
galaxy.datatypes.sequence.
FastqIllumina
(**kwd)[source]¶ Bases:
galaxy.datatypes.sequence.Fastq
Class representing a FASTQ sequence ( the Illumina 1.3+ variant )
-
file_ext
= 'fastqillumina'¶
-
metadata_spec
= dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', sequences (MetadataParameter): Number of sequences, defaults to '0'¶
-
-
class
galaxy.datatypes.sequence.
FastqSanger
(**kwd)[source]¶ Bases:
galaxy.datatypes.sequence.Fastq
Class representing a FASTQ sequence ( the Sanger variant )
-
file_ext
= 'fastqsanger'¶
-
metadata_spec
= dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', sequences (MetadataParameter): Number of sequences, defaults to '0'¶
-
-
class
galaxy.datatypes.sequence.
FastqSolexa
(**kwd)[source]¶ Bases:
galaxy.datatypes.sequence.Fastq
Class representing a FASTQ sequence ( the Solexa variant )
-
file_ext
= 'fastqsolexa'¶
-
metadata_spec
= dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', sequences (MetadataParameter): Number of sequences, defaults to '0'¶
-
-
class
galaxy.datatypes.sequence.
Lav
(**kwd)[source]¶ Bases:
galaxy.datatypes.data.Text
Class describing a LAV alignment
-
file_ext
= 'lav'¶
-
metadata_spec
= dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0'¶
-
sniff
(filename)[source]¶ Determines whether the file is in lav format
LAV is an alignment format developed by Webb Miller’s group. It is the primary output format for BLASTZ. The first line of a .lav file begins with #:lav.
For complete details see http://www.bioperl.org/wiki/LAV_alignment_format
>>> fname = get_test_fname( 'alignment.lav' ) >>> Lav().sniff( fname ) True >>> fname = get_test_fname( 'alignment.axt' ) >>> Lav().sniff( fname ) False
-
-
class
galaxy.datatypes.sequence.
Maf
(**kwd)[source]¶ Bases:
galaxy.datatypes.sequence.Alignment
Class describing a Maf alignment
-
file_ext
= 'maf'¶
-
metadata_spec
= dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', species (SelectParameter): Species, defaults to '[]', blocks (MetadataParameter): Number of blocks, defaults to '0', species_chromosomes (FileParameter): Species Chromosomes, defaults to 'None', maf_index (FileParameter): MAF Index File, defaults to 'None'¶
-
set_meta
(dataset, overwrite=True, **kwd)[source]¶ Parses and sets species, chromosomes, index from MAF file.
-
sniff
(filename)[source]¶ Determines wether the file is in maf format
The .maf format is line-oriented. Each multiple alignment ends with a blank line. Each sequence in an alignment is on a single line, which can get quite long, but there is no length limit. Words in a line are delimited by any white space. Lines starting with # are considered to be comments. Lines starting with ## can be ignored by most programs, but contain meta-data of one form or another.
The first line of a .maf file begins with ##maf. This word is followed by white-space-separated variable=value pairs. There should be no white space surrounding the “=”.
For complete details see http://genome.ucsc.edu/FAQ/FAQformat#format5
>>> fname = get_test_fname( 'sequence.maf' ) >>> Maf().sniff( fname ) True >>> fname = get_test_fname( 'sequence.fasta' ) >>> Maf().sniff( fname ) False
-
-
class
galaxy.datatypes.sequence.
MafCustomTrack
(**kwd)[source]¶ Bases:
galaxy.datatypes.data.Text
-
file_ext
= 'mafcustomtrack'¶
-
metadata_spec
= dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', vp_chromosome (MetadataParameter): Viewport Chromosome, defaults to 'chr1', vp_start (MetadataParameter): Viewport Start, defaults to '1', vp_end (MetadataParameter): Viewport End, defaults to '100'¶
-
-
class
galaxy.datatypes.sequence.
RNADotPlotMatrix
(**kwd)[source]¶ Bases:
galaxy.datatypes.data.Data
-
file_ext
= 'rna_eps'¶
-
metadata_spec
= dbkey (DBKeyParameter): Database/Build, defaults to '?'¶
-
-
class
galaxy.datatypes.sequence.
Sequence
(**kwd)[source]¶ Bases:
galaxy.datatypes.data.Text
Class describing a sequence
-
classmethod
do_fast_split
(input_datasets, toc_file_datasets, subdir_generator_function, split_params)[source]¶
-
static
get_split_commands_sequential
(is_compressed, input_name, output_name, start_sequence, sequence_count)[source]¶ Does a brain-dead sequential scan & extract of certain sequences >>> Sequence.get_split_commands_sequential(True, ‘./input.gz’, ‘./output.gz’, start_sequence=0, sequence_count=10) [‘zcat ”./input.gz” | ( tail -n +1 2> /dev/null) | head -40 | gzip -c > ”./output.gz”’] >>> Sequence.get_split_commands_sequential(False, ‘./input.fastq’, ‘./output.fastq’, start_sequence=10, sequence_count=10) [‘tail -n +41 ”./input.fastq” 2> /dev/null | head -40 > ”./output.fastq”’]
-
static
get_split_commands_with_toc
(input_name, output_name, toc_file, start_sequence, sequence_count)[source]¶ Uses a Table of Contents dict, parsed from an FQTOC file, to come up with a set of shell commands that will extract the parts necessary >>> three_sections=[dict(start=0, end=74, sequences=10), dict(start=74, end=148, sequences=10), dict(start=148, end=148+76, sequences=10)] >>> Sequence.get_split_commands_with_toc(‘./input.gz’, ‘./output.gz’, dict(sections=three_sections), start_sequence=0, sequence_count=10) [‘dd bs=1 skip=0 count=74 if=./input.gz 2> /dev/null >> ./output.gz’] >>> Sequence.get_split_commands_with_toc(‘./input.gz’, ‘./output.gz’, dict(sections=three_sections), start_sequence=1, sequence_count=5) [‘(dd bs=1 skip=0 count=74 if=./input.gz 2> /dev/null )| zcat | ( tail -n +5 2> /dev/null) | head -20 | gzip -c >> ./output.gz’] >>> Sequence.get_split_commands_with_toc(‘./input.gz’, ‘./output.gz’, dict(sections=three_sections), start_sequence=0, sequence_count=20) [‘dd bs=1 skip=0 count=148 if=./input.gz 2> /dev/null >> ./output.gz’] >>> Sequence.get_split_commands_with_toc(‘./input.gz’, ‘./output.gz’, dict(sections=three_sections), start_sequence=5, sequence_count=10) [‘(dd bs=1 skip=0 count=74 if=./input.gz 2> /dev/null )| zcat | ( tail -n +21 2> /dev/null) | head -20 | gzip -c >> ./output.gz’, ‘(dd bs=1 skip=74 count=74 if=./input.gz 2> /dev/null )| zcat | ( tail -n +1 2> /dev/null) | head -20 | gzip -c >> ./output.gz’] >>> Sequence.get_split_commands_with_toc(‘./input.gz’, ‘./output.gz’, dict(sections=three_sections), start_sequence=10, sequence_count=10) [‘dd bs=1 skip=74 count=74 if=./input.gz 2> /dev/null >> ./output.gz’] >>> Sequence.get_split_commands_with_toc(‘./input.gz’, ‘./output.gz’, dict(sections=three_sections), start_sequence=5, sequence_count=20) [‘(dd bs=1 skip=0 count=74 if=./input.gz 2> /dev/null )| zcat | ( tail -n +21 2> /dev/null) | head -20 | gzip -c >> ./output.gz’, ‘dd bs=1 skip=74 count=74 if=./input.gz 2> /dev/null >> ./output.gz’, ‘(dd bs=1 skip=148 count=76 if=./input.gz 2> /dev/null )| zcat | ( tail -n +1 2> /dev/null) | head -20 | gzip -c >> ./output.gz’]
-
metadata_spec
= dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', sequences (MetadataParameter): Number of sequences, defaults to '0'¶
-
set_meta
(dataset, **kwd)[source]¶ Set the number of sequences and the number of data lines in dataset.
-
classmethod
-
class
galaxy.datatypes.sequence.
SequenceSplitLocations
(**kwd)[source]¶ Bases:
galaxy.datatypes.data.Text
Class storing information about a sequence file composed of multiple gzip files concatenated as one OR an uncompressed file. In the GZIP case, each sub-file’s location is stored in start and end.
The format of the file is JSON:
{ "sections" : [ { "start" : "x", "end" : "y", "sequences" : "z" }, ... ]}
-
file_ext
= 'fqtoc'¶
-
metadata_spec
= dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0'¶
-
-
class
galaxy.datatypes.sequence.
csFasta
(**kwd)[source]¶ Bases:
galaxy.datatypes.sequence.Sequence
Class representing the SOLID Color-Space sequence ( csfasta )
-
file_ext
= 'csfasta'¶
-
metadata_spec
= dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', sequences (MetadataParameter): Number of sequences, defaults to '0'¶
-
sniff
Module¶
File format detector
-
exception
galaxy.datatypes.sniff.
InappropriateDatasetContentError
[source]¶ Bases:
exceptions.Exception
-
galaxy.datatypes.sniff.
check_newlines
(fname, bytes_to_read=52428800)[source]¶ Determines if there are any non-POSIX newlines in the first number_of_bytes (by default, 50MB) of the file.
-
galaxy.datatypes.sniff.
convert_newlines
(fname, in_place=True, tmp_dir=None, tmp_prefix=None)[source]¶ Converts in place a file from universal line endings to Posix line endings.
>>> fname = get_test_fname('temp.txt') >>> file(fname, 'wt').write("1 2\r3 4") >>> convert_newlines(fname, tmp_prefix="gxtest", tmp_dir=tempfile.gettempdir()) (2, None) >>> file(fname).read() '1 2\n3 4\n'
-
galaxy.datatypes.sniff.
convert_newlines_sep2tabs
(fname, in_place=True, patt='\\s+', tmp_dir=None, tmp_prefix=None)[source]¶ Combines above methods: convert_newlines() and sep2tabs() so that files do not need to be read twice
>>> fname = get_test_fname('temp.txt') >>> file(fname, 'wt').write("1 2\r3 4") >>> convert_newlines_sep2tabs(fname, tmp_prefix="gxtest", tmp_dir=tempfile.gettempdir()) (2, None) >>> file(fname).read() '1\t2\n3\t4\n'
-
galaxy.datatypes.sniff.
get_headers
(fname, sep, count=60, is_multi_byte=False)[source]¶ Returns a list with the first ‘count’ lines split by ‘sep’
>>> fname = get_test_fname('complete.bed') >>> get_headers(fname,'\t') [['chr7', '127475281', '127491632', 'NM_000230', '0', '+', '127486022', '127488767', '0', '3', '29,172,3225,', '0,10713,13126,'], ['chr7', '127486011', '127488900', 'D49487', '0', '+', '127486022', '127488767', '0', '2', '155,490,', '0,2399']]
-
galaxy.datatypes.sniff.
guess_ext
(fname, sniff_order=None, is_multi_byte=False)[source]¶ Returns an extension that can be used in the datatype factory to generate a data for the ‘fname’ file
>>> fname = get_test_fname('megablast_xml_parser_test1.blastxml') >>> guess_ext(fname) 'xml' >>> fname = get_test_fname('interval.interval') >>> guess_ext(fname) 'interval' >>> fname = get_test_fname('interval1.bed') >>> guess_ext(fname) 'bed' >>> fname = get_test_fname('test_tab.bed') >>> guess_ext(fname) 'bed' >>> fname = get_test_fname('sequence.maf') >>> guess_ext(fname) 'maf' >>> fname = get_test_fname('sequence.fasta') >>> guess_ext(fname) 'fasta' >>> fname = get_test_fname('file.html') >>> guess_ext(fname) 'html' >>> fname = get_test_fname('test.gtf') >>> guess_ext(fname) 'gtf' >>> fname = get_test_fname('test.gff') >>> guess_ext(fname) 'gff' >>> fname = get_test_fname('gff_version_3.gff') >>> guess_ext(fname) 'gff3' >>> fname = get_test_fname('temp.txt') >>> file(fname, 'wt').write("a\t2\nc\t1\nd\t0") >>> guess_ext(fname) 'tabular' >>> fname = get_test_fname('temp.txt') >>> file(fname, 'wt').write("a 1 2 x\nb 3 4 y\nc 5 6 z") >>> guess_ext(fname) 'txt' >>> fname = get_test_fname('test_tab1.tabular') >>> guess_ext(fname) 'tabular' >>> fname = get_test_fname('alignment.lav') >>> guess_ext(fname) 'lav' >>> fname = get_test_fname('1.sff') >>> guess_ext(fname) 'sff' >>> fname = get_test_fname('1.bam') >>> guess_ext(fname) 'bam' >>> fname = get_test_fname('3unsorted.bam') >>> guess_ext(fname) 'bam'
-
galaxy.datatypes.sniff.
handle_uploaded_dataset_file
(filename, datatypes_registry, ext='auto', is_multi_byte=False)[source]¶
-
galaxy.datatypes.sniff.
is_column_based
(fname, sep='\t', skip=0, is_multi_byte=False)[source]¶ Checks whether the file is column based with respect to a separator (defaults to tab separator).
>>> fname = get_test_fname('test.gff') >>> is_column_based(fname) True >>> fname = get_test_fname('test_tab.bed') >>> is_column_based(fname) True >>> is_column_based(fname, sep=' ') False >>> fname = get_test_fname('test_space.txt') >>> is_column_based(fname) False >>> is_column_based(fname, sep=' ') True >>> fname = get_test_fname('test_ensembl.tab') >>> is_column_based(fname) True >>> fname = get_test_fname('test_tab1.tabular') >>> is_column_based(fname, sep=' ', skip=0) False >>> fname = get_test_fname('test_tab1.tabular') >>> is_column_based(fname) True
-
galaxy.datatypes.sniff.
sep2tabs
(fname, in_place=True, patt='\\s+')[source]¶ Transforms in place a ‘sep’ separated file to a tab separated one
>>> fname = get_test_fname('temp.txt') >>> file(fname, 'wt').write("1 2\n3 4\n") >>> sep2tabs(fname) (2, None) >>> file(fname).read() '1\t2\n3\t4\n'
tabular
Module¶
Tabular datatype
-
class
galaxy.datatypes.tabular.
Eland
(**kwd)[source]¶ Bases:
galaxy.datatypes.tabular.Tabular
Support for the export.txt.gz file used by Illumina’s ELANDv2e aligner
-
file_ext
= '_export.txt.gz'¶
-
metadata_spec
= dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', comment_lines (MetadataParameter): Number of comments, defaults to '0', columns (MetadataParameter): Number of columns, defaults to '0', column_types (ColumnTypesParameter): Column types, defaults to '[]', column_names (MetadataParameter): Column names, defaults to '[]', tiles (ListParameter): Set of tiles, defaults to '[]', reads (ListParameter): Set of reads, defaults to '[]', lanes (ListParameter): Set of lanes, defaults to '[]', barcodes (ListParameter): Set of barcodes, defaults to '[]'¶
-
sniff
(filename)[source]¶ Determines whether the file is in ELAND export format
A file in ELAND export format consists of lines of tab-separated data. There is no header.
Rules for sniffing as True:
- There must be 22 columns on each line - LANE, TILEm X, Y, INDEX, READ_NO, SEQ, QUAL, POSITION, *STRAND, FILT must be correct - We will only check that up to the first 5 alignments are correctly formatted.
-
-
class
galaxy.datatypes.tabular.
ElandMulti
(**kwd)[source]¶ Bases:
galaxy.datatypes.tabular.Tabular
-
file_ext
= 'elandmulti'¶
-
metadata_spec
= dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', comment_lines (MetadataParameter): Number of comment lines, defaults to '0', columns (MetadataParameter): Number of columns, defaults to '0', column_types (ColumnTypesParameter): Column types, defaults to '[]', column_names (MetadataParameter): Column names, defaults to '[]'¶
-
-
class
galaxy.datatypes.tabular.
FeatureLocationIndex
(**kwd)[source]¶ Bases:
galaxy.datatypes.tabular.Tabular
An index that stores feature locations in tabular format.
-
file_ext
= 'fli'¶
-
metadata_spec
= dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', comment_lines (MetadataParameter): Number of comment lines, defaults to '0', columns (MetadataParameter): Number of columns, defaults to '2', column_types (ColumnTypesParameter): Column types, defaults to '['str', 'str']', column_names (MetadataParameter): Column names, defaults to '[]'¶
-
-
class
galaxy.datatypes.tabular.
Pileup
(**kwd)[source]¶ Bases:
galaxy.datatypes.tabular.Tabular
Tab delimited data in pileup (6- or 10-column) format
-
data_sources
= {'data': 'tabix'}¶ Add metadata elements
-
dataproviders
= {'dataset-column': <function dataset_column_dataprovider at 0x7f85df3fcd70>, 'chunk64': <function chunk64_dataprovider at 0x7f85def345f0>, 'genomic-region-dict': <function genomic_region_dict_dataprovider at 0x7f85df40c758>, 'column': <function column_dataprovider at 0x7f85df3fcc08>, 'chunk': <function chunk_dataprovider at 0x7f85def34488>, 'regex-line': <function regex_line_dataprovider at 0x7f85def34cf8>, 'genomic-region': <function genomic_region_dataprovider at 0x7f85df40c5f0>, 'base': <function base_dataprovider at 0x7f85def34320>, 'dict': <function dict_dataprovider at 0x7f85df3fced8>, 'dataset-dict': <function dataset_dict_dataprovider at 0x7f85df4080c8>, 'line': <function line_dataprovider at 0x7f85def34b90>}¶
-
file_ext
= 'pileup'¶
-
line_class
= 'genomic coordinate'¶
-
metadata_spec
= dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', comment_lines (MetadataParameter): Number of comment lines, defaults to '0', columns (MetadataParameter): Number of columns, defaults to '0', column_types (ColumnTypesParameter): Column types, defaults to '[]', column_names (MetadataParameter): Column names, defaults to '[]', chromCol (ColumnParameter): Chrom column, defaults to '1', startCol (ColumnParameter): Start column, defaults to '2', endCol (ColumnParameter): End column, defaults to '2', baseCol (ColumnParameter): Reference base column, defaults to '3'¶
-
sniff
(filename)[source]¶ Checks for ‘pileup-ness’
There are two main types of pileup: 6-column and 10-column. For both, the first three and last two columns are the same. We only check the first three to allow for some personalization of the format.
>>> fname = get_test_fname( 'interval.interval' ) >>> Pileup().sniff( fname ) False >>> fname = get_test_fname( '6col.pileup' ) >>> Pileup().sniff( fname ) True >>> fname = get_test_fname( '10col.pileup' ) >>> Pileup().sniff( fname ) True
-
-
class
galaxy.datatypes.tabular.
Sam
(**kwd)[source]¶ Bases:
galaxy.datatypes.tabular.Tabular
-
data_sources
= {'index': 'bigwig', 'data': 'bam'}¶
-
dataproviders
= {'dataset-column': <function dataset_column_dataprovider at 0x7f85df4089b0>, 'chunk64': <function chunk64_dataprovider at 0x7f85def345f0>, 'id-seq-qual': <function id_seq_qual_dataprovider at 0x7f85df408f50>, 'header': <function header_dataprovider at 0x7f85df408de8>, 'column': <function column_dataprovider at 0x7f85df408848>, 'chunk': <function chunk_dataprovider at 0x7f85def34488>, 'regex-line': <function regex_line_dataprovider at 0x7f85df4086e0>, 'genomic-region': <function genomic_region_dataprovider at 0x7f85df40c140>, 'base': <function base_dataprovider at 0x7f85def34320>, 'dict': <function dict_dataprovider at 0x7f85df408b18>, 'dataset-dict': <function dataset_dict_dataprovider at 0x7f85df408c80>, 'line': <function line_dataprovider at 0x7f85df408578>, 'genomic-region-dict': <function genomic_region_dict_dataprovider at 0x7f85df40c2a8>}¶
-
file_ext
= 'sam'¶
-
static
merge
(split_files, output_file)[source]¶ Multiple SAM files may each have headers. Since the headers should all be the same, remove the headers from files 1-n, keeping them in the first file only
-
metadata_spec
= dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', comment_lines (MetadataParameter): Number of comment lines, defaults to '0', columns (MetadataParameter): Number of columns, defaults to '0', column_types (ColumnTypesParameter): Column types, defaults to '[]', column_names (MetadataParameter): Column names, defaults to '[]'¶
-
sniff
(filename)[source]¶ Determines whether the file is in SAM format
A file in SAM format consists of lines of tab-separated data. The following header line may be the first line:
@QNAME FLAG RNAME POS MAPQ CIGAR MRNM MPOS ISIZE SEQ QUAL or @QNAME FLAG RNAME POS MAPQ CIGAR MRNM MPOS ISIZE SEQ QUAL OPT
Data in the OPT column is optional and can consist of tab-separated data
For complete details see http://samtools.sourceforge.net/SAM1.pdf
Rules for sniffing as True:
There must be 11 or more columns of data on each line Columns 2 (FLAG), 4(POS), 5 (MAPQ), 8 (MPOS), and 9 (ISIZE) must be numbers (9 can be negative) We will only check that up to the first 5 alignments are correctly formatted.
>>> fname = get_test_fname( 'sequence.maf' ) >>> Sam().sniff( fname ) False >>> fname = get_test_fname( '1.sam' ) >>> Sam().sniff( fname ) True
-
track_type
= 'ReadTrack'¶
-
-
class
galaxy.datatypes.tabular.
Tabular
(**kwd)[source]¶ Bases:
galaxy.datatypes.data.Text
Tab delimited data
-
CHUNKABLE
= True¶ Add metadata elements
-
dataproviders
= {'dataset-column': <function dataset_column_dataprovider at 0x7f85df3fcd70>, 'chunk64': <function chunk64_dataprovider at 0x7f85def345f0>, 'column': <function column_dataprovider at 0x7f85df3fcc08>, 'chunk': <function chunk_dataprovider at 0x7f85def34488>, 'regex-line': <function regex_line_dataprovider at 0x7f85def34cf8>, 'base': <function base_dataprovider at 0x7f85def34320>, 'dict': <function dict_dataprovider at 0x7f85df3fced8>, 'dataset-dict': <function dataset_dict_dataprovider at 0x7f85df4080c8>, 'line': <function line_dataprovider at 0x7f85def34b90>}¶
-
dataset_column_dataprovider
(*args, **kwargs)[source]¶ Attempts to get column settings from dataset.metadata
-
dataset_dict_dataprovider
(*args, **kwargs)[source]¶ Attempts to get column settings from dataset.metadata
-
make_html_peek_header
(dataset, skipchars=None, column_names=None, column_number_format='%s', column_parameter_alias=None, **kwargs)[source]¶
-
metadata_spec
= dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', comment_lines (MetadataParameter): Number of comment lines, defaults to '0', columns (MetadataParameter): Number of columns, defaults to '0', column_types (ColumnTypesParameter): Column types, defaults to '[]', column_names (MetadataParameter): Column names, defaults to '[]'¶
-
set_meta
(dataset, overwrite=True, skip=None, max_data_lines=100000, max_guess_type_data_lines=None, **kwd)[source]¶ Tries to determine the number of columns as well as those columns that contain numerical values in the dataset. A skip parameter is used because various tabular data types reuse this function, and their data type classes are responsible to determine how many invalid comment lines should be skipped. Using None for skip will cause skip to be zero, but the first line will be processed as a header. A max_data_lines parameter is used because various tabular data types reuse this function, and their data type classes are responsible to determine how many data lines should be processed to ensure that the non-optional metadata parameters are properly set; if used, optional metadata parameters will be set to None, unless the entire file has already been read. Using None for max_data_lines will process all data lines.
Items of interest:
- We treat ‘overwrite’ as always True (we always want to set tabular metadata when called).
- If a tabular file has no data, it will have one column of type ‘str’.
- We used to check only the first 100 lines when setting metadata and this class’s set_peek() method read the entire file to determine the number of lines in the file. Since metadata can now be processed on cluster nodes, we’ve merged the line count portion of the set_peek() processing here, and we now check the entire contents of the file.
-
-
class
galaxy.datatypes.tabular.
Taxonomy
(**kwd)[source]¶ Bases:
galaxy.datatypes.tabular.Tabular
-
metadata_spec
= dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', comment_lines (MetadataParameter): Number of comment lines, defaults to '0', columns (MetadataParameter): Number of columns, defaults to '0', column_types (ColumnTypesParameter): Column types, defaults to '[]', column_names (MetadataParameter): Column names, defaults to '[]'¶
-
-
class
galaxy.datatypes.tabular.
Vcf
(**kwd)[source]¶ Bases:
galaxy.datatypes.tabular.Tabular
Variant Call Format for describing SNPs and other simple genome variations.
-
column_names
= ['Chrom', 'Pos', 'ID', 'Ref', 'Alt', 'Qual', 'Filter', 'Info', 'Format', 'data']¶
-
data_sources
= {'index': 'bigwig', 'data': 'tabix'}¶
-
dataproviders
= {'dataset-column': <function dataset_column_dataprovider at 0x7f85df3fcd70>, 'chunk64': <function chunk64_dataprovider at 0x7f85def345f0>, 'genomic-region-dict': <function genomic_region_dict_dataprovider at 0x7f85df40cb90>, 'column': <function column_dataprovider at 0x7f85df3fcc08>, 'chunk': <function chunk_dataprovider at 0x7f85def34488>, 'regex-line': <function regex_line_dataprovider at 0x7f85def34cf8>, 'genomic-region': <function genomic_region_dataprovider at 0x7f85df40ca28>, 'base': <function base_dataprovider at 0x7f85def34320>, 'dict': <function dict_dataprovider at 0x7f85df3fced8>, 'dataset-dict': <function dataset_dict_dataprovider at 0x7f85df4080c8>, 'line': <function line_dataprovider at 0x7f85def34b90>}¶
-
file_ext
= 'vcf'¶
-
metadata_spec
= dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0', comment_lines (MetadataParameter): Number of comment lines, defaults to '0', columns (MetadataParameter): Number of columns, defaults to '10', column_types (ColumnTypesParameter): Column types, defaults to '['str', 'int', 'str', 'str', 'str', 'int', 'str', 'list', 'str', 'str']', column_names (MetadataParameter): Column names, defaults to '[]', viz_filter_cols (ColumnParameter): Score column for visualization, defaults to '[5]', sample_names (MetadataParameter): Sample names, defaults to '[]'¶
-
track_type
= 'VariantTrack'¶
-
tracks
Module¶
Datatype classes for tracks/track views within galaxy.
-
class
galaxy.datatypes.tracks.
GeneTrack
(**kwargs)[source]¶ Bases:
galaxy.datatypes.binary.Binary
-
file_ext
= 'genetrack'¶
-
metadata_spec
= dbkey (DBKeyParameter): Database/Build, defaults to '?'¶
-
xml
Module¶
XML format classes
-
class
galaxy.datatypes.xml.
CisML
(**kwd)[source]¶ Bases:
galaxy.datatypes.xml.GenericXml
CisML XML data
-
file_ext
= 'cisml'¶
-
metadata_spec
= dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0'¶
-
-
class
galaxy.datatypes.xml.
GenericXml
(**kwd)[source]¶ Bases:
galaxy.datatypes.data.Text
Base format class for any XML file.
-
dataproviders
= {'xml': <function xml_dataprovider at 0x7f85dda1d050>, 'chunk64': <function chunk64_dataprovider at 0x7f85def345f0>, 'chunk': <function chunk_dataprovider at 0x7f85def34488>, 'regex-line': <function regex_line_dataprovider at 0x7f85def34cf8>, 'base': <function base_dataprovider at 0x7f85def34320>, 'line': <function line_dataprovider at 0x7f85def34b90>}¶
-
file_ext
= 'xml'¶
-
static
merge
(split_files, output_file)[source]¶ Merging multiple XML files is non-trivial and must be done in subclasses.
-
metadata_spec
= dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0'¶
-
-
class
galaxy.datatypes.xml.
MEMEXml
(**kwd)[source]¶ Bases:
galaxy.datatypes.xml.GenericXml
MEME XML Output data
-
file_ext
= 'memexml'¶
-
metadata_spec
= dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0'¶
-
-
class
galaxy.datatypes.xml.
Owl
(**kwd)[source]¶ Bases:
galaxy.datatypes.xml.GenericXml
Web Ontology Language OWL format description http://www.w3.org/TR/owl-ref/
-
file_ext
= 'owl'¶
-
metadata_spec
= dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0'¶
-
-
class
galaxy.datatypes.xml.
Phyloxml
(**kwd)[source]¶ Bases:
galaxy.datatypes.xml.GenericXml
Format for defining phyloxml data http://www.phyloxml.org/
-
file_ext
= 'phyloxml'¶
-
metadata_spec
= dbkey (DBKeyParameter): Database/Build, defaults to '?', data_lines (MetadataParameter): Number of data lines, defaults to '0'¶
-
Subpackages¶
- converters Package
bed_to_genetrack_converter
Modulebed_to_gff_converter
Modulebedgraph_to_array_tree_converter
Modulebgzip
Modulefasta_to_len
Modulefasta_to_tabular_converter
Modulefastq_to_fqtoc
Modulefastqsolexa_to_fasta_converter
Modulefastqsolexa_to_qual_converter
Modulegff_to_bed_converter
Modulegff_to_interval_index_converter
Moduleinterval_to_bed_converter
Moduleinterval_to_bedstrict_converter
Moduleinterval_to_coverage
Moduleinterval_to_fli
Moduleinterval_to_interval_index_converter
Moduleinterval_to_summary_tree_converter
Moduleinterval_to_tabix_converter
Modulelped_to_fped_converter
Modulelped_to_pbed_converter
Modulemaf_to_fasta_converter
Modulemaf_to_interval_converter
Modulepbed_ldreduced_converter
Modulepbed_to_lped_converter
Modulepicard_interval_list_to_bed6_converter
Modulesam_or_bam_to_summary_tree_converter
Modulesam_to_bam
Modulevcf_to_interval_index_converter
Modulevcf_to_summary_tree_converter
Modulevcf_to_vcf_bgzip
Modulewiggle_to_array_tree_converter
Modulewiggle_to_simple_converter
Module
- display_applications Package
- util Package