HTSeq is a Python library to facilitate processing and analysis of data from high-throughput sequencing (HTS) experiments.

Related tags

Deep Learninghtseq
Overview

CI Documentation Status

HTSeq

DEVS: https://github.com/htseq/htseq

DOCS: https://htseq.readthedocs.io

A Python library to facilitate programmatic analysis of data from high-throughput sequencing (HTS) experiments. A popular component of HTSeq is htseq-count, a script to quantify gene expression in bulk and single-cell RNA-Seq and similar experiments.

Requirements

To use HTSeq you need:

  • Python >= 3.7 (note: Python 2.7 support has been dropped)
  • numpy
  • pysam

To manipulate BigWig files, you also need:

  • pyBigWig

To run the htseq-qa script, you also need:

  • matplotlib

To run htseq-count and htseq-count-barcodes with custom output formats for the counts table, you need:

  • mtx file: scipy
  • h5ad file: anndata
  • loom file: loompy

Both Linux and OSX are supported and binaries are provided on Pypi. We would like to support Windows but currently lack the expertise to do so. If you would like to take on the Windows release and maintenance, please open an issue and we'll try to help.

A source package which should not require Cython nor SWIG is also provided on Pypi.

To develop HTSeq you will also need:

  • Cython >=0.29.5
  • SWIG >=3.0.8

Installation

PIP

To install directly from PyPI:

pip install HTSeq

To install a specific version:

pip install 'HTSeq==0.13.5'

If this fails, please install all dependencies first:

pip install matplotlib
pip install Cython
pip install pysam
pip install HTSeq

setup.py (distutils/setuptools)

Install the dependencies with your favourite tool (pip, conda, etc.).

To install HTSeq itself, run:

python setup.py build install

Testing

To test locally, run

./test.sh

To test htseq-count alone, run it with the -o option.

A virtual environment is created in the .venv folder and HTSeq is installed inside it, including all modules and scripts.

Authors

Comments
  • ModuleNotFoundError: No module named 'HTSeq._HTSeq'

    ModuleNotFoundError: No module named 'HTSeq._HTSeq'

    I installed using conda, everything appeared to install. However, checking the install failed, and importing HTSeq in python failed. Please advise as I've tried several times without success.

    conda create --prefix=~/myprog/htseq_env1 python=3.8.2 numpy pysam pyBigWig matplotlib pip install HTSeq python setup.py build install

    However, "./test.sh" function failed: ERROR: Failed building wheel for pybigwig. I checked install "pip install pyBigWig" Requirement already satisfied:

    python

    import HTSeq ModuleNotFoundError: No module named 'HTSeq._HTSeq'

    opened by slives-lab 13
  • Counts difference htseq-count version 0.13.5 and 1.99.2.

    Counts difference htseq-count version 0.13.5 and 1.99.2.

    First of all thanks for developing HTSeq!

    We were running a simple test to compare the output of htseq-count 2.0.1 and 0.11.0 and noticed some differences:

    The older version (0.11.0) reported the following summary metrics: __no_feature 12522350 __ambiguous 800482 __too_low_aQual 0 __not_aligned 0 __alignment_not_unique 640187

    While the new version (2.0.1) reports: __no_feature 6771128 __ambiguous 422058 __too_low_aQual 0 __not_aligned 11903020 __alignment_not_unique 348779

    Also the counts are about half in the new version: old (0.11.0) version: ENSG00000288714.1 RP3-460G2.1 32 ENSG00000288715.1 GS2-740I5.1 0 ENSG00000288716.1 AC001226.8 0 ENSG00000288717.1 RP11-852E15.4 1 ENSG00000288718.1 RP11-509I21.4 0 ENSG00000288719.1 RP4-669P10.21 26 ENSG00000288720.1 RP11-852E15.3 1 ENSG00000288721.1 RP5-973N23.5 11 ENSG00000288722.1 F8A1 111 ENSG00000288723.1 RP11-553N16.6 0 ENSG00000288724.1 RP13-546I2.2 0 ENSG00000288725.1 RP11-413H22.3 0

    new (2.0.1) version: ENSG00000288714.1 RP3-460G2.1 18 ENSG00000288715.1 GS2-740I5.1 0 ENSG00000288716.1 AC001226.8 0 ENSG00000288717.1 RP11-852E15.4 1 ENSG00000288718.1 RP11-509I21.4 0 ENSG00000288719.1 RP4-669P10.21 14 ENSG00000288720.1 RP11-852E15.3 1 ENSG00000288721.1 RP5-973N23.5 7 ENSG00000288722.1 F8A1 56 ENSG00000288723.1 RP11-553N16.6 0 ENSG00000288724.1 RP13-546I2.2 0 ENSG00000288725.1 RP11-413H22.3 0

    Note: Another counter (VERSE: https://github.com/gerbenvoshol/VERSE) matches the output of the older version, but was designed to match htseq-count (v 0.x) output

    Both were run with the same parameters: singularity run -B /mnt htseq-2.0.1.sif htseq-count -f bam -t exon -i gene_id --additional-attr gene_name --stranded=yes test.bam annotation.gtf >test.raw

    Is there an option I should set? Or is there another explanation for these observations?

    Thanks!

    Singularity container htseq-count 0.11.0:

    • HTSeq 0.11.0
    • Python 3.6.7
    • Ubuntu 18.04
    • STAR 2.5.2

    Singularity container htseq-count 2.0.1:

    • HTSeq 0.11.0
    • Python 3.10.4
    • Ubuntu 22.04
    • STAR 2.5.2
    opened by gerbenvoshol 13
  • -n parallel CPUs do not speed up

    -n parallel CPUs do not speed up

    Hi,

    I recently switched to python3 version of htseq-count which supports parallel CPUs, hoping that it would speed up the quantification step significantly. I have run tests with -n 1 and -n 20. However, I did not observe any utilization of multiple CPUs or speed up compared to single CPU usage, indeed the two jobs finished in identical time, although the latter 'fakely' seemed to utilize 20 CPUs by increasing the SHR (Shared Memory) rather than %CPU in the top command output:

    time htseq-count -i gene_name -r pos -m intersection-nonempty -s no -n 1 -c test_n1 x.sorted.bam hg19.genes.gtf
    
    real	15m36.056s
    user	15m34.173s
    sys	0m0.780s
    
    time htseq-count -i gene_name -r pos -m intersection-nonempty -s no -n 20 -c test_n20 x.sorted.bam hg19.genes.gtf
    
    real	15m36.121s
    user	15m32.738s
    sys	0m3.773s
    
    image

    I am wondering if I am right to expect speed up by increasing this parameter and how it was implemented. Any bench-marking done by others possibly?

    @iosonofabio @simon-anders Thanks for the great tool and help.

    opened by bounlu 10
  • htseq-count error, Could not retrieve index file

    htseq-count error, Could not retrieve index file

    Hello,

    I am attempting to use the htseq-count tool to generate a count file from a name sorted alignment file generated from paired end data using the following code:

    htseq-count --format bam --order name -a 0 -q -m intersection-strict --supplementary-alignments ignore --secondary-alignments ignore --stranded no SS0200_S58_vs_iso.mmetsp_shuff.bam $gtfFile > $outdir"S58_iso.mmetsp.HTSeqCounts.tab"

    When I run this script I get the following error though:

    Could not retrieve index file for 'SS0200_S58_vs_iso.mmetsp_shuff.bam'

    Since you cannot generate an index file for name sorted bam file, I'm not sure how to resolve the error. I have also run this same script previously with Htseq v0.9.1 and didn't have any problems.

    Any help would be appreciated, thanks.

    Software versions: Htseq0.13.5, Python v3.6.8, Samtools v1.9

    opened by sluxlerch 9
  • [BUG] GFF parser doesn't follow GFF spec on quotes, crashes on latest RefSeq GFF3

    [BUG] GFF parser doesn't follow GFF spec on quotes, crashes on latest RefSeq GFF3

    Software versions

    • HTSeq 1.99.2 (latest)
    • Python 3.8
    • operating system Ubuntu

    Describe the bug

    HTSeq GFF_Reader throws an exception parsing a line with unbalanced quotes.

    The GFF3 spec says

    Attribute values do not need to be and should not be quoted. The quotes should be included as part of the value by parsers and not stripped.

    An alternative implementation - "bcbio-gff" - parses this file without errors (includes the quote in the data)

    Minimal example showing the bug

    HTSeq fails parsing the latest RefSeq GFF annotation:

    wget https://ftp.ncbi.nlm.nih.gov/refseq/H_sapiens/annotation/annotation_releases/109.20211119/GCF_000001405.39_GRCh38.p13/GCF_000001405.39_GRCh38.p13_genomic.gff.gz
    

    Minimal example is the line:

    NC_000001.11	RefSeqFE	enhancer	6127709	6128003	.	.	.	ID=id-GeneID:112590813;Dbxref=GeneID:112590813;Note=tiled region #11871%3B K562 Activating DNase matched - State 1:Tss;experiment=EXISTENCE:reporter gene assay evidence [ECO:0000049][PMID:27701403];function=activates a minimal TATA promoter and a strong SV40 promoter by Sharpr-MPRA in K562 cells {active_cell/tissue: K562}";gbkey=regulatory;regulatory_class=enhancer
    
    import HTSeq
    gff_containing_quotes = "GCF_000001405.39_GRCh38.p13_genomic.gff.gz" # or minimal example
    list(HTSeq.GFF_Reader(gff_containing_quotes))
    

    throws exception:

        list(HTSeq.GFF_Reader(gff_containing_quotes))
      File "/usr/local/lib/python3.8/dist-packages/HTSeq/features.py", line 146, in __iter__
        (attr, name) = parse_GFF_attribute_string(attributeStr, True)
      File "/usr/local/lib/python3.8/dist-packages/HTSeq/features.py", line 92, in parse_GFF_attribute_string
        quotesafe_split(attrStr.encode())):
      File "src/HTSeq/_HTSeq.pyx", line 2039, in HTSeq._HTSeq.quotesafe_split
      File "src/HTSeq/_HTSeq.pyx", line 2060, in HTSeq._HTSeq.quotesafe_split
    ValueError: unmatched quote
    
    opened by davmlaw 8
  • handling output from minimap2

    handling output from minimap2

    Dear developers, I'm using htseq on the results generated using minimap2 on nanopore reads.

    It looks like I need to turn off the generation of secondary alignment from minimap2 or I get the following error:

    Error occured when processing input (record #1 in file data_espress---unclassified_s.bam):
      'NoneType' object has no attribute 'encode'
      [Exception type: AttributeError, raised in _HTSeq.pyx:1379]
    

    Not so sure is a htseq problem or minimap2.

    opened by lucacozzuto 8
  • Exception type: ValueError, raised in __init__.py:209

    Exception type: ValueError, raised in __init__.py:209

    Hi,

    I came across the following error when I was running htseq in our HPRC.

    Error occured when processing GFF file (line 2338856 of file cr.working_models.pm.locus_assign.gtf):
      not enough values to unpack (expected 9, got 1)
      [Exception type: ValueError, raised in __init__.py:209]
    

    I was using the following command htseq-count --format bam --order pos --mode intersection-strict --stranded reverse --minaqual 1 --type exon --idattr gene_id ${Mock}.bam cr.working_models.pm.locus_assign.gtf > htseq_counts/${Mock}.tsv

    Version: 0.11.3. Can you please help me to resolve this issue? If you need further details please let me know. Thank you, Venura

    opened by venuraherath 6
  • MacOS (arm) - make sure `swig` is installed

    MacOS (arm) - make sure `swig` is installed

    Hey,

    I just want to mention that I faced some issues with installing HTSeq - and I was able to fix it by installing swig with homebrew (brew install swig).

    Cheers!

    opened by bzaruk 6
  • #38 -

    #38 - "auto" GenomicArray creates finite length chromosome if 1st acc…

    test_access_out_of_range demonstrates the problem - the second loop is the same as the first except it has a line genomic_array[known_iv] = "test" - calling the setter first before the getter used to cause the test to fail with IndexError: stop too large

    After making the change, I also had to modify the bedgraph file test. I believe this is due to an unrelated bug (which I haven't fixed) - write_bedgraph_file tries to deal with infinite sized chromosomes via:

    if iv.start == -sys.maxsize - 1 or iv.end == sys.maxsize:
    

    The code checks for start = sys.maxsize-1 used to represent infinity but I can't see anywhere where it sets it that way, add_chrom by defaults to setting start to 0

    opened by davmlaw 6
  • [BUG] Latest release

    [BUG] Latest release "auto" GenomicArray crashes existing code due to finite length chromosome if 1st accessed via setter

    The following code ran without crashing on all previous versions of HTSeq (ie up to and including Ie up to and including v0.13.5 (Dec 29, 2020)

    import HTSeq
    from importlib import metadata                                                                                           
    
    print(f'HTSeq version: {metadata.version("HTSeq")}')        
    
    
    ga = HTSeq.GenomicArray("auto")                                                                                          
    
    iv = HTSeq.GenomicInterval("1", 100, 150, "+")                                                               
    iv2 = HTSeq.GenomicInterval("1", 200, 300, "+")                                                                          
    
    ga[iv] = 2
    data = ga[iv2]
    
    print("Ran ok!")
    

    but now crashes:

    HTSeq version: 1.99.2
    Traceback (most recent call last):
      File "./htseq_test.py", line 15, in <module>
        data = ga[iv2]
      File "src/HTSeq/_HTSeq.pyx", line 701, in HTSeq._HTSeq.GenomicArray.__getitem__
      File "src/HTSeq/_HTSeq.pyx", line 480, in HTSeq._HTSeq.ChromVector.__getitem__
    IndexError: stop too large
    

    This was due to commit:

    https://github.com/htseq/htseq/commit/b2d20a3d50a23e0e5f2634b960dec480f863662e#diff-0f12eb8d539626111020f6eaf57e93427ae1fb352b95688ba0e8e0c9b09012c1L639

    Which changed the GenomicArray setter to make a non-infinite length new chromosome.

    GenomicArray used to create infinite length chromosomes in this case, and still does in all other instances, eg explicitly naming chroms makes them infinite length:

    In [1]: import HTSeq                                                                                                                     
    
    In [2]: HTSeq.__version__                                                                                                                
    Out[2]: '1.99.2'
    
    In [3]: ga_chroms = HTSeq.GenomicArray(["1"])                                                                                            
    
    In [4]: ga_chroms["1"]                                                                                                                   
    Out[4]: 
    {'+': <ChromVector object, 1:[0,Inf)/+, step>,
     '-': <ChromVector object, 1:[0,Inf)/-, step>}
    

    Accessing the unknown chromosome for the first time via a get makes it infinite:

    In [5]: iv = HTSeq.GenomicInterval("1", 1, 100, "+")                                                                                     
    
    In [6]: ga_getter = HTSeq.GenomicArray("auto")                                                                                           
    
    In [7: ga_getter[iv]                                                                                                                   
    Out[7]: <ChromVector object, 1:[1,100)/+, step>
    
    In [8]: ga_getter["1"]                                                                                                                  
    Out[8]: 
    {'+': <ChromVector object, 1:[0,Inf)/+, step>,
     '-': <ChromVector object, 1:[0,Inf)/-, step>}
    

    However if you access it for the first time via a set, it is only of that interval's size:

    In [9]: ga_setter = HTSeq.GenomicArray("auto")                                                                                           
    
    In [10]: ga_setter[iv] = 42                                                                                                               
    
    In [11]: ga_setter["1"]                                                                                                                   
    Out[11]: 
    {'+': <ChromVector object, 1:[1,100)/+, step>,
     '-': <ChromVector object, 1:[1,100)/-, step>}
    

    Expected Result

    Old code written for HTSeq continues to run without crashing.

    This means Genomic Array "auto" chromosomes need to be consistently infinite size when created via SET consistent with other methods

    Software versions

    • HTSeq - 1.99.2
    • Python - Python 3.8.10
    • operating system - Ubuntu 20.04.3
    opened by davmlaw 6
  • Cannot convert HTseq tagged SAM files to BAM

    Cannot convert HTseq tagged SAM files to BAM

    Hello,

    I'm trying use the XF tag generated by HTseq for some downstream analysis. I'm easily able to write out to sam

    python -m HTSeq.scripts.count downsampled.bam \
    /SAN/vyplab/vyplab_reference_genomes/annotation/human/GRCh38/gencode.v34.annotation.gff3 \
    --stranded yes --samout downsampled.tagged.sam 
    

    But my problem comes when I try to convert the sam to a bam

    samtools view downsampled.tagged.sam  > test.bam
    [E::sam_parse1] missing SAM header
    [W::sam_read1] Parse error at line 1
    [main_samview] truncated file.
    

    I wasn't able to find any google answers on it, nor if anyone else has come across this particular error .

    Caveat: for speed of testing I've been running HTSeq on downsampled BAMs. Possible the error comes from the downsampling and not an issue with HTSeq's sam output.

    opened by aleighbrown 6
  • [BUG] yeast_RNASeq_excerpt.sam file

    [BUG] yeast_RNASeq_excerpt.sam file

    Software versions

    • HTSeq: 2.0.1
    • Python: 3.10.4
    • samtools: 1.15.1
    • operating system: MacOS 12.4

    Describe the bug

    The yeast_RNASeq_excerpt.sam file in HTSeq_example_data.tgz can't be parsed by HTSeq.SAM_Reader or samtools -S

    Both HTSeq and samtools throw the same error.

    [E::sam_hrecs_error] Malformed key:value pair at line 20: "@PG  ID=Bowtie       VN=0.11.3       CL="bowtie --sam --solexa1.3-quals Scerv yeast_RNASeq_excerpt_sequence.txt yeast_RNASeq_excerpt.sam""
    

    After deleting line 20, both software can parse this file correctly.

    So, technically this is not a bug, but I think it's better to replace this file to keep newcomers (like me) away from frustrating. 😅

    opened by panyq357 0
  • [BUG] GenomicArray doesn't implement __contains__ and thus behaves spuriously

    [BUG] GenomicArray doesn't implement __contains__ and thus behaves spuriously

    Software versions

    • HTSeq 2.0.1
    • Python 3.9.2
    • operating system Ubuntu 18.04

    Describe the bug I was trying to use the in operator to determine whether any values were set in a specific interval for a GenomicArray. Alas because GenomicArray doesn't implement contains python falls back to the iteration methods described in (https://docs.python.org/3/reference/datamodel.html#object.contains and https://docs.python.org/3/reference/expressions.html#membership-test-details). This causes an unhelpful KeyError: 0 to be emitted. It would be helpful if GenomicArray either implemented contains or threw a NotImplementedError.

    Minimal example showing the bug

    mekey = HTSeq.GenomicInterval( l_chr, l_begin, l_end+1)
    if not mekey in medata.MEIndexingArray:
        continue
    
    
    opened by mp15 1
  • HTSEQ Count file showing 0 reads for a particular gene ( ENSG00000205755 CRLF2, ENSG00000260596 DUX4)

    HTSEQ Count file showing 0 reads for a particular gene ( ENSG00000205755 CRLF2, ENSG00000260596 DUX4)

    Software versions Specify what versions of the following you are using:

    • HTSeq- 0.12.4
    • Python - 2.7.17
    • operating system - Ubuntu 18.04.5 LTS
    • STAR - STAR-2.7.3a
    • genome- Hg38
    • gtf- gencode.v36.annotation.gtf

    Hello Fabio, I am using HTSEQ to generate the count file of the RNA-Seq data of the B-cell acute lymphoblastic leukemia(B-ALL) patients. While analyzing the output file, I have observed there are 0 reads in the samples which are positive for the CRLF2 translocation( P2RY8::CRLF2, Characterized by high expression of CRLF2). Similarly 0 reads are observed for the DUX4 gene in DUX4 rearranged cases. Kindly help in troubleshooting the issue.

    Thanks and Regards, JAY

    opened by JAYRJPT 1
  • Cannot process paired-end alignment found with 'unknown' 'pe_which' status

    Cannot process paired-end alignment found with 'unknown' 'pe_which' status

    [kscott94]$ python3 -m HTSeq.scripts.count -f bam -r pos ../../final_bams/file.bam annotation.gtf

    Error occured when processing SAM input (record #0 in file ../../final_bams/TS559exoS_totalRNA_March2016_rmdup.bam): Cannot process paired-end alignment found with 'unknown' 'pe_which' status. [Exception type: ValueError, raised in init.py:767]

    I don't know what this error means. Here is the head of my file.

    samtools view -h ../../final_bams/TS559exoS_totalRNA_Oct2020_rmdup.sam | head -13 @HD VN:1.0 SO:coordinate @SQ SN:TS559_Genomic_Sequence.seq LN:2087105 @PG PN:BS Seeker 2 ID:1 CL:/projects/[email protected]/tools/BSseeker2/bs_seeker2-align.py -i /scratch/summit/[email protected]/bs7/TS559/rep3/TS559exoS_totalRNA_Oct2020_R1_CA_filtered.fastq -g /projects/[email protected]/genome/TS559_genome.fa --temp_dir=/scratch/summit/[email protected]/tmp -m 2 --XS=0.03,3 --bt2--mm --bt-p 10 --aligner=bowtie2 -p /projects/[email protected]/tools/bowtie2/ @PG PN:BS Seeker 2 ID:1-3297022D CL:/projects/[email protected]/tools/BSseeker2/bs_seeker2-align.py -i /scratch/summit/[email protected]/bs7/TS559/rep3/TS559exoS_totalRNA_Oct2020_R2_C_filtered.fastq -g /projects/[email protected]/genome/TS559_genome.fa --temp_dir=/scratch/summit/[email protected]/tmp -m 2 --XS=0.03,3 --bt2--mm --bt-p 12 --aligner=bowtie2 -p /projects/[email protected]/tools/bowtie2/ @PG ID:samtools PN:samtools PP:1-3297022D VN:1.12 CL:samtools merge TS559exoS_totalRNA_Oct2020_filtered_unsorted.bam TS559exoS_totalRNA_Oct2020_R1_CA_filtered.fastq_bsse.bam TS559exoS_totalRNA_Oct2020_R2_C_filtered.fastq_bsse.bam @PG ID:samtools.1 PN:samtools PP:samtools VN:1.12 CL:samtools view -h -b -q 20 TS559exoS_totalRNA_Oct2020_filtered_unsorted.bam @PG ID:samtools.2 PN:samtools PP:samtools.1 VN:1.12 CL:samtools sort -n -o TS559exoS_totalRNA_Oct2020_mapq_nsorted.bam @PG ID:samtools.3 PN:samtools PP:samtools.2 VN:1.12 CL:samtools fixmate -rm TS559exoS_totalRNA_Oct2020_mapq_nsorted.bam TS559exoS_totalRNA_Oct2020_fixmate.bam @PG ID:samtools.4 PN:samtools PP:samtools.3 VN:1.12 CL:samtools sort TS559exoS_totalRNA_Oct2020_fixmate.bam @PG ID:samtools.5 PN:samtools PP:samtools.4 VN:1.12 CL:samtools markdup -r TS559exoS_totalRNA_Oct2020_fixmate_csorted.bam TS559exoS_totalRNA_Oct2020_rmdup.bam @PG ID:samtools.6 PN:samtools PP:samtools.5 VN:1.12 CL:samtools view -h ../../final_bams/TS559exoS_totalRNA_Oct2020_rmdup.bam @PG ID:samtools.7 PN:samtools PP:samtools.6 VN:1.12 CL:samtools view -h ../../final_bams/TS559exoS_totalRNA_Oct2020_rmdup.sam A00336:A00336:HT232DRXX:1:2111:6063:35603 1 TS559_Genomic_Sequence.seq 1 255 76S74M = 1 0 ATATAATTGAGGATGGAAAGTTTGTTATAAGAATTTTTAAGAAGGAAAATGGTGAGTTTAAGATTGAGTATGAAAGATGATTTTTGATATTGATTATATAATTGAGGATGGAAAGTTTGTTATAAGAATTTTTAAGAAGGAAAATGGTGA * XO:Z:+FW XS:i:0 NM:i:0 XM:Z:-----zz-x--z-y---z--z----yx------------zy---z-----------z-----------x--x-- XG:Z:NN_ATGATCCTCGACACTGACTACATAACCGAGGATGGAAAGCCTGTCATAAGAATTTTCAAGAAGGAAAACGGCGA_GT MQ:i:255 MC:Z:42S92M16S ms:i:255

    opened by kscott94 3
Releases(release_0.12.3)
  • release_0.12.3(Apr 18, 2020)

    First release since migration to the new Github organization htseq.

    Binaries for Linux and OSX are provided on PyPI: https://pypi.org/project/HTSeq/#files.

    As usual, installation with pip is recommended.

    New features:

    • Negative indices for StepVector (thanks to shouldsee for the original PR).
    • htseq-count-barcodes counts features in barcoded SAM/BAM files, e.g. 10X Genomics single cell outputs. It supports cell barcodes, which result in different columns of the output count table, and unique molecular identifiers.
    • htseq-count has new option -n for multicore parallel processing
    • htseq-count has new option -d for separating output columns by arbitrary character (defalt TAB, , is also common)
    • htseq-count has new option -c for output into a file instead of stdout
    • htseq-count has new option --append-output for output into a file by appending to any existing test (e.g. a header with the feature attribute names and sample names)
    • htseq-count has two new values for option --nonunique, namely fraction, which will count an N-multimapper as 1/N for each feature, and random, which will assign the alignment to a random one of its N-multimapped features. This feature was added by ewallace (thank you!).
    • htseq-qa got refactored and now accepts an options --primary-only which ignores non-primary alignments in SAM/BAM files. This means that the final number of alignments scored is equal to the number of reads even when multimapped reads are present.

    Testing improvements:

    • Extensive testing and installation changes for Mac OSX 10.14 and later versions
    • Testing Python 2.7, 3.6, 3.7, and 3.8 on OSX
    • Testing and deployment now uses conda environments

    Numerous bugfixes and doc improvements.

    This is the last version of HTSEQ supporting Python 2.7, as it is unmaintained since Jan 1st, 2020. HTSeq will support Python 3.5+ from the next version.

    Source code(tar.gz)
    Source code(zip)
Owner
HTSeq
Devs behind HTSeq
HTSeq
SmartSim Infrastructure Library.

Home Install Documentation Slack Invite Cray Labs SmartSim SmartSim makes it easier to use common Machine Learning (ML) libraries like PyTorch and Ten

Cray Labs 139 Jan 01, 2023
Histocartography is a framework bringing together AI and Digital Pathology

Documentation | Paper Welcome to the histocartography repository! histocartography is a python-based library designed to facilitate the development of

155 Nov 23, 2022
A PyTorch implementation of "Predict then Propagate: Graph Neural Networks meet Personalized PageRank" (ICLR 2019).

APPNP ⠀ A PyTorch implementation of Predict then Propagate: Graph Neural Networks meet Personalized PageRank (ICLR 2019). Abstract Neural message pass

Benedek Rozemberczki 329 Dec 30, 2022
Hitters Linear Regression - Hitters Linear Regression With Python

Hitters_Linear_Regression Kullanacağımız veri seti Carnegie Mellon Üniversitesi'

AyseBuyukcelik 2 Jan 26, 2022
Real-time LIDAR-based Urban Road and Sidewalk detection for Autonomous Vehicles 🚗

urban_road_filter: a real-time LIDAR-based urban road and sidewalk detection algorithm for autonomous vehicles Dependency ROS (tested with Kinetic and

JKK - Vehicle Industry Research Center 180 Dec 12, 2022
Generalizing Gaze Estimation with Outlier-guided Collaborative Adaptation

Generalizing Gaze Estimation with Outlier-guided Collaborative Adaptation Our paper is accepted by ICCV2021. Picture: Overview of the proposed Plug-an

Yunfei Liu 32 Dec 10, 2022
Conversational text Analysis using various NLP techniques

PyConverse Let me try first Installation pip install pyconverse Usage Please try this notebook that demos the core functionalities: basic usage noteb

Rita Anjana 158 Dec 25, 2022
Code repo for "Towards Interpretable Deep Networks for Monocular Depth Estimation" paper.

InterpretableMDE A PyTorch implementation for "Towards Interpretable Deep Networks for Monocular Depth Estimation" paper. arXiv link: https://arxiv.or

Zunzhi You 16 Aug 12, 2022
Aydin is a user-friendly, feature-rich, and fast image denoising tool

Aydin is a user-friendly, feature-rich, and fast image denoising tool that provides a number of self-supervised, auto-tuned, and unsupervised image denoising algorithms.

Royer Lab 99 Dec 14, 2022
Sign Language Transformers (CVPR'20)

Sign Language Transformers (CVPR'20) This repo contains the training and evaluation code for the paper Sign Language Transformers: Sign Language Trans

Necati Cihan Camgoz 164 Dec 30, 2022
This repository is an implementation of paper : Improving the Training of Graph Neural Networks with Consistency Regularization

CRGNN Paper : Improving the Training of Graph Neural Networks with Consistency Regularization Environments Implementing environment: GeForce RTX™ 3090

THUDM 28 Dec 09, 2022
A general and strong 3D object detection codebase that supports more methods, datasets and tools (debugging, recording and analysis).

ALLINONE-Det ALLINONE-Det is a general and strong 3D object detection codebase built on OpenPCDet, which supports more methods, datasets and tools (de

Michael.CV 5 Nov 03, 2022
PClean: A Domain-Specific Probabilistic Programming Language for Bayesian Data Cleaning

PClean: A Domain-Specific Probabilistic Programming Language for Bayesian Data Cleaning Warning: This is a rapidly evolving research prototype.

MIT Probabilistic Computing Project 190 Dec 27, 2022
A python library for highly configurable transformers - easing model architecture search and experimentation.

A python library for highly configurable transformers - easing model architecture search and experimentation.

Anthony Fuller 51 Nov 20, 2022
Official PyTorch Implementation of Unsupervised Learning of Scene Flow Estimation Fusing with Local Rigidity

UnRigidFlow This is the official PyTorch implementation of UnRigidFlow (IJCAI2019). Here are two sample results (~10MB gif for each) of our unsupervis

Liang Liu 28 Nov 16, 2022
3D mesh stylization driven by a text input in PyTorch

Text2Mesh [Project Page] Text2Mesh is a method for text-driven stylization of a 3D mesh, as described in "Text2Mesh: Text-Driven Neural Stylization fo

Threedle (University of Chicago) 649 Dec 27, 2022
Author's PyTorch implementation of TD3+BC, a simple variant of TD3 for offline RL

A Minimalist Approach to Offline Reinforcement Learning TD3+BC is a simple approach to offline RL where only two changes are made to TD3: (1) a weight

Scott Fujimoto 193 Dec 23, 2022
🍅🍅🍅YOLOv5-Lite: lighter, faster and easier to deploy. Evolved from yolov5 and the size of model is only 1.7M (int8) and 3.3M (fp16). It can reach 10+ FPS on the Raspberry Pi 4B when the input size is 320×320~

YOLOv5-Lite:lighter, faster and easier to deploy Perform a series of ablation experiments on yolov5 to make it lighter (smaller Flops, lower memory, a

pogg 1.5k Jan 05, 2023
Distributed DataLoader For Pytorch Based On Ray

Dpex——用户无感知分布式数据预处理组件 一、前言 随着GPU与CPU的算力差距越来越大以及模型训练时的预处理Pipeline变得越来越复杂,CPU部分的数据预处理已经逐渐成为了模型训练的瓶颈所在,这导致单机的GPU配置的提升并不能带来期望的线性加速。预处理性能瓶颈的本质在于每个GPU能够使用的C

Dalong 23 Nov 02, 2022
DI-HPC is an acceleration operator component for general algorithm modules in reinforcement learning algorithms

DI-HPC: Decision Intelligence - High Performance Computation DI-HPC is an acceleration operator component for general algorithm modules in reinforceme

OpenDILab 185 Dec 29, 2022