A Python library for unevenly-spaced time series analysis

Related tags

Deep Learningtraces
Overview

traces

Version PyVersions CircleCI Documentation Status Coverage Status

A Python library for unevenly-spaced time series analysis.

Why?

Taking measurements at irregular intervals is common, but most tools are primarily designed for evenly-spaced measurements. Also, in the real world, time series have missing observations or you may have multiple series with different frequencies: it can be useful to model these as unevenly-spaced.

Traces was designed by the team at Datascope based on several practical applications in different domains, because it turns out unevenly-spaced data is actually pretty great, particularly for sensor data analysis.

Installation

To install traces, run this command in your terminal:

$ pip install traces

Quickstart: using traces

To see a basic use of traces, let's look at these data from a light switch, also known as Big Data from the Internet of Things.

The main object in traces is a TimeSeries, which you create just like a dictionary, adding the five measurements at 6:00am, 7:45:56am, etc.

>>> time_series = traces.TimeSeries()
>>> time_series[datetime(2042, 2, 1,  6,  0,  0)] = 0 #  6:00:00am
>>> time_series[datetime(2042, 2, 1,  7, 45, 56)] = 1 #  7:45:56am
>>> time_series[datetime(2042, 2, 1,  8, 51, 42)] = 0 #  8:51:42am
>>> time_series[datetime(2042, 2, 1, 12,  3, 56)] = 1 # 12:03:56am
>>> time_series[datetime(2042, 2, 1, 12,  7, 13)] = 0 # 12:07:13am

What if you want to know if the light was on at 11am? Unlike a python dictionary, you can look up the value at any time even if it's not one of the measurement times.

>>> time_series[datetime(2042, 2, 1, 11,  0, 0)] # 11:00am
0

The distribution function gives you the fraction of time that the TimeSeries is in each state.

>>> time_series.distribution(
>>>   start=datetime(2042, 2, 1,  6,  0,  0), # 6:00am
>>>   end=datetime(2042, 2, 1,  13,  0,  0)   # 1:00pm
>>> )
Histogram({0: 0.8355952380952381, 1: 0.16440476190476191})

The light was on about 16% of the time between 6am and 1pm.

Adding more data...

Now let's get a little more complicated and look at the sensor readings from forty lights in a house.

How many lights are on throughout the day? The merge function takes the forty individual TimeSeries and efficiently merges them into one TimeSeries where the each value is a list of all lights.

>>> trace_list = [... list of forty traces.TimeSeries ...]
>>> count = traces.TimeSeries.merge(trace_list, operation=sum)

We also applied a sum operation to the list of states to get the TimeSeries of the number of lights that are on.

How many lights are on in the building on average during business hours, from 8am to 6pm?

>>> histogram = count.distribution(
>>>   start=datetime(2042, 2, 1,  8,  0,  0),   # 8:00am
>>>   end=datetime(2042, 2, 1,  12 + 6,  0,  0) # 6:00pm
>>> )
>>> histogram.median()
17

The distribution function returns a Histogram that can be used to get summary metrics such as the mean or quantiles.

It's flexible

The measurements points (keys) in a TimeSeries can be in any units as long as they can be ordered. The values can be anything.

For example, you can use a TimeSeries to keep track the contents of a grocery basket by the number of minutes within a shopping trip.

>>> time_series = traces.TimeSeries()
>>> time_series[1.2] = {'broccoli'}
>>> time_series[1.7] = {'broccoli', 'apple'}
>>> time_series[2.2] = {'apple'}          # puts broccoli back
>>> time_series[3.5] = {'apple', 'beets'} # mmm, beets

To learn more, check the examples and the detailed reference.

More info

Contributing

Contributions are welcome and greatly appreciated! Please visit our guidelines for more info.

Comments
  • Trying to calculate the mean of an empty Histogram fails

    Trying to calculate the mean of an empty Histogram fails

    Running .mean() on an empty Histogram object (Histogram(None, 1000, {0: 0.0})) fails with a divide by zero error:

      File "/src/traces/traces/histogram.py", line 30, in mean
        return weighted_sum / float(self.total())
    ZeroDivisionError: float division by zero
    
    Bug Report 
    opened by vlsd 6
  • How are the plots in the documentation created?

    How are the plots in the documentation created?

    Not a bug, but just curious about how you've plotted the charts in the documentation and what the recommended approach for plotting TimeSeries objects is? I couldn't find a trace of this information in the repo. Thanks in advance!

    opened by Ogaday 5
  • Dev

    Dev

    This covers an initial implementation of the EventSeries features described in #229

    I ended up leaving out the histogram plotting feature as creating reasonable and responsive log binned histograms of time units felt a little outside the scope of this project, though something I may yet tackle.

    Would love a review for readability, test coverage, or feature suggestions!

    opened by nsteins 4
  • add possibility to write ts[start:end] = v to change value on an interval

    add possibility to write ts[start:end] = v to change value on an interval

    I have a use case where I need to change the value of a timeseries on an interval without changing the value outside of the interval, ie do something like ts[start:end] = value. Just setting

    ts[end] = ts[end]   # freezing/anchoring the current value of ts as of [end, ...)
    ts[start] = value      # changing the value as of [start, ...)
    

    may fail as intermediate points in [start,end) may exist ==> we need to remove all intermediate points (which is easy as ts.iterperiods(start,end) provides them nicely).

    I think the function below does it properly (but it would be better integrated in the item to use the slice notation)

    def set_slice(ts, start, end, value):
        """
       ts[start:end] = value ==> call set_slice(ts, start, end, value)
        Set the value of the ts so that
        - on the interval [start, end) we have the new value
        - on [end, ...) we haven't change the value
        - on (..., start) we haven't change the value neither
        We replace the value of the ts on an interval.
    
        :param ts: 
        :param start: 
        :param end: 
        :param value: 
        :return: 
        """
        # for each interval to render
        for i, (s, e, v) in enumerate(list(ts.iterperiods(start, end))):
            # look at all intervals included in the current interval
            # (always at least 1)
            if i == 0:
                # if the first, set initial value to new value of range
                ts[s] = value
            else:
                # otherwise, remove intermediate key
                del ts[s]
        # finish by setting the end of the interval to the previous value
        ts[end] = v
    
    
    
    opened by sdementen 4
  • Values in TimeSeries.distribution() are sentence-cased regardless of how vales were added to the TimeSeries

    Values in TimeSeries.distribution() are sentence-cased regardless of how vales were added to the TimeSeries

    If you are using strings as values in a TimeSeries:

    ts = traces.TimeSeries()
    ts[1] = JUNK
    ts[3] = JANK
    ts[5] = WHAT
    

    If you call something like ts.distrubution(min, max), you would see something like this:

    Histogram(None, 1000, {'Jank': 0.16008504570112725, 'Junk': 0.04229136076598496, 'What': 0.797577092766277})
    

    It looks like somewhere along the line, the string-values are getting sentence-cased. Not sure exactly where yet, but this could be confusing or cause silly bugs if looking-up these objects with the wrong value.

    opened by michaelmoliterno 4
  • Fix conversion of window_size to float breaking timedelta compatiblity

    Fix conversion of window_size to float breaking timedelta compatiblity

    With commit 05a14608d06b06dfc589ae9c247d300b89f956b5, using a timedelta as sampling_period in moving_average throws an exception when converting window_size to a float. Multiplying by 1. (as previously done) serves the same purpose and still allows timedelta to be used.

    opened by cesarrodrig 2
  • Feature Request: linear interpolation for mean

    Feature Request: linear interpolation for mean

    So I recently discovered this nice library and decided to try it since I got unevenly spaced data, however I found out today that the .mean() wasn't doing linear interpolation as I thought it would be:

    >>> from traces import TimeSeries
    >>> t = TimeSeries()
    >>> t[0] = 0
    >>> t[1] = 0
    >>> t[3] = 20
    >>> t.mean(0, 2)
    0.0
    

    With linear interpolation between 2 points we would find that t[2] = 10 and doing the average from 0 to 2 would give us 3.333 in this example. A simple optional argument in mean() to choose the interpolation method would be fantastic, and I really think that it would be useful to many users who are not using traces exclusively for binary data (where linear interpolation would make no sense). I know that we can re-sample the TimeSeries but I think a shortcut like this would be really neat since this library is designed with ease of use in mind.

    Thanks for reading and have a nice day 👋

    opened by Inspirateur 2
  • [Question] How to recreate traces chart?

    [Question] How to recreate traces chart?

    I wonder, how could one plot traces' Signature plot? signature plot ?

    I was wondering if the library has anything to do with the charts (as per the docs that is not the case) but seeing a couple of charts like that in the docs made me think that maybe producing that kind of charts is within the scope of the projects.

    opened by manugarri 2
  • Can't pickle TimeSeries objects

    Can't pickle TimeSeries objects

    [UPDATE] This only seems to happen on python 2.7

    Trying to pickle a TimeSeries object:

    import traces
    ofile = open('test.pkl', 'wb')
    import pickle
    ts = traces.TimeSeries()
    ts[23]="blah"
    ts[2]="foo"
    pickle.dump(ts, ofile)
    

    I get the following error:

    In [9]: pickle.dump(ts, ofile)
    ---------------------------------------------------------------------------
    TypeError                                 Traceback (most recent call last)
    <ipython-input-9-f1eed5bd8d83> in <module>()
    ----> 1 pickle.dump(ts, ofile)
    
    /Users/vlad/.pyenv/versions/2.7.13/lib/python2.7/pickle.pyc in dump(obj, file, protocol)
       1374
       1375 def dump(obj, file, protocol=None):
    -> 1376     Pickler(file, protocol).dump(obj)
       1377
       1378 def dumps(obj, protocol=None):
    
    /Users/vlad/.pyenv/versions/2.7.13/lib/python2.7/pickle.pyc in dump(self, obj)
        222         if self.proto >= 2:
        223             self.write(PROTO + chr(self.proto))
    --> 224         self.save(obj)
        225         self.write(STOP)
        226
    
    /Users/vlad/.pyenv/versions/2.7.13/lib/python2.7/pickle.pyc in save(self, obj)
        329
        330         # Save the reduce() output and finally memoize the object
    --> 331         self.save_reduce(obj=obj, *rv)
        332
        333     def persistent_id(self, obj):
    
    /Users/vlad/.pyenv/versions/2.7.13/lib/python2.7/pickle.pyc in save_reduce(self, func, args, state, listitems, dictitems, obj)
        423
        424         if state is not None:
    --> 425             save(state)
        426             write(BUILD)
        427
    
    /Users/vlad/.pyenv/versions/2.7.13/lib/python2.7/pickle.pyc in save(self, obj)
        284         f = self.dispatch.get(t)
        285         if f:
    --> 286             f(self, obj) # Call unbound method with explicit self
        287             return
        288
    
    /Users/vlad/.pyenv/versions/2.7.13/lib/python2.7/pickle.pyc in save_dict(self, obj)
        653
        654         self.memoize(obj)
    --> 655         self._batch_setitems(obj.iteritems())
        656
        657     dispatch[DictionaryType] = save_dict
    
    /Users/vlad/.pyenv/versions/2.7.13/lib/python2.7/pickle.pyc in _batch_setitems(self, items)
        667             for k, v in items:
        668                 save(k)
    --> 669                 save(v)
        670                 write(SETITEM)
        671             return
    
    /Users/vlad/.pyenv/versions/2.7.13/lib/python2.7/pickle.pyc in save(self, obj)
        284         f = self.dispatch.get(t)
        285         if f:
    --> 286             f(self, obj) # Call unbound method with explicit self
        287             return
        288
    
    /Users/vlad/.pyenv/versions/2.7.13/lib/python2.7/pickle.pyc in save_dict(self, obj)
        653
        654         self.memoize(obj)
    --> 655         self._batch_setitems(obj.iteritems())
        656
        657     dispatch[DictionaryType] = save_dict
    
    /Users/vlad/.pyenv/versions/2.7.13/lib/python2.7/pickle.pyc in _batch_setitems(self, items)
        667             for k, v in items:
        668                 save(k)
    --> 669                 save(v)
        670                 write(SETITEM)
        671             return
    
    /Users/vlad/.pyenv/versions/2.7.13/lib/python2.7/pickle.pyc in save(self, obj)
        304             reduce = getattr(obj, "__reduce_ex__", None)
        305             if reduce:
    --> 306                 rv = reduce(self.proto)
        307             else:
        308                 reduce = getattr(obj, "__reduce__", None)
    
    /Users/vlad/.pyenv/versions/2.7.13/envs/prelude_monitor/lib/python2.7/copy_reg.pyc in _reduce_ex(self, proto)
         68     else:
         69         if base is self.__class__:
    ---> 70             raise TypeError, "can't pickle %s objects" % base.__name__
         71         state = base(self)
         72     args = (self.__class__, base, state)
    
    TypeError: can't pickle instancemethod objects```
    opened by vlsd 2
  • When using a mask with TimeSeries.distribution(), mask.start() is called in `timeseries.py` but `start()` doesn't exist

    When using a mask with TimeSeries.distribution(), mask.start() is called in `timeseries.py` but `start()` doesn't exist

    I think this will be fixed with the next bump; looked for an issue related to this but didn't find one. Feel free to close this out if it was as simple as defining start() for TimeSeries.

    Traceback (most recent call last):
      File "run_plots.py", line 25, in <module>
        make_plots()
      File "/Users/mjfm/projects/modustri/analysis/plots/see_cart_trips.py", line 55, in make_plots
        mask = front_ts,
      File "/Users/mjfm/Virtualenvs/modustri/lib/python2.7/site-packages/traces/timeseries.py", line 622, in distribution
        new_ts = self.slice(mask.start(), mask.end())
    AttributeError: 'TimeSeries' object has no attribute 'start'
    
    opened by michaelmoliterno 2
  • Add `compact` option to `iterperiods()`

    Add `compact` option to `iterperiods()`

    This would merge adjacent periods that have the same value and return them as only one period. Ideally this would be done efficiently, although I'm unclear what that means (store a compact version of the timeseries along with the non-compact one?)

    Enhancement Request 
    opened by vlsd 2
  • `max` for distribution with `start` and `end` gives wrong result

    `max` for distribution with `start` and `end` gives wrong result

    Hello, there seems to be a bug with the Histogram initialization when a start and end are passed.

    versions:

    • python: 3.10.5
    • traces: 0.6.0

    Given the following TimeSeries:

    from traces import TimeSeries
    from pandas import Timestamp
    
    
    ts = TimeSeries(
        {
            Timestamp('2022-10-09 08:48:47'): 5.5,
            Timestamp('2022-10-09 10:36:47'): 51.4,
            Timestamp('2022-10-09 10:38:47'): 15.2,
            Timestamp('2022-10-09 10:38:56'): 0.1,
            Timestamp('2022-10-09 10:41:25'): 4.5
        }
    )
    

    Computing the maximum value with

    ts.distribution().max()
    

    gives 51.4 (as expected)

    However

    ts.distribution(
        start=Timestamp('2022-10-09 07:55:10'),
        end=Timestamp('2022-10-09 10:56:32'),
    ).max()
    

    gives 5.5

    Thank you.

    opened by RuiLoureiro 1
  • No longer maintained?

    No longer maintained?

    This repo looks like it's no longer maintained, with the last PR merged over two years ago. Are you looking for active maintainers? What's the plan for this repo?

    opened by nielsuit227 0
  • Incorrect handling of Numpy array passed as times of measurements

    Incorrect handling of Numpy array passed as times of measurements

    In the following example, although ts1 and ts2 are equal, ts2.distribution() fails with a TypeError as if Numpy arrays weren't recognized properly.
    Somewhat similar to issue #145

    import numpy as np  # Numpy version 1.22.3
    import traces  # traces version 0.6.0
    
    ts1 = traces.TimeSeries(zip(range(4), range(4)), default=0)
    ts2 = traces.TimeSeries(zip(np.arange(4), range(4)), default=0)
    
    ts1 == ts2  # True
    ts1.distribution()  # Histogram({0: 0.3333333333333333, 1: 0.3333333333333333, 2: 0.3333333333333333})
    ts2.distribution()  # TypeError: duration is an unknown type (1)
    
    opened by yportier 0
  • [... list of forty traces.TimeSeries ...] is not functioning.

    [... list of forty traces.TimeSeries ...] is not functioning.

    Hi, Thanks for creating traces. I am trying to learn it. But while I run the following command,

    [... list of forty traces.TimeSeries ...]
    

    I get an error which is mentioned below,

     File "/tmp/ipykernel_51/3316415681.py", line 3
        trace_list = [... list of forty traces.TimeSeries ...]
                             ^
    SyntaxError: invalid syntax
    

    Could anybody please help? Thanks a lot.

    opened by bhavinmoriya 2
  • Allow more flexible type checks on duration

    Allow more flexible type checks on duration

    The current way of checking int and float cannot handle numpy's data types, such as np.int64 and np.float64, which requires extra effort to convert a numpy element into int or float to pass the check.

    Using numeric ABCs solves the problem and allows more flexible "implementations" of integers and real numbers. (Better than the approach in #224, no extra dependencies needed)

    === Updated === Tests can be passed locally. CI seems to complain something about repo_token and marked them as failed.

    opened by zzrcxb 0
Releases(v0.5.1)
AdamW optimizer and cosine learning rate annealing with restarts

AdamW optimizer and cosine learning rate annealing with restarts This repository contains an implementation of AdamW optimization algorithm and cosine

Maksym Pyrozhok 133 Dec 20, 2022
PyTorch implementation of DreamerV2 model-based RL algorithm

PyDreamer Reimplementation of DreamerV2 model-based RL algorithm in PyTorch. The official DreamerV2 implementation can be found here. Features ... Run

118 Dec 15, 2022
I tried to apply the CAM algorithm to YOLOv4 and it worked.

YOLOV4:You Only Look Once目标检测模型在pytorch当中的实现 2021年2月7日更新: 加入letterbox_image的选项,关闭letterbox_image后网络的map得到大幅度提升。 目录 性能情况 Performance 实现的内容 Achievement

55 Dec 05, 2022
CenterFace(size of 7.3MB) is a practical anchor-free face detection and alignment method for edge devices.

CenterFace Introduce CenterFace(size of 7.3MB) is a practical anchor-free face detection and alignment method for edge devices. Recent Update 2019.09.

StarClouds 1.2k Dec 21, 2022
Author: Wenhao Yu ([email protected]). ACL 2022. Commonsense Reasoning on Knowledge Graph for Text Generation

Diversifying Commonsense Reasoning Generation on Knowledge Graph Introduction -- This is the pytorch implementation of our ACL 2022 paper "Diversifyin

DM2 Lab @ ND 61 Dec 30, 2022
Training neural models with structured signals.

Neural Structured Learning in TensorFlow Neural Structured Learning (NSL) is a new learning paradigm to train neural networks by leveraging structured

955 Jan 02, 2023
Generative Modelling of BRDF Textures from Flash Images [SIGGRAPH Asia, 2021]

Neural Material Official code repository for the paper: Generative Modelling of BRDF Textures from Flash Images [SIGGRAPH Asia, 2021] Henzler, Deschai

Philipp Henzler 80 Dec 20, 2022
Implementation of Monocular Direct Sparse Localization in a Prior 3D Surfel Map (DSL)

DSL Project page: https://sites.google.com/view/dsl-ram-lab/ Monocular Direct Sparse Localization in a Prior 3D Surfel Map Authors: Haoyang Ye, Huaiya

Haoyang Ye 93 Nov 30, 2022
Co-GAIL: Learning Diverse Strategies for Human-Robot Collaboration

CoGAIL Table of Content Overview Installation Dataset Training Evaluation Trained Checkpoints Acknowledgement Citations License Overview This reposito

Jeremy Wang 29 Dec 24, 2022
A library for augmentation of a YOLO-formated dataset

YOLO Dataset Augmentation lib Инструкция по использованию этой библиотеки Запуск всех файлов осуществлять из консоли. GoogleCrawl_to_Dataset.py Это ск

Egor Orel 1 Dec 10, 2022
This is the code repository for the paper "Identification of the Generalized Condorcet Winner in Multi-dueling Bandits" (NeurIPS 2021).

Code Repository for the Paper "Identification of the Generalized Condorcet Winner in Multi-dueling Bandits" (To appear in: Proceedings of NeurIPS20

1 Oct 03, 2022
🌳 A Python-inspired implementation of the Optimum-Path Forest classifier.

OPFython: A Python-Inspired Optimum-Path Forest Classifier Welcome to OPFython. Note that this implementation relies purely on the standard LibOPF. Th

Gustavo Rosa 30 Jan 04, 2023
This is the implementation of the paper LiST: Lite Self-training Makes Efficient Few-shot Learners.

LiST (Lite Self-Training) This is the implementation of the paper LiST: Lite Self-training Makes Efficient Few-shot Learners. LiST is short for Lite S

Microsoft 28 Dec 07, 2022
《Towards High Fidelity Face Relighting with Realistic Shadows》(CVPR 2021)

Towards High Fidelity Face-Relighting with Realistic Shadows Andrew Hou, Ze Zhang, Michel Sarkis, Ning Bi, Yiying Tong, Xiaoming Liu. In CVPR, 2021. T

114 Dec 10, 2022
Official implementation of GraphMask as presented in our paper Interpreting Graph Neural Networks for NLP With Differentiable Edge Masking.

GraphMask This repository contains an implementation of GraphMask, the interpretability technique for graph neural networks presented in our ICLR 2021

Michael Schlichtkrull 29 Sep 02, 2022
[WWW 2022] Zero-Shot Stance Detection via Contrastive Learning

PT-HCL for Zero-Shot Stance Detection The code of this repository is constantly being updated... Please look forward to it! Introduction This reposito

Akuchi 12 Dec 21, 2022
Code for the paper "Offline Reinforcement Learning as One Big Sequence Modeling Problem"

Trajectory Transformer Code release for Offline Reinforcement Learning as One Big Sequence Modeling Problem. Installation All python dependencies are

Michael Janner 266 Dec 27, 2022
A collection of inference modules for fastai2

fastinference A collection of inference modules for fastai including inference speedup and interpretability Install pip install fastinference There ar

Zachary Mueller 83 Oct 10, 2022
A deep learning framework for historical document image analysis

DIVA-DAF Description A deep learning framework for historical document image analysis. How to run Install dependencies # clone project git clone https

9 Aug 04, 2022
Configure SRX interfaces with Scrapli

Configure SRX interfaces with Scrapli Overview This example will show how to configure interfaces on Juniper's SRX firewalls. In addition to the Pytho

Calvin Remsburg 1 Jan 07, 2022