Whisper is a file-based time-series database format for Graphite.

Overview

Whisper

Codacy Badge Build Status FOSSA Status codecov

Overview

Whisper is one of three components within the Graphite project:

  1. Graphite-Web, a Django-based web application that renders graphs and dashboards
  2. The Carbon metric processing daemons
  3. The Whisper time-series database library

Graphite Components

Whisper is a fixed-size database, similar in design and purpose to RRD (round-robin-database). It provides fast, reliable storage of numeric data over time. Whisper allows for higher resolution (seconds per point) of recent data to degrade into lower resolutions for long-term retention of historical data.

Installation, Configuration and Usage

Please refer to the instructions at readthedocs.

Whisper Scripts

rrd2whisper.py

Convert a rrd file into a whisper (.wsp) file.

Usage: rrd2whisper.py rrd_path

Options:
  -h, --help            show this help message and exit
  --xFilesFactor=XFILESFACTOR
                        The xFilesFactor to use in the output file. Defaults
                        to the input RRD's xFilesFactor
  --aggregationMethod=AGGREGATIONMETHOD
                        The consolidation function to fetch from on input and
                        aggregationMethod to set on output. One of: average,
                        last, max, min, avg_zero, absmax, absmin
  --destinationPath=DESTINATIONPATH
                        Path to place created whisper file. Defaults to the
                        RRD file's source path.

whisper-create.py

Create a new whisper database file.

Usage: whisper-create.py path timePerPoint:timeToStore [timePerPoint:timeToStore]*
       whisper-create.py --estimate timePerPoint:timeToStore [timePerPoint:timeToStore]*

timePerPoint and timeToStore specify lengths of time, for example:

60:1440      60 seconds per datapoint, 1440 datapoints = 1 day of retention
15m:8        15 minutes per datapoint, 8 datapoints = 2 hours of retention
1h:7d        1 hour per datapoint, 7 days of retention
12h:2y       12 hours per datapoint, 2 years of retention


Options:
  -h, --help            show this help message and exit
  --xFilesFactor=XFILESFACTOR
  --aggregationMethod=AGGREGATIONMETHOD
                        Function to use when aggregating values (average, sum,
                        last, max, min, avg_zero, absmax, absmin)
  --overwrite
  --estimate            Don't create a whisper file, estimate storage requirements based on archive definitions

whisper-dump.py

Dump the whole whisper file content to stdout.

Usage: whisper-dump.py path

Options:
  -h, --help            show this help message and exit
  --pretty              Show human-readable timestamps instead of unix times
  -t TIME_FORMAT, --time-format=TIME_FORMAT
                        Time format to use with --pretty; see time.strftime()
  -r, --raw             Dump value only in the same format for whisper-update
                        (UTC timestamps)

whisper-fetch.py

Fetch all the metrics stored in a whisper file to stdout.

Usage: whisper-fetch.py [options] path

Options:
  -h, --help     show this help message and exit
  --from=_FROM   Unix epoch time of the beginning of your requested interval
                 (default: 24 hours ago)
  --until=UNTIL  Unix epoch time of the end of your requested interval
                 (default: now)
  --json         Output results in JSON form
  --pretty       Show human-readable timestamps instead of unix times
  -t TIME_FORMAT, --time-format=TIME_FORMAT
                 Time format to use with --pretty; see time.strftime()
  --drop=DROP    Specify 'nulls' to drop all null values. Specify 'zeroes' to
                 drop all zero values. Specify 'empty' to drop both null and
                 zero values.

whisper-info.py

Dump the metadata about a whisper file to stdout.

Usage: whisper-info.py [options] path [field]

Options:
  -h, --help  show this help message and exit
  --json      Output results in JSON form

whisper-merge.py

Join two existing whisper files together.

Usage: whisper-merge.py [options] from_path to_path

Options:
  -h, --help  show this help message and exit

whisper-fill.py

Copies data from src in dst, if missing. Unlike whisper-merge, don't overwrite data that's already present in the target file, but instead, only add the missing data (e.g. where the gaps in the target file are). Because no values are overwritten, no data or precision gets lost. Also, unlike whisper-merge, try to take the highest-precision archive to provide the data, instead of the one with the largest retention.

Usage: whisper-fill.py [options] src_path dst_path

Options:
  -h, --help  show this help message and exit

whisper-resize.py

Change the retention rates of an existing whisper file.

Usage: whisper-resize.py path timePerPoint:timeToStore [timePerPoint:timeToStore]*

timePerPoint and timeToStore specify lengths of time, for example:

60:1440      60 seconds per datapoint, 1440 datapoints = 1 day of retention
15m:8        15 minutes per datapoint, 8 datapoints = 2 hours of retention
1h:7d        1 hour per datapoint, 7 days of retention
12h:2y       12 hours per datapoint, 2 years of retention


Options:
  -h, --help            show this help message and exit
  --xFilesFactor=XFILESFACTOR
                        Change the xFilesFactor
  --aggregationMethod=AGGREGATIONMETHOD
                        Change the aggregation function (average, sum, last,
                        max, min, avg_zero, absmax, absmin)
  --force               Perform a destructive change
  --newfile=NEWFILE     Create a new database file without removing the
                        existing one
  --nobackup            Delete the .bak file after successful execution
  --aggregate           Try to aggregate the values to fit the new archive
                        better. Note that this will make things slower and use
                        more memory.

whisper-set-aggregation-method.py

Change the aggregation method of an existing whisper file.

Usage: whisper-set-aggregation-method.py path <average|sum|last|max|min|avg_zero|absmax|absmin>

Options:
  -h, --help  show this help message and exit

whisper-update.py

Update a whisper file with 1 or many values, must provide a time stamp with the value.

Usage: whisper-update.py [options] path timestamp:value [timestamp:value]*

Options:
  -h, --help  show this help message and exit

whisper-diff.py

Check the differences between whisper files. Use sanity check before merging.

Usage: whisper-diff.py [options] path_a path_b

Options:
  -h, --help      show this help message and exit
  --summary       show summary of differences
  --ignore-empty  skip comparison if either value is undefined
  --columns       print output in simple columns
  --no-headers    do not print column headers
  --until=UNTIL   Unix epoch time of the end of your requested interval
                  (default: now)
  --json          Output results in JSON form

License

Whisper is licensed under version 2.0 of the Apache License. See the LICENSE file for details.

Comments
  • Performance/efficiency optimisation

    Performance/efficiency optimisation

    Hi, I'm working on optimising whisper storage efficiency and speed. If you'll be interested in any of the following, we can discuss the implementation details so that they are suitable for you to merge later:

    1. Enable different data formats to reduce storage overhead. It's done, it was easy. We needed that to store i.e. byte values - using double for that is a great waste of space. The code can be made compatible with old format - new code could read old format for seamless migration, not the other way around (obviously) 1.1 Allow passing arguments to storage backend from storage-schema.conf Due to the way Archive class and loadStorageSchemas() are implemented this will break things, unless you have a better idea
    2. Remove timestamp from datafiles. Since I'm disk/io bound and my data is not sparse, there are more efficient (space-way at least) methods to keep timestamps.
    3. Introduce reduced precision storage For example, map (linearly at first, and with arbitrary functions later) values from bigger range (like 0-65000) into byte to reduce storage size. There are some cases where reduction in precision is acceptable and benefits worth the overhead of extra calculations and doing it inside storage backend makes it comfortable to use and control

    Points 1. and 2. together can in some cases (1 byte values) reduce data size by a factor of 12, and probably in most cases by a factor of 3 - that's quite a lot in my opinion. Also reduction in size could offset any additional overhead that would result from those changes. And if it doesn't, there are other possibilities that can be (and will be) implemented to speed things up:

    • pure python code optimisations
    • native/configurable byte order for data (what's the reason for BE in the first place?)
    • numpy
    • cython
    • pure C

    I'd also welcome some help in benchmarking - if someone has a setup they can test on or are interested in the same gains I am, you're welcome to contact me here.

    enhancement discussion stale 
    opened by mkaluza 29
  • whisper: disable buffering for update operations

    whisper: disable buffering for update operations

    We found out that 200GiB of memory isn't enough for 8M metrics, and past this limit whisper will start reading from disk at every update.

    After this patch, updates() will get a 100% hit rate on the page cache with only 100GiB of memory.

    opened by iksaif 29
  • Handle zero length time ranges by returning the next valid point

    Handle zero length time ranges by returning the next valid point

    This appears to match current behavior when a user requests a time range between two intervals in the archive. The next valid point is returned even though its outside the initial time range.

    Previously, as untilInterval was not greater than fromInterval the code would do a wrap-read of the archive reading all data points into memory and would return the correct timeInfo but a very long array of Nones for values. This resulted in very large pickle objects being exchanged between graphite webapps and very long query times.

    opened by jjneely 23
  • Whisper queries on existing files broken after upgrade from 0.9.10 to 0.9.14

    Whisper queries on existing files broken after upgrade from 0.9.10 to 0.9.14

    I have a really odd whisper problem after upgrading to 0.9.14 from 0.9.10. I can no longer see any data for a range that is less than 24 hours and is before the current day. I can see data for the current day, and I can see data for something like "the last week" at one time, but if I try to look at a single day in during that last week or whatever time period, no data points are returned.

    For example:

    On my 0.9.14 installation, this whisper-fetch.py query for data from Mon Nov 16 17:39:35 UTC 2015 until now returns what I would expect. All timestamps return data. whisper-fetch.py --pretty --from=1447695575 myfile.wsp.

    However, this query for an hours worth of data starting at the previous time stamp returns None for all values. whisper-fetch.py --pretty --from=1447695575 --until=1447699175 myfile.wsp.

    I thought I had seriously broken something with the installation, so I copied one of the whisper files back to the old server with 0.9.10 on it and ran the same whisper-fetch.py test queries an the same exact whisper files from the new server, and data shows up as I would expect.

    My first thought was that somehow, somewhere I had screwed up retention, but the retention on these files haven’t changed in several years and it was working and continues to work 100% correctly on the old graphite server, even with whisper files copied back from the 0.9.14 server.

    This is the retention information from the specific file that I've been testing with:

    # From whisper-info.py:
    maxRetention: 315360000
    xFilesFactor: 0.5
    aggregationMethod: average
    fileSize: 5247424
    
    Archive 0
    retention: 86400
    secondsPerPoint: 10
    points: 8640
    size: 103680
    offset: 64
    
    Archive 1
    retention: 2592000
    secondsPerPoint: 60
    points: 43200
    size: 518400
    offset: 103744
    
    Archive 2
    retention: 63072000
    secondsPerPoint: 300
    points: 210240
    size: 2522880
    offset: 622144
    
    Archive 3
    retention: 315360000
    secondsPerPoint: 1800
    points: 175200
    size: 2102400
    offset: 3145024
    
    opened by dbeckham 15
  • py3 compat: enableDebug and auto-resize, update tox file and pep8

    py3 compat: enableDebug and auto-resize, update tox file and pep8

    This makes enableDebug() compatible with python3 and adds testcases. Added option to disable the debug again. Configured tox file to run tests on all active python implementations. Made flake8 pass and updated auto-resize raw_input to support python3.

    opened by piotr1212 14
  • allow for optional comparison of source and destination directories

    allow for optional comparison of source and destination directories

    All original functionality is untouched, but if you need to backfill a large directory of files, firing off a python instance for every single one of them wastes a ton of resources.

    question stale 
    opened by fuzzy 14
  • Select archive

    Select archive

    Introduces one new function to override the default fetch algorithm. In this new algorithm the user can pass the file that he wants to get data from instead of getting the one with the highest precission

    needs backport to 1.0.x 
    opened by adriangalera 13
  • create: unlink the file we've created on ENOSPC  (master)

    create: unlink the file we've created on ENOSPC (master)

    Instead of leaving a 0 byte or corrupted whisper file on the filesystem, this allows carbon to retry the creation later when we might have some free space.

    Same fix as in #105 .

    This is basically wrapping the body of create() in a try/except IOError, and unlinking the file if we caught ENOSPC. The file is explicitly closed in the try/except to catch IOError on close(), which is a edge case that can happen. (see the tests cases in #105 )

    opened by jraby 13
  • Problem querying old data with multiple retentions

    Problem querying old data with multiple retentions

    There was a question asked at https://answers.launchpad.net/graphite/+question/285178 which I include a copy of the text from below. I have discovered that this is resolved by reverting the change made by @obfuscurity in https://github.com/graphite-project/whisper/commit/ccd0c89204f2266fa2fc20bad7e49739568086fa

    i.e. change

      diff = untilTime - fromTime
      for archive in header['archives']:
        if archive['retention'] >= diff:
          break
    

    back to

      diff = now - fromTime
      for archive in header['archives']:
        if archive['retention'] >= diff:
          break
    

    Changing this back satisfies the requirement in http://graphite.readthedocs.io/en/latest/whisper.html namely

    When data is retrieved (scoped by a time range), the first archive which can satisfy the entire time period is used. If the time period overlaps an archive boundary, the lower-resolution archive will be used. This allows for a simpler behavior while retrieving data as the data’s resolution is consistent through an entire returned series.

    Here is a copy of the original question:

    We've been running a grafana/graphite-api/carbon/whisper stack for a while now and it's working generally ok. However, I've noticed that if I drill into data in grafana, once I get to a certain level of detail, the chart is blank.

    Here is some config. Our storage schema looks like this, store on a 10 sec interval for 7 days, then 1 minute for 2 years.

    [Web_Prod] priority = 90 pattern = ^Production..web..WebServer.* retentions = 10s:7d,1m:2y

    I can verify this in the whisper files themselves, like this: -

    /usr/local/src/whisper/bin/whisper-dump.py /opt/graphite/storage/whisper/Production/Live/web/web2-vm/WebServer/Customer/HPS.wsp | less

    Meta data:RETURN) aggregation method: average max retention: 63072000 xFilesFactor: 0

    Archive 0 info: offset: 40 seconds per point: 10 points: 60480 retention: 604800 size: 725760

    Archive 1 info: offset: 725800 seconds per point: 60 points: 1051200 retention: 63072000 size: 12614400

    I've noticed the problem only happens, when querying data older than 7 days i..e after it's been averaged to a 60 second interval. If I pick a time period older than 7 days, across a three minute interval, and look directly inside the whisper file, it all looks good: -

    /usr/local/src/whisper/bin/whisper-fetch.py --from 1454230700 --until 1454230880 /opt/graphite/storage/whisper/Production/Live/web/web2-vm/WebServer/Customer/HPS.wsp

    1454230740 8.000000 1454230800 8.700000 1454230860 8.233333

    However, if I query through graphite-api, it returns a 10 second interval (the wrong retention period, because I'm querying older than 7 days), and all items (even the ones that match the timestamps above) are null.

    http://www.dashboard.com/render?target=Production.Live.web.web2-vm.WebServer.Customer.HPS&from=1454230700&until=1454230880&format=json&maxDataPoints=1000

    [{"target": "Production.Live.web.571854-web2-vm.WebServer.Customer.HPS", "datapoints": [[null, 1454230710], [null, 1454230720], [null, 1454230730], [null, 1454230740], [null, 1454230750], [null, 1454230760], [null, 1454230770], [null, 1454230780], [null, 1454230790], [null, 1454230800], [null, 1454230810], [null, 1454230820], [null, 1454230830], [null, 1454230840], [null, 1454230850], [null, 1454230860], [null, 1454230870], [null, 1454230880]]}]

    If I go for a wider time span, I start to get data back, but some are null and some are populated.

    question 
    opened by fmacgregor 12
  • Adds new absmax aggregation method

    Adds new absmax aggregation method

    absmax is a new aggregation method which returns the largest absolute value when aggregating to a lower precision archive.

    Useful for data such as time offsets, where you care about retaining the value furthest from zero when aggregating but would prefer to preserve whether the offset was positive or negative.

    enhancement 
    opened by yadsirhc 12
  • added --dropnulls and --dropzeroes options to whisper-fetch.py

    added --dropnulls and --dropzeroes options to whisper-fetch.py

    These options helps to cut down on the whisper metrics data export file size on my particular data series, which contains a lot of useless nulls and zeroes.

    opened by coderfi 12
  • Fix whisper-fetch.py --drop timestamps

    Fix whisper-fetch.py --drop timestamps

    Possible fix for #305 .

    I didn't test this, not sure if this is correct.

    Specifically I don't know how timestamps and offsets in Whisper files work. Is a simple t + step appropriate, or is taking into account an offset and wrap-around needed?

    pinned 
    opened by cdeil 3
  • `__archive_fetch` from/until intervals rounding

    `__archive_fetch` from/until intervals rounding

    At the beginning of the function __archive_fetch here there is the following code that rounds from and until times to the granularity of the archive:

      step = archive['secondsPerPoint']
      fromInterval = int(fromTime - (fromTime % step)) + step
      untilInterval = int(untilTime - (untilTime % step)) + step
    

    I do not understand why after this rounding we add step to the result. This will give wrong results.

    Take for instance:

    • fromTime = 1501639200 (02/08/2017 2:00 AM UTC)
    • untilTime = 1501660800 (02/08/2017 8:00 AM UTC)

    with a step of 1 hour (that is step = 3660). In this case both fromTime % step and untilTime % step gives 0 as result but since step is then added we return a result for the range 02/08/2017 3:00 AM UTC -- 02/08/2017 9:00 AM UTC

    question pinned 
    opened by albertored 9
Releases(1.1.10)
  • 1.1.10(May 22, 2022)

    Graphite release 1.1.10 Please see Release Notes https://graphite.readthedocs.io/en/latest/releases/1_1_10.html or changelog https://github.com/graphite-project/graphite-web/blob/master/CHANGELOG.md

    Source code(tar.gz)
    Source code(zip)
  • 1.1.8(Apr 18, 2021)

    Graphite release 1.1.8 Please see Release Notes https://graphite.readthedocs.io/en/latest/releases/1_1_8.html or changelog https://github.com/graphite-project/graphite-web/blob/master/CHANGELOG.md

    Source code(tar.gz)
    Source code(zip)
  • 1.1.7(Mar 16, 2020)

    Graphite release 1.1.7 Please see Release Notes https://graphite.readthedocs.io/en/latest/releases/1_1_7.html or changelog https://github.com/graphite-project/graphite-web/blob/master/CHANGELOG.md

    Source code(tar.gz)
    Source code(zip)
  • 1.1.6(Oct 24, 2019)

  • 1.1.5(Dec 23, 2018)

  • 1.1.4(Sep 3, 2018)

  • 1.1.3(Apr 4, 2018)

  • 1.1.2(Feb 13, 2018)

  • 1.1.1(Dec 19, 2017)

  • 1.1.0-rc(Dec 8, 2017)

    The final release candidate for next major release.

    Detailed release notes and changelog will follow.

    Please check and report any issues.

    Source code(tar.gz)
    Source code(zip)
  • 1.1.0-pre5(Dec 4, 2017)

  • 1.1.0-pre4(Nov 30, 2017)

  • 1.1.0-pre3(Nov 29, 2017)

  • 1.1.0-pre1(Nov 27, 2017)

  • 1.0.2(Jul 11, 2017)

  • 1.0.1(Apr 23, 2017)

  • 1.0.0(Apr 11, 2017)

  • 0.9.16(Apr 11, 2017)

PED: DETR for Crowd Pedestrian Detection

PED: DETR for Crowd Pedestrian Detection Code for PED: DETR For (Crowd) Pedestrian Detection Paper PED: DETR for Crowd Pedestrian Detection Installati

36 Sep 13, 2022
Improving Contrastive Learning by Visualizing Feature Transformation, ICCV 2021 Oral

Improving Contrastive Learning by Visualizing Feature Transformation This project hosts the codes, models and visualization tools for the paper: Impro

Bingchen Zhao 83 Dec 15, 2022
An automated facial recognition based attendance system (desktop application)

Facial_Recognition_based_Attendance_System An automated facial recognition based attendance system (desktop application) Made using Python, Tkinter an

1 Jun 21, 2022
Python scripts for performing road segemtnation and car detection using the HybridNets multitask model in ONNX.

ONNX-HybridNets-Multitask-Road-Detection Python scripts for performing road segemtnation and car detection using the HybridNets multitask model in ONN

Ibai Gorordo 45 Jan 01, 2023
Group R-CNN for Point-based Weakly Semi-supervised Object Detection (CVPR2022)

Group R-CNN for Point-based Weakly Semi-supervised Object Detection (CVPR2022) By Shilong Zhang*, Zhuoran Yu*, Liyang Liu*, Xinjiang Wang, Aojun Zhou,

Shilong Zhang 129 Dec 24, 2022
PyTorch Implementation of "Non-Autoregressive Neural Machine Translation"

Non-Autoregressive Transformer Code release for Non-Autoregressive Neural Machine Translation by Jiatao Gu, James Bradbury, Caiming Xiong, Victor O.K.

Salesforce 261 Nov 12, 2022
Code repository for the paper "Doubly-Trained Adversarial Data Augmentation for Neural Machine Translation" with instructions to reproduce the results.

Doubly Trained Neural Machine Translation System for Adversarial Attack and Data Augmentation Languages Experimented: Data Overview: Source Target Tra

Steven Tan 1 Aug 18, 2022
Equipped customers with insights about their EVs Hourly energy consumption and helped predict future charging behavior using LSTM model

Equipped customers with insights about their EVs Hourly energy consumption and helped predict future charging behavior using LSTM model. Designed sample dashboard with insights and recommendation for

Yash 2 Apr 07, 2022
PyTorch implementation of NIPS 2017 paper Dynamic Routing Between Capsules

Dynamic Routing Between Capsules - PyTorch implementation PyTorch implementation of NIPS 2017 paper Dynamic Routing Between Capsules from Sara Sabour,

Adam Bielski 475 Dec 24, 2022
Adversarial Robustness Toolbox (ART) - Python Library for Machine Learning Security - Evasion, Poisoning, Extraction, Inference - Red and Blue Teams

Adversarial Robustness Toolbox (ART) is a Python library for Machine Learning Security. ART provides tools that enable developers and researchers to defend and evaluate Machine Learning models and ap

3.4k Jan 04, 2023
Simple command line tool for text to image generation using OpenAI's CLIP and Siren (Implicit neural representation network)

Deep Daze mist over green hills shattered plates on the grass cosmic love and attention a time traveler in the crowd life during the plague meditative

Phil Wang 4.4k Jan 03, 2023
Spatially-Adaptive Pixelwise Networks for Fast Image Translation, CVPR 2021

Image Translation with ASAPNets Spatially-Adaptive Pixelwise Networks for Fast Image Translation, CVPR 2021 Webpage | Paper | Video Installation insta

Tamar Rott Shaham 100 Dec 28, 2022
abess: Fast Best-Subset Selection in Python and R

abess: Fast Best-Subset Selection in Python and R Overview abess (Adaptive BEst Subset Selection) library aims to solve general best subset selection,

297 Dec 21, 2022
R3Det based on mmdet 2.19.0

R3Det: Refined Single-Stage Detector with Feature Refinement for Rotating Object Installation # install mmdetection first if you haven't installed it

SJTU-Thinklab-Det 38 Dec 15, 2022
Official implementation of "DSP: Dual Soft-Paste for Unsupervised Domain Adaptive Semantic Segmentation"

DSP Official implementation of "DSP: Dual Soft-Paste for Unsupervised Domain Adaptive Semantic Segmentation". Accepted by ACM Multimedia 2021. Authors

20 Oct 24, 2022
Look Closer: Bridging Egocentric and Third-Person Views with Transformers for Robotic Manipulation

Look Closer: Bridging Egocentric and Third-Person Views with Transformers for Robotic Manipulation Official PyTorch implementation for the paper Look

Rishabh Jangir 20 Nov 24, 2022
Convert dog pictures into various painting styles. Try LimnPet

LimnPet Cartoon stylization service project Try our service » Home page · Team notion · Members 목차 프로젝트 소개 프로젝트 목표 사용한 기술스택과 수행도구 팀원 구현 기능 주요 기능 추가 기능

LiJell 7 Jul 14, 2022
Generalized Jensen-Shannon Divergence Loss for Learning with Noisy Labels

The official code for the NeurIPS 2021 paper Generalized Jensen-Shannon Divergence Loss for Learning with Noisy Labels

13 Dec 22, 2022
​ This is the Pytorch implementation of Progressive Attentional Manifold Alignment.

PAMA This is the Pytorch implementation of Progressive Attentional Manifold Alignment. Requirements python 3.6 pytorch 1.2.0+ PIL, numpy, matplotlib C

98 Nov 15, 2022
2021 Artificial Intelligence Diabetes Datathon

A.I.D.D. 2021 2021 Artificial Intelligence Diabetes Datathon A.I.D.D. 2021은 ‘2021 인공지능 학습용 데이터 구축사업’을 통해 만들어진 학습용 데이터를 활용하여 당뇨병을 효과적으로 예측할 수 있는가에 대한 A

2 Dec 27, 2021