☀️ Measuring the accuracy of BBC weather forecasts in Honolulu, USA

Overview

Accuracy of BBC Weather forecasts for Honolulu

This repository records the forecasts made by BBC Weather for the city of Honolulu, USA. Essentially, there's a GitHub Action that runs at each 30 minute mark and saves the latest forecasts. The data is stored in a separate branch called data. Therefore, the data is versioned. This allows going back into the past to see the forecasts that were made for any given hour in the (relative) future.

I made this after watching Git scraping, the five minute lightning talk by Simon Willison. It blew my mind! I agree with Simon that collecting and versioning API data via git is a powerful pattern. You could use this pattern to keep a ledger of any dynamic forecasting system, such as the predicted outcomes of football games. In these dynamical systems, the forecasts are updated when new information becomes available. Therefore, the forecasted values depend on the point in time when they were made. I think that it's super interesting to analyse how these forecasts evolve through time.

The build_database.py script iterates through all the commits in the data branch and consolidates the data into an SQLite database. You can run the script yourself by simply cloning this repository. Then, go into a terminal, navigate to the cloned repository, and install the necessary Python dependencies:

python -m venv .env
source .env/bin/activate
pip install -r requirements.txt

Then, run the consolidation script:

python build_database.py

This will create a bbc_weather.sqlite file. You can load the latter into your preferred database access tool — I have a personal preference for DataGrip — to analyse the data. At present, the database contains two tables:

forecasts

These are the predicted weather values made at one point in time for a future point in time.

issued_at at celsius feels_like_celsius wind_speed_kph
2021-03-10 09:00:00 2021-03-10 11:00:00 24 30 16
2021-03-10 09:00:00 2021-03-10 12:00:00 25 31 17
2021-03-10 09:00:00 2021-03-10 13:00:00 26 32 17
2021-03-10 09:00:00 2021-03-10 14:00:00 27 33 17
2021-03-10 09:00:00 2021-03-10 15:00:00 26 33 17

observations

These are the weather values that actually occurred — as opposed to those that were forecasted.

at celsius wind_speed_kph
2021-03-09 19:00:00 23 0
2021-03-09 20:00:00 22 8
2021-03-09 21:00:00 22 0
2021-03-09 22:00:00 21 9
2021-03-09 23:00:00 21 0

Check out measure_accuracy.sql for an example of how to evaluate the correctness of the forecasts.

Owner
Max Halford
Data wizard @alan-eu. PhD in machine learning applied to query optimization. Kaggle competitions Master when it was cool. Online machine learning nut.
Max Halford
CATs: Semantic Correspondence with Transformers

CATs: Semantic Correspondence with Transformers For more information, check out the paper on [arXiv]. Training with different backbones and evaluation

74 Dec 10, 2021
[ICCV 2021] Instance-level Image Retrieval using Reranking Transformers

Instance-level Image Retrieval using Reranking Transformers Fuwen Tan, Jiangbo Yuan, Vicente Ordonez, ICCV 2021. Abstract Instance-level image retriev

UVA Computer Vision 86 Dec 28, 2022
Utilize Korean BERT model in sentence-transformers library

ko-sentence-transformers 이 프로젝트는 KoBERT 모델을 sentence-transformers 에서 보다 쉽게 사용하기 위해 만들어졌습니다. Ko-Sentence-BERT-SKTBERT 프로젝트에서는 KoBERT 모델을 sentence-trans

Junghyun 40 Dec 20, 2022
MicBot - MicBot uses Google Translate to speak everyone's chat messages

MicBot MicBot uses Google Translate to speak everyone's chat messages. It can al

2 Mar 09, 2022
A natural language modeling framework based on PyTorch

Overview PyText is a deep-learning based NLP modeling framework built on PyTorch. PyText addresses the often-conflicting requirements of enabling rapi

Meta Research 6.4k Jan 08, 2023
Russian GPT3 models.

Russian GPT-3 models (ruGPT3XL, ruGPT3Large, ruGPT3Medium, ruGPT3Small) trained with 2048 sequence length with sparse and dense attention blocks. We also provide Russian GPT-2 large model (ruGPT2Larg

Sberbank AI 1.6k Jan 05, 2023
Maha is a text processing library specially developed to deal with Arabic text.

An Arabic text processing library intended for use in NLP applications Maha is a text processing library specially developed to deal with Arabic text.

Mohammad Al-Fetyani 184 Nov 27, 2022
[WWW 2021 GLB] New Benchmarks for Learning on Non-Homophilous Graphs

New Benchmarks for Learning on Non-Homophilous Graphs Here are the codes and datasets accompanying the paper: New Benchmarks for Learning on Non-Homop

94 Dec 21, 2022
Translation for Trilium Notes. Trilium Notes 中文版.

Trilium Translation 中文说明 This repo provides a translation for the awesome Trilium Notes. Currently, I have translated Trilium Notes into Chinese. Test

743 Jan 08, 2023
Line as a Visual Sentence: Context-aware Line Descriptor for Visual Localization

Line as a Visual Sentence with LineTR This repository contains the inference code, pretrained model, and demo scripts of the following paper. It suppo

SungHo Yoon 158 Dec 27, 2022
自然言語で書かれた時間情報表現を抽出/規格化するルールベースの解析器

ja-timex 自然言語で書かれた時間情報表現を抽出/規格化するルールベースの解析器 概要 ja-timex は、現代日本語で書かれた自然文に含まれる時間情報表現を抽出しTIMEX3と呼ばれるアノテーション仕様に変換することで、プログラムが利用できるような形に規格化するルールベースの解析器です。

Yuki Okuda 116 Nov 09, 2022
NeMo: a toolkit for conversational AI

NVIDIA NeMo Introduction NeMo is a toolkit for creating Conversational AI applications. NeMo product page. Introductory video. The toolkit comes with

NVIDIA Corporation 5.3k Jan 04, 2023
Count the frequency of letters or words in a text file and show a graph.

Word Counter By EBUS Coding Club Count the frequency of letters or words in a text file and show a graph. Requirements Python 3.9 or higher matplotlib

EBUS Coding Club 0 Apr 09, 2022
IEEEXtreme15.0 Questions And Answers

IEEEXtreme15.0 Questions And Answers IEEEXtreme is a global challenge in which teams of IEEE Student members – advised and proctored by an IEEE member

Dilan Perera 15 Oct 24, 2022
Contains the code and data for our #ICSE2022 paper titled as "CodeFill: Multi-token Code Completion by Jointly Learning from Structure and Naming Sequences"

CodeFill This repository contains the code for our paper titled as "CodeFill: Multi-token Code Completion by Jointly Learning from Structure and Namin

Software Analytics Lab 11 Oct 31, 2022
Transformer-based Text Auto-encoder (T-TA) using TensorFlow 2.

T-TA (Transformer-based Text Auto-encoder) This repository contains codes for Transformer-based Text Auto-encoder (T-TA, paper: Fast and Accurate Deep

Jeong Ukjae 13 Dec 13, 2022
Ongoing research training transformer language models at scale, including: BERT & GPT-2

Megatron (1 and 2) is a large, powerful transformer developed by the Applied Deep Learning Research team at NVIDIA.

NVIDIA Corporation 3.5k Dec 30, 2022
Code for Text Prior Guided Scene Text Image Super-Resolution

Code for Text Prior Guided Scene Text Image Super-Resolution

82 Dec 26, 2022
The Internet Archive Research Assistant - Daily search Internet Archive for new items matching your keywords

The Internet Archive Research Assistant - Daily search Internet Archive for new items matching your keywords

Kay Savetz 60 Dec 25, 2022
Understanding the Difficulty of Training Transformers

Admin Understanding the Difficulty of Training Transformers Guided by our analyses, we propose Adaptive Model Initialization (Admin), which successful

Liyuan Liu 300 Dec 29, 2022