The Internet Archive Research Assistant - Daily search Internet Archive for new items matching your keywords

Last update: Dec 25, 2022

Overview

tiara - The Internet Archive Research Assistant

The Internet Archive Research Assistant - Daily search Internet Archive for new items matching your keywords

by Kay Savetz, May 2021.

Searches Internet Archive using its full text search for new items matching the keywords you specify. Run this script once a day via crontab for daily updates about new items relevant to your ongoing research subjects. It keeps track of the items it has already found, so will only alert you to new-to-you items. The script outputs its findings to an html file, and optionally emails that file to you via SendGrid or your system mail (eg Sendmail or Postfix).

Put your keywords in searchlist.txt, one search term per line. Very general terms (like "dogs") provide too many daily hits to be useful. More specific phrases work better.

Dependency: Internet Archive command line tool (Install with pip install internetarchive) The script also requires read-write access to the directory it lives in.

Issue: Internet Archive cannot generate thumbnails for all items. In these cases, you may see a broken image icon. Issue: Internet Archive's full text search doesn't seem to allow exact phrase matching. So a search for "Pliny The Elder" may turn up items mentioning Pliny The Younger, or with "Pliny" on one page and "elder" on another.

If you find this tool useful, please donate to Internet Archive

The Internet Archive Research Assistant - Daily search Internet Archive for new items matching your keywords

Related tags

Overview

tiara - The Internet Archive Research Assistant

Owner

Kay Savetz

A Japanese tokenizer based on recurrent neural networks

This is a project built for FALLABOUT2021 event under SRMMIC, This project deals with NLP poetry generation.

Deduplication is the task to combine different representations of the same real world entity.

A Python script that compares files in directories

I label phrases on a scale of five values: negative, somewhat negative, neutral, somewhat positive, positive

API for the GPT-J language model 🦜. Including a FastAPI backend and a streamlit frontend

Deep Learning Topics with Computer Vision & NLP

A BERT-based reverse dictionary of Korean proverbs

Searching keywords in PDF file folders

vits chinese, tts chinese, tts mandarin

Code for the paper in Findings of EMNLP 2021: "EfficientBERT: Progressively Searching Multilayer Perceptron via Warm-up Knowledge Distillation".

Gpt2-WebAPI - The objective of this API is to provide the 3 best possible responses to sentences that the user would input via http GET request as a parameter

Code for the paper "BERT Loses Patience: Fast and Robust Inference with Early Exit".

Convolutional 2D Knowledge Graph Embeddings resources

FewCLUE: 为中文NLP定制的小样本学习测评基准

To be a next-generation DL-based phenotype prediction from genome mutations.

Main repository for the chatbot Bobotinho.

🗣️ NALP is a library that covers Natural Adversarial Language Processing.

LSTM model - IMDB review sentiment analysis

🚀 RocketQA, dense retrieval for information retrieval and question answering, including both Chinese and English state-of-the-art models.