Automatically download and crop key information from the arxiv daily paper.

Last update: Jul 30, 2022

Related tags

Overview

Arxiv daily 速览

功能：按关键词筛选arxiv每日最新paper，自动获取摘要，自动截取文中表格和图片。

1 测试环境

Ubuntu 16+
Python3.7
torch 1.9
Colab GPU

2 使用演示

首先下载权重baiduyun 提取码:il87，放置于code/ParseServer/models/PubLayNet/faster_rcnn_R_50_FPN_3x/model_final.pth

2.1 环境安装

可选择在本地使用或Colab使用，以本地使用为例。

1.提前安装Pytorch GPU版本
2.在本项目根目录启动jupyter notebook，运行Overview_RUNME_Local.ipynb
3.首次运行，先安装环境

4.运行文档版面分析服务，确认正常启动后再运行下一步

5.按照需要填写关键词进行筛选，如果需要PDF文件needPDF=True，需要将结果打包needZip=True

6.启动后，将同时进行下载和文档版面分析，截取需要的内容。下载的文件将保存在./arxiv 目录下，如果needZip=True，会产生 ./arxiv.zip 文件。

2.2 Colab

将code目录压缩上传 google drive根目录
使用Colab运行Overview_RUNME_Colab.ipynb，后续步骤同2.1

3 效果展示

本地解压后，使用Typora markdown阅览工具可进行查看。

每个文件夹中的abs.md文件保留的是当前pdf的介绍。

ps:排版不规范会导致截图混乱，这也侧面说明了文章质量。

其他

ps:本着能用就行"堆屎山"代码，有bug描述清楚提issue，定期维护。

Automatically download and crop key information from the arxiv daily paper.

Related tags

Overview

Arxiv daily 速览

1 测试环境

2 使用演示

2.1 环境安装

2.2 Colab

3 效果展示

其他

Owner

HeoLis

学习强国自动化百分百正确、瞬间答题，分值45分

👨🏼‍⚖️ reddit bot that turns comment chains into ace attorney scenes

爬取各大SRC当日公告 | 通过微信通知的小工具 | 赏金工具

Instagram_scrapper - This project allow you to scrape the list of followers, following or both from a public Instagram account, and create a csv or excel file easily.

Divar.ir Ads scrapper

An utility library to scrape data from TikTok, Instagram, Twitch, Youtube, Twitter or Reddit in one line!

Examine.com supplement research scraper!

Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization.

CRI Scrape is a tool for get general info about Italian Red Cross in GAIA Platform

This tool crawls a list of websites and download all PDF and office documents

此脚本为 python 脚本,实现原理为利用 selenium 定位相关元素,再配合点击事件完成浏览器的自动化.

淘宝、天猫半价抢购，抢电视、抢茅台，干死黄牛党

Webservice wrapper for hhursev/recipe-scrapers (python library to scrape recipes from websites)

Web and PDF Scraper Refactoring

A simple Discord scraper for discord bots

A low-code tool that generates python crawler code based on curl or url

A spider for Universal Online Judge(UOJ) system, converting problem pages to PDFs.

This is a module that I had created along with my friend. It's a basic web scraping module

Scrape Twitter for Tweets

This is a webscraper for a specific website

Automatically download and crop key information from the arxiv daily paper.

Related tags

Overview

Arxiv daily 速览

1 测试环境

2 使用演示

2.1 环境安装

2.2 Colab

3 效果展示

其他

Owner

HeoLis

学习强国 自动化 百分百正确、瞬间答题，分值45分

👨🏼‍⚖️ reddit bot that turns comment chains into ace attorney scenes

爬取各大SRC当日公告 | 通过微信通知的小工具 | 赏金工具

Instagram_scrapper - This project allow you to scrape the list of followers, following or both from a public Instagram account, and create a csv or excel file easily.

Divar.ir Ads scrapper

An utility library to scrape data from TikTok, Instagram, Twitch, Youtube, Twitter or Reddit in one line!

Examine.com supplement research scraper!

Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization.

CRI Scrape is a tool for get general info about Italian Red Cross in GAIA Platform

This tool crawls a list of websites and download all PDF and office documents

此脚本为 python 脚本,实现原理为利用 selenium 定位相关元素,再配合点击事件完成浏览器的自动化.

淘宝、天猫半价抢购，抢电视、抢茅台，干死黄牛党

Webservice wrapper for hhursev/recipe-scrapers (python library to scrape recipes from websites)

Web and PDF Scraper Refactoring

A simple Discord scraper for discord bots

A low-code tool that generates python crawler code based on curl or url

A spider for Universal Online Judge(UOJ) system, converting problem pages to PDFs.

This is a module that I had created along with my friend. It's a basic web scraping module

Scrape Twitter for Tweets

This is a webscraper for a specific website

学习强国自动化百分百正确、瞬间答题，分值45分