Web Downloader With Python

Overview

Web Downloader

Introduction

This module will provide API to download the webpage components : html file, image file, css fil, javascript file, href link file based on the input url (the url must start with 'http' or 'https' ).

To prosses multiple URLs at the same time, The user can list all the url he wants to download in the file "urllist.txt" as shown below:

# Add the URL you want to download line by line(The url must start with 'http' or 'https' ):
# example: https://www.google.com
https://www.google.com
https://www.carousell.sg/
https://www.google.com/search?q=github&sxsrf=AOaemvJh3t5_h8H85AE8Ajbb1IMnBrRISA%3A1636698503535&source=hp&ei=hwmOYY6mHdGkqtsPq8S9sAY&iflsig=ALs-wAMAAAAAYY4Xl7GLWS16_xc2Q9XrG0p3q277DpkL&oq=&gs_lcp=Cgdnd3Mtd2l6EAEYADIHCCMQ6gIQJzIHCCMQ6gIQJzIHCCMQ6gIQJzIHCCMQ6gIQJzIHCCMQ6gIQJzIHCCMQ6gIQJzINCC4QxwEQowIQ6gIQJzIHCCMQ6gIQJzIHCCMQ6gIQJzIHCCMQ6gIQJ1AAWABgjgdoAXAAeACAAQCIAQCSAQCYAQCwAQo&sclient=gws-wiz
https://stackoverflow.com/questions/66022042/how-to-let-kubernetes-pod-run-a-local-script/66025424

Program Setup

Development Environment : python 3.7.4
Additional Lib/Software Need
  1. beautifulsoup4 4.10.0

    install:

    pip install beautifulsoup4
    

    Lib link: https://pypi.org/project/beautifulsoup4/

Hardware Needed : None
Program File List

version: v0.1

Program File Execution Env Description
webDownload.py python 3 Main executable program use the API.
urllist.txt url record list.

Program Usage

Module API Usage
  1. Downloader init:
soup = urlDownloader(imgFlg=True, linkFlg=True, scriptFlg=True)
  • imgFlg: Set to "True" to download all the "" tag files.
  • linkFlg: Set to "True" to download all the html section, image, icon, css file imported by ""
  • scriptFlg: set to "True" to download all the js file.
  1. Call API method savePage to scape url and save the data in a folder

    soup.savePage('
         
          ', '
          
           ')
    
    # Exampe:
    soup.savePage('https://www.google.com', 'www_google_com')
    
          
         
Program Execution
  1. Copy the url you want to check in the url record file "urllist.txt"

  2. Cd to the program folder and run program execution cmd:

    python webDownload.py
    
  3. Check the result:

    For example, if you copy the url "https://www.carousell.sg/" as the first url you want to check into the file "urllist.txt" file, all the html files, image file and js files will be under folder "1_www.carousell.sg_files"

    • The main web page will be saved as: "1_www.carousell.sg_files/1_www.carousell.sg.html"
    • The image used in the page will be saved in folder: "1_www.carousell.sg_files/img"
    • The html/imge/css import by href will be saved in folder: "1_www.carousell.sg_files/link"
    • The js file used by the page will be saved in fodler: "1_www.carousell.sg_files/script"

Problem and Solution

Problem[0]: Files download got slight different

Why there is a slight different between the files which download by using the program and the files which downlaod I use some-webBrowser's "page save as " for the same URL such as www.google.com

OS Platform : n.a

Error Message: n.a

Type: n.a

Solution:

This is normal situation, the logic of web scrape and browser display are different: if you type www.google.ccom if different people's browser, you can see the page shown on different browser are also different. This is because the browser cache, token in the local storage , cookie will make influence of the "GET" request. So when different people type in the google URL in their browser, they can see their own Gmail Icon shows on the right top corner. If you remove all the cache, token in the local storage , cookie of your browser and try "page save as ", the file downloaded by "page save as " should be same as the program.

Problem[2]: Some download Image are empty

OS Platform : n.a

Error Message: n.a

Type: n.a

Solution:

If a web use third party's storage to save the image and the net-storage need to authorization before download, our program download request will be reject and got 'null' when download the file. Then the saved image will be empty.


Last edit by LiuYuancheng([email protected]) at 13/11/2021

Command-line program to download videos from YouTube.com and other video sites

youtube-dl - download videos from youtube.com or other video platforms

youtube-dl 116.4k Jan 07, 2023
Python Program that downloads gaming required packages based on your Linux Distribution.

LibreGaming Python Program that downloads gaming required packages based on your Linux Distribution. Table of contents Distributions Prerequisites Dep

Ahmed Al Balochi 195 Jan 01, 2023
A program which takes an Anime name or URL and downloads the specified range of episodes.

super-anime-downloader A console application written in Python3.x (GUI will be added soon) which takes a Anime Name/URL as input and downloads the ran

Sayyid Ali Sajjad Rizavi 26 Jul 18, 2022
Downloads separate (specified) file to a randomly generated folder in /TEMP then executes it.

PyTemp-1 A Python3 file downloader. What you do with this code / project / idea is non of my buisness or concern, and this was made for **educational*

NightTab 1 Aug 03, 2022
The lyrics module of the repository apple-playlist-downloader

This is the lyrics module of the repository apple-playlist-downloader. With this code you can download the .lrc file (time synced lyrics) from yours t

Antoine Bollengier 6 Oct 07, 2022
Let's you download entire YT-playlists.

Youtube MP3 Playlist Downloader Let's you download entire youtube playlists as mp3 files. This application is basically a script that makes it easier

11 Dec 18, 2022
Download your Spotify playlists and songs along with album art and metadata

spotDL Download your Spotify playlists and songs along with album art and metadata The fastest, easiest, and most accurate command-line music download

10.6k Jan 03, 2023
apkizer is a mass downloader for android applications for all available versions.

apkizer apkizer collects all available versions of an Android application from apkpure.com Purpose Sometimes mobile applications can be useful to dig

Kamil Onur Özkaleli 41 Dec 16, 2022
Download all games from a public Itch.io Game Jam

Itch Jam Downloader Downloads all games from a public Itch.io Game Jam. What you'll need: Python 3.8+ pip install -r requirements.txt For site mirrori

Dragoon Aethis 19 Dec 07, 2022
Itchio Downloader Tool with python

Itchio Downloader Tool Install pip install git+https://github.com/emersont1/itchio Download All Games in library from account python -m itchio.downloa

Peter Taylor 69 Dec 05, 2022
Python library to download bulk of images from Bing.com

Python library to download bulk of images form Bing.com. This package uses async url, which makes it very fast while downloading.

Guru Prasad Singh 105 Dec 14, 2022
利用python3,爬取并下载91porn网站上面的视频

91porn_python 利用python3,爬取并下载91porn网站上面的视频 增加爬取t66y论坛图片的脚本 该脚本支持一下功能: 支持多线程 下载视频有进度条显示 支持从特定页的特定视频开始下载 将m3u8和mp4格式的视频下载到不同文件夹,加以分类 自动过滤已经下载过的视频

253 Feb 23, 2021
Youtube-downloader-using-Python - Youtube downloader using Python

Youtube-downloader-using-Python Hii guys !! Fancy to see here Welcome! built by

Lakshmi Deepak 2 Jun 09, 2022
A toolkit to automatically crawl the paper list and download paper pdfs of ACL Ahthology.

ACL-Anthology-Crawler A toolkit to automatically crawl the paper list and download paper pdfs of ACL Anthology

Ray GG 9 Oct 09, 2022
👻🟡 Download all Snapchat video & photo memories from a data export.

Snapchat "Memories" Fetcher In compliance with the California Consumer Privacy Act of 2018 (“CCPA”), businesses which collect and store user data must

Todd Birchard 18 Dec 26, 2022
ImageScraper is a cross-platform tool for downloading a specified count from xkcd, Astronomy Picture of the Day and Existential Comics

ImageScraper The ImageScraper is a cross-platform tool for downloading a specified count from xkcd, Astronomy Picture of the Day and Existential Comic

1amnobody 1 Jan 25, 2022
Easily download audio described movies and TV shows found on audiovault.net

AudioVault Downloader A convenient downloader for audio described movies and TV shows found on the Audio Vault. get latest binary release for Windows

Carter Temm 5 Feb 10, 2022
A tool to make easy to search for directories in the URL.

Welcome to Brutos Directory Scanner 🚀 The Brutos is a python script used to provide agility in obtaining verifications to informations about related

Sérgio Corrêa 4 Apr 14, 2022
YouTube Video publisher using youtube-dl & ROS2🐢

YouTube-publisher-ROS2 Publish sensor_msgs/Image by "YouTube" 🤗 🤗 🤗 ! You don't have to use webcamera or your video to check demos. Purpose Quick d

Ar-Ray 5 Dec 04, 2022
Music and video downloader, Made with love by Bryan Herrera

Python-Mp3Mp4-Downloader Music and video downloader, Made with love by Bryan Herrera Requirements CHOCOLATELY windows command If your system does not

ርᚱ1ናተᛰ ᚻህᚥተპᚱ 104 Dec 27, 2022