Downloader Middleware to support Playwright in Scrapy & Gerapy

Overview

Gerapy Playwright

This is a package for supporting Playwright in Scrapy, also this package is a module in Gerapy.

Installation

pip3 install gerapy-playwright

Usage

You can use PlaywrightRequest to specify a request which uses playwright to render.

For example:

yield PlaywrightRequest(detail_url, callback=self.parse_detail)

And you also need to enable PlaywrightMiddleware in DOWNLOADER_MIDDLEWARES:

DOWNLOADER_MIDDLEWARES = {
    'gerapy_playwright.downloadermiddlewares.PlaywrightMiddleware': 543,
}

Congratulate, you've finished the all of the required configuration.

If you run the Spider again, Playwright will be started to render every web page which you configured the request as PlaywrightRequest.

Settings

GerapyPlaywright provides some optional settings.

Concurrency

You can directly use Scrapy's setting to set Concurrency of Playwright, for example:

CONCURRENT_REQUESTS = 3

Pretend as Real Browser

Some website will detect WebDriver or Headless, GerapyPlaywright can pretend Chromium by inject scripts. This is enabled by default.

You can close it if website does not detect WebDriver to speed up:

GERAPY_PLAYWRIGHT_PRETEND = False

Also you can use pretend attribute in PlaywrightRequest to overwrite this configuration.

Logging Level

By default, Playwright will log all the debug messages, so GerapyPlaywright configured the logging level of Playwright to WARNING.

If you want to see more logs from Playwright, you can change the this setting:

import logging
GERAPY_PLAYWRIGHT_LOGGING_LEVEL = logging.DEBUG

Download Timeout

Playwright may take some time to render the required web page, you can also change this setting, default is 30s:

# playwright timeout
GERAPY_PLAYWRIGHT_DOWNLOAD_TIMEOUT = 30

Headless

By default, Playwright is running in Headless mode, you can also change it to False as you need, default is True:

GERAPY_PLAYWRIGHT_HEADLESS = False

Window Size

You can also set the width and height of Playwright window:

GERAPY_PLAYWRIGHT_WINDOW_WIDTH = 1400
GERAPY_PLAYWRIGHT_WINDOW_HEIGHT = 700

Default is 1400, 700.

Proxy

You can set a proxy channel via below this config:

GERAPY_PLAYWRIGHT_PROXY = 'http://tps254.kdlapi.com:15818'
GERAPY_PLAYWRIGHT_PROXY_CREDENTIAL = {
  'username': 'xxx',
  'password': 'xxxx'
}

Screenshot

You can get screenshot of loaded page, you can pass screenshot args to PlaywrightRequest as dict:

Below are the supported args:

  • type (str): Specify screenshot type, can be either jpeg or png. Defaults to png.
  • quality (int): The quality of the image, between 0-100. Not applicable to png image.
  • full_page (bool): When true, take a screenshot of the full scrollable page. Defaults to False.
  • clip (dict): An object which specifies clipping region of the page. This option should have the following fields:
    • x (int): x-coordinate of top-left corner of clip area.
    • y (int): y-coordinate of top-left corner of clip area.
    • width (int): width of clipping area.
    • height (int): height of clipping area.
  • omit_background (bool): Hide default white background and allow capturing screenshot with transparency.
  • timeout (str): Maximum time in milliseconds, defaults to 30 seconds, pass 0 to disable timeout.

Check more from https://playwright.dev/python/docs/api/class-page#page-screenshot

For example:

yield PlaywrightRequest(start_url, callback=self.parse_index, wait_for='.item .name', screenshot={
            'type': 'png',
            'full_page': True
        })

then you can get screenshot result in response.meta['screenshot']:

Simplest save it to file:

def parse_index(self, response):
    with open('screenshot.png', 'wb') as f:
        f.write(response.meta['screenshot'].getbuffer())

If you want to enable screenshot for all requests, you can configure it by GERAPY_PLAYWRIGHT_SCREENSHOT.

For example:

GERAPY_PLAYWRIGHT_SCREENSHOT = {
    'type': 'png',
    'full_page': True
}

PlaywrightRequest

PlaywrightRequest provide args which can override global settings above.

  • url: request url
  • callback: callback
  • wait_until: one of "load", "domcontentloaded", "networkidle" see https://playwright.dev/python/docs/api/class-page#page-wait-for-load-state, default is domcontentloaded
  • wait_for: wait for some element to load, also supports dict
  • script: script to execute
  • actions: actions defined for execution of Page object
  • proxy: use proxy for this time, like http://x.x.x.x:x
  • proxy_credential: the proxy credential, like {'username': 'xxxx', 'password': 'xxxx'}
  • sleep: time to sleep after loaded, override GERAPY_PLAYWRIGHT_SLEEP
  • timeout: load timeout, override GERAPY_PLAYWRIGHT_DOWNLOAD_TIMEOUT
  • ignore_resource_types: ignored resource types, override GERAPY_PLAYWRIGHT_IGNORE_RESOURCE_TYPES
  • pretend: pretend as normal browser, override GERAPY_PLAYWRIGHT_PRETEND
  • screenshot: ignored resource types, see https://playwright.dev/python/docs/api/class-page#page-screenshot, override GERAPY_PLAYWRIGHT_SCREENSHOT

For example, you can configure PlaywrightRequest as:

from gerapy_playwright import PlaywrightRequest

def parse(self, response):
    yield PlaywrightRequest(url,
        callback=self.parse_detail,
        wait_until='domcontentloaded',
        wait_for='title',
        script='() => { return {name: "Germey"} }',
        sleep=2)

Then Playwright will:

  • wait for document to load
  • wait for title to load
  • execute console.log(document) script
  • sleep for 2s
  • return the rendered web page content, get from response.meta['screenshot']
  • return the script executed result, get from response.meta['script_result']

For waiting mechanism controlled by JavaScript, you can use await in script, for example:

js = '''async () => {
    await new Promise(resolve => setTimeout(resolve, 10000));
    return {
        'name': 'Germey'
    }
}
'''
yield PlaywrightRequest(url, callback=self.parse, script=js)

Then you can get the script result from response.meta['script_result'], result is {'name': 'Germey'}.

If you think the JavaScript is wired to write, you can use actions argument to define a function to execute Python based functions, for example:

async def execute_actions(page):
    await page.evaluate('() => { document.title = "Hello World"; }')
    return 1
yield PlaywrightRequest(url, callback=self.parse, actions=execute_actions)

Then you can get the actions result from response.meta['actions_result'], result is 1.

Also you can define proxy and proxy_credential for each Reqest, for example:

yield PlaywrightRequest(
  self.base_url,
  callback=self.parse_index,
  priority=10,
  proxy='http://tps254.kdlapi.com:15818',
  proxy_credential={
      'username': 'xxxx',
      'password': 'xxxx'
})

proxy and proxy_credential will override the settings GERAPY_PLAYWRIGHT_PROXY and GERAPY_PLAYWRIGHT_PROXY_CREDENTIAL.

Example

For more detail, please see example.

Also you can directly run with Docker:

docker run germey/gerapy-playwright-example

Outputs:

2021-12-27 16:54:14 [scrapy.utils.log] INFO: Scrapy 2.2.0 started (bot: example)
2021-12-27 16:54:14 [scrapy.utils.log] INFO: Versions: lxml 4.7.1.0, libxml2 2.9.12, cssselect 1.1.0, parsel 1.6.0, w3lib 1.22.0, Twisted 21.7.0, Python 3.7.9 (default, Aug 31 2020, 07:22:35) - [Clang 10.0.0 ], pyOpenSSL 21.0.0 (OpenSSL 1.1.1l  24 Aug 2021), cryptography 35.0.0, Platform Darwin-21.1.0-x86_64-i386-64bit
2021-12-27 16:54:14 [scrapy.utils.log] DEBUG: Using reactor: twisted.internet.asyncioreactor.AsyncioSelectorReactor
2021-12-27 16:54:14 [scrapy.crawler] INFO: Overridden settings:
{'BOT_NAME': 'example',
 'CONCURRENT_REQUESTS': 1,
 'NEWSPIDER_MODULE': 'example.spiders',
 'RETRY_HTTP_CODES': [403, 500, 502, 503, 504],
 'SPIDER_MODULES': ['example.spiders']}
2021-12-27 16:54:14 [scrapy.extensions.telnet] INFO: Telnet Password: e931b241390ad06a
2021-12-27 16:54:14 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
 'scrapy.extensions.telnet.TelnetConsole',
 'scrapy.extensions.memusage.MemoryUsage',
 'scrapy.extensions.logstats.LogStats']
2021-12-27 16:54:14 [gerapy.playwright] INFO: playwright libraries already installed
2021-12-27 16:54:14 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
 'gerapy_playwright.downloadermiddlewares.PlaywrightMiddleware',
 'scrapy.downloadermiddlewares.retry.RetryMiddleware',
 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
 'scrapy.downloadermiddlewares.stats.DownloaderStats']
2021-12-27 16:54:14 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
 'scrapy.spidermiddlewares.referer.RefererMiddleware',
 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
 'scrapy.spidermiddlewares.depth.DepthMiddleware']
2021-12-27 16:54:14 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2021-12-27 16:54:14 [scrapy.core.engine] INFO: Spider opened
2021-12-27 16:54:14 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2021-12-27 16:54:14 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2021-12-27 16:54:14 [example.spiders.movie] DEBUG: start url https://antispider1.scrape.center/page/1
2021-12-27 16:54:14 [gerapy.playwright] DEBUG: processing request <GET https://antispider1.scrape.center/page/1>
2021-12-27 16:54:14 [gerapy.playwright] DEBUG: playwright_meta {'wait_until': 'domcontentloaded', 'wait_for': '.item', 'script': None, 'actions': None, 'sleep': None, 'proxy': None, 'proxy_credential': None, 'pretend': None, 'timeout': None, 'screenshot': None}
2021-12-27 16:54:14 [gerapy.playwright] DEBUG: set options {'headless': False}
cookies []
2021-12-27 16:54:16 [gerapy.playwright] DEBUG: PRETEND_SCRIPTS is run
2021-12-27 16:54:16 [gerapy.playwright] DEBUG: timeout 10
2021-12-27 16:54:16 [gerapy.playwright] DEBUG: crawling https://antispider1.scrape.center/page/1
2021-12-27 16:54:16 [gerapy.playwright] DEBUG: request https://antispider1.scrape.center/page/1 with options {'url': 'https://antispider1.scrape.center/page/1', 'wait_until': 'domcontentloaded'}
2021-12-27 16:54:18 [gerapy.playwright] DEBUG: waiting for .item
2021-12-27 16:54:18 [gerapy.playwright] DEBUG: sleep for 1s
2021-12-27 16:54:19 [gerapy.playwright] DEBUG: taking screenshot using args {'type': 'png', 'full_page': True}
2021-12-27 16:54:19 [gerapy.playwright] DEBUG: close playwright
2021-12-27 16:54:20 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://antispider1.scrape.center/page/1> (referer: None)
2021-12-27 16:54:20 [example.spiders.movie] DEBUG: start url https://antispider1.scrape.center/page/2
2021-12-27 16:54:20 [gerapy.playwright] DEBUG: processing request <GET https://antispider1.scrape.center/page/2>
2021-12-27 16:54:20 [gerapy.playwright] DEBUG: playwright_meta {'wait_until': 'domcontentloaded', 'wait_for': '.item', 'script': None, 'actions': None, 'sleep': None, 'proxy': None, 'proxy_credential': None, 'pretend': None, 'timeout': None, 'screenshot': None}
2021-12-27 16:54:20 [gerapy.playwright] DEBUG: set options {'headless': False}
2021-12-27 16:54:20 [example.spiders.movie] INFO: detail url https://antispider1.scrape.center/detail/1
2021-12-27 16:54:20 [example.spiders.movie] INFO: detail url https://antispider1.scrape.center/detail/2
2021-12-27 16:54:20 [example.spiders.movie] INFO: detail url https://antispider1.scrape.center/detail/3
2021-12-27 16:54:20 [example.spiders.movie] INFO: detail url https://antispider1.scrape.center/detail/4
2021-12-27 16:54:20 [example.spiders.movie] INFO: detail url https://antispider1.scrape.center/detail/5
2021-12-27 16:54:20 [example.spiders.movie] INFO: detail url https://antispider1.scrape.center/detail/6
2021-12-27 16:54:20 [example.spiders.movie] INFO: detail url https://antispider1.scrape.center/detail/7
2021-12-27 16:54:20 [example.spiders.movie] INFO: detail url https://antispider1.scrape.center/detail/8
2021-12-27 16:54:20 [example.spiders.movie] INFO: detail url https://antispider1.scrape.center/detail/9
2021-12-27 16:54:20 [example.spiders.movie] INFO: detail url https://antispider1.scrape.center/detail/10
cookies []
2021-12-27 16:54:21 [gerapy.playwright] DEBUG: PRETEND_SCRIPTS is run
2021-12-27 16:54:21 [gerapy.playwright] DEBUG: timeout 10
2021-12-27 16:54:21 [gerapy.playwright] DEBUG: crawling https://antispider1.scrape.center/page/2
2021-12-27 16:54:21 [gerapy.playwright] DEBUG: request https://antispider1.scrape.center/page/2 with options {'url': 'https://antispider1.scrape.center/page/2', 'wait_until': 'domcontentloaded'}
2021-12-27 16:54:23 [gerapy.playwright] DEBUG: waiting for .item
2021-12-27 16:54:24 [gerapy.playwright] DEBUG: sleep for 1s
2021-12-27 16:54:25 [gerapy.playwright] DEBUG: taking screenshot using args {'type': 'png', 'full_page': True}
2021-12-27 16:54:25 [gerapy.playwright] DEBUG: close playwright
2021-12-27 16:54:25 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://antispider1.scrape.center/page/2> (referer: None)
2021-12-27 16:54:25 [gerapy.playwright] DEBUG: processing request <GET https://antispider1.scrape.center/detail/10>
2021-12-27 16:54:25 [gerapy.playwright] DEBUG: playwright_meta {'wait_until': 'domcontentloaded', 'wait_for': '.item', 'script': None, 'actions': None, 'sleep': None, 'proxy': None, 'proxy_credential': None, 'pretend': None, 'timeout': None, 'screenshot': None}
2021-12-27 16:54:25 [gerapy.playwright] DEBUG: set options {'headless': False}
...
Comments
  • twisted.internet.error.ReactorAlreadyInstalledError: reactor already installed

    twisted.internet.error.ReactorAlreadyInstalledError: reactor already installed

    python: 3.9 GerapyPlaywright: 0.2.4 os: mac 11.6

    运行scrapy crawl spider的时候直接报错:

    Traceback (most recent call last):
      File "/Users/zz/.virtualenvs/crawler-apk/bin/scrapy", line 8, in <module>
        sys.exit(execute())
      File "/Users/zz/.virtualenvs/crawler-apk/lib/python3.9/site-packages/scrapy/cmdline.py", line 145, in execute
        _run_print_help(parser, _run_command, cmd, args, opts)
      File "/Users/zz/.virtualenvs/crawler-apk/lib/python3.9/site-packages/scrapy/cmdline.py", line 100, in _run_print_help
        func(*a, **kw)
      File "/Users/zz/.virtualenvs/crawler-apk/lib/python3.9/site-packages/scrapy/cmdline.py", line 153, in _run_command
        cmd.run(args, opts)
      File "/Users/zz/.virtualenvs/crawler-apk/lib/python3.9/site-packages/scrapy/commands/crawl.py", line 22, in run
        crawl_defer = self.crawler_process.crawl(spname, **opts.spargs)
      File "/Users/zz/.virtualenvs/crawler-apk/lib/python3.9/site-packages/scrapy/crawler.py", line 205, in crawl
        crawler = self.create_crawler(crawler_or_spidercls)
      File "/Users/zz/.virtualenvs/crawler-apk/lib/python3.9/site-packages/scrapy/crawler.py", line 238, in create_crawler
        return self._create_crawler(crawler_or_spidercls)
      File "/Users/zz/.virtualenvs/crawler-apk/lib/python3.9/site-packages/scrapy/crawler.py", line 313, in _create_crawler
        return Crawler(spidercls, self.settings, init_reactor=True)
      File "/Users/zz/.virtualenvs/crawler-apk/lib/python3.9/site-packages/scrapy/crawler.py", line 82, in __init__
        default.install()
      File "/Users/zz/.virtualenvs/crawler-apk/lib/python3.9/site-packages/twisted/internet/selectreactor.py", line 194, in install
        installReactor(reactor)
      File "/Users/zz/.virtualenvs/crawler-apk/lib/python3.9/site-packages/twisted/internet/main.py", line 32, in installReactor
        raise error.ReactorAlreadyInstalledError("reactor already installed"
    twisted.internet.error.ReactorAlreadyInstalledError: reactor already installed
    

    这个应该改哪里呢

    opened by yyyy777 3
  • Fix leaking file descriptors by using the context manager

    Fix leaking file descriptors by using the context manager

    I was running into OSError: [Errno 24] Too many open files while using this with scrapy for scraping a domain.

    By using async_playwright() as a context manager, we ensure it's closed once finished. This fixes the issue.

    opened by xolan 2
  •     raise BadGzipFile('Not a gzipped file (%r)' % magic) gzip.BadGzipFile: Not a gzipped file (b'<!')

    raise BadGzipFile('Not a gzipped file (%r)' % magic) gzip.BadGzipFile: Not a gzipped file (b'

    崔佬,我这边也不能用。一启动scrapy,就会报这个。 配置: python:3.9.4 macOs: 11.5.2

    Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/twisted/internet/defer.py", line 1445, in _inlineCallbacks result = current_context.run(g.send, result) File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/scrapy/core/downloader/middleware.py", line 54, in process_response response = yield deferred_from_coro(method(request=request, response=response, spider=spider)) File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/scrapy/downloadermiddlewares/httpcompression.py", line 62, in process_response decoded_body = self._decode(response.body, encoding.lower()) File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/scrapy/downloadermiddlewares/httpcompression.py", line 82, in _decode body = gunzip(body) File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/scrapy/utils/gz.py", line 27, in gunzip chunk = f.read1(8196) File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/gzip.py", line 313, in read1 return self._buffer.read1(size) File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/_compression.py", line 68, in readinto data = self.read(len(byte_view)) File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/gzip.py", line 487, in read if not self._read_gzip_header(): File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/gzip.py", line 435, in _read_gzip_header raise BadGzipFile('Not a gzipped file (%r)' % magic) gzip.BadGzipFile: Not a gzipped file (b'<!')

    opened by wf4867612 2
  • Method is not Json serializable for actions

    Method is not Json serializable for actions

    Hello,

    I am running into an issue with 'yield' for gerapy-playwright when I need to access a login page. I try to run: yield PlaywrightRequest(login_page_url, self.parse_login, actions = self.login_action) in order to first use playwright to login and then access data that can only be accessed when logging in with self.parse_login. I am getting a: builtins.TypeError: is not JSON serializable.

    I am using scrapy cluster along with gerapy-playwright in order to run a scheduler for all the spiders that I have: https://github.com/istresearch/scrapy-cluster

    It seems that the action is saved in the meta data as a method and cannot be passed to the scheduler. Is it possible to type cast the action as a string and then when the action is called later, to do a method call on the string? If I understand correctly, the action is produced on line 339 of the downloadermiddlewares.py inside of gerepy-playwright. Would it be possible to evaluate the string as a method so that the scrapy-cluster scheduler can pass the string but gerapy-playwright still calls the self.login_action method prior to the self.parse_login?

    opened by BenzTivianne 0
  • playwright._impl._api_types.Error: Browser closed.

    playwright._impl._api_types.Error: Browser closed.

    这种报错会是什么原因呢...

    2022-03-23 06:21:06 [scrapy.core.scraper] ERROR: Error downloading <GET https://apkpure.com/bikers-men-women-bike-photo-editor-future-trends/com.dsrtech.bikers> Traceback (most recent call last): File "/usr/local/lib/python3.8/dist-packages/twisted/internet/defer.py", line 1656, in _inlineCallbacks result = current_context.run( File "/usr/local/lib/python3.8/dist-packages/twisted/python/failure.py", line 489, in throwExceptionIntoGenerator return g.throw(self.type, self.value, self.tb) File "/usr/local/lib/python3.8/dist-packages/scrapy/core/downloader/middleware.py", line 41, in process_request response = yield deferred_from_coro(method(request=request, spider=spider)) File "/usr/local/lib/python3.8/dist-packages/twisted/internet/defer.py", line 1030, in adapt extracted = result.result() File "/usr/local/lib/python3.8/dist-packages/gerapy_playwright/downloadermiddlewares.py", line 243, in _process_request context = await browser.new_context( File "/usr/local/lib/python3.8/dist-packages/playwright/async_api/_generated.py", line 11254, in new_context await self._async( File "/usr/local/lib/python3.8/dist-packages/playwright/_impl/_browser.py", line 117, in new_context channel = await self._channel.send("newContext", params) File "/usr/local/lib/python3.8/dist-packages/playwright/_impl/_connection.py", line 39, in send return await self.inner_send(method, params, False) File "/usr/local/lib/python3.8/dist-packages/playwright/_impl/_connection.py", line 63, in inner_send result = next(iter(done)).result() playwright._impl._api_types.Error: Browser closed. ==================== Browser output: ==================== /ms-playwright/chromium-978106/chrome-linux/chrome --disable-background-networking --enable-features=NetworkService,NetworkServiceInProcess --disable-background-timer-throttling --disable-backgrounding-occluded-windows --disable-breakpad --disable-client-side-phishing-detection --disable-component-extensions-with-background-pages --disable-default-apps --disable-dev-shm-usage --disable-extensions --disable-features=ImprovedCookieControls,LazyFrameLoading,GlobalMediaControls,DestroyProfileOnBrowserClose,MediaRouter,AcceptCHFrame,AutoExpandDetailsElement --allow-pre-commit-input --disable-hang-monitor --disable-ipc-flooding-protection --disable-popup-blocking --disable-prompt-on-repost --disable-renderer-backgrounding --disable-sync --force-color-profile=srgb --metrics-recording-only --no-first-run --enable-automation --password-store=basic --use-mock-keychain --no-service-autorun --export-tagged-pdf --headless --hide-scrollbars --mute-audio --blink-settings=primaryHoverType=2,availableHoverTypes=2,primaryPointerType=4,availablePointerTypes=4 --no-sandbox --disable-extensions --hide-scrollbars --mute-audio --no-sandbox --disable-setuid-sandbox --disable-gpu --user-data-dir=/tmp/playwright_chromiumdev_profile-LGppgb --remote-debugging-pipe --no-startup-window pid=1185 [pid=1185][err] [0323/062041.773971:ERROR:platform_thread_posix.cc(151)] pthread_create: Resource temporarily unavailable (11) [pid=1185][err] [0323/062041.774268:ERROR:platform_thread_posix.cc(151)] pthread_create: Resource temporarily unavailable (11) [pid=1185][err] [0323/062041.778128:ERROR:zygote_communication_linux.cc(142)] Did not receive ping from zygote child [pid=1185][err] [0323/062041.778090:ERROR:zygote_linux.cc(607)] Zygote could not fork: process_type gpu-process numfds 3 child_pid -1 [pid=1185][err] [0323/062041.778548:ERROR:gpu_process_host.cc(968)] GPU process launch failed: error_code=1002 [pid=1185][err] [0323/062041.778568:WARNING:gpu_process_host.cc(1279)] The GPU process has crashed 1 time(s) [pid=1185][err] [0323/062041.778540:ERROR:zygote_linux.cc(271)] Unexpected real PID message from browser [pid=1185][err] [0323/062041.780496:ERROR:zygote_communication_linux.cc(142)] Did not receive ping from zygote child [pid=1185][err] [0323/062041.785120:ERROR:zygote_linux.cc(607)] Zygote could not fork: process_type gpu-process numfds 3 child_pid -1 [pid=1185][err] [0323/062041.785947:ERROR:gpu_process_host.cc(968)] GPU process launch failed: error_code=1002 [pid=1185][err] [0323/062041.785963:WARNING:gpu_process_host.cc(1279)] The GPU process has crashed 2 time(s) [pid=1185][err] [0323/062041.786835:ERROR:bus.cc(397)] Failed to connect to the bus: Failed to connect to socket /var/run/dbus/system_bus_socket: No such file or directory [pid=1185][err] [0323/062041.786892:ERROR:bus.cc(397)] Failed to connect to the bus: Failed to connect to socket /var/run/dbus/system_bus_socket: No such file or directory [pid=1185][err] [0323/062041.787157:WARNING:bluez_dbus_manager.cc(248)] Floss manager not present, cannot set Floss enable/disable. [pid=1185][err] [0323/062041.787335:ERROR:zygote_linux.cc(271)] Unexpected real PID message from browser [pid=1185][err] [0323/062041.816287:ERROR:zygote_communication_linux.cc(142)] Did not receive ping from zygote child [pid=1185][err] [0323/062041.815965:ERROR:zygote_linux.cc(607)] Zygote could not fork: process_type gpu-process numfds 3 child_pid -1 [pid=1185][err] [0323/062041.816903:ERROR:gpu_process_host.cc(968)] GPU process launch failed: error_code=1002 [pid=1185][err] [0323/062041.816914:WARNING:gpu_process_host.cc(1279)] The GPU process has crashed 3 time(s) [pid=1185][err] [0323/062041.816721:ERROR:zygote_linux.cc(271)] Unexpected real PID message from browser [pid=1185][err] [0323/062041.821091:ERROR:zygote_communication_linux.cc(142)] Did not receive ping from zygote child [pid=1185][err] [0323/062041.821123:ERROR:zygote_linux.cc(607)] Zygote could not fork: process_type gpu-process numfds 3 child_pid -1 [pid=1185][err] [0323/062041.821310:ERROR:gpu_process_host.cc(968)] GPU process launch failed: error_code=1002 [pid=1185][err] [0323/062041.821321:WARNING:gpu_process_host.cc(1279)] The GPU process has crashed 4 time(s) [pid=1185][err] [0323/062041.822089:ERROR:zygote_linux.cc(271)] Unexpected real PID message from browser [pid=1185][err] [0323/062041.823058:ERROR:zygote_communication_linux.cc(142)] Did not receive ping from zygote child [pid=1185][err] [0323/062041.823172:ERROR:zygote_linux.cc(607)] Zygote could not fork: process_type gpu-process numfds 3 child_pid -1 [pid=1185][err] [0323/062041.823358:ERROR:gpu_process_host.cc(968)] GPU process launch failed: error_code=1002 [pid=1185][err] [0323/062041.823369:WARNING:gpu_process_host.cc(1279)] The GPU process has crashed 5 time(s) [pid=1185][err] [0323/062041.824213:ERROR:zygote_linux.cc(271)] Unexpected real PID message from browser [pid=1185][err] [0323/062041.825010:ERROR:zygote_communication_linux.cc(142)] Did not receive ping from zygote child [pid=1185][err] [0323/062041.825129:ERROR:zygote_linux.cc(607)] Zygote could not fork: process_type gpu-process numfds 3 child_pid -1 [pid=1185][err] [0323/062041.825312:ERROR:gpu_process_host.cc(968)] GPU process launch failed: error_code=1002 [pid=1185][err] [0323/062041.825323:WARNING:gpu_process_host.cc(1279)] The GPU process has crashed 6 time(s) [pid=1185][err] [0323/062041.825608:ERROR:zygote_linux.cc(271)] Unexpected real PID message from browser [pid=1185][err] [0323/062041.825332:FATAL:gpu_data_manager_impl_private.cc(447)] GPU process isn't usable. Goodbye. [pid=1185][err] #0 0x55f9da32a369 base::debug::CollectStackTrace() [pid=1185][err] #1 0x55f9da2908c3 base::debug::StackTrace::StackTrace() [pid=1185][err] #2 0x55f9da2a3650 logging::LogMessage::~LogMessage() [pid=1185][err] #3 0x55f9d7e92bf7 content::(anonymous namespace)::IntentionallyCrashBrowserForUnusableGpuProcess() [pid=1185][err] #4 0x55f9d7e903fe content::GpuDataManagerImplPrivate::FallBackToNextGpuMode() [pid=1185][err] #5 0x55f9d7e8f303 content::GpuDataManagerImpl::FallBackToNextGpuMode() [pid=1185][err] #6 0x55f9d7e99d13 content::GpuProcessHost::RecordProcessCrash() [pid=1185][err] #7 0x55f9d7e9af44 content::GpuProcessHost::OnProcessLaunchFailed() [pid=1185][err] #8 0x55f9d7d15421 content::BrowserChildProcessHostImpl::OnProcessLaunchFailed() [pid=1185][err] #9 0x55f9d7d6faf5 content::internal::ChildProcessLauncherHelper::PostLaunchOnClientThread() [pid=1185][err] #10 0x55f9d7d6fd15 base::internal::Invoker<>::RunOnce() [pid=1185][err] #11 0x55f9da2e8bb0 base::TaskAnnotator::RunTaskImpl() [pid=1185][err] #12 0x55f9da2fca99 base::sequence_manager::internal::ThreadControllerWithMessagePumpImpl::DoWorkImpl() [pid=1185][err] #13 0x55f9da2fc7bc base::sequence_manager::internal::ThreadControllerWithMessagePumpImpl::DoWork() [pid=1185][err] #14 0x55f9da2fcf92 base::sequence_manager::internal::ThreadControllerWithMessagePumpImpl::DoWork() [pid=1185][err] #15 0x55f9da2ac06b base::(anonymous namespace)::WorkSourceDispatch() [pid=1185][err] #16 0x7fe589c2b17d g_main_context_dispatch [pid=1185][err] #17 0x7fe589c2b400 (/usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.6400.6+0x523ff) [pid=1185][err] #18 0x7fe589c2b4a3 g_main_context_iteration [pid=1185][err] #19 0x55f9da2abeb3 base::MessagePumpGlib::Run() [pid=1185][err] #20 0x55f9da2fd1fe base::sequence_manager::internal::ThreadControllerWithMessagePumpImpl::Run() [pid=1185][err] #21 0x55f9da2ca3ed base::RunLoop::Run() [pid=1185][err] #22 0x55f9d7d2d2ad content::BrowserMainLoop::RunMainMessageLoop() [pid=1185][err] #23 0x55f9d7d2eb62 content::BrowserMainRunnerImpl::Run() [pid=1185][err] #24 0x55f9df75683e headless::HeadlessContentMainDelegate::RunProcess() [pid=1185][err] #25 0x55f9d9e42862 content::RunBrowserProcessMain() [pid=1185][err] #26 0x55f9d9e43d0f content::ContentMainRunnerImpl::RunBrowser() [pid=1185][err] #27 0x55f9d9e4389f content::ContentMainRunnerImpl::Run() [pid=1185][err] #28 0x55f9d9e40cb4 content::RunContentProcess() [pid=1185][err] #29 0x55f9d9e415ce content::ContentMain() [pid=1185][err] #30 0x55f9d9e9cc5a headless::(anonymous namespace)::RunContentMain() [pid=1185][err] #31 0x55f9d9e9c965 headless::HeadlessShellMain() [pid=1185][err] #32 0x55f9d6961fa8 ChromeMain [pid=1185][err] #33 0x7fe588ea60b3 __libc_start_main [pid=1185][err] #34 0x55f9d6961dea _start [pid=1185][err] Task trace: [pid=1185][err] #0 0x55f9d7d6f9ac content::internal::ChildProcessLauncherHelper::PostLaunchOnLauncherThread() [pid=1185][err] #1 0x55f9d7d6f3aa content::internal::ChildProcessLauncherHelper::StartLaunchOnClientThread() [pid=1185][err] #2 0x55f9da682456 mojo::SimpleWatcher::Context::Notify() [pid=1185][err] #3 0x55f9d7d6f3aa content::internal::ChildProcessLauncherHelper::StartLaunchOnClientThread() [pid=1185][err] #4 0x55f9da682456 mojo::SimpleWatcher::Context::Notify() [pid=1185][err] Task trace buffer limit hit, update PendingTask::kTaskBacktraceLength to increase. [pid=1185][err]

    opened by yyyy777 0
Releases(v0.2.3)
  • v0.2.3(Jan 11, 2022)

  • v0.2.0(Dec 28, 2021)

    • New Feature: Add support for:
      • Specifying channel for launching
      • Specifying executablePath for launching
      • Specifying slowMo for launching
      • Specifying devtools for launching
      • Specifying --disable-extensions in args for launching
      • Specifying --hide-scrollbars in args for launching
      • Specifying --no-sandbox in args for launching
      • Specifying --disable-setuid-sandbox in args for launching
      • Specifying --disable-gpu in args for launching
    • Update: change GERAPY_PLAYWRIGHT_SLEEP default to 0
    Source code(tar.gz)
    Source code(zip)
  • v0.1.2(Dec 28, 2021)

  • v0.1.1(Dec 27, 2021)

    First version of Playwright, add basic support for:

    • Auto Installation
    • Render with Playwright
    • Setting Concurrency
    • Setting Proxy
    • Setting Cookies
    • Screenshot
    • Evaluating Script
    • Wait for Elements
    • Wait loading control
    • Setting Timeout
    • Pretending Webdriver
    Source code(tar.gz)
    Source code(zip)
Owner
Gerapy
Distributed Crawler Management Framework Based on Scrapy, Scrapyd.
Gerapy
Youtube video downloader and info extractor for python.

tube_dl Tube_dl is a Simple Youtube video downloader for Python. A Modular approach to bypass and download Youtube Videos and Playlist from Youtube us

Shekhar Chander 16 Jul 09, 2022
Download songs and playlists from Spotify for free!

spotify-to-mp3-converter You can basically understand the process with just this image but for clarity, these are the steps. Before using the exe down

2 Jan 25, 2022
Python script to download (TCR) genes from IMGT/GENE-DB

IMGTgeneDL 0.1.0 Jamie Heather | CCR @ MGH | 2021 This script provides an alternative way to access TCR and IG genes stored in IMGT/GENE-DB. It's prim

Jamie Heather 1 Mar 30, 2022
A manga download script written in python.

manga-dlp python script to download mangas Description A manga download script written in python. It only supports mangadex.org for now. But support f

Ivan Schaller 15 Nov 28, 2022
ASF Sentinel-1 Metadata Download tool

ASF Sentinel-1 Metadata Download tool Copyright: 2021-2022 Antonio Valentino Small Python tool (asfsmd) that allows to download XML files containing S

Antonio Valentino 9 Dec 04, 2022
PyDownloader - Downloads files and folders at high speed (based on your interent speed).

PyDownloader - Downloads files and folders at high speed (based on your interent speed).

Armen._.G 4 Feb 24, 2022
A program that can download animations from myself website

MYD A program that can download animations from myself website 一個可以用來下載Myself網站上動漫的程式 Quick Start [無GUI版本] 確定電腦內包含 ffmpeg 並設為環境變數 (Environment Variabl

Patrick_star 1 Nov 07, 2021
Convert BMS songs to osu! With options to convert keysounds and convert to 7key.

bmx2osu Convert BMS to osu! With options to: convert keysounds to one song file using BMX2WAV include 7k version change Overall Difficulty and HP Drai

7 Nov 28, 2022
A simple kemono.party downloader using python.

kemono-dl This is a simple kemono.party downloader. How to use Install python Download source code from releases and extract it Then install requireme

318 Dec 27, 2022
Ripurei is a free-to-use osu! replay downloader, that can be configured to download from any osu! server.

Ripurei Ripurei is a fully functional osu! replay downloader, fully capable of downloading from almost any osu! server. Functionality Timeline ✔️ Able

Thomas 0 Feb 11, 2022
Bulk Downloader for Reddit

saveddit is a bulk media downloader for reddit pip3 install saveddit Setting up authorization Register an application with Reddit Write down your clie

Pranav 136 Jan 03, 2023
Noto fonts go universal! Download Noto fonts combined to suit your region

noto-cjk Noto CJK fonts Noto Serif CJK update was released on 25 October 2021. We moved the release history and other notes into both Sans and Serif s

Google Fonts 2k Jan 02, 2023
Python-Youtube-Downloader - An Open Source Python Youtube Downloader

Python-Youtube-Downloader Hello There This Is An Open Source Python Youtube Down

Flex Tools 3 Jun 14, 2022
A Quick demo of how to use the youtube_dl module in python.

youtube_dl python module demo A Quick demo of how to use the youtube_dl module in python. Whole documentation for the youtube_dl Installation git

7 Aug 27, 2021
A simple Python program which uses youtube-dl for downloading YouTube videos as mp3 files.

yt-mp3 converter This is a simple Python program which uses youtube-dl for downloading YouTube videos as mp3 files. This program is for you if you are

nostalgicnerdpenguin 1 Oct 24, 2021
Download Apple Music Cover Artwork in the best Quality by providing an Apple Music Link. It downloads the jpg, png and webp version since they often differ from another.

amogus.py - Version 0.0.5 amogus - Apple Music Hi-Res Artwork Fetcher this is my first real python tool so sorry if its bad amogus is a Python script

reaper 46 Jan 09, 2023
A collection of modules I have created to programmatically search for/download imagery from live cam feeds across the state of California.

A collection of modules that I have created to programmatically search for/download imagery from all publicly available live cam feeds across the state of California. In no way am I affiliated with a

Chad Groom 5 Nov 21, 2022
Download India Stocks Historical Data

Kite Helper - Download Stock Market Data 🌎 Website Simple Application to Download any stock market data in .csv format using Kite 🏃‍♂️ Running Serve

Pishang Ujeniya 12 Dec 06, 2022
Download YouTube videos that are available in the given playlist

Youtube-Playlist-Downloader Download YouTube videos that are in a playlist Project assets: music downloaded music folder. (will be generated) music.db

Sultan Aljaberi 1 Dec 22, 2021
FireDM is a python open source (Internet Download Manager) with multi-connections, high speed engine, it downloads general files and videos from youtube and tons of other streaming websites .

python open source (Internet Download Manager) with multi-connections, high speed engine, based on python, LibCurl, and youtube_dl https://github.com/firedm/FireDM

1.6k Apr 12, 2022