Materials to reproduce our findings in our stories, "Amazon Puts Its Own 'Brands' First Above Better-Rated Products" and "When Amazon Takes the Buy Box, it Doesn’t Give it up"

Overview

Amazon Brands and Exclusives

This repository contains code to reproduce the findings featured in our story "Amazon Puts Its Own 'Brands' First Above Better-Rated Products" and "When Amazon Takes the Buy Box, it Doesn’t Give it up" from our series Amazon's Advantage.

Our methodology is described in "How We Analyzed Amazon’s Treatment of Its Brands in Search Results".

Data that we collected and analyzed is in the data folder.
To use the full input dataset (which is not hosted here), please refer to Download data.

Jupyter notebooks used for data preprocessing and analysis are available in the notebooks folder.
Descriptions for each notebook are outlined in the Notebooks section below.

Installation

Python

Make sure you have Python 3.6+ installed. We used Miniconda to create a Python 3.8 virtual environment.

Then install the Python packages:
pip install -r requirements.txt

Notebooks

These notebooks are intended to be run sequentially, but they are not dependent on one another. If you want a quick overview of the methodology, you only need to concern yourself with the notebooks with an asterisk(*).

0-data-preprocessing.ipynb

This notebook parses Amazon search results and Amazon product pages, and produces the intermediary datasets (data/output/datasets/) used in ranking analysis and random forest classifiers.

1-data-analysis-search-results.ipynb *

Bulk of the ranking analysis and stats in the data analysis.

2-random-forest-analysis.ipynb *

Feature engineering training set, finding optimal hyperparameters, and performing the ablation study on a random forest model. The most predictive feature is verified using three separate methods.

3-survey-results.ipynb

Visualizing the survey results from our national panel of 1,000 adults.

4-limiations-product-page-changes.ipynb

Analysis of how often the Buy Box's default shipper and seller change between Amazon and a third party.

utils.py

Contains convenient functions used in the notebooks.

parsers.py

Contains parsers for search results and product pages.

Data

This directory is where inputs, intermediaries, and outputs are saved.

data
├── output
│   ├── figures
│   ├── tables
│   └── datasets
│       ├── amazon_private_label.csv.xz
│       ├── products.csv.xz
│       ├── searches.csv.xz
│       ├── training_set.csv.gz
│       ├── pairwise_training_set.csv.gz
│       └── trademarks
└── input
    ├── combined_queries_with_source.csv
    ├── best_sellers
    ├── generic_search_terms
    ├── search-private-label
    ├── search-selenium
    ├── search-selenium-our-brands-filter_
    ├── selenium-products
    ├── seller_central
    └── spotcheck

data/output/ contains tables, figures, and datasets used in our methodology.

data/output/datasets/amazon_private_label.csv.xz is our dataset of Amazon brands, exclusives, and proprietary electronics (N=137,428 products). We use each product's unique ID (called an ASIN) to identify Amazon's own products in our methodology.

data/output/datasets/trademarks contains a dataset of trademarked brands registered by Amazon. The data was collected from USPTO.gov and Amazon. We included an additional README with the exact steps we took to build this dataset in the directory.

data/output/datasets/searches.csv.xz parsed search result pages from top and generic searches (N=187,534 product positions). You can filter this by search_term for each of these subsets from data/input/combined_queries_with_source.csv.

data/output/datasets/products.csv.xz parsed product pages from the searches above (N=157,405 product pages).

data/output/training_set.csv.gz metadata used to train and evaluate the random forest. Additionally, feature engineering is conducted in notebooks/2-random-forest-analysis.ipynb, which produces pairwise_training_set.csv.gz.

Every file in data/input except combined_queries_with_source.csv is stored in AWS s3. Those are not hosted in this repository.

Download Data

You can find the raw inputs in data/input in s3://markup-public-data/amazon-brands/.

If you trust us, you can download the HTML and JSON files in data/input using this script: sh data/download_input_data.sh

Note this is not necessary to run notebooks and see full results.

data/input/search-selenium/ (12 GB uncompressed)

First page of search results collected in January 2021. Download the HTML files search-selenium.tar.xz (238 MB compressed) here.

data/input/selenium-products/ (220 GB uncompressed)

Product pages collected in February 2021. Download the HTML files selenium-products.tar.xz (9 GB compressed) here.

data/input/search-selenium-our-brands-filter_/ (35 GB uncompressed)

Search results filtered by "our brands". Contains every page of search results. Download search-selenium-our-brands-filter_.tar.xz (403 MB compressed) here.

data/input/search-private-label/ (25 GB uncompressed)

API responses for search results filtered down to products Amazon identifies as "our brands". Contains paginated API results. Download the JSON files search-private-label.tar.xz (402 MB uncompressed) here.

data/input/seller_central/ (105 MB)

Seller central data for Q4 2020. Download the CSV file All_Q4_2020.csv.xz (105 MB compressioned) here.

data/input/best_sellers/ (4 GB)

Amazon's best sellers under the category "Amazon Devices & Accessories". Download the HTML files best_sellers.tar.xz (60MB compressed) here.

data/input/spotcheck/ (4 GB)

A sub-sample of product pages for spot-checking Buy Box changes. Download the HTML files spotcheck.tar.xz (159 MB compressed) here.

A simple MTProto-based bot that can download various types of media (>10MB) on a local storage

TG Media Downloader Bot 🤖 A telegram bot based on Pyrogram that downloads on a local storage the following media files: animation, audio, document, p

Alessio Tudisco 11 Nov 01, 2022
GroupMenter : New Telegram Group Manager Bot🔸Fast 🔸Python🔸Pyrogram 🔸

GroupMenter An PowerFull Group Manager Bot. Written In Pytelethon. Info • A modular Telegram Python bot running on python3. • Can be found on telegram

Group Menter 24 Jun 28, 2022
Free and Open Source Group Voice chat music player for telegram ❤️ with button support youtube playback support

Free and Open Source Group Voice chat music player for telegram ❤️ with button support youtube playback support

Sehath Perera 1 Jan 08, 2022
IMDb + Auto + Unlimited Filter BoT

Telegram Movie Bot Features Auto Filter Manuel Filter IMDB Admin Commands Broadcast Index IMDB search Inline Search Random pics ids and User info Stat

Team AlinaX 1 Dec 03, 2021
Python 3 SDK/Wrapper for Huobi Crypto Exchange Api

This packages intents to be an idiomatic PythonApi wrapper for https://www.huobi.com/ Huobi Api Doc: https://huobiapi.github.io/docs Showcase TODO Con

3 Jul 28, 2022
A Python wrapper around the Twitter API.

Python Twitter A Python wrapper around the Twitter API. By the Python-Twitter Developers Introduction This library provides a pure Python interface fo

Mike Taylor 3.4k Jan 01, 2023
Freqtrade is a free and open source crypto trading bot written in Python.

Freqtrade is a free and open source crypto trading bot written in Python. It is designed to support all major exchanges and be controlled via Telegram. It contains backtesting, plotting and money man

Kazune Takeda 5 Dec 30, 2021
Official implementation of DeepSportLab (a fork of OpenPifPaf)

DeepSportLab DeepSportLab: a Unified Framework for BallDetection, Player Instance Segmentationand Pose Estimation in Team Sports Scenes This paper pre

ISPGroupUCL 8 Sep 27, 2022
This Wrapper is a Discum Copy With Addons, original one is made by Merubokkusu

Remaded Discum Its not Official Discum Wrapper ! This Wrapper is a Discum Copy With Addons, original one is made by Merubokkusu Authors @merubokkusu (

discum-remaded 8 Aug 09, 2022
This is a Python package to create a snowflake identifier similar to Discord's or Twitter's.

snowflake2 Based on falcondai and fenhl's Python snowflake tool, but with documentation and simliarities to Discord. Installation instructions Install

Learnloot 2 Mar 19, 2022
The fastest nuker on discord, Proxy support and more

About okuru nuker is a nuker for discord written in python, It uses methods such as threading and requests to ban faster and perform at higher speeds.

63 Dec 31, 2022
An open source raffle bot made to increase the chance of winning limited sneaker raffles by automating entries.

🚀 SyneziaRaffles An open source raffle bot made to increase the chance of winning limited sneaker raffles by automating entries. 🏄‍♂️ Quick Start Pr

Alexis M. 29 Dec 22, 2022
Tools to download and aggregate feeds of vaccination clinic location information in the United States.

vaccine-feed-ingest Pipeline for ingesting nationwide feeds of vaccine facilities. Contributing How to Configure your environment (instructions on the

Call the Shots 26 Aug 05, 2022
Intelligent Trading Bot: Automatically generating signals and trading based on machine learning and feature engineering

Intelligent Trading Bot: Automatically generating signals and trading based on machine learning and feature engineering

Alexandr Savinov 326 Jan 03, 2023
Automatically searching for vaccine appointments

Vaccine Appointments Automatically searching for vaccine appointments Usage To copy this package, run: git clone https://github.com/TheIronicCurtain/v

58 Apr 13, 2021
Match-making API for OpenSanctions

OpenSanctions Match-making API This directory contains code and a Docker image for running an API to match data against OpenSanctions. It is intended

OpenSanctions.org 26 Dec 15, 2022
Weather_besac is a French twitter bot that tweet the weather of the city of Besançon in Franche-Comté in France every day at 8am and 4pm.

Weather Bot Besac Weather_besac is a French twitter bot that tweet the weather of the city of Besançon in Franche-Comté in France every day at 8am and

Rgld_ 1 Nov 15, 2021
A discord bot with a leveling system (similar to mee6).

Discord.py A discord bot with a leveling system (like mee6) Pre-requisites Knowing how to get create an app/bot via discord's developer portal. Websit

26 Dec 11, 2022
🔮 A usefull set of scripts to dig into your Discord data package.

Discord DataExtractor 🔮 Discord DataExtractor is a set of scripts that allows you to dig into your Discord Data package. Repository guide ☕ Coffee_Ga

3 Dec 29, 2021
an OSU! bot sdk based on IRC

osu-bot-sdk an OSU! bot sdk based on IRC Start! The following is an example of event triggering import osu_irc_sdk from osu_irc_sdk import models bot

chinosk 2 Dec 16, 2021