Python utility to extract differences between two pandas dataframes.

Overview

#Pandas Diff

Installation

Install pandas_diff with pip

  pip install pandas_diff

Usage/Examples

import pandas_diff as pd_diff

import pandas as pd

# Create two example dataframes
df_infinity = pd.DataFrame([
                {"hero" : "hulk" , "power" : "strength"},
                {"hero" : "black_widow" , "power" : "spy"},
                {"hero" : "thor" , "hammers" : 0 },
                {"hero" : "thor" , "hammers" : 1 } ] )
df_endgame = pd.DataFrame([
                {"hero" : "hulk" , "power" : "smart"},
                {"hero" : "captain marvel" , "power" : "strength"},
                {"hero" : "thor" , "hammers" : 2 } ] )

# Get differences, using the key "hero"
df = pd_diff.get_diffs(df_infinity ,df_endgame ,"hero")

df

  operation object_keys  object_values                     object_json                     attribute_changed old_value new_value
0   create     [hero]    captain marvel  {'hero': 'captain marvel', 'power': 'strength'...           NaN           NaN      NaN
1   delete     [hero]       black_widow  {'hero': 'black_widow', 'power': 'spy', 'hamme...           NaN           NaN      NaN
2   modify     [hero]              thor     {'hero': 'thor', 'power': nan, 'hammers': 2.0}       hammers             1        2
3   modify     [hero]              hulk  {'hero': 'hulk', 'power': 'smart', 'hammers': ...         power      strength    smart

Features

  • Support for stand alone app
Comments
  • Unused import click

    Unused import click

    opened by jaimevalero 1
  • Bump pip from 19.2.3 to 21.1

    Bump pip from 19.2.3 to 21.1

    Bumps pip from 19.2.3 to 21.1.

    Changelog

    Sourced from pip's changelog.

    21.1 (2021-04-24)

    Process

    • Start installation scheme migration from distutils to sysconfig. A warning is implemented to detect differences between the two implementations to encourage user reports, so we can avoid breakages before they happen.

    Features

    • Add the ability for the new resolver to process URL constraints. ([#8253](https://github.com/pypa/pip/issues/8253) <https://github.com/pypa/pip/issues/8253>_)
    • Add a feature --use-feature=in-tree-build to build local projects in-place when installing. This is expected to become the default behavior in pip 21.3; see Installing from local packages <https://pip.pypa.io/en/stable/user_guide/#installing-from-local-packages>_ for more information. ([#9091](https://github.com/pypa/pip/issues/9091) <https://github.com/pypa/pip/issues/9091>_)
    • Bring back the "(from versions: ...)" message, that was shown on resolution failures. ([#9139](https://github.com/pypa/pip/issues/9139) <https://github.com/pypa/pip/issues/9139>_)
    • Add support for editable installs for project with only setup.cfg files. ([#9547](https://github.com/pypa/pip/issues/9547) <https://github.com/pypa/pip/issues/9547>_)
    • Improve performance when picking the best file from indexes during pip install. ([#9748](https://github.com/pypa/pip/issues/9748) <https://github.com/pypa/pip/issues/9748>_)
    • Warn instead of erroring out when doing a PEP 517 build in presence of --build-option. Warn when doing a PEP 517 build in presence of --global-option. ([#9774](https://github.com/pypa/pip/issues/9774) <https://github.com/pypa/pip/issues/9774>_)

    Bug Fixes

    • Fixed --target to work with --editable installs. ([#4390](https://github.com/pypa/pip/issues/4390) <https://github.com/pypa/pip/issues/4390>_)
    • Add a warning, discouraging the usage of pip as root, outside a virtual environment. ([#6409](https://github.com/pypa/pip/issues/6409) <https://github.com/pypa/pip/issues/6409>_)
    • Ignore .dist-info directories if the stem is not a valid Python distribution name, so they don't show up in e.g. pip freeze. ([#7269](https://github.com/pypa/pip/issues/7269) <https://github.com/pypa/pip/issues/7269>_)
    • Only query the keyring for URLs that actually trigger error 401. This prevents an unnecessary keyring unlock prompt on every pip install invocation (even with default index URL which is not password protected). ([#8090](https://github.com/pypa/pip/issues/8090) <https://github.com/pypa/pip/issues/8090>_)
    • Prevent packages already-installed alongside with pip to be injected into an isolated build environment during build-time dependency population. ([#8214](https://github.com/pypa/pip/issues/8214) <https://github.com/pypa/pip/issues/8214>_)
    • Fix pip freeze permission denied error in order to display an understandable error message and offer solutions. ([#8418](https://github.com/pypa/pip/issues/8418) <https://github.com/pypa/pip/issues/8418>_)
    • Correctly uninstall script files (from setuptools' scripts argument), when installed with --user. ([#8733](https://github.com/pypa/pip/issues/8733) <https://github.com/pypa/pip/issues/8733>_)
    • New resolver: When a requirement is requested both via a direct URL (req @ URL) and via version specifier with extras (req[extra]), the resolver will now be able to use the URL to correctly resolve the requirement with extras. ([#8785](https://github.com/pypa/pip/issues/8785) <https://github.com/pypa/pip/issues/8785>_)
    • New resolver: Show relevant entries from user-supplied constraint files in the error message to improve debuggability. ([#9300](https://github.com/pypa/pip/issues/9300) <https://github.com/pypa/pip/issues/9300>_)
    • Avoid parsing version to make the version check more robust against lousily debundled downstream distributions. ([#9348](https://github.com/pypa/pip/issues/9348) <https://github.com/pypa/pip/issues/9348>_)
    • --user is no longer suggested incorrectly when pip fails with a permission error in a virtual environment. ([#9409](https://github.com/pypa/pip/issues/9409) <https://github.com/pypa/pip/issues/9409>_)
    • Fix incorrect reporting on Requires-Python conflicts. ([#9541](https://github.com/pypa/pip/issues/9541) <https://github.com/pypa/pip/issues/9541>_)

    ... (truncated)

    Commits
    • 2b2a268 Bump for release
    • ea761a6 Update AUTHORS.txt
    • 2edd3fd Postpone a deprecation to 21.2
    • 3cccfbf Rename mislabeled news fragment
    • 21cd124 Fix NEWS.rst placeholder position
    • e46bdda Merge pull request #9827 from pradyunsg/fix-git-improper-tag-handling
    • 0e4938d :newspaper:
    • ca832b2 Don't split git references on unicode separators
    • 1320bac Merge pull request #9814 from pradyunsg/revamp-ci-apr-2021-v2
    • e9cc23f Skip checks on PRs only
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 1
  • Config file for pyup.io

    Config file for pyup.io

    Hi there and thanks for using pyup.io!

    Since you are using a non-default config I've created one for you.

    There are a lot of things you can configure on top of that, so make sure to check out the docs to see what I can do for you.

    opened by pyup-bot 1
  • Bump wheel from 0.33.6 to 0.38.1

    Bump wheel from 0.33.6 to 0.38.1

    Bumps wheel from 0.33.6 to 0.38.1.

    Changelog

    Sourced from wheel's changelog.

    Release Notes

    UNRELEASED

    • Updated vendored packaging to 22.0

    0.38.4 (2022-11-09)

    • Fixed PKG-INFO conversion in bdist_wheel mangling UTF-8 header values in METADATA (PR by Anderson Bravalheri)

    0.38.3 (2022-11-08)

    • Fixed install failure when used with --no-binary, reported on Ubuntu 20.04, by removing setup_requires from setup.cfg

    0.38.2 (2022-11-05)

    • Fixed regression introduced in v0.38.1 which broke parsing of wheel file names with multiple platform tags

    0.38.1 (2022-11-04)

    • Removed install dependency on setuptools
    • The future-proof fix in 0.36.0 for converting PyPy's SOABI into a abi tag was faulty. Fixed so that future changes in the SOABI will not change the tag.

    0.38.0 (2022-10-21)

    • Dropped support for Python < 3.7
    • Updated vendored packaging to 21.3
    • Replaced all uses of distutils with setuptools
    • The handling of license_files (including glob patterns and default values) is now delegated to setuptools>=57.0.0 (#466). The package dependencies were updated to reflect this change.
    • Fixed potential DoS attack via the WHEEL_INFO_RE regular expression
    • Fixed ValueError: ZIP does not support timestamps before 1980 when using SOURCE_DATE_EPOCH=0 or when on-disk timestamps are earlier than 1980-01-01. Such timestamps are now changed to the minimum value before packaging.

    0.37.1 (2021-12-22)

    • Fixed wheel pack duplicating the WHEEL contents when the build number has changed (#415)
    • Fixed parsing of file names containing commas in RECORD (PR by Hood Chatham)

    0.37.0 (2021-08-09)

    • Added official Python 3.10 support
    • Updated vendored packaging library to v20.9

    ... (truncated)

    Commits
    • 6f1608d Created a new release
    • cf8f5ef Moved news item from PR #484 to its proper place
    • 9ec2016 Removed install dependency on setuptools (#483)
    • 747e1f6 Fixed PyPy SOABI parsing (#484)
    • 7627548 [pre-commit.ci] pre-commit autoupdate (#480)
    • 7b9e8e1 Test on Python 3.11 final
    • a04dfef Updated the pypi-publish action
    • 94bb62c Fixed docs not building due to code style changes
    • d635664 Updated the codecov action to the latest version
    • fcb94cd Updated version to match the release
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
  • Unused variable 'a'

    Unused variable 'a'

    opened by jaimevalero 0
  • Initial Update

    Initial Update

    The bot created this issue to inform you that pyup.io has been set up on this repo. Once you have closed it, the bot will open pull requests for updates as soon as they are available.

    opened by pyup-bot 0
Releases(v1.4.0)
Owner
Jaime Valero
Devops, Machine learning learner and sysadmin. I also cook omelettes.
Jaime Valero
An ETL Pipeline of a large data set from a fictitious music streaming service named Sparkify.

An ETL Pipeline of a large data set from a fictitious music streaming service named Sparkify. The ETL process flows from AWS's S3 into staging tables in AWS Redshift.

1 Feb 11, 2022
Employee Turnover Analysis

Employee Turnover Analysis Submission to the DataCamp competition "Can you help reduce employee turnover?"

Jannik Wiedenhaupt 1 Feb 13, 2022
Office365 (Microsoft365) audit log analysis tool

Office365 (Microsoft365) audit log analysis tool The header describes it all WHY?? The first line of code was written long time before other colleague

Anatoly 1 Jul 27, 2022
.npy, .npz, .mtx converter.

npy-converter Matrix Data Converter. Expand matrix for multi-thread, multi-process Divid matrix for multi-thread, multi-process Support: .mtx, .npy, .

taka 1 Feb 07, 2022
A powerful data analysis package based on mathematical step functions. Strongly aligned with pandas.

The leading use-case for the staircase package is for the creation and analysis of step functions. Pretty exciting huh. But don't hit the close button

48 Dec 21, 2022
Single-Cell Analysis in Python. Scales to >1M cells.

Scanpy – Single-Cell Analysis in Python Scanpy is a scalable toolkit for analyzing single-cell gene expression data built jointly with anndata. It inc

Theis Lab 1.4k Jan 05, 2023
songplays datamart provide details about the musical taste of our customers and can help us to improve our recomendation system

Songplays User activity datamart The following document describes the model used to build the songplays datamart table and the respective ETL process.

Leandro Kellermann de Oliveira 1 Jul 13, 2021
This mini project showcase how to build and debug Apache Spark application using Python

Spark app can't be debugged using normal procedure. This mini project showcase how to build and debug Apache Spark application using Python programming language. There are also options to run Spark a

Denny Imanuel 1 Dec 29, 2021
Detailed analysis on fraud claims in insurance companies, gives you information as to why huge loss take place in insurance companies

Insurance-Fraud-Claims Detailed analysis on fraud claims in insurance companies, gives you information as to why huge loss take place in insurance com

1 Jan 27, 2022
:truck: Agile Data Preparation Workflows made easy with dask, cudf, dask_cudf and pyspark

To launch a live notebook server to test optimus using binder or Colab, click on one of the following badges: Optimus is the missing framework to prof

Iron 1.3k Dec 30, 2022
Catalogue data - A Python Scripts to prepare catalogue data

catalogue_data Scripts to prepare catalogue data. Setup Clone this repo. Install

BigScience Workshop 3 Mar 03, 2022
PySpark bindings for H3, a hierarchical hexagonal geospatial indexing system

h3-pyspark: Uber's H3 Hexagonal Hierarchical Geospatial Indexing System in PySpark PySpark bindings for the H3 core library. For available functions,

Kevin Schaich 12 Dec 24, 2022
Fit models to your data in Python with Sherpa.

Table of Contents Sherpa License How To Install Sherpa Using Anaconda Using pip Building from source History Release History Sherpa Sherpa is a modeli

134 Jan 07, 2023
Bearsql allows you to query pandas dataframe with sql syntax.

Bearsql adds sql syntax on pandas dataframe. It uses duckdb to speedup the pandas processing and as the sql engine

14 Jun 22, 2022
Utilize data analytics skills to solve real-world business problems using Humana’s big data

Humana-Mays-2021-HealthCare-Analytics-Case-Competition- The goal of the project is to utilize data analytics skills to solve real-world business probl

Yongxian (Caroline) Lun 1 Dec 27, 2021
Weather analysis with Python, SQLite, SQLAlchemy, and Flask

Surf's Up Weather analysis with Python, SQLite, SQLAlchemy, and Flask Overview The purpose of this analysis was to examine weather trends (precipitati

Art Tucker 1 Sep 05, 2021
BIGDATA SIMULATION ONE PIECE WORLD CENSUS

ONE PIECE is a Japanese manga of great international success. The story turns inhabited in a fictional world, tells the adventures of a young man whose body gained rubber properties after accidentall

Maycon Cypriano 3 Jun 30, 2022
MeSH2Matrix - A set of Python codes for the generation of biomedical ontologies from the MeSH keywords of the PubMed scholarly publications

A set of Python codes for the generation of biomedical ontologies from the MeSH keywords of the PubMed scholarly publications

SisonkeBiotik 6 Nov 30, 2022
Lale is a Python library for semi-automated data science.

Lale is a Python library for semi-automated data science. Lale makes it easy to automatically select algorithms and tune hyperparameters of pipelines that are compatible with scikit-learn, in a type-

International Business Machines 293 Dec 29, 2022
This creates a ohlc timeseries from downloaded CSV files from NSE India website and makes a SQLite database for your research.

NSE-timeseries-form-CSV-file-creator-and-SQL-appender- This creates a ohlc timeseries from downloaded CSV files from National Stock Exchange India (NS

PILLAI, Amal 1 Oct 02, 2022