A tool to compare differences between dataframes and create a differences report in Excel

Last update: Sep 15, 2022

Related tags

Data Analysis similarpanda

Overview

similarpanda

A module to check for differences between pandas Dataframes, and generate a report in Excel format.

This is helpful in a workplace setting, where you might need to check between different versions of an Excel file.

Inputs (to create the object)

new: DataFrame
old: DataFrame
key_column: A reference to a common DataFrame column. If omitted, the data will be matched on the row position.

Returns:

A similarpanda object
Use similarpanda.output_excel() to generate an Excel report.

Note:

Tabular data must contain unique column names.
Only checks cell values. Formulae, formatting ets is disregarded.

Includes a helper Jupyer notebook, which supports in loading of tabular data:

From specified named tables within named Excel files
From the clipboard

Owner

Andre Pretorius

GitHub Repository

ASTR 302: Python for Astronomy (Winter '22)

ASTR 302, Winter 2022, University of Washington: Python for Astronomy Mario Jurić Location When: 2:30-3:50, Monday & Wednesday, Winter quarter 2022 Wh

4 Jan 12, 2022

Maximum Covariance Analysis in Python

xMCA | Maximum Covariance Analysis in Python The aim of this package is to provide a flexible tool for the climate science community to perform Maximu

39 Jan 03, 2023

Semi-Automated Data Processing

Perform semi automated exploratory data analysis, feature engineering and feature selection on provided dataset by visualizing every possibilities on each step and assisting the user to make a meanin

1 Jan 17, 2022

CleanX is an open source python library for exploring, cleaning and augmenting large datasets of X-rays, or certain other types of radiological images.

cleanX CleanX is an open source python library for exploring, cleaning and augmenting large datasets of X-rays, or certain other types of radiological

20 Jan 05, 2023

The official pytorch implementation of ViTAE: Vision Transformer Advanced by Exploring Intrinsic Inductive Bias

104 Nov 27, 2022

In this tutorial, raster models of soil depth and soil water holding capacity for the United States will be sampled at random geographic coordinates within the state of Colorado.

Raster_Sampling_Demo (Resulting graph of this demo) Background Sampling values of a raster at specific geographic coordinates can be done with a numbe

2 Dec 13, 2022

Methylation/modified base calling separated from basecalling.

Remora Methylation/modified base calling separated from basecalling. Remora primarily provides an API to call modified bases for basecaller programs s

72 Jan 05, 2023

An ETL Pipeline of a large data set from a fictitious music streaming service named Sparkify.

An ETL Pipeline of a large data set from a fictitious music streaming service named Sparkify. The ETL process flows from AWS's S3 into staging tables in AWS Redshift.

1 Feb 11, 2022

High Dimensional Portfolio Selection with Cardinality Constraints

High-Dimensional Portfolio Selecton with Cardinality Constraints This repo contains code for perform proximal gradient descent to solve sample average

2 Mar 22, 2022

🌍 Create 3d-printable STLs from satellite elevation data 🌏

mapa 🌍 Create 3d-printable STLs from satellite elevation data Installation pip install mapa Usage mapa uses numpy and numba under the hood to crunch

13 Dec 15, 2022

Demonstrate a Dataflow pipeline that saves data from an API into BigQuery table

Overview dataflow-mvp provides a basic example pipeline that pulls data from an API and writes it to a BigQuery table using GCP's Dataflow (i.e., Apac

1 Dec 03, 2021

DefAP is a program developed to facilitate the exploration of a material's defect chemistry

DefAP is a program developed to facilitate the exploration of a material's defect chemistry. A large number of features are provided and rapid exploration is supported through the use of autoplotting

6 Oct 25, 2022

A distributed block-based data storage and compute engine

Nebula is an extremely-fast end-to-end interactive big data analytics solution. Nebula is designed as a high-performance columnar data storage and tabular OLAP engine.

131 Dec 26, 2022

CPSPEC is an astrophysical data reduction software for timing

CPSPEC manual Introduction CPSPEC is an astrophysical data reduction software for timing. Various timing properties, such as power spectra and cross s

1 Oct 20, 2021

statDistros is a Python library for dealing with various statistical distributions

StatisticalDistributions statDistros statDistros is a Python library for dealing with various statistical distributions. Now it provides various stati

1 Oct 03, 2021

Time ranges with python

timeranges Time ranges. Read the Docs Installation pip timeranges is available on pip: pip install timeranges GitHub You can also install the latest v

2 Sep 01, 2022

Python implementation of Principal Component Analysis

Principal Component Analysis Principal Component Analysis (PCA) is a dimension-reduction algorithm. The idea is to use the singular value decompositio

1 Nov 06, 2021

A library to create multi-page Streamlit applications with ease.

107 Jan 04, 2023

Collections of pydantic models

pydantic-collections The pydantic-collections package provides BaseCollectionModel class that allows you to manipulate collections of pydantic models

20 Dec 26, 2022

NumPy aware dynamic Python compiler using LLVM

Numba A Just-In-Time Compiler for Numerical Functions in Python Numba is an open source, NumPy-aware optimizing compiler for Python sponsored by Anaco

8.2k Jan 07, 2023