Automatically Visualize any dataset, any size with a single line of code. Created by Ram Seshadri. Collaborators Welcome. Permission Granted upon Request.

Overview

AutoViz

banner

Pepy Downloads Pepy Downloads per week Pepy Downloads per month standard-readme compliant Python Versions PyPI Version PyPI License

Automatically Visualize any dataset, any size with a single line of code.

AutoViz performs automatic visualization of any dataset with one line. Give any input file (CSV, txt or json) and AutoViz will visualize it.

Table of Contents

Install

Prerequsites

To clone AutoViz, it's better to create a new environment, and install the required dependencies:

To install from PyPi:

conda create -n <your_env_name> python=3.7 anaconda
conda activate <your_env_name> # ON WINDOWS: `source activate <your_env_name>`
pip install autoviz

To install from source:

cd <AutoViz_Destination>
git clone [email protected]:AutoViML/AutoViz.git
# or download and unzip https://github.com/AutoViML/AutoViz/archive/master.zip
conda create -n <your_env_name> python=3.7 anaconda
conda activate <your_env_name> # ON WINDOWS: `source activate <your_env_name>`
cd AutoViz
pip install -r requirements.txt

Usage

Read this Medium article to know how to use AutoViz.

In the AutoViz directory, open a Jupyter Notebook and use this line to instantiate the library

from autoviz.AutoViz_Class import AutoViz_Class

AV = AutoViz_Class()

Load a dataset (any CSV or text file) into a Pandas dataframe or give the name of the path and filename you want to visualize. If you don't have a filename, you can simply assign the filename argument "" (empty string).

Call AutoViz using the filename (or dataframe) along with the separator and the name of the target variable in the input. AutoViz will do the rest. You will see charts and plots on your screen.

filename = ""
sep = ","
dft = AV.AutoViz(
    filename,
    sep=",",
    depVar="",
    dfte=None,
    header=0,
    verbose=0,
    lowess=False,
    chart_format="svg",
    max_rows_analyzed=150000,
    max_cols_analyzed=30,
)

AV.AutoViz is the main plotting function in AV.

Notes:

  • AutoViz will visualize any sized file using a statistically valid sample.
  • COMMA is assumed as default separator in file. But you can change it.
  • Assumes first row as header in file but you can change it.
  • verbose option
    • if 0, display minimal information but displays charts on your notebook
    • if 1, print extra information on the notebook and also display charts
    • if 2, will not display any charts, it will simply save them in your local machine under AutoViz_Plots directory

API

Arguments

  • filename - Make sure that you give filename as empty string ("") if there is no filename associated with this data and you want to use a dataframe, then use dfte to give the name of the dataframe. Otherwise, fill in the file name and leave dfte as empty string. Only one of these two is needed to load the data set.
  • sep - this is the separator in the file. It can be comma, semi-colon or tab or any value that you see in your file that separates each column.
  • depVar - target variable in your dataset. You can leave it as empty string if you don't have a target variable in your data.
  • dfte - this is the input dataframe in case you want to load a pandas dataframe to plot charts. In that case, leave filename as an empty string.
  • header - the row number of the header row in your file. If it is the first row, then this must be zero.
  • verbose - it has 3 acceptable values: 0, 1 or 2. With zero, you get all charts but limited info. With 1 you get all charts and more info. With 2, you will not see any charts but they will be quietly generated and save in your local current directory under the AutoViz_Plots directory which will be created. Make sure you delete this folder periodically, otherwise, you will have lots of charts saved here if you used verbose=2 option a lot.
  • lowess - this option is very nice for small datasets where you can see regression lines for each pair of continuous variable against the target variable. Don't use this for large data sets (that is over 100,000 rows)
  • chart_format - this can be SVG, PNG or JPG. You will get charts generated and saved in this format if you used verbose=2 option. Very useful for generating charts and using them later.
  • max_rows_analyzed - limits the max number of rows that is used to display charts. If you have a very large data set with millions of rows, then use this option to limit the amount of time it takes to generate charts. We will take a statistically valid sample.
  • max_cols_analyzed - limits the number of continuous vars that can be analyzed

Maintainers

Contributing

See the contributing file!

PRs accepted.

License

Apache License, Version 2.0

DISCLAIMER

This project is not an official Google project. It is not supported by Google and Google specifically disclaims all warranties as to its quality, merchantability, or fitness for a particular purpose.

Owner
AutoViz and Auto_ViML
Automated Machine Learning: Build Variant Interpretable Machine Learning models. Project Created by Ram Seshadri.
AutoViz and Auto_ViML
D-Analyst : High Performance Visualization Tool

D-Analyst : High Performance Visualization Tool D-Analyst is a high performance data visualization built with python and based on OpenGL. It allows to

4 Apr 14, 2022
Cartopy - a cartographic python library with matplotlib support

Cartopy is a Python package designed to make drawing maps for data analysis and visualisation easy. Table of contents Overview Get in touch License an

1.2k Jan 01, 2023
A Python Library for Self Organizing Map (SOM)

SOMPY A Python Library for Self Organizing Map (SOM) As much as possible, the structure of SOM is similar to somtoolbox in Matlab. It has the followin

Vahid Moosavi 497 Dec 29, 2022
Learning Convolutional Neural Networks with Interactive Visualization.

CNN Explainer An interactive visualization system designed to help non-experts learn about Convolutional Neural Networks (CNNs) For more information,

Polo Club of Data Science 6.3k Jan 01, 2023
coordinate to draw the nimbus logo on the graffitiwall

This is a community effort to draw the nimbus logo on beaconcha.in's graffitiwall. get started clone repo with git clone https://github.com/tennisbowl

4 Apr 04, 2022
ipyvizzu - Jupyter notebook integration of Vizzu

ipyvizzu - Jupyter notebook integration of Vizzu. Tutorial · Examples · Repository About The Project ipyvizzu is the Jupyter Notebook integration of V

Vizzu 729 Jan 08, 2023
An application that allows you to design and test your own stock trading algorithms in an attempt to beat the market.

StockBot is a Python application for designing and testing your own daily stock trading algorithms. Installation Use the

Ryan Cullen 280 Dec 19, 2022
A small tool to test and visualize protein embeddings and amino acid proportions.

polyprotein_stats A small tool to test and visualize protein embeddings and amino acid proportions. Currently deployed on streamlit.io. Given a set of

2 Jan 07, 2023
This repository contains a streaming Dataflow pipeline written in Python with Apache Beam, reading data from PubSub.

Sample streaming Dataflow pipeline written in Python This repository contains a streaming Dataflow pipeline written in Python with Apache Beam, readin

Israel Herraiz 9 Mar 18, 2022
This is my favourite function - the Rastrigin function.

This is my favourite function - the Rastrigin function. What sparked my curiosity and interest in the function was its complexity in terms of many local optimum points, which makes it particularly in

1 Dec 27, 2021
Geocoding library for Python.

geopy geopy is a Python client for several popular geocoding web services. geopy makes it easy for Python developers to locate the coordinates of addr

geopy 3.8k Jan 02, 2023
PyPassword is a simple follow up to PyPassphrase

PyPassword PyPassword is a simple follow up to PyPassphrase. After finishing that project it occured to me that while some may wish to use that option

Scotty 2 Jan 22, 2022
visualize_ML is a python package made to visualize some of the steps involved while dealing with a Machine Learning problem

visualize_ML visualize_ML is a python package made to visualize some of the steps involved while dealing with a Machine Learning problem. It is build

Ayush Singh 164 Dec 12, 2022
基于python爬虫爬取COVID-19爆发开始至今全球疫情数据并利用Echarts对数据进行分析与多样化展示。

COVID-19-Epidemic-Map 基于python爬虫爬取COVID-19爆发开始至今全球疫情数据并利用Echarts对数据进行分析与多样化展示。 觉得项目还不错的话欢迎给一个star! 项目的源码可以正常运行,各个库的版本、数据库的建表语句、运行过程中遇到的坑以及解决方式在笔记.md中都

31 Dec 15, 2022
Generating interfaces(CLI, Qt GUI, Dash web app) from a Python function.

oneFace is a Python library for automatically generating multiple interfaces(CLI, GUI, WebGUI) from a callable Python object. oneFace is an easy way t

NaNg 31 Oct 21, 2022
A simple code for plotting figure, colorbar, and cropping with python

Python Plotting Tools This repository provides a python code to generate figures (e.g., curves and barcharts) that can be used in the paper to show th

Guanying Chen 134 Jan 02, 2023
eoplatform is a Python package that aims to simplify Remote Sensing Earth Observation by providing actionable information on a wide swath of RS platforms and provide a simple API for downloading and visualizing RS imagery

An Earth Observation Platform Earth Observation made easy. Report Bug | Request Feature About eoplatform is a Python package that aims to simplify Rem

Matthew Tralka 4 Aug 11, 2022
finds grocery stores and stuff next to route (gpx)

Route-Report Route report is a command-line utility that can be used to locate points-of-interest near your planned route (gpx). The results are based

Clemens Mosig 5 Oct 10, 2022
Make sankey, alluvial and sankey bump plots in ggplot

The goal of ggsankey is to make beautiful sankey, alluvial and sankey bump plots in ggplot2

David Sjoberg 156 Jan 03, 2023
Data visualization electromagnetic spectrum

Datenvisualisierung-Elektromagnetischen-Spektrum Anhand des Moduls matplotlib sollen die Daten des elektromagnetischen Spektrums dargestellt werden. D

Pulsar 1 Sep 01, 2022