Full-featured Decision Trees and Random Forests learner.

Overview

CID3

Latest Release License Github All Releases GitHub Follow Twitter Follow

This is a full-featured Decision Trees and Random Forests learner. It can save trees or forests to disk for later use. It is possible to query trees and Random Forests and to fill out an unlabeled file with the predicted classes. Documentation is not yet available, although the program options can be shown with command:

% java -jar cid3.jar -h

usage: java -jar cid3.jar
 -a,--analysis <name>    show causal analysis report
 -c,--criteria <name>    input criteria: c[Certainty], e[Entropy], g[Gini]
 -f,--file <name>        input file
 -h,--help               print this message
 -o,--output <name>      output file
 -p,--partition          partition train/test data
 -q,--query <type>       query model, enter: t[Tree] or r[Random forest]
 -r,--forest <amount>    create random forest, enter # of trees
 -s,--save               save tree/random forest
 -t,--threads <amount>   maximum number of threads (default is 500)
 -v,--validation         create 10-fold cross-validation
 -ver,--version          version

List of features

  • It uses a new Certainty formula as splitting criteria.
  • Provides causal analysis report, which shows how some attribute values cause a particular classification.
  • Creates full trees, showing error rates for train and test data, attribute importance, causes and false positives/negatives.
  • If no test data is provided, it can split the train dataset in 80% for training and 20% for testing.
  • Creates random forests, showing error rates for train and test data, attribute importance, causes and false positives/negatives. Random forests are created in parallel, so it is very fast.
  • Creates 10 Fold Cross-Validation for trees and random forests, showing error rates, mean and Standard Error and false positives/negatives. Cross-Validation folds are created in parallel.
  • Saves trees and random forests to disk in a compressed file. (E.g. model.tree, model.forest)
  • Query trees and random forest from saved files. Queries can contain missing values, just enter the character: “?”.
  • Make predictions and fill out cases files with those predictions, either from single trees or random forests.
  • Missing values imputation for train and test data is implemented. Continuous attributes are imputed as the mean value. Discrete attributes are imputed as MODE, which selects the value that is most frequent.
  • Ignoring attributes is implemented. In the .names file just set the attribute type as: ignore.
  • Three different splitting criteria can be used: Certainty, Entropy and Gini. If no criteria is invoked then Certainty will be used.

Example run with titanic dataset

[email protected] datasets % java -jar cid3.jar -f titanic

CID3 [Version 1.1]              Saturday October 30, 2021 06:34:11 AM
------------------
[ ✓ ] Read data: 891 cases for training. (10 attributes)
[ ✓ ] Decision tree created.

Rules: 276
Nodes: 514

Importance Cause   Attribute Name
---------- -----   --------------
      0.57   yes ············ Sex
      0.36   yes ········· Pclass
      0.30   yes ··········· Fare
      0.28   yes ······· Embarked
      0.27   yes ·········· SibSp
      0.26   yes ·········· Parch
      0.23    no ············ Age


[==== TRAIN DATA ====] 

Correct guesses:  875
Incorrect guesses: 16 (1.8%)

# Of Cases  False Pos  False Neg   Class
----------  ---------  ---------   -----
       549         14          2 ····· 0
       342          2         14 ····· 1

Time: 0:00:00

Requirements

CID3 requires JDK 15 or higher.

The data format is similar to that of C4.5 and C5.0. The data file format is CSV, and it could be split in two separated files, like: titanic.data and titanic.test. The class attribute column must be the last column of the file. The other necessary file is the "names" file, which should be named like: titanic.names, and it contains the names and types of the attributes. The first line is the class attribute possible values. This line could be left empty with just a dot(.) Below is an example of the titanic.names file:

0,1.  
PassengerId: ignore.  
Pclass: 1,2,3.  
Sex : male,female.  
Age: continuous.  
SibSp: discrete.  
Parch: discrete.  
Ticket: ignore.  
Fare: continuous.  
Cabin: ignore.  
Embarked: discrete.  

Example of causal analysis

% java -jar cid3.jar -f adult -a education

From this example we can see that attribute "education" is a cause, which is based on the certainty-raising inequality. Once we know that it is a cause we then compare the causal certainties of its values. When it's value is "Doctorate" it causes the earnings to be greater than $50,000, with a probability of 0.73. A paper will soon be published with all the formulas used to calculate the Certainty for splitting the nodes and the certainty-raising inequality, used for causal analysis.

Importance Cause   Attribute Name
---------- -----   --------------
      0.56   yes ······ education

Report of causal certainties
----------------------------

[ Attribute: education ]

    1st-4th --> <=50K  (0.97)

    5th-6th --> <=50K  (0.95)

    7th-8th --> <=50K  (0.94)

    9th --> <=50K  (0.95)

    10th --> <=50K  (0.94)

    11th --> <=50K  (0.95)

    12th --> <=50K  (0.93)

    Assoc-acdm --> <=50K  (0.74)

    Assoc-voc --> <=50K  (0.75)

    Bachelors --> Non cause.

    Doctorate --> >50K  (0.73)

    HS-grad --> <=50K  (0.84)

    Masters --> >50K  (0.55)

    Preschool --> <=50K  (0.99)

    Prof-school --> >50K  (0.74)

    Some-college --> <=50K  (0.81)
You might also like...
Full body anonymization - Realistic Full-Body Anonymization with Surface-Guided GANs
Full body anonymization - Realistic Full-Body Anonymization with Surface-Guided GANs

Realistic Full-Body Anonymization with Surface-Guided GANs This is the official

Random-Afg - Afghanistan Random Old Idz Cloner Tools
Random-Afg - Afghanistan Random Old Idz Cloner Tools

AFGHANISTAN RANDOM OLD IDZ CLONER TOOLS Install $ apt update $ apt upgrade $ apt

ElegantRL is featured with lightweight, efficient and stable, for researchers and practitioners.
ElegantRL is featured with lightweight, efficient and stable, for researchers and practitioners.

Lightweight, efficient and stable implementations of deep reinforcement learning algorithms using PyTorch. 🔥

This program writes christmas wish programmatically. It is using turtle as a pen pointer draw christmas trees and stars.

Introduction This is a simple program is written in python and turtle library. The objective of this program is to wish merry Christmas programmatical

Simulate genealogical trees and genomic sequence data using population genetic models

msprime msprime is a population genetics simulator based on tskit. Msprime can simulate random ancestral histories for a sample of individuals (consis

TreeSubstitutionCipher - Encryption system based on trees and substitution

Tree Substitution Cipher Generation Algorithm: Generate random tree. Tree nodes

Python implementation of cover trees, near-drop-in replacement for scipy.spatial.kdtree

This is a Python implementation of cover trees, a data structure for finding nearest neighbors in a general metric space (e.g., a 3D box with periodic

Code for: Gradient-based Hierarchical Clustering using Continuous Representations of Trees in Hyperbolic Space. Nicholas Monath, Manzil Zaheer, Daniel Silva, Andrew McCallum, Amr Ahmed. KDD 2019.

gHHC Code for: Gradient-based Hierarchical Clustering using Continuous Representations of Trees in Hyperbolic Space. Nicholas Monath, Manzil Zaheer, D

A python library to build Model Trees with Linear Models at the leaves.
A python library to build Model Trees with Linear Models at the leaves.

A python library to build Model Trees with Linear Models at the leaves.

Releases(v1.2.4)
Owner
Alejandro Penate-Diaz
Machine learner, web developer, scientist and photo edition enthusiast.
Alejandro Penate-Diaz
Domain Adaptation with Invariant RepresentationLearning: What Transformations to Learn?

Domain Adaptation with Invariant RepresentationLearning: What Transformations to Learn? Repository Structure: DSAN |└───amazon |    └── dataset (Amazo

DMIRLAB 17 Jan 04, 2023
Source code for the Paper: CombOptNet: Fit the Right NP-Hard Problem by Learning Integer Programming Constraints}

CombOptNet: Fit the Right NP-Hard Problem by Learning Integer Programming Constraints Installation Run pipenv install (at your own risk with --skip-lo

Autonomous Learning Group 65 Dec 27, 2022
Official PyTorch implementation of Synergies Between Affordance and Geometry: 6-DoF Grasp Detection via Implicit Representations

Synergies Between Affordance and Geometry: 6-DoF Grasp Detection via Implicit Representations Zhenyu Jiang, Yifeng Zhu, Maxwell Svetlik, Kuan Fang, Yu

UT-Austin Robot Perception and Learning Lab 63 Jan 03, 2023
Reproduction process of AlexNet

PaddlePaddle论文复现杂谈 背景 注:该repo基于PaddlePaddle,对AlexNet进行复现。时间仓促,难免有所疏漏,如果问题或者想法,欢迎随时提issue一块交流。 飞桨论文复现赛地址:https://aistudio.baidu.com/aistudio/competitio

19 Nov 29, 2022
YOLOv5 🚀 is a family of object detection architectures and models pretrained on the COCO dataset

YOLOv5 🚀 is a family of object detection architectures and models pretrained on the COCO dataset, and represents Ultralytics open-source research int

阿才 73 Dec 16, 2022
Bio-Computing Platform Featuring Large-Scale Representation Learning and Multi-Task Deep Learning “螺旋桨”生物计算工具集

English | 简体中文 Latest News 2021.10.25 Paper "Docking-based Virtual Screening with Multi-Task Learning" is accepted by BIBM 2021. 2021.07.29 PaddleHeli

633 Jan 04, 2023
ZEBRA: Zero Evidence Biometric Recognition Assessment

ZEBRA: Zero Evidence Biometric Recognition Assessment license: LGPLv3 - please reference our paper version: 2020-06-11 author: Andreas Nautsch (EURECO

Voice Privacy Challenge 2 Dec 12, 2021
Mask-invariant Face Recognition through Template-level Knowledge Distillation

Mask-invariant Face Recognition through Template-level Knowledge Distillation This is the official repository of "Mask-invariant Face Recognition thro

Fadi Boutros 35 Dec 06, 2022
Nb workflows - A workflow platform which allows you to run parameterized notebooks programmatically

NB Workflows Description If SQL is a lingua franca for querying data, Jupyter sh

Xavier Petit 6 Aug 18, 2022
Awesome Weak-Shot Learning

Awesome Weak-Shot Learning In weak-shot learning, all categories are split into non-overlapped base categories and novel categories, in which base cat

BCMI 162 Dec 30, 2022
Neural network graphs and training metrics for PyTorch, Tensorflow, and Keras.

HiddenLayer A lightweight library for neural network graphs and training metrics for PyTorch, Tensorflow, and Keras. HiddenLayer is simple, easy to ex

Waleed 1.7k Dec 31, 2022
基于PaddleClas实现垃圾分类,并转换为inference格式用PaddleHub服务端部署

百度网盘链接及提取码: 链接:https://pan.baidu.com/s/1HKpgakNx1hNlOuZJuW6T1w 提取码:wylx 一个垃圾分类项目带你玩转飞桨多个产品(1) 基于PaddleClas实现垃圾分类,导出inference模型并利用PaddleHub Serving进行服务

thomas-yanxin 22 Jul 12, 2022
A strongly-typed genetic programming framework for Python

monkeys "If an army of monkeys were strumming on typewriters they might write all the books in the British Museum." monkeys is a framework designed to

H. Chase Stevens 115 Nov 27, 2022
A PyTorch Implementation of FaceBoxes

FaceBoxes in PyTorch By Zisian Wong, Shifeng Zhang A PyTorch implementation of FaceBoxes: A CPU Real-time Face Detector with High Accuracy. The offici

Zi Sian Wong 797 Dec 17, 2022
Selecting Parallel In-domain Sentences for Neural Machine Translation Using Monolingual Texts

DataSelection-NMT Selecting Parallel In-domain Sentences for Neural Machine Translation Using Monolingual Texts Quick update: The paper got accepted o

Javad Pourmostafa 6 Jan 07, 2023
In the case of your data having only 1 channel while want to use timm models

timm_custom Description In the case of your data having only 1 channel while want to use timm models (with or without pretrained weights), run the fol

2 Nov 26, 2021
Users can free try their models on SIDD dataset based on this code

SIDD benchmark 1 Train python train.py If you want to train your network, just modify the yaml in the options folder. 2 Validation python validation.p

Yuzhi ZHAO 2 May 20, 2022
SpineAI Bilsky Grading With Python

SpineAI-Bilsky-Grading SpineAI Paper with Code 📫 Contact Address correspondence to J.T.P.D.H. (e-mail: james_hallinan AT nuhs.edu.sg) Disclaimer This

<a href=[email protected]"> 2 Dec 16, 2021
Accurate Phylogenetic Inference with Symmetry-Preserving Neural Networks

Accurate Phylogenetic Inference with a Symmetry-preserving Neural Network Model Claudia Solis-Lemus Shengwen Yang Leonardo Zepeda-Núñez This repositor

Leonardo Zepeda-Núñez 2 Feb 11, 2022
Densely Connected Search Space for More Flexible Neural Architecture Search (CVPR2020)

DenseNAS The code of the CVPR2020 paper Densely Connected Search Space for More Flexible Neural Architecture Search. Neural architecture search (NAS)

Jamin Fong 291 Nov 18, 2022