The audio-video synchronization of MKV Container Format is exploited to achieve data hiding

Overview

1.0 Data Hiding in MKV Container Format

1.1 Brief Description

The audio-video synchronization of MKV Container Format is exploited to achieve data hiding, where the hidden data can be utilized for various management purposes, including hyper-linking, annotation, and authentication

1.2 Video Demonstration @ YouTube

Data Hiding (Hidden Watermark) in MKV Container Format

1.3 Requirements

  • Linux (not tested anywhere else)
  • Python
  • .MKV reader (like VLC player)
  • All the files are required:
    • .MKV video (./VideoForTesting/2mb.mkv)
    • ./convert_xml2mkv.py
    • ./parse_and_convert_mkv2xml.py
    • ./find_data.py
    • ./hide_data.py
    • ./find
    • ./hide
  • Ensure that you have all the permission to access these files. Run the following command: chmod +x convert_xml2mkv.py && chmod +x find_data.py && chmod +x hide_data.py && chmod +x parse_and_convert_mkv2xml.py
  • If the command above doesn't work and Linux prevents your access you may use the following command on any of the affected files: chmod +x filename.extension

1.4 How To Run Data Embedding Process

Note: for screenshots refer to the end of the ./Maxim_Zaika_Data_Hiding_in_MKV_Container.pdf file

  1. Ensure 1.3 Requirements are fulfilled
  2. Run ./hide from your terminal within the folder where files are located.
  3. Enter the name of the .MKV container: 2mb.mkv.
  4. Enter the data that needs to be hidden: 'example'. Write it down!
  5. Enter the SECRET KEY that will be used to decrypt your data in the data detecting process: 'encryption key'. Write it down!
  6. Enter the timecode where data will be saved to: 10.523 or type 'help' to display all the available timecodes. Write it down!
  7. File modified_mkv.mkv should now be created that stores your hidden data.

Note: do not lose text of the hidden data, SECRET KEY, and the timecode. Otherwise, you won't be able to verify it later.

1.5 How To Run Data Detecting Process

  1. Ensure 1.3 Requirements are fulfilled
  2. Run ./find from your terminal within the folder where files are located.
  3. Enter the file name: modified_mkv.mkv.
  4. Enter the text of your hidden data: 'example'.
  5. Enter the SECRET KEY used: 'encryption key'.
  6. Enter the timecode used: 10.523.
  7. If the data is matching then it will show a success.

2.0 Data Embedding Process

2.1 Software Architecture of Data Embedding

DataEmbeddingDesign

2.2 Data Embedding Design

DataEmbeddingDesign

2.3 Data Embedding Pseudocode

Note: this is incomplete representation.

Function main {
  Set a_word -> “word that needs to be written in”
  Set encryption_key -> “key used for the encryption”
  If (length of encryption_key) < (length of a_word) {
	  Set encryption_key -> same length as a_word
  }
  Set a_word -> convert to ascii
  Set encryption_key -> convert to ascii
  Set ascii_a_word -> convert to hexadecimal
  Set ascii_encryption_key -> convert to hexadecimal
  If (length of ascii_encryption_key) < (length of ascii_a_word) { 
	  Set ascii_encryption_key = -> same length as ascii_a_word
  }
  Encrypt a_word(ascii_a_word, ascii_encryption_key, a_word) // encrypt ascii word
                                                             // using original word 
  Convert encrypted word to hexadecimal // because MKV parser accepts hexadecimals
                                        // inside the cluster’s timecode
  Timecodes = [] // read the XML file and identify the timecodes
  Set input_timecode -> “input timecode here”
  Call function embed data (filename, input_timecode, encrypted_word_in_hexadecimal_format)
}

Function embed data {
	Loop through the file {
		Identify the location of the timecode {
			Identify the location of the data inside the cluster’s timecode {
				Write-in the data
			}
		} else not found timecode {
			Try again
		}
	}
}

3.0 Data Detecting Process

3.1 Software Architecture of Data Detecting

DataEmbeddingDesign

3.2 Data Detecting Design

DataEmbeddingDesign

3.3 Data Embedding Pseudocode

Note: this is incomplete representation.

Function detect data {
	Set hexadecimal_word -> ‘the encrypted word’ \\ basically the identical process like in data 
						                                    \\ hiding process
	Loop through the file {
		Loop each line of the file {
			Identify the location of the timecode {
				Identify the data inside the cluster’s timecode {
					Read through the line ignoring first 6 characters // format
				}
				If there is at least 1 miss-match {
					Return error
				} else fully matched {
					Return success
				}
			}
		}
	}
}

4.0 Results

Description Explanation
Limited Number of Cluster's Timecodes Modifying more than two cluster’s timecodes cause slight video distortion; however, modifying even more timecodes causes both video and audio distortions.
Embedding Capacity Passed test of up to 2,500 characters. Assumption is that 2,500 characters should be more than enough for the user.
File Size Increment Original file: 2.1 MB (2,097,641 bytes) -> Modified File (2,500 characters): 2.1 MB (2,122,058 bytes). Increased by 23,417 bytes (1.00%).

5.0 Additional Information

For more information (like testing and background information), refer to the .PDF file attached to this repository: ./Maxim_Zaika_Data_Hiding_in_MKV_Container.pdf

6.0 Credits

It would not be possible to complete this project without MKV > XML > MKV parser created by Vitaly "_Vi" Shukela: https://github.com/vi/mkvparse.

Parser is rewritten for my own needs (for better understanding) and included in this repository to ensure that there is no mismatch with Vitaly's version. If you are interested in the parser, please, refer to his repository provided above. I do not take any credit for its creation.

Owner
Maxim Zaika
Maxim Zaika
Car Parking Tracker Using OpenCv

Car Parking Vacancy Tracker Using OpenCv I used basic image processing methods i

Adwait Kelkar 30 Dec 03, 2022
Anomaly detection in multi-agent trajectories: Code for training, evaluation and the OpenAI highway simulation.

Anomaly Detection in Multi-Agent Trajectories for Automated Driving This is the official project page including the paper, code, simulation, baseline

12 Dec 02, 2022
Example scripts for the detection of lanes using the ultra fast lane detection model in Tensorflow Lite.

TFlite Ultra Fast Lane Detection Inference Example scripts for the detection of lanes using the ultra fast lane detection model in Tensorflow Lite. So

Ibai Gorordo 12 Aug 27, 2022
Multiview Dataset Toolkit

Multiview Dataset Toolkit Using multi-view cameras is a natural way to obtain a complete point cloud. However, there is to date only one multi-view 3D

11 Dec 22, 2022
"Graph Neural Controlled Differential Equations for Traffic Forecasting", AAAI 2022

Graph Neural Controlled Differential Equations for Traffic Forecasting Setup Python environment for STG-NCDE Install python environment $ conda env cr

Jeongwhan Choi 55 Dec 28, 2022
Source code of our work: "Benchmarking Deep Models for Salient Object Detection"

SALOD Source code of our work: "Benchmarking Deep Models for Salient Object Detection". In this works, we propose a new benchmark for SALient Object D

22 Dec 30, 2022
Revitalizing CNN Attention via Transformers in Self-Supervised Visual Representation Learning

Revitalizing CNN Attention via Transformers in Self-Supervised Visual Representation Learning This repository is the official implementation of CARE.

ChongjianGE 89 Dec 02, 2022
MM1 and MMC Queue Simulation using python - Results and parameters in excel and csv files

implementation of MM1 and MMC Queue on randomly generated data and evaluate simulation results then compare with analytical results and draw a plot curve for them, simulate some integrals and compare

Mohamadreza Rezaei 1 Jan 19, 2022
Repo for "Event-Stream Representation for Human Gaits Identification Using Deep Neural Networks"

Summary This is the code for the paper Event-Stream Representation for Human Gaits Identification Using Deep Neural Networks by Yanxiang Wang, Xian Zh

zhangxian 54 Jan 03, 2023
Official Pytorch implementation of Online Continual Learning on Class Incremental Blurry Task Configuration with Anytime Inference (ICLR 2022)

The Official Implementation of CLIB (Continual Learning for i-Blurry) Online Continual Learning on Class Incremental Blurry Task Configuration with An

NAVER AI 34 Oct 26, 2022
A Next Generation ConvNet by FaceBookResearch Implementation in PyTorch(Original) and TensorFlow.

ConvNeXt A Next Generation ConvNet by FaceBookResearch Implementation in PyTorch(Original) and TensorFlow. A FacebookResearch Implementation on A Conv

Raghvender 2 Feb 14, 2022
Code for the CVPR 2021 paper: Understanding Failures of Deep Networks via Robust Feature Extraction

Welcome to Barlow Barlow is a tool for identifying the failure modes for a given neural network. To achieve this, Barlow first creates a group of imag

Sahil Singla 33 Dec 05, 2022
A Comprehensive Study on Learning-Based PE Malware Family Classification Methods

A Comprehensive Study on Learning-Based PE Malware Family Classification Methods Datasets Because of copyright issues, both the MalwareBazaar dataset

8 Oct 21, 2022
Acoustic mosquito detection code with Bayesian Neural Networks

HumBugDB Acoustic mosquito detection with Bayesian Neural Networks. Extract audio or features from our large-scale dataset on Zenodo. This repository

31 Nov 28, 2022
Codebase to experiment with a hybrid Transformer that combines conditional sequence generation with regression

Regression Transformer Codebase to experiment with a hybrid Transformer that combines conditional sequence generation with regression . Development se

International Business Machines 27 Jan 05, 2023
Official repository of the paper 'Essentials for Class Incremental Learning'

Essentials for Class Incremental Learning Official repository of the paper 'Essentials for Class Incremental Learning' This Pytorch repository contain

33 Nov 27, 2022
Defending against Model Stealing via Verifying Embedded External Features

Defending against Model Stealing Attacks via Verifying Embedded External Features This is the official implementation of our paper Defending against M

20 Dec 30, 2022
Using pretrained language models for biomedical knowledge graph completion.

LMs for biomedical KG completion This repository contains code to run the experiments described in: Scientific Language Models for Biomedical Knowledg

Rahul Nadkarni 41 Nov 30, 2022
Code for paper "Learning to Reweight Examples for Robust Deep Learning"

learning-to-reweight-examples Code for paper Learning to Reweight Examples for Robust Deep Learning. [arxiv] Environment We tested the code on tensorf

Uber Research 261 Jan 01, 2023
Py-FEAT: Python Facial Expression Analysis Toolbox

Py-FEAT is a suite for facial expressions (FEX) research written in Python. This package includes tools to detect faces, extract emotional facial expressions (e.g., happiness, sadness, anger), facial

Computational Social Affective Neuroscience Laboratory 147 Jan 06, 2023