Searches through git repositories for high entropy strings and secrets, digging deep into commit history

Overview

truffleHog

codecov

Searches through git repositories for secrets, digging deep into commit history and branches. This is effective at finding secrets accidentally committed.

Join The Slack

Have questions? Feedback? Jump in slack and hang out with me

https://join.slack.com/t/trufflehog-community/shared_invite/zt-pw2qbi43-Aa86hkiimstfdKH9UCpPzQ

NEW

truffleHog previously functioned by running entropy checks on git diffs. This functionality still exists, but high signal regex checks have been added, and the ability to suppress entropy checking has also been added.

truffleHog --regex --entropy=False https://github.com/dxa4481/truffleHog.git

or

truffleHog file:///user/dxa4481/codeprojects/truffleHog/

With the --include_paths and --exclude_paths options, it is also possible to limit scanning to a subset of objects in the Git history by defining regular expressions (one per line) in a file to match the targeted object paths. To illustrate, see the example include and exclude files below:

include-patterns.txt:

src/
# lines beginning with "#" are treated as comments and are ignored
gradle/
# regexes must match the entire path, but can use python's regex syntax for
# case-insensitive matching and other advanced options
(?i).*\.(properties|conf|ini|txt|y(a)?ml)$
(.*/)?id_[rd]sa$

exclude-patterns.txt:

(.*/)?\.classpath$
.*\.jmx$
(.*/)?test/(.*/)?resources/

These filter files could then be applied by:

trufflehog --include_paths include-patterns.txt --exclude_paths exclude-patterns.txt file://path/to/my/repo.git

With these filters, issues found in files in the root-level src directory would be reported, unless they had the .classpath or .jmx extension, or if they were found in the src/test/dev/resources/ directory, for example. Additional usage information is provided when calling trufflehog with the -h or --help options.

These features help cut down on noise, and makes the tool easier to shove into a devops pipeline.

Example

Install

pip install truffleHog

Customizing

Custom regexes can be added with the following flag --rules /path/to/rules. This should be a json file of the following format:

{
    "RSA private key": "-----BEGIN EC PRIVATE KEY-----"
}

Things like subdomain enumeration, s3 bucket detection, and other useful regexes highly custom to the situation can be added.

Feel free to also contribute high signal regexes upstream that you think will benefit the community. Things like Azure keys, Twilio keys, Google Compute keys, are welcome, provided a high signal regex can be constructed.

trufflehog's base rule set sources from https://github.com/dxa4481/truffleHogRegexes/blob/master/truffleHogRegexes/regexes.json

To explicitly allow particular secrets (e.g. self-signed keys used only for local testing) you can provide an allow list --allow /path/to/allow in the following format:

{
    "local self signed test key": "-----BEGIN EC PRIVATE KEY-----\nfoobar123\n-----END EC PRIVATE KEY-----",
    "git cherry pick SHAs": "regex:Cherry picked from .*",
}

Note that values beginning with regex: will be used as regular expressions. Values without this will be literal, with some automatic conversions (e.g. flexible newlines).

How it works

This module will go through the entire commit history of each branch, and check each diff from each commit, and check for secrets. This is both by regex and by entropy. For entropy checks, truffleHog will evaluate the shannon entropy for both the base64 char set and hexidecimal char set for every blob of text greater than 20 characters comprised of those character sets in each diff. If at any point a high entropy string >20 characters is detected, it will print to the screen.

Help

usage: trufflehog [-h] [--json] [--regex] [--rules RULES] [--allow ALLOW]
                  [--entropy DO_ENTROPY] [--since_commit SINCE_COMMIT]
                  [--max_depth MAX_DEPTH]
                  git_url

Find secrets hidden in the depths of git.

positional arguments:
  git_url               URL for secret searching

optional arguments:
  -h, --help            show this help message and exit
  --json                Output in JSON
  --regex               Enable high signal regex checks
  --rules RULES         Ignore default regexes and source from json list file
  --allow ALLOW         Explicitly allow regexes from json list file
  --entropy DO_ENTROPY  Enable entropy checks
  --since_commit SINCE_COMMIT
                        Only scan from a given commit hash
  --branch BRANCH       Scans only the selected branch
  --max_depth MAX_DEPTH
                        The max commit depth to go back when searching for
                        secrets
  -i INCLUDE_PATHS_FILE, --include_paths INCLUDE_PATHS_FILE
                        File with regular expressions (one per line), at least
                        one of which must match a Git object path in order for
                        it to be scanned; lines starting with "#" are treated
                        as comments and are ignored. If empty or not provided
                        (default), all Git object paths are included unless
                        otherwise excluded via the --exclude_paths option.
  -x EXCLUDE_PATHS_FILE, --exclude_paths EXCLUDE_PATHS_FILE
                        File with regular expressions (one per line), none of
                        which may match a Git object path in order for it to
                        be scanned; lines starting with "#" are treated as
                        comments and are ignored. If empty or not provided
                        (default), no Git object paths are excluded unless
                        effectively excluded via the --include_paths option.

Running with Docker

First, enter the directory containing the git repository

cd /path/to/git

To launch the trufflehog with the docker image, run the following"

docker run --rm -v "$(pwd):/proj" dxa4481/trufflehog file:///proj

-v mounts the current working dir (pwd) to the /proj dir in the Docker container

file:///proj references that very same /proj dir in the container (which is also set as the default working dir in the Dockerfile)

Wishlist

  • A way to detect and not scan binary diffs
  • Don't rescan diffs if already looked at in another branch
  • A since commit X feature
  • Print the file affected
Comments
  • fix #8 - add `--include` and `--exclude` options

    fix #8 - add `--include` and `--exclude` options

    Fixes issue #8 by adding --include_paths and --exclude_paths options that allow the user to limit scanning to a subset of objects in the Git history by defining regular expressions (one per line) in a file to match the targeted object paths.

    If provided, the --include_paths option should point to a file with regular expressions (one per line), at least one of which must match a Git object path in order for it to be scanned. If empty or not provided (default), all Git object paths are included (unless otherwise excluded via the --exclude_paths option).

    Likewise, the --exclude_paths option, when provided, should point to a file with regular expressions, none of which may match a Git object path in order for it to be scanned. If empty or not provided (default), no Git object paths are excluded (unless effectively excluded via the --include_paths option).

    In either file, lines starting with "#" are treated as comments and are ignored.

    opened by milo-minderbinder 22
  • fix --since_commit parameter

    fix --since_commit parameter

    Hi, how can I contribute to this project? I was running truffleHog and using the --since_commit parameter, however it was buggy and did not work as expected. I made a very small change, and it worked as expected. Do you accept PRs or should I just tell you the change so you can verify it?

    opened by fahrishb 18
  • The regex functionality is not working as expected

    The regex functionality is not working as expected

    I git cloned the truffleHog repository. Changed my regexChecks.py file to look like below:

    import re
    
    regexes = {
        "Slack Token XOXP": re.compile('xoxp.*'),
        "Slack Token XOXB": re.compile('xoxb.*'),
        "Slack Token XOXO": re.compile('xoxo.*'),
        "Slack Token XOXA": re.compile('xoxa.*'),
        "AWS API Key": re.compile('AKIA.*'),
        "Private key": re.compile('-----BEGIN PRIVATE KEY-----.*')
    }
    

    I then installed the libraries required to run the tool by typing pip install -r requirements.txt. My requirements.txt file looked like below:

    GitPython==2.1.5
    gitdb2==2.0.2
    smmap2==2.0.2
    

    Finally, I ran the tool by typing - python truffleHog.py --regex --entropy=False https://github.com/secretuser1/secretrepo.git

    It printed out the Private Key, Slack Token XOXP and Slack Token XOXB. It should have also printed out the AWS key here - https://github.com/secretuser1/secretrepo/blob/master/secretfile.txt#L2 but it did not, even though the regex is present.

    Any idea why?

    opened by anshumanbh 14
  • Adding the capability for scanning a directory

    Adding the capability for scanning a directory

    This PR adds the capability for truffleHog to recursively scan a directory instead of a Git repository with all its history. This can be useful in CI pipelines or other situations where it is desirable to scan the codebase at a single point in time. Additionally, it can also be used to scan code that is not stored in Git.

    I've done some minor refactoring to the existing scanning code to reduce code duplication.

    opened by runako 14
  • ValueError: unknown reasons (During run application on EC2 RedHat)

    ValueError: unknown reasons (During run application on EC2 RedHat)

    If anyone can help I will be appreciate! Describe the bug Having an error during running the app on EC2 RedHat : ValueError: unknown reasons

    I installed on redhat ec2 instance trufflehog. By default there python 2.7 and 3.6 Trufflehog was installed from pip3. pip3 freeze shows that everything installed : gitdb==4.0.5 gitdb2==4.0.2 GitPython==3.0.6 smmap==3.0.5 truffleHog==2.2.1 truffleHogRegexes==0.0.7

    After installation and running next command ( just to check does it work or not) trufflehog --regex --entropy=False https://github.com/dxa4481/truffleHog.git ( I got next error) : Traceback (most recent call last): File "/usr/local/bin/trufflehog", line 11, in sys.exit(main()) File "/usr/local/lib64/python3.6/site-packages/truffleHog/truffleHog.py", line 93, in main surpress_output=False, branch=args.branch, repo_path=args.repo_path, path_inclusions=path_inclusions, path_exclusions=path_exclusions, allow=allow) File "/usr/local/lib64/python3.6/site-packages/truffleHog/truffleHog.py", line 351, in find_strings diff_hash = hashlib.md5((str(prev_commit) + str(curr_commit)).encode('utf-8')).digest() ValueError: unknown reasons

    opened by RepositoryOfCode 12
  •  Git issue when trying to scan cloned project in Apple-mac: trufflehog file:///

    Git issue when trying to scan cloned project in Apple-mac: trufflehog file:///

    Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/3.6/bin/trufflehog", line 10, in sys.exit(main()) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/truffleHog/truffleHog.py", line 82, in main surpress_output=False, branch=args.branch, repo_path=args.repo_path, path_inclusions=path_inclusions, path_exclusions=path_exclusions) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/truffleHog/truffleHog.py", line 309, in find_strings project_path = clone_git_repo(git_url) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/truffleHog/truffleHog.py", line 152, in clone_git_repo Repo.clone_from(git_url, project_path) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/git/repo/base.py", line 925, in clone_from return cls._clone(git, url, to_path, GitCmdObjectDB, progress, **kwargs) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/git/repo/base.py", line 880, in _clone finalize_process(proc, stderr=stderr) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/git/util.py", line 341, in finalize_process proc.wait(**kwargs) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/git/cmd.py", line 291, in wait raise GitCommandError(self.args, status, errstr) git.exc.GitCommandError: Cmd('git') failed due to: exit code(128) cmdline: git clone -v file:///GitHub/Github-guardian/ /var/folders/37/md_m401d073bw1vt7q863pbw0000gn/T/tmpchgjs_ln stderr: 'Cloning into '/var/folders/37/md_m401d073bw1vt7q863pbw0000gn/T/tmpchgjs_ln'... fatal: '/GitHub/Github-guardian/' does not appear to be a git repository fatal: Could not read from remote repository.

    Please make sure you have the correct access rights and the repository exists. '

    opened by dgurazada 12
  • Depth limits are needed to prevent long jobs

    Depth limits are needed to prevent long jobs

    When leveraging trufflehog for repo scans, it would be helpful to introduce the concept of depth limits, to ensure that when a scan is performed, it only goes to a certain number of commits back. On a test repository that I have, there is a huge number of commits dating back to 2014, and the job is running well more than 24 hours to go deep across all of them.

    opened by dend 11
  • WindowsError: [Error 5] Access is denied

    WindowsError: [Error 5] Access is denied

    Traceback (most recent call last): File "trufflehog.py", line 106, in <module> find_strings(args.git_url) File "trufflehog.py", line 98, in find_strings shutil.rmtree(project_path) File "C:\Python27\lib\shutil.py", line 247, in rmtree rmtree(fullname, ignore_errors, onerror) File "C:\Python27\lib\shutil.py", line 247, in rmtree rmtree(fullname, ignore_errors, onerror) File "C:\Python27\lib\shutil.py", line 247, in rmtree rmtree(fullname, ignore_errors, onerror) File "C:\Python27\lib\shutil.py", line 252, in rmtree onerror(os.remove, fullname, sys.exc_info()) File "C:\Python27\lib\shutil.py", line 250, in rmtree os.remove(fullname) WindowsError: [Error 5] Access is denied: 'temp\\[uuid]\\.git\\objects\\pack\\pack-[uuid].idx'

    When scanning some repos. (This one crashes half way through, This one crashes at startup)

    opened by Peter-Maguire 11
  • i cant see the result

    i cant see the result

    1. See error

    {"level":"debug","msg":"Cloning remote Git repo without authentication","time":"2022-04-05T16:19:28Z"} {"level":"debug","msg":"Git repo local path: /tmp/trufflehog944564607","time":"2022-04-05T16:23:19Z"}

    2022/04/05 16:44:16 [updater parent] prog exited with 1

    I can see the result if found or note even if I use --json I cant see the saved file its always clone the repo in tmp folder after finish scanning it should delete the cloned folder in the tmp

    bug 
    opened by abramas 10
  • Hardcoded thresholds of 20 in get_strings_of_set()

    Hardcoded thresholds of 20 in get_strings_of_set()

    threshold keyword variable is declared and used on the last if statement Line 39, but not in the first else statement Line 35

    def get_strings_of_set(word, char_set, threshold=20):
        count = 0
        letters = ""
        strings = []
        for char in word:
            if char in char_set:
                letters += char
                count += 1
            else:
                if count > 20:
                    strings.append(letters)
                letters = ""
                count = 0
        if count > threshold:
            strings.append(letters)
    
    opened by bandrel 10
  • gitdb update breaks trufflehog

    gitdb update breaks trufflehog

    Probably related to #198

    We install inside a docker container using:

    $ pip install truffleHog==2.0.99
    

    We run:

    $ trufflehog --regex --entropy=False .
    

    Starting today this errored with:

    Traceback (most recent call last):
       File "/usr/local/bin/trufflehog", line 5, in <module>
         from truffleHog.truffleHog import main
       File "/usr/local/lib/python3.8/site-packages/truffleHog/truffleHog.py", line 17, in <module>
         from git import Repo
       File "/usr/local/lib/python3.8/site-packages/git/__init__.py", line 38, in <module>
         from git.config import GitConfigParser  # @NoMove @IgnorePep8
       File "/usr/local/lib/python3.8/site-packages/git/config.py", line 16, in <module>
         from git.compat import (
       File "/usr/local/lib/python3.8/site-packages/git/compat.py", line 16, in <module>
         from gitdb.utils.compat import (
     ModuleNotFoundError: No module named 'gitdb.utils.compat'
    

    A quick dive down the dependency tree showed that the trufflehog dependency on gitpython-2.1.1 (here) is pulling in gitdb2-3.0.2 (here) which has removed the gitdb.utils.compat (PR)

    Our fix for now is to use (may be useful to others):

    pip install gitdb2==3.0.0 truffleHog==2.0.99
    
    opened by danieldooley 9
  • Use access-token endpoint for validity check

    Use access-token endpoint for validity check

    This PR fixes the issue https://github.com/trufflesecurity/trufflehog/issues/990, it should correctly report keys as valid even if they are missing the user_read scope.

    opened by clonsdale-canva 1
  • Buildkite token validation missing tokens without user_read scope

    Buildkite token validation missing tokens without user_read scope

    Community Note

    • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
    • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
    • If you are interested in working on this issue or have submitted a pull request, please leave a comment

    TruffleHog Version

    3.21.0

    Expected Behavior

    Buildkite token is reported as valid

    Actual Behavior

    Buildkite token is not validated as the API call fails due to missing user_read scope

    Additional Context

    The logic to check if a buildkite token is valid will send out an API call to the /user endpoint https://github.com/trufflesecurity/trufflehog/blob/009756dce61948a66cf90a8b14018460c91ab4f0/pkg/detectors/buildkite/buildkite.go#L51. This will miss all tokens which do not have the read_user scope.

    Instead, we can use the access-token endpoint, which will return 200 for any valid token, and report on the scopes present / ID of the token - https://buildkite.com/docs/apis/rest-api/access-token.

    bug 
    opened by clonsdale-canva 0
  • Add max-depth limit to GitHub subcommand

    Add max-depth limit to GitHub subcommand

    Community Note

    • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
    • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
    • If you are interested in working on this issue or have submitted a pull request, please leave a comment

    Description

    Ability to limit the depth of the commit history being scanned for GitHub users We need the ability to set a --max-depth= limit to GitHub sub command.

    Problem to be Addressed

    It is very noisy for large GitHub enterprises to detect new issues due to the inability to ignore historical commit history that one has already remediated. Results have to be saved into a spreadsheet or database and then diff'd to see what has changed.

    Description of the Preferred Solution

    The ability to set a --max-depth= limit to GitHub sub command. This would be very beneficial when attempting to scan a GitHub enterprise repositories as a group.

    Additional Context

    References

    • #0000
    enhancement 
    opened by dwilliamsstc 0
  • go install - missing dot in first path element

    go install - missing dot in first path element

    build github.com/trufflesecurity/trufflehog/v3: cannot load embed: malformed module path "embed": missing dot in first path element

    Community Note

    • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
    • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
    • If you are interested in working on this issue or have submitted a pull request, please leave a comment

    TruffleHog Version

    Trace Output

    Expected Behavior

    Actual Behavior

    Steps to Reproduce

    Distributor ID: Elementary Description: elementary OS 6.1 Jólnir Release: 6.1 Codename: jolnir

    Additional Context

    References

    • #0000
    bug 
    opened by rip752 0
  • Run certain Detector Type

    Run certain Detector Type

    trufflehog version: trufflehog dev

    Currently I am running trufflehog as a pre-commit hook with all possible Detector type. Is it possible to only run few Detector types , say AWS keys, Private keys as such?

    bug 
    opened by Priyadhana 0
Releases(v3.21.0)
AIL LeakFeeder: A Module for AIL Framework that automate the process to feed leaked files automatically to AIL

AIL LeakFeeder: A Module for AIL Framework that automates the process to feed leaked files automatically to AIL, So basically this feeder will help you ingest AIL with your leaked files automatically

ail project 8 May 03, 2022
HatSploit collection of generic payloads designed to provide a wide range of attacks without having to spend time writing new ones.

HatSploit collection of generic payloads designed to provide a wide range of attacks without having to spend time writing new ones.

EntySec 5 May 10, 2022
Python implementation for CVE-2021-42278 (Active Directory Privilege Escalation)

Pachine Python implementation for CVE-2021-42278 (Active Directory Privilege Escalation). Installtion $ pip3 install impacket Usage Impacket v0.9.23 -

Oliver Lyak 250 Dec 31, 2022
Port scanning tool that uses Python3. Created by Noble Wilson

Hello There! My name is Noble Wilson and I am an aspiring IT/InfoSec coder practicing for my future. ________________________________________________

1 Nov 23, 2021
This is a multi-password‌ cracking tool that can help you hack facebook accounts very quickly

Pro_Crack Facebook Fast Cracking Tool This is a multi-password‌ cracking tool that can help you hack facebook accounts very quickly Installation On Te

•JINN• 1 Jan 16, 2022
xray多线程批量扫描工具

Auto_xray xray多线程批量扫描工具 简介 xray社区版貌似没有批量扫描,这就让安服仔使用起来很不方便,扫站得一个个手动添加,非常难受 Auto_xray目录下记得放xray,就跟平时一样的。 选项1:oneforall+xray 输入一个主域名,自动采集子域名然后添加到xray任务列表

1frame 13 Nov 09, 2022
Kunyu, more efficient corporate asset collection

Kunyu(坤舆) - More efficient corporate asset collection English | 中文文档 0x00 Introduce Tool introduction Kunyu (kunyu), whose name is taken from , is act

Knownsec, Inc. 772 Jan 05, 2023
Acc-Data-Gen - Allows you to generate a password, e-mail & token for your Minecraft Account

Acc-Data-Gen Allows you to generate a password, e-mail & token for your Minecraft Account How to use the generator: Move all the files in a single dir

KarmaBait 2 May 16, 2022
Script Crack Facebook Yang Kaya Akan Teh Hijau 🚶‍♂

r-mbf Script Crack Facebook 🚶‍♂ Bukti Recode [•] Install Script $ pkg update && pkg upgrade $ pkg install python $ pkg install git $ pip install requ

O'Hayo Smrn 3 Apr 02, 2022
IDA2Obj is a tool to implement SBI (Static Binary Instrumentation).

IDA2Obj IDA2Obj is a tool to implement SBI (Static Binary Instrumentation). The working flow is simple: Dump object files (COFF) directly from one exe

Mickey 94 Dec 13, 2022
open detection and scanning tool for discovering and fuzzing for Log4J RCE CVE-2021-44228 vulnerability

CVE-2021-44228-log4jVulnScanner-metasploit open detection and scanning tool for discovering and fuzzing for Log4J RCE CVE-2021-44228 vulnerability pre

Taroballz 7 Nov 09, 2022
md5 hash cracking with python.

Python-Md5-Cracker- md5 hash cracking with python. Original files added First create a file called word.txt then run the wordCreate.py script The task

Nebil Sharifi 0 Aug 31, 2022
Repository for a project of the course EP2520 Building Networked Systems Security

EP2520_ACME_Project Repository for a project of the course EP2520 Building Networked Systems Security in Royal Institute of Technology (KTH), Stockhol

1 Dec 11, 2021
The self-hostable proxy tunnel

TTUN Server The self-hostable proxy tunnel. Running Running: docker run -e TUNNEL_DOMAIN=Your tunnel domain -e SECURE=True if using SSL ghcr.io/to

Tom van der Lee 2 Jan 11, 2022
simple python keylogger

HELLogger simple python keylogger DISCLAIMERS: DON'T DO BAD THINGS. THIS PROGRAM IS MEANT FOR PERSONAL USES ONLY. USE IT ONLY IN COMPUTERS WHERE YOU H

Arya 10 Nov 10, 2022
CVE-2022-22965 : about spring core rce

CVE-2022-22965: Spring-Core-Rce EXP 特性: 漏洞探测(不写入 webshell,简单字符串输出) 自定义写入 webshell 文件名称及路径 不会追加写入到同一文件中,每次检测写入到不同名称 webshell 文件 支持写入 冰蝎 webshell 代理支持,可

东方有鱼名为咸 53 Nov 09, 2022
Sonoff NSPanel protocol and hacking information. Tasmota Berry driver for NSPanel

NSPanel Hacking Sonoff NSPanel protocol and hacking information and Tasmota Berry driver. NSPanel protocol manual Tasmota driver nspanel.be Installati

blakadder 98 Dec 26, 2022
DomainMonitor is a web project that has a RESTful API to get a domain's subdomains and whois data.

DomainMonitor is a web project that has a RESTful API to get a domain's subdomains and whois data.

2 Feb 05, 2022
Python decompiler for Python 1.5-2.4 (for historical archive)

This preserves the early code of a Python decompiler for Python versions 1.5 to 2.4. I have been able to install this using pyenv using Python 2.3.7 u

R. Bernstein 2 Jan 04, 2022
MSDorkDump is a Google Dork File Finder that queries a specified domain name and variety of file extensions

MSDorkDump is a Google Dork File Finder that queries a specified domain name and variety of file extensions (pdf, doc, docx, etc), and downloads them.

Joe Helle 150 Jan 03, 2023