Natural Language Processing - Sommer Semester 2022

Overview

Natural Language Processing (DIS25a/NLP)

This course can be taken for the Bachelor Programm Data and Information Science (DIS25a) or the Master Program Digital Sciences (NLP).

After easter all sessions are hosted at TH Köln, Claudiusstraße 1. The sessions will be held life. Slides will be usually available a night before the actual lecture. We try to record all lectures and tutorials for later referal (not sure how this works out with the sessions at Claudiusstraße).

Schedule for Summer Semester 2022

(L) Lectures; (T) Tutorials; (P) Project

The first lectures and tutorial were recorded and are available online. The password is the same as for the Zoom sessions.

Date Slot 13:30h Slot 15:15h DIS25a (DIS B.Sc.) NLP (DS M.Sc.)
1.4.2022 Introduction and Overview (L) Basic Text Processing (L) x x
8.4.2022 Basic NLP Pipeline: NLTK (T) (solution) Common Toolkit: Spacy (T) (solution) x x
15.4.2022 no lecture
22.4.2022 WordNet (L) Vector Semantics (L) x x
29.4.2022 WordNet, GermaNet (T) (solution) Vector Semantics (T) (solution) x x
6.5.2022 Information Extraction (L) Sentiment Analysis (L) x x
13.5.2022 no lecture
20.5.2022 Language Models and Ethics in NLP (L) Group assignment (P) x x
27.5.2022 Group work (P) Group work (P) x
3.6.2022 Data Programming for IE (L) Group work (P) / Oral Exam Master x x
10.6.2022 Guest Lecture: Dimitar Dimitrov(L) Group work (P) x
17.6.2022 Group work (P) Group work (P) x
24.6.2022 Student talks - Project presentation (P) Student talks - Project presentation (P) x
31.8.2022 Submission of term papers x

Bachelor: Group Assignments

In the group assignments a group of four students has to work on a bias-related topic with a specific focus and on one of three datasets. In the group work phases starting on 20.5.2022 we will be available during the lecture time to help and advise.

In the presentations on 24.6.2022 you are expected to present a concept regarding your specific topic and dataset. Please decribe the motivation, the dataset, your methods and NLP pipeline, a working prototype and some first insights and results.

The feedback gathered during the presentation should be used to write a final term paper on your specific topic and work. Please read the guidelines for the term paper.

Datasets

Choose one of the following datasets to work on:

Topics

Choose one of the following topcis:

Gender Bias

Gender bias is a group bias in which different genders are represented differently in terms of an aspect in a given (set of) document(s) than expected. Aspects for which there can be a bias range from quantitative measures (e.g., how many documents have male/female authors) to more complex NLP measures (e.g., different sentiments in texts about male/female politicians or topical bias, different distributions of topics in texts geared towards male/female readers).

Exaples for papers that investigate gender bias:

Ethnic Bias

Like gender bias, ethnic or racial bias describes bias towards groups of people belonging to an ethnical (or religious) group. Ethnic bias includes harmful stereotypes and less blatant but still dangerous aspects like topical bias. Detecting ethnic bias is not only important because it may lead to even more severe instances of racism, and it is an infringement of the constitutional right to equal treatment.

Exaples for papers that investigate ethnic bias:

Non-Neutral Speech

Non-neutral language consists of many aspects of language that is subjective, opinionated, or otherwise implies valuation. This includes toxicity, ranging from forms of hate speech such as racism, incivility, profane, offensive and aggressive language to over-positive praises. Non-neutral language is especially problematic when it appears in types of documents that claim to be neutral, such as wikipedia or (public) news. A related concept is framing bias, defined as the use of subjective words or phrases linked with a particular opinion.

Exaples for papers that investigate non-neutral language:

Stance Detection

Stance is a concept that describes an opinion on a subject, most often in a political context. The goal of stance detection is to detect the stances of users/authors towards these subjects. Often, the subjects are known due to context (for example, abortion, weapon laws and gay marriage in political texts) or they have to be determined using approaches like entity recognition. A related concept is that of target-dependent or aspect-based sentiment analysis, in which the opinions on aspects (targets) are detected.

Exaples for papers that investigate stance detection:

Owner
Classrooms of IR Group at Technische Hochschule Köln
Classrooms of IR Group at Technische Hochschule Köln
🔐 A simple command-line password manager.

PassVault What Is It? It is a command-line password manager, for educational purposes, that stores localy, in AES encryption, your sensitives datas in

5 Aug 15, 2022
CVE-2021-41773 Path Traversal for Apache 2.4.49

CVE-2021-41773 Path Traversal for Apache 2.4.49

ac1d 3 Oct 20, 2021
A Tool to find subdomains from hackerone reports.

Hactivity A Tool to find subdomains from Hackerone reports of a given company or a search term (xss, ssrf, etc). It can also print out URL and Title o

Stinger 15 Jul 24, 2022
Polkit - Local Privilege Escalation (CVE-2021-3560)

CVE-2021-3560 Polkit - Local Privilege Escalation Original discovery by kevin_backhouse from GitHub Security Lab References https://github.blog/2021-0

Salman Asad 1 Nov 12, 2021
A python base script from which you can hack or clone any person's facebook friendlist or followers accounts which have simple password

Hcoder This is a python base script from which you can hack or clone any person's facebook friendlist or followers accounts which have simple password

Muhammad Hamza 3 Dec 06, 2021
This respository contains the source code of the printjack and phonejack attacks.

Printjack-Phonejack This repository contains the source code of the printjack and phonejack attacks. The Printjack directory contains the script to ca

pietrobiondi 2 Feb 12, 2022
Evil-stalker - A simple tool written in python, it is so simple that it is based on google dorks

evil-stalker How to run First of all, you must install the necessary libraries.

rock3d 6 Nov 16, 2022
♻️ Password Generator (PSG) 📚 This plugin is made for more familiarity with Python, but can also be used to create passwords

About Tool This plugin is made for more familiarity with Python, but can also be used to create passwords.

STgazing 2 Jul 23, 2022
Search Shodan for Minecraft server IPs to grief

GriefBuddy This script searches Shodan for Minecraft server IPs to grief. This will return all servers connected to the public internet which Shodan h

26 Dec 29, 2022
A python script to decrypt media files encrypted using the Android application 'Decrypting 'LOCKED Secret Calculator Vault''. Will identify PIN / pattern.

A python script to decrypt media files encrypted using the Android application 'Decrypting 'LOCKED Secret Calculator Vault''. Will identify PIN / pattern.

3 Sep 26, 2022
Guess the password for Tik Tok accounts

Guess the password for Tik Tok accounts Tool features : You don't need proxies There is no captcha Running on a private api Combo T

32 Dec 25, 2022
Dependency injection in python with autoconfiguration

The base is a DynamicContainer to autoconfigure services using the decorators @services for regular services and @command_handler for using command pattern.

Sergio Gómez 2 Jan 17, 2022
Rouge Spammers with a mission to disrupt the peace of the valley ? Fear not we will STOMP the Spammers

Rouge Spammers with a mission to disrupt the peace of the valley ? Fear not we will STOMP the Spammers New Update : adding 'on-review' tag on an issue

A N U S H 13 Sep 19, 2021
Python APK Reverser & Patcher Tool

DTL-X An Advanced Python APK Reverser and Patcher Tool. --rmads1: target=AndroidManifest.xml,replace=com.google.android.gms.ad --rmads2: No Internet (

DedSecTL 10 Oct 31, 2022
🏃 Python Solutions of All Problems in FHC 2021 (In Progress)

FacebookHackerCup-2021 Python solutions of Facebook Hacker Cup 2021. Solution begins with * means it will get TLE in the largest data set (total compu

kamyu 14 Oct 15, 2022
Some Attacks of Exchange SSRF ProxyLogon&ProxyShell

Some Attacks of Exchange SSRF This project is heavily replicated in ProxyShell, NtlmRelayToEWS https://mp.weixin.qq.com/s/GFcEKA48bPWsezNdVcrWag Get 1

Jumbo 129 Dec 30, 2022
Raphael is a vulnerability scanning tool based on Python3.

Raphael Raphael是一款基于Python3开发的插件式漏洞扫描工具。 Raphael is a vulnerability scanning too

b4zinga 5 Mar 21, 2022
S2-061 的payload,以及对应简单的PoC/Exp

S2-061 脚本皆根据vulhub的struts2-059/061漏洞测试环境来写的,不具普遍性,还望大佬多多指教 struts2-061-poc.py(可执行简单系统命令) 用法:python struts2-061-poc.py http://ip:port command 例子:python

dreamer 46 Oct 20, 2022
:closed_lock_with_key: multi factor authentication system (2FA, MFA, OTP Server)

privacyIDEA privacyIDEA is an open solution for strong two-factor authentication like OTP tokens, SMS, smartphones or SSH keys. Using privacyIDEA you

1.3k Jan 03, 2023
A simple python-function, to gain all wlan passwords from stored wlan-profiles on a computer.

Wlan Fetcher Windows10 Description A simple python-function, to gain all wlan passwords from stored wlan-profiles on a computer. Usage This Script onl

2 Nov 20, 2021