INFO 3350/6350, Spring 2022, Cornell

Overview

Information Science 3350/6350

Text mining for history and literature

Staff and sections

Instructor: Matthew Wilkens
Graduate TAs: Federica Bologna, Rosamund Thalken
Undergrad TAs: Steven Chen, Naba Deyab, Isabel Frank, Zilu Li, Lindsey Luo, Daniel Riley, Aishwarya Singh, Ray Wang

Term: Spring 2022
Credits: 3
Mode: In person except as required by Cornell (¯\_(ツ)_/¯)

Lecture: MW 9:05-9:55am, Olin 245
Sections:

  • F 9:05-9:55 Olin 145
  • F 10:10-11:00 Upson 146
  • F 11:20-12:10 Hollister 312
  • F 12:25-1:15 Philips 307
  • Additional grad section: F 2:40-3:30pm (Gates 114)

Office hours: See Canvas
Online sessions and resources: See the Mechanics section below.

Waitlist

Both the course and the waitlist are full. If you are currently on the waitlist, we will send you a Zoom link so that you can attend lectures during the add/drop period.

Summary

A course on the uses of text mining and other data-intensive techniques to assist analysis of textual humanities materials. Special emphasis on literary and historical documents. Intended for students with programming experience equivalent to CS 1110 (Intro to Computing Using Python).

Description

Broadly speaking, the course covers text mining, content analysis, and basic machine learning, emphasizing approaches with demonstrated value in literary studies and other humanistic fields. Students will learn how to clean and process textual corpora, extract information from unstructured texts, identify relevant textual and extra-textual features, assess document similarity, cluster and classify authors and texts using a variety of machine-learning methods, visualize the outputs of statistical models, and incorporate quantitative evidence into literary and humanistic analysis. The course will also introduce some of the more interesting recently published results in computational and quantitative humanities.

Most of the methods treated in the class are relevant in multiple fields. Students from all majors are welcome. Students with backgrounds in the humanities are especially encouraged to join.

Objectives and learning goals

The primary objective of the course is to build proficiency in text analysis and data mining for the humanities. Students who complete the course will have knowledge of standard approaches to text analysis and will be familiar with the humanistic ends to which those approaches might be put. Secondary objectives include acquiring basic understanding of relevant literary history, of integrating quantitative with qualitative evidence, and of best practices in small-scale project management.

Mechanics

We will use:

  • GitHub (right here) to distribute lecture materials, code, and datasets. The current versions of the syllabus (this page) and the schedule are always on GitHub, too. You might want to watch or star this repo to be notified of changes.
  • CMS to manage problem sets and other code work and to track grades.
  • Canvas to distribute restricted readings and other non-public materials, including Zoom links.
  • Ed for Q&A.
  • Google Drive to build and manage a class corpus.

Links and detailed info about each of these are available via the course Canvas site.

Note that you must generally be logged in through your Cornell account to access non-public resources such as Zoom meetings, lecture recordings, Google Drive folders.

Work and grading

Grades will be based on weekly problem sets (50% in sum), a midterm mini-project (15%), reading responses (10% in sum), a take-home final exam or optional final project (20%), and consistent professionalism (including attendance and participation, 5%). You must achieve a passing grade in each of these components to pass the course.

Graduate students (enrolled in 6350) must complete a final project in place of the final exam and are strongly encouraged to attend an additional weekly section meeting for 6350 (Fridays, time to be determined).

A small amount of extra credit will be awarded for IS/Communications SONA study completion (0.5 course point per SONA credit assigned to this class, up to 1.0 total course point) and for consistent, helpful contributions to Ed discussions (up to 1.0 course point).

Texts and readings

There is one required textbook for the course:

Readings from this textbook will be assigned for most weeks. All other assigned readings will be available online, either through the open web or via Canvas. See the schedule for details.

There are five textbooks that may be useful for students who wish to consult them. They are not required and most students will not need them.

Schedule

In general, Monday lectures will introduce new technical material. Wednesday sessions will combine technical instruction with discussion of assigned readings from the scholarly literature. Friday sections are smaller and devoted to focused work on problem sets and to follow-up questions about topics previously introduced.

For the detailed (and updating) list of topics and readings, see the course schedule.

Final exam

A final exam in the form of a take-home project is due during the finals period. More information will be posted as that time draws closer.

Undergraduates (enrolled in 3350) may elect to complete a project in lieu of the exam. If you take this route, you may work in a group of up to three students. The expected amount of work on the project will be scaled by the number of group members. Except in unusual circumstances, all group members will receive the same grade.

Graduate students (enrolled in 6350) must complete an independent project in place of the final exam.

Policies

Harassment and respect

All students are entitled to respect from course staff and from their fellow students. All staff are entitled to respect from students and from fellow staff members. Violations of this principle, whether large or small, will not be tolerated.

Respect means that your ideas are taken seriously, that you feel welcome in class settings (including in study groups and online fora), and that you are treated as a full, co-equal member of the class. Harassment describes any action, intentional or otherwise, that abridges the respect owed to every member of the class.

If you experience harassment in any form, or if you would like to discuss your experience in the class, please see me in office hours or contact me by email. The university also has reporting and counseling resources available, including those for sexual harassment and for other bias incidents.

Attendance

This is a class of moderate size that will make frequent use of class time to discuss readings and to debate different approaches to academic inquiry. It also has a waitlist a mile long. For this reason, attendance (virtual and physical, as circumstances dictate) is required. When meeting by Zoom, your camera must be on or you will be marked absent. If you feel your circumstances require an exception to this policy, you should consult with Professor Wilkens directly to explain why.

Students in mandatory isolation or who are in highly displaced time zones and who have received individual permission are excused from attending Monday and Wednesday lectures. These students should watch the recorded lectures as soon as they are available and post any questions to Ed.

If you need to miss a class meeting, please complete the absence form before the meeting in question and watch the recorded video of the session you missed once it is available. If you miss section on Friday, a recording may not be available. Consult with your section leader for appropriate steps. In every case, assigned work remains due at the appointed time.

Note: Participation is much more important than attendance. Your grade will not suffer if you make the wise decision to stay home when you might infect others.

Slip days

Late work is accepted subject to a limit of ten total slip days for the semester. You may submit any individual assignment up to four days (96 hours) late. The slip day policy does not apply to the reading responses, which may not be submitted late, since they are tied to in-class activities.

If you expect to miss a deadline or to be absent for an extended period due to truly exceptional circumstances, contact Professor Wilkens as far in advance as possible so that we can discuss potential accommodations.

Regrade requests

If you feel that the graders have made a clear, objective, and significant mistake in assessing your work, you may request a regrade via CMS not later than one week after feedback is released. Remember that this process exists to correct mistakes. This process does not exist to lobby for points.

Regrade requests are typically processed within a week or two of submission. You will be notified of the outcome as soon as it is ready.

We want to give grades that accurately represent our assessment of your understanding of course material. Hence, if you are given a lower score than you should have been, due to an obvious grading error, you should absolutely bring it to our attention. However, we must explicitly mention an additional consequence of the importance of grade accuracy: if we notice that you have been assigned more points than you should have been, we are duty-bound to correct such scores downward to the correct value.

Academic integrity

Each student in this course is expected to abide by the Cornell University Code of Academic Integrity. Any work submitted by a student in this course for academic credit will be the student's own work unless specifically and explicitly permitted otherwise.

Using other people's code is an important part of programming but, for group projects, the code should be substantially the work of the group members (except for standard libraries). Any code used in projects that was not written by the group members should be placed in separate files and clearly labeled with their source URLs. If you have benefitted from online resources such as StackOverflow, list the URLs in comments in your own code, even if you did not directly copy anything.

Project work that relates to your other classes or research is encouraged, but you may not recycle assignments. There must be no doubt that the work you turn in for this class was done for this class. When in doubt, consult with me during office hours.

Disabilities

Every student's access is important to us. If you have, or think you may have, a disability, please contact Student Disability Services for a confidential discussion: [email protected], 607-254-4545, or sds.cornell.edu.

  • Please request any accommodation letter early in the semester, or as soon as you become registered with Student Disability Services (SDS), so that we have adequate time to arrange your approved academic accommodations.
  • Once SDS approves your accommodation letter, it will be emailed to you and to me. Please follow up with me to discuss the necessary logistics of your accommodations.
  • If you are approved for exam accommodations, please consult with me at least two weeks before the scheduled exam date to confirm the testing arrangements.
  • If you experience any access barriers in this course, such as with printed content, graphics, online materials, or any communication barriers, reach out to me and/or your SDS counselor right away.
  • If you need an immediate accommodation, please speak with me after class or send an email message to me and to SDS.
Owner
Wilkens Teaching
Courses taught by Matthew Wilkens
Wilkens Teaching
The next level Python obfuscator, nearly impossible to deobfuscate.

🐸 Kramer 🐸 Kramer is a next level obfuscation tool written in Python3 allowing you to obfuscate your Python3 code easily and securely. It uses Berse

Billy 114 Dec 26, 2022
Script Crack Facebook Premium 🚶‍♂

prem Script Crack Facebook Premium 🚶‍♂ Install Script $ pkg update && pkg update $ termux-setup-storage $ pkg install git $ pkg install python $ pip

Yumasaa 1 Dec 03, 2021
Apache OFBiz rmi反序列化EXP(CVE-2021-26295)

Apache OFBiz rmi反序列化EXP(CVE-2021-26295) 目前仅支持nc弹shell 将ysoserial.jar放置在同目录下,py3运行,根据提示输入漏洞url,你的vps地址和端口 第二次使用建议删除exp.ot 本工具仅用于安全测试,禁止未授权非法攻击站点,否则后果自负

15 Nov 09, 2022
Extensive Python3 network scanner, simplified.

Snake Map Extensive Python3 network scanner, simplified. _,.--. --..,_ .'`__ o `;__, `'.'. .'.'` '---'` '

Miss Bliss 4 Apr 16, 2022
Phishing-Crack tools to punish friends

Phishing-Crack Phishing Tool Version 1.0.0 Created By temirovazat A Phishing Tool With PHP and Python3 Features Fake Instagram Phishing Page Fake Face

3 Oct 04, 2022
IDA Python Script for anti ollvm

IDA Python Script for anti ollvm

Shocker 62 Dec 23, 2022
EyeJo是一款自动化资产风险评估平台,可以协助甲方安全人员或乙方安全人员对授权的资产中进行排查,快速发现存在的薄弱点和攻击面。

EyeJo EyeJo是一款自动化资产风险评估平台,可以协助甲方安全人员或乙方安全人员对授权的资产中进行排查,快速发现存在的薄弱点和攻击面。 免责声明 本平台集成了大量的互联网公开工具,主要是方便安全人员整理、排查资产、安全测试等,切勿用于非法用途。使用者存在危害网络安全等任何非法行为,后果自负,作

429 Dec 31, 2022
Tor Relay availability checker, for using it as a bridge in countries with censorship

Tor Relay Availability Checker This small script downloads all Tor Relay IP addresses from onionoo.torproject.org and checks whether random Relays are

ValdikSS 161 Dec 30, 2022
Detection And Breaking With Python

Detection And Breaking IIIIIIIIIIIIIIIIIIII PPPPPPPPPPPPPPPPP VVVVVVVV VVVVVVVV I::::::::II::::::::I P:::::::

Baris Dincer 1 Dec 26, 2021
MVT is a forensic tool to look for signs of infection in smartphone devices

Mobile Verification Toolkit Mobile Verification Toolkit (MVT) is a collection of utilities to simplify and automate the process of gathering forensic

8.3k Jan 08, 2023
A set of blender assets created for the $yb NFT project.

fyb-blender A set of blender assets created for the $yb NFT project. Install just as you would any other Blender Add-on (via Edit-Preferences-Add-on

Pedro Arroyo 1 May 06, 2022
python写的一款免杀工具(shellcode加载器)BypassAV,国内杀软全过(windows denfend)

python写的一款免杀工具(shellcode加载器)BypassAV,国内杀软全过(windows denfend)

1frame 266 Jan 02, 2023
PyExtractor is a decompiler that can fully decompile exe's compiled with pyinstaller or py2exe

PyExtractor is a decompiler that can fully decompile exe's compiled with pyinstaller or py2exe with additional features such as malware checker/detector! Also checks file(s) for suspicious words, dis

Rdimo 56 Jul 31, 2022
POC for CVE-2022-1388

CVE-2022-1388 POC for CVE-2022-1388 affecting multiple F5 products. Follow the Horizon3.ai Attack Team on Twitter for the latest security research: Ho

Horizon 3 AI Inc 231 Dec 07, 2022
This repository contains wordlists for each versions of common web applications and content management systems (CMS). Each version contains a wordlist of all the files directories for this version.

webapp-wordlists This repository contains wordlists for each versions of common web applications and content management systems (CMS). Each version co

Podalirius 396 Jan 08, 2023
PasswordManager is a command-line program that helps you manage your secret files like passwords

PasswordManager is a command-line program that helps you manage your secret files like passwords. It's very minimalistic and easy to use.

Michael 3 Dec 30, 2021
exchange-ssrf-rce

Usage python3 .\exchange-exp.py -------------------------------------------------------------------------------- |

Jen 76 Nov 09, 2022
TCP/UDP port scanner on python, usong scapy and multiprocessin

Port Scanner TCP/UDP port scanner on python, usong scapy and multiprocessing. Usage python3 scanner.py [OPTIONS] IP_ADDRESS [{tcp|udp}[/[PORT|PORT-POR

Egor Krokhin 1 Dec 05, 2021
一款辅助探测Orderby注入漏洞的BurpSuite插件,Python3编写,适用于上xray等扫描器被ban的场景

OrderbyHunter 一款辅助探测Orderby注入漏洞的BurpSuite插件,Python3编写,适用于上xray等扫描器被ban的场景 1. 支持Get/Post型请求参数的探测,被动探测,对于存在Orderby注入的请求将会在HTTP Histroy里标红 2. 自定义排序参数list

Automne 21 Aug 12, 2022
Open Source Tool - Cybersecurity Graph Database in Neo4j

GraphKer Open Source Tool - Cybersecurity Graph Database in Neo4j |G|r|a|p|h|K|e|r| { open source tool for a cybersecurity graph database in neo4j } W

Adamantios - Marios Berzovitis 27 Dec 06, 2022