A repo for open resources & information for people to succeed in PhD in CS & career in AI / NLP

Overview

Resources to Help Global Equality for PhDs in NLP / AI

This repo originates with a wish to promote Global Equality for people who want to do a PhD in NLP, following the idea that mentorship programs are an effective way to fight against segregation, according to The Human Networks (Jackson, 2019). Specifically, we wish people from all over the world and with all types of backgrounds can share the same source of information, so that success will be a reward to those who are determined and hardworking, regardless of external contrainsts.

One non-negligible reason for success is access to information, such as (1) knowing what a PhD in NLP is like, (2) knowing what top grad schools look for when reviewing PhD applications, (3) broadening your horizon of what is good work, (4) knowing how careers in NLP in both academia and industry are like, and many others.

Contributor: Zhijing Jin (PhD student in NLP at Max Planck Institute, co-organizer of the ACL Year-Round Mentorship Program).

You are welcome to be a collaborator, -- you can make an issue/pull request, and I can add you :).

Endorsers of this repo: Prof Rada Mihalcea (University of Michigan). Please add your name here (by a pull request) if you endorse this repo :).

Contents (Actively Updating)

Top Resources

  1. Online ACL Year-Round Mentorship Program: https://acl-mentorship.github.io (You can apply as a mentee, as a mentor, or as a volunteer. For mentees, you will be able to attend monthly zoom Q&A sessions hosted senior researchers in NLP. You will also join a global slack channel, where you can constantly post your questions, and we will collect answers from senior NLP researchers.)

Stage 1. (Non-PhD -> PhD) How to Apply to PhD?

  1. (Prof Philip [email protected]) Finding CS Ph.D. programs to apply to. [Video]

  2. (Prof Mor Harchol-Balter@CMU) Applying to Ph.D. Programs in Computer Science (2014). [Guide]

  3. (Prof Jason [email protected]) Advice for Research Students (last updated: 2021). [List of suggestions]

  4. (CS Rankings) Advice on Applying to Grad School in Computer Science. [Pointers]

  5. (Nelson Liu, [email protected]) Student Perspectives on Applying to NLP PhD Programs (2019). [Suggestions Based on Surveys]

  6. A Princeton CS Major's Guide to Applying to Graduate School. [List of suggestions]

  7. (John Hewitt, [email protected]) Undergrad to PhD, or not - advice for undergrads interested in research (2018). [Suggestions]

  8. (Kalpesh Krishna, [email protected] Amherst) Grad School Resources (2018). [Article] (This list lots of useful pointers!)

  9. (Prof Scott E. [email protected]) Quora answers on the LTI program at CMU (2017). [Article]

  10. (Albert Webson et al., [email protected] University) Resources for Underrepresented Groups, including Brown's Own Applicant Mentorship Program (2020, but we will keep updating it throughout the 2021 application season.) [List of Resources]

Specific Suggestions

  1. (Prof Nathan [email protected] University) Inside Ph.D. admissions: What readers look for in a Statement of Purpose. [Article]

Improve Your Proficiency with Tools

  1. (MIT 2020) The Missing Semester of Your CS Education (e.g., master the command-line, ssh into remote machines, use fancy features of version control systems).

Stage 2. (Doing PhD) How to Succeed in PhD?

  1. (Maxwell Forbes, [email protected]) Every PhD Is Different. [Suggestions]

  2. (Prof Mark [email protected], Prof Hanna M. [email protected] Amherst) How to be a successful PhD student (in computer science (in NLP/ML)). [Suggestions]

  3. (Andrej Karpathy) A Survival Guide to a PhD (2016). [Suggestions]

  4. (Prof Kevin [email protected]) Kevin Gimpel's Advice to PhD Students. [Suggestions]

  5. (Prof Marie [email protected] University) How to Succeed in Graduate School: A Guide for Students and Advisors (1994). [Article] [Part II]

  6. (Prof Eric [email protected]) Syllabus for Eric’s PhD students (incl. Prof's expectation for PhD students). [syllabus]

  7. (Prof H.T. [email protected]) Useful Thoughts about Research (1987). [Suggestions]

  8. (Prof Phil [email protected]) Networking on the Network: A Guide to Professional Skills for PhD Students (last updated: 2015). [Suggestions]

  9. (Prof Stephen C. [email protected]) Some Modest Advice for Graduate Students. [Article]

  10. (Prof Tao [email protected]) Graduate Student Survival/Success Guide. [Slides]

  11. (Mu [email protected]) 博士这五年 (A Chinese article about five years in PhD at CMU). [Article]

  12. (Karl Stratos) A Note to a Prospective Student. [Suggestions]

What Is Weekly Meeting with Advisors like?

  1. (Prof Jason [email protected]) What do PhD students talk about in their once-a-week meetings with their advisers during their first year? (2015). [Article]

  2. (Brown University) Guide to Meetings with Your Advisor. [Suggestions]

Practical Guides

  1. (Prof Srinivasan [email protected]) How to Read a Paper (2007). [Suggestions]

  2. (Prof Jason [email protected]) How to Read a Technical Paper (2009). [Suggestions]

  3. (Prof Jason [email protected]) How to write a paper? (2010). [Suggestions]

Memoir-Like Narratives

  1. (Prof Philip [email protected]) The Ph.D. Grind: A Ph.D. Student Memoir (last updated: 2015). [Video] (For the book, you have to dig deeply, and then you will find the book.)

  2. (Prof Tianqi [email protected]) 陈天奇:机器学习科研的十年 (2019) (A Chinese article about ten years of research in ML). [Article]

  3. (Jean Yang) What My PhD Was Like. [Article]

How to Excel Your Research

  1. The most important step: (Prof Jason [email protected]) How to Find Research Problems (1997). [Suggestions]

Grad School Fellowships

  1. (List compiled by CMU) Graduate Fellowship Opportunities [link]
  2. CYD Fellowship for Grad Students in Switzerland [link]

Other Books

  1. The craft of Research by Wayne Booth, Greg Colomb and Joseph Williams.

  2. How to write a better thesis by Paul Gruba and David Evans

  3. Helping Doctoral Students to write by Barbara Kamler and Pat Thomson

  4. The unwritten rules of PhD research by Marian Petre and Gordon Rugg

Stage 3. (After PhD -> Industry) How is life as an industry researcher?

  1. (Mu [email protected]) 工作五年反思 (A Chinese article about reflections on the five years working in industry). [Article]

Stage 4. (Being a Prof) How to get an academic position? And how to be a good prof?

  1. (Prof Jason [email protected]) How to write an academic research statement (when applying for a faculty job) (2017). [Article]

  2. (Prof Jason [email protected]) How to Give a Talk (2015). [Suggestions]

  3. (Prof Jason [email protected]) Teaching Philosophy. [Article]

Stage 5. (Whole Career Path) How to live out a life career as an NLP research?

  1. (Prof Charles [email protected] University, Prof Qiang [email protected])Crafting Your Research Future: A Guide to Successful Master's and Ph.D. Degrees in Science & Engineering. [Book]

Further Readings: Technical Materials to Improve Your NLP Research Skills

  1. (Prof Jason [email protected]) Technical Tutorials, Notes, and Suggested Reading (last updated: 2018) [Reading list]

Contributions

All types of contributions to this resource list is welcome. Feel free to open a Pull Request.

Contact: Zhijing Jin, PhD in NLP at Max Planck Institute for Intelligent Systems, working on NLP & Causality.

How to Cite This Repo

@misc{resources2021jin,
  author = {Zhijing Jin},
  title = {Resources to Help Global Equality for PhDs in NLP},
  year = {2021},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/zhijing-jin/nlp-phd-global-equality}}
}
Owner
PhD in NLP & Causality. Affiliated with Max Planck Institute, Germany & ETH & UMich. Supervised by Bernhard Schoelkopf, Rada Mihalcea, and Mrinmaya Sachan.
Segmenter - Transformer for Semantic Segmentation

Segmenter - Transformer for Semantic Segmentation

592 Dec 27, 2022
precise iris segmentation

PI-DECODER Introduction PI-DECODER, a decoder structure designed for Precise Iris Segmentation and Location. The decoder structure is shown below: Ple

8 Aug 08, 2022
This code is the implementation of Text Emotion Recognition (TER) with linguistic features

APSIPA-TER This code is the implementation of Text Emotion Recognition (TER) with linguistic features. The network model is BERT with a pretrained mod

kenro515 1 Feb 08, 2022
Linking data between GBIF, Biodiverse, and Open Tree of Life

GBIF-biodiverse-OpenTree Linking data between GBIF, Biodiverse, and Open Tree of Life The python scripts will rely on opentree and Dendropy. To set up

2 Oct 03, 2022
End-2-end speech synthesis with recurrent neural networks

Introduction New: Interactive demo using Google Colaboratory can be found here TTS-Cube is an end-2-end speech synthesis system that provides a full p

Tiberiu Boros 214 Dec 07, 2022
BERTopic is a topic modeling technique that leverages 🤗 transformers and c-TF-IDF to create dense clusters allowing for easily interpretable topics whilst keeping important words in the topic descriptions

BERTopic BERTopic is a topic modeling technique that leverages 🤗 transformers and c-TF-IDF to create dense clusters allowing for easily interpretable

Maarten Grootendorst 3.6k Jan 07, 2023
Auto_code_complete is a auto word-completetion program which allows you to customize it on your needs

auto_code_complete is a auto word-completetion program which allows you to customize it on your needs. the model for this program is one of the deep-learning NLP(Natural Language Process) model struc

RUO 2 Feb 22, 2022
DaCy: The State of the Art Danish NLP pipeline using SpaCy

DaCy: A SpaCy NLP Pipeline for Danish DaCy is a Danish preprocessing pipeline trained in SpaCy. At the time of writing it has achieved State-of-the-Ar

Kenneth Enevoldsen 71 Jan 06, 2023
FactSumm: Factual Consistency Scorer for Abstractive Summarization

FactSumm: Factual Consistency Scorer for Abstractive Summarization FactSumm is a toolkit that scores Factualy Consistency for Abstract Summarization W

devfon 83 Jan 09, 2023
In this project, we compared Spanish BERT and Multilingual BERT in the Sentiment Analysis task.

Applying BERT Fine Tuning to Sentiment Classification on Amazon Reviews Abstract Sentiment analysis has made great progress in recent years, due to th

Alexander Leonardo Lique Lamas 5 Jan 03, 2022
Implemented shortest-circuit disambiguation, maximum probability disambiguation, HMM-based lexical annotation and BiLSTM+CRF-based named entity recognition

Implemented shortest-circuit disambiguation, maximum probability disambiguation, HMM-based lexical annotation and BiLSTM+CRF-based named entity recognition

0 Feb 13, 2022
Predict the spans of toxic posts that were responsible for the toxic label of the posts

toxic-spans-detection An attempt at the SemEval 2021 Task 5: Toxic Spans Detection. The Toxic Spans Detection task of SemEval2021 required participant

Ilias Antonopoulos 3 Jul 24, 2022
Blackstone is a spaCy model and library for processing long-form, unstructured legal text

Blackstone Blackstone is a spaCy model and library for processing long-form, unstructured legal text. Blackstone is an experimental research project f

ICLR&D 579 Jan 08, 2023
Code for the paper "Language Models are Unsupervised Multitask Learners"

Status: Archive (code is provided as-is, no updates expected) gpt-2 Code and models from the paper "Language Models are Unsupervised Multitask Learner

OpenAI 16.1k Jan 08, 2023
A fast and lightweight python-based CTC beam search decoder for speech recognition.

pyctcdecode A fast and feature-rich CTC beam search decoder for speech recognition written in Python, providing n-gram (kenlm) language model support

Kensho 315 Dec 21, 2022
sangha, pronounced "suhng-guh", is a social networking, booking platform where students and teachers can share their practice.

Flask React Project This is the backend for the Flask React project. Getting started Clone this repository (only this branch) git clone https://github

Courtney Newcomer 17 Sep 29, 2021
Code Generation using a large neural network called GPT-J

CodeGenX is a Code Generation system powered by Artificial Intelligence! It is delivered to you in the form of a Visual Studio Code Extension and is Free and Open-source!

DeepGenX 389 Dec 31, 2022
Free and Open Source Machine Translation API. 100% self-hosted, offline capable and easy to setup.

LibreTranslate Try it online! | API Docs | Community Forum Free and Open Source Machine Translation API, entirely self-hosted. Unlike other APIs, it d

3.4k Dec 27, 2022
Original implementation of the pooling method introduced in "Speaker embeddings by modeling channel-wise correlations"

Speaker-Embeddings-Correlation-Pooling This is the original implementation of the pooling method introduced in "Speaker embeddings by modeling channel

Themos Stafylakis 10 Apr 30, 2022
NLP Core Library and Model Zoo based on PaddlePaddle 2.0

PaddleNLP 2.0拥有丰富的模型库、简洁易用的API与高性能的分布式训练的能力,旨在为飞桨开发者提升文本建模效率,并提供基于PaddlePaddle 2.0的NLP领域最佳实践。

6.9k Jan 01, 2023