PubMed Mapper: A Python library that map PubMed XML to Python object

Overview

pubmed-mapper: A Python Library that map PubMed XML to Python object

中文文档

1. Philosophy

view UML

Programmatically access PubMed article is a common task for me. Luckily, with the help of eutils, we can access full article data in XML format. What I need is Python objects, not just XML strings, so pubmed-mapper was born.

2. Installation

pip install pubmed-mapper

3. Usage

3.1 use as library

3.1.1 parse a PubMed ID

from pubmed_mapper import Article


article = Article.parse_pmid('32329900')

# PubMed ID
print(article.pmid)  # 32329900

# ids
print(article.ids)  # [pubmed: 32329900, doi: 10.1111/jgs.16467]
print(article.ids[1].id_type)  # doi
print(article.ids[1].id_value)  # 10.1111/jgs.16467

# title
print(article.title)  # Associations of Coffee...

# abstract
print(article.abstract)  # <p><strong>Background: </strong>Coffee and tea...

# keywords
print(article.keywords)  # ['aging', 'coffee; diet; longevity', 'tea']

# MeSH headings
print(article.mesh_headings)  # ['Aged', 'Body Mass Index', '...']

# authors
print(article.authors)  # [hadyab AH Aladdin H, Manson JE JoAnn E, ...]
print(article.authors[0].last_name)  # Shadyab
print(article.authors[0].forename)  # Aladdin H
print(article.authors[0].initials)  # AH
print(article.authors[0].affiliation)  # Department of Family...

# journal
print(article.journal)  # Journal of the American Geriatrics Society
print(article.journal.issn)  # 1532-5415
print(article.journal.issn_type)  # Electronic
print(article.journal.title)  # Journal of the American Geriatrics Society
print(article.journal.abbr)  # J Am Geriatr Soc

# volume
print(article.volume)  # 68

# issue
print(article.issue)  # 9

# references
print(article.references)  # [n. 2013;129:643-659....]
print(article.references[0].citation)  # Lotfield E, Freedman ND...
print(article.references[0].ids)  # []

# pubdate
print(article.pubdate)  # 2020-09-01

3.1.2 parse a downloaded XML file

from lxml import etree
from pubmed_mapper import Article


infile = 'xxx.xml'
with open(infile) as fp:
    root = etree.parse(fp)


articles = []
for pubmed_article_element in root.xpath('/PubmedArticleSet/PubmedArticle'):
    article =  Article.parse_element(pubmed_article_element)
    articles.append(article)

3.2 use as command line software

3.2.1 parse PubMed ID

pubmed-mapper pmid -p 32329900

3.2.2 parse single PubMed XML file

pubmed-mapper file -i data/pubmed21n0001.xml -o output/pubmed21n0001.jl

3.2.3 parse a directory who contains multiple PubMed XML files

pubmed-mapper directory -i data/ -o output/pubmed-mapper.jl

4. FAQs

4.1 There many types of PubMed article publication date, how do you convert it to datetime.date object?

Parse publication date is a hard work, until now pubmed-mapper can't parse all types of them. The types pubmed-mapper can be parsed and the parsed value are:

type value
2021-03-13 2021-03-13
2021-03 2021-03-01
2021 Spring 2021-04-01
2021 2021-01-01
2021 Jan-Feb 2021-01-01
2021 Mar 13-15 2021-03-13
2021 Mar-2022 Jan 2021-03-01
2021-2022 2021-01-01
2021 Mar 13-Dec 15 2021-03-13
1976-1977 Winter 1976-01-01
1977-1978 Fall-Winter 1977-10-01

4.2 What is pubmed-mapper.log generated by pubmed-mapper?

pubmed-mapper.log is the default log file generate by pubmed-mapper, you can change the file by using --log-file options:

pubmed-mapper --log-file my-custom.log file -i data/pubmed21n0001.xml -o output/pubmed21n0001.jl

You can go to this log file to find out more parsing details.

4.3 I want log detail message in my log file?

Using --log-level can log more detail message:

pubmed-mapper --log-file my-custom.log --log-level DEBUG file -i data/pubmed21n0001.xml -o output/pubmed21n0001.jl
Owner
灵魂工具人
大家好,我是灵魂工具人,我会分享一些由我做的生物信息工具,希望大家喜欢。
灵魂工具人
A Python Object-Document-Mapper for working with MongoDB

MongoEngine Info: MongoEngine is an ORM-like layer on top of PyMongo. Repository: https://github.com/MongoEngine/mongoengine Author: Harry Marr (http:

MongoEngine 3.9k Jan 08, 2023
AWS SDK for Python

Boto3 - The AWS SDK for Python Boto3 is the Amazon Web Services (AWS) Software Development Kit (SDK) for Python, which allows Python developers to wri

the boto project 7.8k Jan 04, 2023
Pure Python MySQL Client

PyMySQL Table of Contents Requirements Installation Documentation Example Resources License This package contains a pure-Python MySQL client library,

PyMySQL 7.2k Jan 09, 2023
DataStax Python Driver for Apache Cassandra

DataStax Driver for Apache Cassandra A modern, feature-rich and highly-tunable Python client library for Apache Cassandra (2.1+) and DataStax Enterpri

DataStax 1.3k Dec 25, 2022
This is a repository for a task assigned to me by Bilateral solutions!

Processing-Files-using-MySQL This is a repository for a task assigned to me by Bilateral solutions! Task: Make Folders named Processing,queue and proc

Kandal Khandeka 1 Nov 07, 2022
aioodbc - is a library for accessing a ODBC databases from the asyncio

aioodbc aioodbc is a Python 3.5+ module that makes it possible to access ODBC databases with asyncio. It relies on the awesome pyodbc library and pres

aio-libs 253 Dec 31, 2022
google-cloud-bigtable Apache-2google-cloud-bigtable (🥈31 · ⭐ 3.5K) - Google Cloud Bigtable API client library. Apache-2

Python Client for Google Cloud Bigtable Google Cloud Bigtable is Google's NoSQL Big Data database service. It's the same database that powers many cor

Google APIs 39 Dec 03, 2022
Script em python para carregar os arquivos de cnpj dos dados públicos da Receita Federal em MYSQL.

cnpj-mysql Script em python para carregar os arquivos de cnpj dos dados públicos da Receita Federal em MYSQL. Dados públicos de cnpj no site da Receit

17 Dec 25, 2022
GINO Is Not ORM - a Python asyncio ORM on SQLAlchemy core.

GINO - GINO Is Not ORM - is a lightweight asynchronous ORM built on top of SQLAlchemy core for Python asyncio. GINO 1.0 supports only PostgreSQL with

GINO Community 2.5k Dec 27, 2022
A tiny python web application based on Flask to set, get, expire, delete keys of Redis database easily with direct link at the browser.

First Redis Python (CRUD) A tiny python web application based on Flask to set, get, expire, delete keys of Redis database easily with direct link at t

Max Base 9 Dec 24, 2022
Application which allows you to make PostgreSQL databases with Python

Automate PostgreSQL Databases with Python Application which allows you to make PostgreSQL databases with Python I used the psycopg2 library which is u

Marc-Alistair Coffi 0 Dec 31, 2021
MySQLdb is a Python DB API-2.0 compliant library to interact with MySQL 3.23-5.1 (unofficial mirror)

==================== MySQLdb Installation ==================== .. contents:: .. Prerequisites ------------- + Python 2.3.4 or higher * http://ww

Sébastien Arnaud 17 Oct 10, 2021
Python script to clone SQL dashboard from one workspace to another

Databricks dashboard clone Unofficial project to allow Databricks SQL dashboard copy from one workspace to another. Resource clone Setup: Create a fil

Quentin Ambard 12 Jan 01, 2023
Simple DDL Parser to parse SQL (HQL, TSQL, AWS Redshift, Snowflake and other dialects) ddl files to json/python dict with full information about columns: types, defaults, primary keys, etc.

Simple DDL Parser Build with ply (lex & yacc in python). A lot of samples in 'tests/. Is it Stable? Yes, library already has about 5000+ usage per day

Iuliia Volkova 95 Jan 05, 2023
Neo4j Bolt driver for Python

Neo4j Bolt Driver for Python This repository contains the official Neo4j driver for Python. Each driver release (from 4.0 upwards) is built specifical

Neo4j 762 Dec 30, 2022
Py2neo is a comprehensive toolkit for working with Neo4j from within Python applications or from the command line.

Py2neo v3 Py2neo is a client library and toolkit for working with Neo4j from within Python applications and from the command line. The core library ha

64 Oct 14, 2022
python-bigquery Apache-2python-bigquery (🥈34 · ⭐ 3.5K · 📈) - Google BigQuery API client library. Apache-2

Python Client for Google BigQuery Querying massive datasets can be time consuming and expensive without the right hardware and infrastructure. Google

Google APIs 550 Jan 01, 2023
#crypto #cipher #encode #decode #hash

🌹 CYPHER TOOLS 🌹 Written by TMRSWRR Version 1.0.0 All in one tools for CRYPTOLOGY. Instagram: Capture the Root 🖼️ Screenshots 🖼️ 📹 How to use 📹

50 Dec 23, 2022
Import entity definition document into SQLie3. Manage the entity. Also, create a "Create Table SQL file".

EntityDocumentMaker Version 1.00 After importing the entity definition (Excel file), store the data in sqlite3. エンティティ定義(Excelファイル)をインポートした後、データをsqlit

G-jon FujiYama 1 Jan 09, 2022
edaSQL is a library to link SQL to Exploratory Data Analysis and further more in the Data Engineering.

edaSQL is a python library to bridge the SQL with Exploratory Data Analysis where you can connect to the Database and insert the queries. The query results can be passed to the EDA tool which can giv

Tamil Selvan 8 Dec 12, 2022