Python client for the Socrata Open Data API

Overview

PyPI version Build Status Code Coverage

sodapy

sodapy is a python client for the Socrata Open Data API.

Installation

You can install with pip install sodapy.

If you want to install from source, then clone this repository and run python setup.py install from the project root.

Requirements

At its core, this library depends heavily on the Requests package. All other requirements can be found in requirements.txt. sodapy is currently compatible with Python 3.5, 3.6, 3.7 and 3.8.

Documentation

The official Socrata Open Data API docs provide thorough documentation of the available methods, as well as other client libraries. A quick list of eligible domains to use with this API is available via the Socrata Discovery API or Socrata's Open Data Network.

This library supports writing directly to datasets with the Socrata Open Data API. For write operations that use data transformations in the Socrata Data Management Experience (the user interface for creating datasets), use the Socrata Data Management API. For more details on when to use SODA vs the Data Management API, see the Data Management API documentation. A Python SDK for the Socrata Data Management API can be found at socrata-py.

Examples

There are some jupyter notebooks in the examples directory with usage examples of sodapy in action.

Interface

Table of Contents

client

Import the library and set up a connection to get started.

>>> from sodapy import Socrata
>>> client = Socrata(
        "sandbox.demo.socrata.com",
        "FakeAppToken",
        username="[email protected]",
        password="mypassword",
        timeout=10
    )

username and password are only required for creating or modifying data. An application token isn't strictly required (can be None), but queries executed from a client without an application token will be subjected to strict throttling limits. You may want to increase the timeout seconds when making large requests. To create a bare-bones client:

>>> client = Socrata("sandbox.demo.socrata.com", None)

A client can also be created with a context manager to obviate the need for teardown:

>>> with Socrata("sandbox.demo.socrata.com", None) as client:
>>>    # do some stuff

The client, by default, makes requests over HTTPS. To modify this behavior, or to make requests through a proxy, take a look here.

datasets(limit=0, offset=0)

Retrieve datasets associated with a particular domain. The optional limit and offset keyword args can be used to retrieve a subset of the datasets. By default, all datasets are returned.

>>> client.datasets()
[{"resource" : {"name" : "Approved Building Permits", "id" : "msk6-43c6", "parent_fxf" : null, "description" : "Data of approved building/construction permits",...}, {resource : {...}}, ...]

get(dataset_identifier, content_type="json", **kwargs)

Retrieve data from the requested resources. Filter and query data by field name, id, or using SoQL keywords.

>>> client.get("nimj-3ivp", limit=2)
[{u'geolocation': {u'latitude': u'41.1085', u'needs_recoding': False, u'longitude': u'-117.6135'}, u'version': u'9', u'source': u'nn', u'region': u'Nevada', u'occurred_at': u'2012-09-14T22:38:01', u'number_of_stations': u'15', u'depth': u'7.60', u'magnitude': u'2.7', u'earthquake_id': u'00388610'}, {...}]

>>> client.get("nimj-3ivp", where="depth > 300", order="magnitude DESC", exclude_system_fields=False)
[{u'geolocation': {u'latitude': u'-15.563', u'needs_recoding': False, u'longitude': u'-175.6104'}, u'version': u'9', u':updated_at': 1348778988, u'number_of_stations': u'275', u'region': u'Tonga', u':created_meta': u'21484', u'occurred_at': u'2012-09-13T21:16:43', u':id': 132, u'source': u'us', u'depth': u'328.30', u'magnitude': u'4.8', u':meta': u'{\n}', u':updated_meta': u'21484', u'earthquake_id': u'c000cnb5', u':created_at': 1348778988}, {...}]

>>> client.get("nimj-3ivp/193", exclude_system_fields=False)
{u'geolocation': {u'latitude': u'21.6711', u'needs_recoding': False, u'longitude': u'142.9236'}, u'version': u'C', u':updated_at': 1348778988, u'number_of_stations': u'136', u'region': u'Mariana Islands region', u':created_meta': u'21484', u'occurred_at': u'2012-09-13T11:19:07', u':id': 193, u'source': u'us', u'depth': u'300.70', u'magnitude': u'4.4', u':meta': u'{\n}', u':updated_meta': u'21484', u':position': 193, u'earthquake_id': u'c000cmsq', u':created_at': 1348778988}

>>> client.get("nimj-3ivp", region="Kansas")
[{u'geolocation': {u'latitude': u'38.10', u'needs_recoding': False, u'longitude': u'-100.6135'}, u'version': u'9', u'source': u'nn', u'region': u'Kansas', u'occurred_at': u'2010-09-19T20:52:09', u'number_of_stations': u'15', u'depth': u'300.0', u'magnitude': u'1.9', u'earthquake_id': u'00189621'}, {...}]

get_all(dataset_identifier, content_type="json", **kwargs)

Read data from the requested resource, paginating over all results. Accepts the same arguments as get(). Returns a generator.

>>> client.get_all("nimj-3ivp")
<generator object Socrata.get_all at 0x7fa0dc8be7b0>

>>> for item in client.get_all("nimj-3ivp"):
...     print(item)
...
{'geolocation': {'latitude': '-15.563', 'needs_recoding': False, 'longitude': '-175.6104'}, 'version': '9', ':updated_at': 1348778988, 'number_of_stations': '275', 'region': 'Tonga', ':created_meta': '21484', 'occurred_at': '2012-09-13T21:16:43', ':id': 132, 'source': 'us', 'depth': '328.30', 'magnitude': '4.8', ':meta': '{\n}', ':updated_meta': '21484', 'earthquake_id': 'c000cnb5', ':created_at': 1348778988}
...

>>> import itertools
>>> items = client.get_all("nimj-3ivp")
>>> first_five = list(itertools.islice(items, 5))
>>> len(first_five)
5

get_metadata(dataset_identifier, content_type="json")

Retrieve the metadata associated with a particular dataset.

>>> client.get_metadata("nimj-3ivp")
{"newBackend": false, "licenseId": "CC0_10", "publicationDate": 1436655117, "viewLastModified": 1451289003, "owner": {"roleName": "administrator", "rights": [], "displayName": "Brett", "id": "cdqe-xcn5", "screenName": "Brett"}, "query": {}, "id": "songs", "createdAt": 1398014181, "category": "Public Safety", "publicationAppendEnabled": true, "publicationStage": "published", "rowsUpdatedBy": "cdqe-xcn5", "publicationGroup": 1552205, "displayType": "table", "state": "normal", "attributionLink": "http://foo.bar.com", "tableId": 3523378, "columns": [], "metadata": {"rdfSubject": "0", "renderTypeConfig": {"visible": {"table": true}}, "availableDisplayTypes": ["table", "fatrow", "page"], "attachments": ... }}

update_metadata(dataset_identifier, update_fields, content_type="json")

Update the metadata for a particular dataset. update_fields should be a dictionary containing only the metadata keys that you wish to overwrite.

Note: Invalid payloads to this method could corrupt the dataset or visualization. See this comment for more information.

>>> client.update_metadata("nimj-3ivp", {"attributionLink": "https://anothertest.com"})
{"newBackend": false, "licenseId": "CC0_10", "publicationDate": 1436655117, "viewLastModified": 1451289003, "owner": {"roleName": "administrator", "rights": [], "displayName": "Brett", "id": "cdqe-xcn5", "screenName": "Brett"}, "query": {}, "id": "songs", "createdAt": 1398014181, "category": "Public Safety", "publicationAppendEnabled": true, "publicationStage": "published", "rowsUpdatedBy": "cdqe-xcn5", "publicationGroup": 1552205, "displayType": "table", "state": "normal", "attributionLink": "https://anothertest.com", "tableId": 3523378, "columns": [], "metadata": {"rdfSubject": "0", "renderTypeConfig": {"visible": {"table": true}}, "availableDisplayTypes": ["table", "fatrow", "page"], "attachments": ... }}

download_attachments(dataset_identifier, content_type="json", download_dir="~/sodapy_downloads")

Download all attachments associated with a dataset. Return a list of paths to the downloaded files.

>>> client.download_attachments("nimj-3ivp", download_dir="~/Desktop")
    ['/Users/xmunoz/Desktop/nimj-3ivp/FireIncident_Codes.PDF', '/Users/xmunoz/Desktop/nimj-3ivp/AccidentReport.jpg']

create(name, **kwargs)

Create a new dataset. Optionally, specify keyword args such as:

  • description description of the dataset
  • columns list of fields
  • category dataset category (must exist in /admin/metadata)
  • tags list of tag strings
  • row_identifier field name of primary key
  • new_backend whether to create the dataset in the new backend

Example usage:

>>> columns = [{"fieldName": "delegation", "name": "Delegation", "dataTypeName": "text"}, {"fieldName": "members", "name": "Members", "dataTypeName": "number"}]
>>> tags = ["politics", "geography"]
>>> client.create("Delegates", description="List of delegates", columns=columns, row_identifier="delegation", tags=tags, category="Transparency")
{u'id': u'2frc-hyvj', u'name': u'Foo Bar', u'description': u'test dataset', u'publicationStage': u'unpublished', u'columns': [ { u'name': u'Foo', u'dataTypeName': u'text', u'fieldName': u'foo', ... }, { u'name': u'Bar', u'dataTypeName': u'number', u'fieldName': u'bar', ... } ], u'metadata': { u'rowIdentifier': 230641051 }, ... }

publish(dataset_identifier, content_type="json")

Publish a dataset after creating it, i.e. take it out of 'working copy' mode. The dataset id id returned from create will be used to publish.

>>> client.publish("2frc-hyvj")
{u'id': u'2frc-hyvj', u'name': u'Foo Bar', u'description': u'test dataset', u'publicationStage': u'unpublished', u'columns': [ { u'name': u'Foo', u'dataTypeName': u'text', u'fieldName': u'foo', ... }, { u'name': u'Bar', u'dataTypeName': u'number', u'fieldName': u'bar', ... } ], u'metadata': { u'rowIdentifier': 230641051 }, ... }

set_permission(dataset_identifier, permission="private", content_type="json")

Set the permissions of a dataset to public or private.

>>> client.set_permission("2frc-hyvj", "public")
<Response [200]>

upsert(dataset_identifier, payload, content_type="json")

Create a new row in an existing dataset.

>>> data = [{'Delegation': 'AJU', 'Name': 'Alaska', 'Key': 'AL', 'Entity': 'Juneau'}]
>>> client.upsert("eb9n-hr43", data)
{u'Errors': 0, u'Rows Deleted': 0, u'Rows Updated': 0, u'By SID': 0, u'Rows Created': 1, u'By RowIdentifier': 0}

Update/Delete rows in a dataset.

>>> data = [{'Delegation': 'sfa', ':id': 8, 'Name': 'bar', 'Key': 'doo', 'Entity': 'dsfsd'}, {':id': 7, ':deleted': True}]
>>> client.upsert("eb9n-hr43", data)
{u'Errors': 0, u'Rows Deleted': 1, u'Rows Updated': 1, u'By SID': 2, u'Rows Created': 0, u'By RowIdentifier': 0}

upsert's can even be performed with a csv file.

>>> data = open("upsert_test.csv")
>>> client.upsert("eb9n-hr43", data)
{u'Errors': 0, u'Rows Deleted': 0, u'Rows Updated': 1, u'By SID': 1, u'Rows Created': 0, u'By RowIdentifier': 0}

replace(dataset_identifier, payload, content_type="json")

Similar in usage to upsert, but overwrites existing data.

>>> data = open("replace_test.csv")
>>> client.replace("eb9n-hr43", data)
{u'Errors': 0, u'Rows Deleted': 0, u'Rows Updated': 0, u'By SID': 0, u'Rows Created': 12, u'By RowIdentifier': 0}

create_non_data_file(params, file_obj)

Creates a new file-based dataset with the name provided in the files tuple. A valid file input would be:

files = (
    {'file': ("gtfs2", open('myfile.zip', 'rb'))}
)
>>> with open(nondatafile_path, 'rb') as f:
>>>     files = (
>>>         {'file': ("nondatafile.zip", f)}
>>>     )
>>>     response = client.create_non_data_file(params, files)

replace_non_data_file(dataset_identifier, params, file_obj)

Same as create_non_data_file, but replaces a file that already exists in a file-based dataset.

Note: a table-based dataset cannot be replaced by a file-based dataset. Use create_non_data_file in order to replace.

>>>  with open(nondatafile_path, 'rb') as f:
>>>      files = (
>>>          {'file': ("nondatafile.zip", f)}
>>>      )
>>>      response = client.replace_non_data_file(DATASET_IDENTIFIER, {}, files)

delete(dataset_identifier, row_id=None, content_type="json")

Delete an individual row.

>>> client.delete("nimj-3ivp", row_id=2)
<Response [200]>

Delete the entire dataset.

>>> client.delete("nimj-3ivp")
<Response [200]>

close()

Close the session when you're finished.

>>> client.close()

Run tests

$ pytest

Contributing

See CONTRIBUTING.md.

Meta

This package uses semantic versioning.

Source and wheel distributions are available on PyPI. Here is how I create those releases.

python3 setup.py bdist_wheel
python3 setup.py sdist
twine upload dist/*
Owner
Cristina
ACAB
Cristina
Instrument asyncio Python for distributed tracing with AWS X-Ray.

xraysink (aka xray-asyncio) Extra AWS X-Ray instrumentation to use distributed tracing with asyncio Python libraries that are not (yet) supported by t

Gary Donovan 12 Nov 10, 2022
Telegram 隨機色圖,支援每日自動爬取

Telegram 隨機色圖機器人 使用此原始碼的Bot 開放的隨機色圖機器人: @katonei_bot 已實現的功能 爬取每日R18排行榜 不夠色!再來一張 Tag 索引,指定Tag色圖 將爬取到的色圖轉為 WebP 格式儲存,節省空間 需要注意的事件 好久之前的怪東西,代碼質量不保證 請在使用A

cluckbird 15 Oct 18, 2021
A youtube search telegram bot.

YouTube-Search-Bot A youtube search telegram bot. Made with Python3 (C) @FayasNoushad Copyright permission under MIT License License - https://github

Fayas Noushad 22 Nov 12, 2022
Bot telegram yang menggemakan pesan apa pun yang Anda kirim atau modifikasi untuk menganonimkan pesan

Bot telegram yang menggemakan pesan apa pun yang Anda kirim atau modifikasi untuk menganonimkan pesan

KEN KAN 2 Oct 21, 2022
Autofilterv5 With Same more Features

Autofilterv5 With Same more Features ✨ Imbd + Index +.....

Selfie SD 8 Oct 21, 2022
Python client for the Echo Nest API

Pyechonest Tap into The Echo Nest's Musical Brain for the best music search, information, recommendations and remix tools on the web. Pyechonest is an

The Echo Nest 655 Dec 29, 2022
ALIEN: idA Local varIables rEcogNizer

ALIEN: idA Local varIables rEcogNizer ALIEN is an IDA Pro plugin that allows the user to get more information about ida local variables with the help

16 Nov 26, 2022
Currency And Gold Prices - Currency And Gold Prices For Python

Currency_And_Gold_Prices Photos from the app New Update Show range Change better

Ali HemmatNia 4 Sep 19, 2022
Music bot for Discord

Treble Music bot for Discord Youtube is after music bots on Discord. So we are here to fill the void. Introducing Treble, the next generation of Disco

Aja Khanal 0 Sep 16, 2022
Upvotes and karma for Discord: Heart 💗 or Crush 💔 a comment to give points to an user, or Star ⭐ it to add it to the Best Of!

🤖 Reto Reto is a community-oriented Discord bot, featuring a karma system, a way to reward the best comments, leaderboards, and so much more! React t

Erik Bianco Vera 3 May 07, 2022
A simple Telegram bot that can broadcast messages and media to the bot subscribers. with mongo DB support

𝘽𝙧𝙤𝙖𝙙𝙘𝙖𝙨𝙩 𝘽𝙤𝙩 A simple Telegram bot that can broadcast messages and media to the bot subscribers using MongoDB. Features Support mongodb.c

N A C BOTS 70 Jan 02, 2023
A slack bot that notifies you when a restaurant is available for orders

Slack Wolt Notifier A Slack bot that notifies you when a Wolt restaurant or venue is available for orders. How does it work? Slack supports bots that

Gil Matok 8 Oct 24, 2022
Pinopoly is a tool to remove the "banker" player and replace them with a digitalized system

Pinopoly is a tool to remove the "banker" player and replace them with a digitalized system. It is intended to be used on a Raspberry Pi but can be used in the command line as well.

Alex Overstreet 11 Jul 09, 2022
GET-ACQ is a python tool used to gather all companies acquired by a given company domain name.

get-acq 🏢 GET-ACQ is a python tool used to gather all companies acquired by a given company domain name. It is done by calling SecurityTrails API. Us

Milan 7 Dec 19, 2022
Select random winners for a Twitter giveaway

twitter_picker Select random winners for a Twitter giveaway Once the Twitter giveaway (or airdrop) is closed, assign a number to each participant. The

Michael Rawner 1 Dec 11, 2021
This bot can mention members upto 10,000 in groups and can mention members upto 200 in channels !

Mention All Bot This bot can mention members upto 10,000 in groups and can mention members upto 200 in channels ! 🏷 Infomation Language: Python. Tele

Anjana Madu 52 Dec 29, 2022
A small module to communicate with Triller's API

A small, UNOFFICIAL module to communicate with Triller's API. I plan to add more features/methods in the future.

A3R0 1 Nov 01, 2022
livestream-chat: Overlay para chats de livestreams

livestream-chat Overlay para chats de livestreams. Inicialmente para rodar dentro do browser do obs-studio. TODO: Issues iniciais Suporte a API do You

Eduardo Mendes 10 Dec 16, 2022
ВКонтакте бот для управления Sugar кошельком

Sugarchain VK ВКонтакте бот для управления Sugar кошельком Установка Установить зависимости можно командой: pip install -r requirements.txt Запуск (из

Vladimir 4 Jun 06, 2021
Обертка для мини-игры "рабы" на python

Slaves API Библиотека для игры Рабы на Python. Большая просьба Поставьте звездочку на репозиторий. Это много для меня значит. Версии Т.к. разработчики

Zdorov Philipp 13 Mar 31, 2021