API documentation#
Datasets#
{% swagger baseUrl="/" path="datasets" method="get" summary="Get datasets" %}
{% swagger-description %}
Retrieves a list of all created datasets that user is permitted to view.
{% endswagger-description %}
{% swagger-parameter in="header" name="token" type="string" %}
Personal access token
{% endswagger-parameter %}
{% swagger-response status="200" description="Success response:" %}
[{"name": "<dataset_name>", "id": "<dataset_id>"}]
{% endswagger-response %}
{% endswagger %}
Retrieves a list of all datasets within a user’s AWS S3 bucket.
Example:#
curl --location --request GET 'https://api.sigtech.com/ingestion/datasets'
--header 'Authorization: Bearer <token>'
{% swagger baseUrl="/" path="datasets" method="post" summary="Post dataset" %}
{% swagger-description %}```
Create a single new dataset resource.
If schema is not provided, then all of
file
,
upload_format
,
file_format
and
file_id
are required.
\
{% endswagger-description %}
{% swagger-parameter in=”header” name=”Token” type=”string” %} Personal access token {% endswagger-parameter %}
{% swagger-parameter in=”body” name=”name” type=”string” %} Dataset name {% endswagger-parameter %}
{% swagger-parameter in=”body” name=”tags” type=”string” %} Key-value pair in the format :
\
{[string] : [string]} {% endswagger-parameter %}
{% swagger-parameter in=”body” name=”schema” type=”string” %} Schema of data in the format:
\
[{“name”:[string], “type”:[string]}] {% endswagger-parameter %}
{% swagger-parameter in=”body” name=”file” type=”string” %} File {% endswagger-parameter %}
{% swagger-parameter in=”body” name=”upload_format” type=”string” %} Format of upload, one of the following:
\
[base64 | link] {% endswagger-parameter %}
{% swagger-parameter in=”body” name=”file_format” type=”string” %} Format of file, one of the following:
\
[parquet | csv | xls | xlsx] {% endswagger-parameter %}
{% swagger-parameter in=”body” name=”file_id” type=”string” %} ID of file {% endswagger-parameter %}
{% swagger-response status=”200” description=”Success response: A UUID for the new dataset will be randomly generated and returned in the response body.” %}```
{
"id": "<dataset_uuid>",
"name": "<dataset_name>",
"schema": [
{"name": "<column_name", "type": "<column_type>"}
],
"permissions": [{
"entityId": "<entity_id>",
"entityType": "<entity_type>",
"action": "<allowed_action>"
}],
"tags": {
"<tag_name>": "<tag_value>"
}
}
{% endswagger-response %}
{% endswagger %}```
Creates a new dataset and generates a new UUID.
### Example (with schema):
```bash
curl --location --request POST 'https://api.sigtech.com/ingestion/datasets' \
--header 'Authorization: Bearer <token>' \
--header 'Content-Type: text/plain' \
--data-raw '{
"name": "<dataset_name>",
"schema": [
{"name": "col1", "type": "str"},
{"name": "col2", "type": "int"},
{"name": "col3", "type": "timestamp[ms]"}
]
}'
Note: If a schema is not provided with the request, users must include values for file
, upload_format
, __ file_format
, and file_id
.
Example (without schema):#
echo '{
"name": "<dataset_name>",
"file": "'"$(base64 <file_path>)"'",
"upload_format": "<upload_format>",
"file_format": "<file_format>",
"file_id": "<file_id>"
}' | curl --location --request POST 'https://api.sigtech.com/ingestion/datasets' \
--header 'Authorization: Bearer <token>' \
--header 'Content-Type: text/plain' \
-d @-
Notes:
A UUID for the new dataset is randomly generated and returned in the response body.
A schema can be provided in the request body in two ways:
1. As an array of objects withname
andtype
fields, each representing a single column in the dataset. An attempt will be made to parse the schema into a pyarrow table.
2. As an instruction to download a file and parse it into a pyarrow table.Schemas are currently unenforced. Files with different schemas may be uploaded to the same dataset. Although these individual files will be accessible, the entire dataset will be unreadable.
Although datasets may share the same name, IDs must be unique.
{% swagger baseUrl="/" path="dataset/<dataset_id>" method="get" summary="Get dataset" %}
{% swagger-description %}
Get details for a single dataset resource.
\
{% endswagger-description %}
{% swagger-parameter in="path" name="dataset_id" type="string" %}
Dataset ID
{% endswagger-parameter %}
{% swagger-parameter in="header" name="Token" type="string" %}
Personal access token
{% endswagger-parameter %}
{% swagger-response status="200" description="Success response:" %}
```javascript
{
"id": "<dataset_uuid>",
"name": "<dataset_name>",
"schema": [
{"name": "<column_name", "type": "<column_type>"}
],
"permissions": [{
"entityId": "<entity_id>",
"entityType": "<entity_type>",
"action": "<allowed_action>"
}],
"tags": {
"<tag_name>": "<tag_value>"
}
}
{% endswagger-response %} {% endswagger %}```
Retrieves a list of all available file IDs and pre-signed download links for a specific dataset.
Example:#
curl --location --request GET 'https://api.sigtech.com/ingestion/datasets/<dataset_id>' \
--header 'Authorization: Bearer <token>'
{% swagger baseUrl="/" path="datasets/<dataset_id>" method="put" summary="Put dataset" %}
{% swagger-description %}
Create or replace a single dataset resource.
\
If schema is not provided, then all of
`file`
,
`upload_format`
,
`file_format`
and
`file_id`
are required.
{% endswagger-description %}
{% swagger-parameter in="path" name="api_url" type="string" %}
Domain
{% endswagger-parameter %}
{% swagger-parameter in="path" name="dataset_id" type="string" %}
Dataset ID
{% endswagger-parameter %}
{% swagger-parameter in="header" name="Token" type="string" %}
Personal access token
{% endswagger-parameter %}
{% swagger-parameter in="body" name="name" type="string" %}
Dataset name
{% endswagger-parameter %}
{% swagger-parameter in="body" name="tags" type="string" %}
Key-value pair in the format :
\
{[string] : [string]}
{% endswagger-parameter %}
{% swagger-parameter in="body" name="schema" type="string" %}
Schema of data in the format:
\
\[{"name":[string], "type":[string]}]
{% endswagger-parameter %}
{% swagger-parameter in="body" name="file" type="string" %}
File
{% endswagger-parameter %}
{% swagger-parameter in="body" name="upload_format" type="string" %}
Format of upload, one of the following:
\
\[base64 | link]
{% endswagger-parameter %}
{% swagger-parameter in="body" name="file_format" type="string" %}
Format of file, one of the following:
\
\[parquet | csv | xls | xlsx]
{% endswagger-parameter %}
{% swagger-parameter in="body" name="file_id" type="string" %}
ID of file
{% endswagger-parameter %}
{% swagger-response status="200" description="" %}
```bash
{
"id": "<dataset_uuid>",
"name": "<dataset_name>",
"schema": [
{"name": "<column_name", "type": "<column_type>"}
],
"permissions": [{
"entityId": "<entity_id>",
"entityType": "<entity_type>",
"action": "<allowed_action>"
}],
"tags": {
"<tag_name>": "<tag_value>"
}
}
{% endswagger-response %} {% endswagger %}```
Creates or replaces a specific dataset.
Note: If a schema is not provided with the request, users must include values for file
, upload_format
, file_format
, and __ file_id
.
Example (with schema):#
curl --location --request PUT 'https://api.sigtech.com/ingestion/datasets/<dataset_id>' \
--header 'Authorization: Bearer <token>' \
--header 'Content-Type: text/plain' \
--data-raw '{
"name": "<dataset_name>",
"schema": [
{"name": "col1", "type": "str"},
{"name": "col2", "type": "int"},
{"name": "col3", "type": "timestamp[ms]"}
]
}'
Example (without schema):#
curl --location --request PUT 'https://api.sigtech.com/ingestion/datasets/<dataset_id>' \
--header 'Authorization: Bearer <token>' \
--header 'Content-Type: text/plain' \
--data-raw '{
"name": "<dataset_name>",
"file": "<presigned_file_url>",
"upload_format": "link",
"file_format": "csv",
"file_id": "<file_id>",
"parse_options": {
"delimiter": ";"
}
}'
The notes applying to Post Dataset
are also applicable in this instance.
{% swagger baseUrl="/" path="dataset/<dataset_id>" method="delete" summary="Delete dataset" %}
{% swagger-description %}
Create or replace a single dataset resource.
\
{% endswagger-description %}
{% swagger-parameter in="path" name="api_url" type="string" %}
Domain
{% endswagger-parameter %}
{% swagger-parameter in="path" name="dataset_id" type="string" %}
Dataset ID
{% endswagger-parameter %}
{% swagger-parameter in="header" name="Token" type="string" %}
Personal access token
{% endswagger-parameter %}
{% swagger-response status="204" description="" %}
{% endswagger-response %}
{% endswagger %}
Deletes a specific dataset.
Example:#
curl --location --request DELETE 'https://api.sigtech.com/ingestion/datasets/<dataset_id>' \
--header 'Authorization: Bearer <token>'
Files#
{% swagger baseUrl="/" path="datasets/<dataset_id>/files" method="get" summary="Get dataset files" %}
{% swagger-description %}
Get a collection of pre-signed download links for each file uploaded to a single dataset resource.
{% endswagger-description %}
{% swagger-parameter in="path" name="api_url" type="string" %}
Domain
{% endswagger-parameter %}
{% swagger-parameter in="path" name="dataset_id" type="string" %}
Dataset ID
{% endswagger-parameter %}
{% swagger-parameter in="header" name="Token" type="string" %}
Personal access token
{% endswagger-parameter %}
{% swagger-response status="200" description="Success response:" %}
```javascript
{"ids": ["<file_id>"]}
{% endswagger-response %} {% endswagger %}```
Retrieves a list of files uploaded to a specific dataset.
Example:#
curl --location --request GET 'https://api.sigtech.com/ingestion/datasets/<dataset_id>/files' \
--header 'Authorization: Bearer <token>'
{% swagger baseUrl="/" path="datasets/<dataset_id>/files" method="post" summary="Post dataset file" %}
{% swagger-description %}
Get a collection of pre-signed download links for each file uploaded to a single dataset resource.
\
{% endswagger-description %}
{% swagger-parameter in="path" name="api_url" type="string" %}
Domain
{% endswagger-parameter %}
{% swagger-parameter in="path" name="dataset_id" type="string" %}
Dataset ID
{% endswagger-parameter %}
{% swagger-parameter in="header" name="Token" type="string" %}
Personal access token
{% endswagger-parameter %}
{% swagger-parameter in="body" name="file" type="string" %}
File
{% endswagger-parameter %}
{% swagger-parameter in="body" name="upload_format" type="string" %}
Format of upload, one of the following:
\
\[base64 | link]
{% endswagger-parameter %}
{% swagger-parameter in="body" name="file_format" type="object" %}
Format of file, one of the following:
\
\[parquet | csv | xls | xlsx]
{% endswagger-parameter %}
{% swagger-parameter in="body" name="file_id" type="string" %}
ID of file
{% endswagger-parameter %}
{% swagger-response status="200" description="" %}
[<download_link>]
{% endswagger-response %}
{% endswagger %}
Creates a new parquet file within a specific dataset.
Example:#
curl --location --request POST 'https://api.sigtech.com/ingestion/datasets/<dataset_id>/files' \
--header 'Authorization: Bearer <token>' \
--header 'Content-Type: text/plain' \
--data-raw '{
"file": "<presigned_file_url>",
"upload_format": "link",
"file_format": "parquet",
"file_id": "<file_id>"
}'
Notes:
A dataset file must be provided in a format parsable into a pyarrow table.
This pyarrow table is exported as a parquet file into S3. Download links for these files are retrievable via
GET
requests.The raw file provided will be uploaded to S3 and is also retrievable via a
GET
request.The dataset file must be provided in a form corresponding to one of the available
upload_format
parameters:link
: Provide a pre-signed download link for the file.base64
: Provide the file in the form of base64-encoded bytes, with a maximum size of 10MB.The file formats supported are parquet, csv, cel fand iles (xls, xlsx).
Additional parsing parameters are available depending on format:
CSV: Optional args such as delimiters can be specified in one of the following:
read_options
: Learn more.parse_options
: Learn more.convert_options
: Learn more.Excel (xls/xlsx): Learn more.
Can not pass alternative parameters for IO, or engine.
Parquet: No additional arguments are available.
{% swagger baseUrl="/" path="dataset/<dataset_id>/files" method="delete" summary="Delete dataset files" %}
{% swagger-description %}
Delete all files in a dataset.
{% endswagger-description %}
{% swagger-parameter in="path" name="api_url" type="string" %}
Domain
{% endswagger-parameter %}
{% swagger-parameter in="path" name="dataset_id" type="string" %}
Dataset ID
{% endswagger-parameter %}
{% swagger-parameter in="header" name="Token" type="string" %}
Personal access token
{% endswagger-parameter %}
{% swagger-response status="204" description="" %}
{% endswagger-response %}
{% endswagger %}
Deletes all files from a specific dataset.
Example:#
curl --location --request DELETE 'https://api.sigtech.com/ingestion/datasets/<dataset_id>/files' \
--header 'Authorization: Bearer <token>' \
--header 'Content-Type: text/plain'
Notes:
Even if the dataset no longer exists, files that once resided in that dataset will be deleted without error.
This request deletes both parsed and raw files within a dataset
{% swagger baseUrl="/" path="dataset/<dataset_id>/<file_id>" method="get" summary="Get dataset file" %}
{% swagger-description %}
Get download links for a single dataset file.
{% endswagger-description %}
{% swagger-parameter in="path" name="api_url" type="string" %}
Domain
{% endswagger-parameter %}
{% swagger-parameter in="path" name="dataset_id" type="string" %}
Dataset ID
{% endswagger-parameter %}
{% swagger-parameter in="path" name="file_id" type="string" %}
File ID
{% endswagger-parameter %}
{% swagger-parameter in="header" name="Token" type="string" %}
Personal access token
{% endswagger-parameter %}
{% swagger-response status="200" description="" %}
```javascript
{
"file_key": "/datasets/<dataset_id>/<file_id>.snappy.parquet",
"raw_file_key": "/datasets/<dataset_id>/raw/<file_id>.<file_extension>",
"schema": [{"name": "<column_name>", "type": "<column_data_type>"}]
}
{% endswagger-response %} {% endswagger %}
Retrieves a list of details for a specific file within a specific dataset.
### Example:
```bash
curl --location --request GET 'https://api.sigtech.com/ingestion/datasets/<dataset_id>/files/<file_id>' \
--header 'Authorization: Bearer <token>'
{% swagger baseUrl="/" path="datasets/<dataset_id>/files/<file_id>" method="put" summary="Put dataset file" %}
{% swagger-description %}
Create or replace a single dataset file
\
{% endswagger-description %}
{% swagger-parameter in="header" name="Token" type="string" %}
Personal access token
{% endswagger-parameter %}
{% swagger-parameter in="body" name="file" type="string" %}
File
{% endswagger-parameter %}
{% swagger-parameter in="body" name="upload_format" type="string" %}
Format of upload, one of the following:
\
\[base64 | link]
{% endswagger-parameter %}
{% swagger-parameter in="body" name="file_format" type="object" %}
Format of file, one of the following:
\
\[parquet | csv | xls | xlsx]
{% endswagger-parameter %}
{% swagger-parameter in="body" name="file_id" type="string" %}
ID of file
{% endswagger-parameter %}
{% swagger-response status="200" description="Success response: A UUID for the new dataset will be randomly generated and returned in the response body." %}
```javascript
{
"file_key": "datasets/<dataset_id>/<file_id>",
"raw_file_key": "datasets/<dataset_id>/raw/<file_id>.<file_extension>",
"schema": [{"name": "<column_name>", "type": "<column_data_type>"}]
}
{% endswagger-response %} {% endswagger %}
Creates or replaces a new dataset file.
### Example:
```bash
curl --location --request PUT 'https://api.sigtech.com/ingestion/datasets/<dataset_id>/files/<file_id>' \
--header 'Authorization: Bearer <token>' \
--header 'Content-Type: text/plain' \
--data-raw '{
"file": "<presigned_file_url>",
"upload_format": "link",
"file_format": "parquet",
"file_id": "<file_id>"
}'
Note: Optional parameters can also be provided for file parsing logic. To learn more, see the notes included for Post Dataset File.
{% swagger baseUrl="/" path="dataset/<dataset_id>/files/<file_id>" method="delete" summary="Delete dataset file" %}
{% swagger-description %}
Delete a file within a dataset.
{% endswagger-description %}
{% swagger-parameter in="header" name="Token" type="string" %}
Personal access token
{% endswagger-parameter %}
{% swagger-response status="204" description="" %}
{% endswagger-response %}
{% endswagger %}
Deletes a specific dataset file.
Example:#
curl --location --request DELETE 'https://api.sigtech.com/ingestion/datasets/<dataset_id>/files/<file_id>' \
--header 'Authorization: Bearer <token>' \
--header 'Content-Type: text/plain
Notes:
Even if the dataset no longer exists, files that once resided in that dataset will be deleted without error.
This request deletes both parsed and raw files within a dataset.