API documentation

Datasets

Get datasets

GET /datasets

Retrieves a list of all created datasets that user is permitted to view.

\

Headers

NameTypeDescription

token

string

Personal access token

[{"name": "<dataset_name>", "id": "<dataset_id>"}] 

Retrieves a list of all datasets within a user's AWS S3 bucket.

Example:

curl --location --request GET 'https://api.sigtech.com/ingestion/datasets' 
--header 'Authorization: Bearer <token>'

Post dataset

POST /datasets

Create a single new dataset resource.

\

If schema is not provided, then all of

file

,

upload_format

,

file_format

and

file_id

are required.

\

Headers

NameTypeDescription

Token

string

Personal access token

Request Body

NameTypeDescription

name

string

Dataset name

tags

string

Key-value pair in the format :

\

{[string] : [string]}

schema

string

Schema of data in the format:

\

[{"name":[string], "type":[string]}]

file

string

File

upload_format

string

Format of upload, one of the following:

\

[base64 | link]

file_format

string

Format of file, one of the following:

\

[parquet | csv | xls | xlsx]

file_id

string

ID of file

{
    "id": "<dataset_uuid>",
    "name": "<dataset_name>",
    "schema": [
        {"name": "<column_name", "type": "<column_type>"}
    ],
    "permissions": [{
        "entityId": "<entity_id>",
        "entityType": "<entity_type>",
        "action": "<allowed_action>"
    }],
    "tags": {
        "<tag_name>": "<tag_value>"
    }
}

Creates a new dataset and generates a new UUID.

Example (with schema):

curl --location --request POST 'https://api.sigtech.com/ingestion/datasets' \
--header 'Authorization: Bearer <token>' \
--header 'Content-Type: text/plain' \
--data-raw '{
    "name": "<dataset_name>",
    "schema": [
        {"name": "col1", "type": "str"},
        {"name": "col2", "type": "int"},
        {"name": "col3", "type": "timestamp[ms]"}
    ]
}'

Note: If a schema is not provided with the request, users must include values for file, upload_format, __ file_format, and file_id.

Example (without schema):

echo '{                        
  "name": "<dataset_name>",
  "file": "'"$(base64 <file_path>)"'",
  "upload_format": "<upload_format>",
  "file_format": "<file_format>",
  "file_id": "<file_id>"
}' | curl --location --request POST 'https://api.sigtech.com/ingestion/datasets' \ 
--header 'Authorization: Bearer <token>' \
--header 'Content-Type: text/plain' \
-d @-

Notes:

  • A UUID for the new dataset is randomly generated and returned in the response body.

  • A schema can be provided in the request body in two ways: 1. As an array of objects with name and type fields, each representing a single column in the dataset. An attempt will be made to parse the schema into a pyarrow table. 2. As an instruction to download a file and parse it into a pyarrow table.

  • Schemas are currently unenforced. Files with different schemas may be uploaded to the same dataset. Although these individual files will be accessible, the entire dataset will be unreadable.

  • Although datasets may share the same name, IDs must be unique.

Get dataset

GET /dataset/<dataset_id>

Get details for a single dataset resource.

\

Path Parameters

NameTypeDescription

dataset_id

string

Dataset ID

Headers

NameTypeDescription

Token

string

Personal access token

{
    "id": "<dataset_uuid>",
    "name": "<dataset_name>",
    "schema": [
        {"name": "<column_name", "type": "<column_type>"}
    ],
    "permissions": [{
        "entityId": "<entity_id>",
        "entityType": "<entity_type>",
        "action": "<allowed_action>"
    }],
    "tags": {
        "<tag_name>": "<tag_value>"
    }
}

Retrieves a list of all available file IDs and pre-signed download links for a specific dataset.

Example:

curl --location --request GET 'https://api.sigtech.com/ingestion/datasets/<dataset_id>' \
--header 'Authorization: Bearer <token>'

Put dataset

PUT /datasets/<dataset_id>

Create or replace a single dataset resource.

\

If schema is not provided, then all of

file

,

upload_format

,

file_format

and

file_id

are required.

Path Parameters

NameTypeDescription

api_url

string

Domain

dataset_id

string

Dataset ID

Headers

NameTypeDescription

Token

string

Personal access token

Request Body

NameTypeDescription

name

string

Dataset name

tags

string

Key-value pair in the format :

\

{[string] : [string]}

schema

string

Schema of data in the format:

\

[{"name":[string], "type":[string]}]

file

string

File

upload_format

string

Format of upload, one of the following:

\

[base64 | link]

file_format

string

Format of file, one of the following:

\

[parquet | csv | xls | xlsx]

file_id

string

ID of file

{
    "id": "<dataset_uuid>",
    "name": "<dataset_name>",
    "schema": [
        {"name": "<column_name", "type": "<column_type>"}
    ],
    "permissions": [{
        "entityId": "<entity_id>",
        "entityType": "<entity_type>",
        "action": "<allowed_action>"
    }],
    "tags": {
        "<tag_name>": "<tag_value>"
    }
}

Creates or replaces a specific dataset.

Note: If a schema is not provided with the request, users must include values for file, upload_format, file_format, and __ file_id.

Example (with schema):

curl --location --request PUT 'https://api.sigtech.com/ingestion/datasets/<dataset_id>' \
--header 'Authorization: Bearer <token>' \
--header 'Content-Type: text/plain' \
--data-raw '{
    "name": "<dataset_name>",
    "schema": [
        {"name": "col1", "type": "str"},
        {"name": "col2", "type": "int"},
        {"name": "col3", "type": "timestamp[ms]"}
    ]
}'

Example (without schema):

curl --location --request PUT 'https://api.sigtech.com/ingestion/datasets/<dataset_id>' \
--header 'Authorization: Bearer <token>' \
--header 'Content-Type: text/plain' \
--data-raw '{
    "name": "<dataset_name>",
    "file": "<presigned_file_url>",
    "upload_format": "link",
    "file_format": "csv",
    "file_id": "<file_id>",
    "parse_options": {
      "delimiter": ";"
    }
}'

The notes applying to Post Dataset are also applicable in this instance.

Delete dataset

DELETE /dataset/<dataset_id>

Create or replace a single dataset resource.

\

Path Parameters

NameTypeDescription

api_url

string

Domain

dataset_id

string

Dataset ID

Headers

NameTypeDescription

Token

string

Personal access token

Deletes a specific dataset.

Example:

curl --location --request DELETE 'https://api.sigtech.com/ingestion/datasets/<dataset_id>' \
--header 'Authorization: Bearer <token>'

Files

Get dataset files

GET /datasets/<dataset_id>/files

Get a collection of pre-signed download links for each file uploaded to a single dataset resource.

Path Parameters

NameTypeDescription

api_url

string

Domain

dataset_id

string

Dataset ID

Headers

NameTypeDescription

Token

string

Personal access token

{"ids": ["<file_id>"]}

Retrieves a list of files uploaded to a specific dataset.

Example:

curl --location --request GET 'https://api.sigtech.com/ingestion/datasets/<dataset_id>/files' \
--header 'Authorization: Bearer <token>'

Post dataset file

POST /datasets/<dataset_id>/files

Get a collection of pre-signed download links for each file uploaded to a single dataset resource.

\

Path Parameters

NameTypeDescription

api_url

string

Domain

dataset_id

string

Dataset ID

Headers

NameTypeDescription

Token

string

Personal access token

Request Body

NameTypeDescription

file

string

File

upload_format

string

Format of upload, one of the following:

\

[base64 | link]

file_format

object

Format of file, one of the following:

\

[parquet | csv | xls | xlsx]

file_id

string

ID of file

 [<download_link>]

Creates a new parquet file within a specific dataset.

Example:

curl --location --request POST 'https://api.sigtech.com/ingestion/datasets/<dataset_id>/files' \
--header 'Authorization: Bearer <token>' \
--header 'Content-Type: text/plain' \
--data-raw '{
    "file": "<presigned_file_url>",
    "upload_format": "link",
    "file_format": "parquet",
    "file_id": "<file_id>"
}'

Notes:

  • A dataset file must be provided in a format parsable into a pyarrow table.

    This pyarrow table is exported as a parquet file into S3. Download links for these files are retrievable via GET requests.

    The raw file provided will be uploaded to S3 and is also retrievable via a GET request.

  • The dataset file must be provided in a form corresponding to one of the available upload_formatparameters:

    link: Provide a pre-signed download link for the file.

    base64: Provide the file in the form of base64-encoded bytes, with a maximum size of 10MB.

  • The file formats supported are parquet, csv, cel fand iles (xls, xlsx).

  • Additional parsing parameters are available depending on format:

    CSV: Optional args such as delimiters can be specified in one of the following:

    read_options: Learn more.

    parse_options: Learn more.

    convert_options: Learn more.

    Excel (xls/xlsx): Learn more.

    Can not pass alternative parameters for IO, or engine.

    Parquet: No additional arguments are available.

Delete dataset files

DELETE /dataset/<dataset_id>/files

Delete all files in a dataset.

Path Parameters

NameTypeDescription

api_url

string

Domain

dataset_id

string

Dataset ID

Headers

NameTypeDescription

Token

string

Personal access token

Deletes all files from a specific dataset.

Example:

curl --location --request DELETE 'https://api.sigtech.com/ingestion/datasets/<dataset_id>/files' \
--header 'Authorization: Bearer <token>' \
--header 'Content-Type: text/plain'

Notes:

  • Even if the dataset no longer exists, files that once resided in that dataset will be deleted without error.

  • This request deletes both parsed and raw files within a dataset

Get dataset file

GET /dataset/<dataset_id>/<file_id>

Get download links for a single dataset file.

Path Parameters

NameTypeDescription

api_url

string

Domain

dataset_id

string

Dataset ID

file_id

string

File ID

Headers

NameTypeDescription

Token

string

Personal access token

{
  "file_key": "/datasets/<dataset_id>/<file_id>.snappy.parquet",
  "raw_file_key": "/datasets/<dataset_id>/raw/<file_id>.<file_extension>",
  "schema": [{"name":  "<column_name>", "type": "<column_data_type>"}]
}

Retrieves a list of details for a specific file within a specific dataset.

Example:

curl --location --request GET 'https://api.sigtech.com/ingestion/datasets/<dataset_id>/files/<file_id>' \
--header 'Authorization: Bearer <token>'

Put dataset file

PUT /datasets/<dataset_id>/files/<file_id>

Create or replace a single dataset file

\

Headers

NameTypeDescription

Token

string

Personal access token

Request Body

NameTypeDescription

file

string

File

upload_format

string

Format of upload, one of the following:

\

[base64 | link]

file_format

object

Format of file, one of the following:

\

[parquet | csv | xls | xlsx]

file_id

string

ID of file

{
  "file_key": "datasets/<dataset_id>/<file_id>",
  "raw_file_key": "datasets/<dataset_id>/raw/<file_id>.<file_extension>",
  "schema": [{"name":  "<column_name>", "type": "<column_data_type>"}]
}

Creates or replaces a new dataset file.

Example:

curl --location --request PUT 'https://api.sigtech.com/ingestion/datasets/<dataset_id>/files/<file_id>' \
--header 'Authorization: Bearer <token>' \
--header 'Content-Type: text/plain' \
--data-raw '{
    "file": "<presigned_file_url>",
    "upload_format": "link",
    "file_format": "parquet",
    "file_id": "<file_id>"
}'

Note: Optional parameters can also be provided for file parsing logic. To learn more, see the notes included for Post Dataset File.

Delete dataset file

DELETE /dataset/<dataset_id>/files/<file_id>

Delete a file within a dataset.

Headers

NameTypeDescription

Token

string

Personal access token

Deletes a specific dataset file.

Example:

curl --location --request DELETE 'https://api.sigtech.com/ingestion/datasets/<dataset_id>/files/<file_id>' \
--header 'Authorization: Bearer <token>' \
--header 'Content-Type: text/plain

Notes:

  • Even if the dataset no longer exists, files that once resided in that dataset will be deleted without error.

  • This request deletes both parsed and raw files within a dataset.

Last updated

© 2023 SIG Technologies Limited