API documentation#

Datasets#

{% swagger baseUrl="/" path="datasets" method="get" summary="Get datasets" %}
{% swagger-description %}
Retrieves a list of all created datasets that user is permitted to view.

{% endswagger-description %}

{% swagger-parameter in="header" name="token" type="string" %}
Personal access token
{% endswagger-parameter %}

{% swagger-response status="200" description="Success response:" %}
[{"name": "<dataset_name>", "id": "<dataset_id>"}]
{% endswagger-response %}
{% endswagger %}

Retrieves a list of all datasets within a user’s AWS S3 bucket.

Example:#

curl --location --request GET 'https://api.sigtech.com/ingestion/datasets'
--header 'Authorization: Bearer <token>'
{% swagger baseUrl="/" path="datasets" method="post" summary="Post dataset" %}
{% swagger-description %}```
Create a single new dataset resource.

If schema is not provided, then all of

file

,

upload_format

,

file_format

and

file_id

are required.

\

{% endswagger-description %}

{% swagger-parameter in=”header” name=”Token” type=”string” %} Personal access token {% endswagger-parameter %}

{% swagger-parameter in=”body” name=”name” type=”string” %} Dataset name {% endswagger-parameter %}

{% swagger-parameter in=”body” name=”tags” type=”string” %} Key-value pair in the format :

\

{[string] : [string]} {% endswagger-parameter %}

{% swagger-parameter in=”body” name=”schema” type=”string” %} Schema of data in the format:

\

[{“name”:[string], “type”:[string]}] {% endswagger-parameter %}

{% swagger-parameter in=”body” name=”file” type=”string” %} File {% endswagger-parameter %}

{% swagger-parameter in=”body” name=”upload_format” type=”string” %} Format of upload, one of the following:

\

[base64 | link] {% endswagger-parameter %}

{% swagger-parameter in=”body” name=”file_format” type=”string” %} Format of file, one of the following:

\

[parquet | csv | xls | xlsx] {% endswagger-parameter %}

{% swagger-parameter in=”body” name=”file_id” type=”string” %} ID of file {% endswagger-parameter %}

{% swagger-response status=”200” description=”Success response: A UUID for the new dataset will be randomly generated and returned in the response body.” %}```

{
    "id": "<dataset_uuid>",
    "name": "<dataset_name>",
    "schema": [
        {"name": "<column_name", "type": "<column_type>"}
    ],
    "permissions": [{
        "entityId": "<entity_id>",
        "entityType": "<entity_type>",
        "action": "<allowed_action>"
    }],
    "tags": {
        "<tag_name>": "<tag_value>"
    }
}
{% endswagger-response %}
{% endswagger %}```

Creates a new dataset and generates a new UUID.

### Example (with schema):

```bash
curl --location --request POST 'https://api.sigtech.com/ingestion/datasets' \
--header 'Authorization: Bearer <token>' \
--header 'Content-Type: text/plain' \
--data-raw '{
    "name": "<dataset_name>",
    "schema": [
        {"name": "col1", "type": "str"},
        {"name": "col2", "type": "int"},
        {"name": "col3", "type": "timestamp[ms]"}
    ]
}'

Note: If a schema is not provided with the request, users must include values for file, upload_format, __ file_format, and file_id.

Example (without schema):#

echo '{
  "name": "<dataset_name>",
  "file": "'"$(base64 <file_path>)"'",
  "upload_format": "<upload_format>",
  "file_format": "<file_format>",
  "file_id": "<file_id>"
}' | curl --location --request POST 'https://api.sigtech.com/ingestion/datasets' \
--header 'Authorization: Bearer <token>' \
--header 'Content-Type: text/plain' \
-d @-

Notes:

  • A UUID for the new dataset is randomly generated and returned in the response body.

  • A schema can be provided in the request body in two ways:
    1. As an array of objects with name and type fields, each representing a single column in the dataset. An attempt will be made to parse the schema into a pyarrow table.
    2. As an instruction to download a file and parse it into a pyarrow table.

  • Schemas are currently unenforced. Files with different schemas may be uploaded to the same dataset. Although these individual files will be accessible, the entire dataset will be unreadable.

  • Although datasets may share the same name, IDs must be unique.

{% swagger baseUrl="/" path="dataset/<dataset_id>" method="get" summary="Get dataset" %}
{% swagger-description %}
Get details for a single dataset resource.

\

{% endswagger-description %}

{% swagger-parameter in="path" name="dataset_id" type="string" %}
Dataset ID
{% endswagger-parameter %}

{% swagger-parameter in="header" name="Token" type="string" %}
Personal access token
{% endswagger-parameter %}

{% swagger-response status="200" description="Success response:" %}
```javascript
{
    "id": "<dataset_uuid>",
    "name": "<dataset_name>",
    "schema": [
        {"name": "<column_name", "type": "<column_type>"}
    ],
    "permissions": [{
        "entityId": "<entity_id>",
        "entityType": "<entity_type>",
        "action": "<allowed_action>"
    }],
    "tags": {
        "<tag_name>": "<tag_value>"
    }
}

{% endswagger-response %} {% endswagger %}```

Retrieves a list of all available file IDs and pre-signed download links for a specific dataset.

Example:#

curl --location --request GET 'https://api.sigtech.com/ingestion/datasets/<dataset_id>' \
--header 'Authorization: Bearer <token>'
{% swagger baseUrl="/" path="datasets/<dataset_id>" method="put" summary="Put dataset" %}
{% swagger-description %}
Create or replace a single dataset resource.

\

If schema is not provided, then all of

`file`

,

`upload_format`

,

`file_format`

 and

`file_id`

 are required.
{% endswagger-description %}

{% swagger-parameter in="path" name="api_url" type="string" %}
Domain
{% endswagger-parameter %}

{% swagger-parameter in="path" name="dataset_id" type="string" %}
Dataset ID
{% endswagger-parameter %}

{% swagger-parameter in="header" name="Token" type="string" %}
Personal access token
{% endswagger-parameter %}

{% swagger-parameter in="body" name="name" type="string" %}
Dataset name
{% endswagger-parameter %}

{% swagger-parameter in="body" name="tags" type="string" %}
Key-value pair in the format :

\

{[string] : [string]}
{% endswagger-parameter %}

{% swagger-parameter in="body" name="schema" type="string" %}
Schema of data in the format:

\

\[{"name":[string], "type":[string]}]
{% endswagger-parameter %}

{% swagger-parameter in="body" name="file" type="string" %}
File
{% endswagger-parameter %}

{% swagger-parameter in="body" name="upload_format" type="string" %}
Format of upload, one of the following:

\

\[base64 | link]
{% endswagger-parameter %}

{% swagger-parameter in="body" name="file_format" type="string" %}
Format of file, one of the following:

\

\[parquet | csv | xls | xlsx]
{% endswagger-parameter %}

{% swagger-parameter in="body" name="file_id" type="string" %}
 ID of file
{% endswagger-parameter %}

{% swagger-response status="200" description="" %}
```bash
{
    "id": "<dataset_uuid>",
    "name": "<dataset_name>",
    "schema": [
        {"name": "<column_name", "type": "<column_type>"}
    ],
    "permissions": [{
        "entityId": "<entity_id>",
        "entityType": "<entity_type>",
        "action": "<allowed_action>"
    }],
    "tags": {
        "<tag_name>": "<tag_value>"
    }
}

{% endswagger-response %} {% endswagger %}```

Creates or replaces a specific dataset.

Note: If a schema is not provided with the request, users must include values for file, upload_format, file_format, and __ file_id.

Example (with schema):#

curl --location --request PUT 'https://api.sigtech.com/ingestion/datasets/<dataset_id>' \
--header 'Authorization: Bearer <token>' \
--header 'Content-Type: text/plain' \
--data-raw '{
    "name": "<dataset_name>",
    "schema": [
        {"name": "col1", "type": "str"},
        {"name": "col2", "type": "int"},
        {"name": "col3", "type": "timestamp[ms]"}
    ]
}'

Example (without schema):#

curl --location --request PUT 'https://api.sigtech.com/ingestion/datasets/<dataset_id>' \
--header 'Authorization: Bearer <token>' \
--header 'Content-Type: text/plain' \
--data-raw '{
    "name": "<dataset_name>",
    "file": "<presigned_file_url>",
    "upload_format": "link",
    "file_format": "csv",
    "file_id": "<file_id>",
    "parse_options": {
      "delimiter": ";"
    }
}'

The notes applying to Post Dataset are also applicable in this instance.

{% swagger baseUrl="/" path="dataset/<dataset_id>" method="delete" summary="Delete dataset" %}
{% swagger-description %}
Create or replace a single dataset resource.

\

{% endswagger-description %}

{% swagger-parameter in="path" name="api_url" type="string" %}
Domain
{% endswagger-parameter %}

{% swagger-parameter in="path" name="dataset_id" type="string" %}
Dataset ID
{% endswagger-parameter %}

{% swagger-parameter in="header" name="Token" type="string" %}
Personal access token
{% endswagger-parameter %}

{% swagger-response status="204" description="" %}
{% endswagger-response %}
{% endswagger %}

Deletes a specific dataset.

Example:#

curl --location --request DELETE 'https://api.sigtech.com/ingestion/datasets/<dataset_id>' \
--header 'Authorization: Bearer <token>'

Files#

{% swagger baseUrl="/" path="datasets/<dataset_id>/files" method="get" summary="Get dataset files" %}
{% swagger-description %}
Get a collection of pre-signed download links for each file uploaded to a single dataset resource.
{% endswagger-description %}

{% swagger-parameter in="path" name="api_url" type="string" %}
Domain
{% endswagger-parameter %}

{% swagger-parameter in="path" name="dataset_id" type="string" %}
Dataset ID
{% endswagger-parameter %}

{% swagger-parameter in="header" name="Token" type="string" %}
Personal access token
{% endswagger-parameter %}

{% swagger-response status="200" description="Success response:" %}
```javascript
{"ids": ["<file_id>"]}

{% endswagger-response %} {% endswagger %}```

Retrieves a list of files uploaded to a specific dataset.

Example:#

curl --location --request GET 'https://api.sigtech.com/ingestion/datasets/<dataset_id>/files' \
--header 'Authorization: Bearer <token>'
{% swagger baseUrl="/" path="datasets/<dataset_id>/files" method="post" summary="Post dataset file" %}
{% swagger-description %}
Get a collection of pre-signed download links for each file uploaded to a single dataset resource.

\

{% endswagger-description %}

{% swagger-parameter in="path" name="api_url" type="string" %}
Domain
{% endswagger-parameter %}

{% swagger-parameter in="path" name="dataset_id" type="string" %}
Dataset ID
{% endswagger-parameter %}

{% swagger-parameter in="header" name="Token" type="string" %}
Personal access token
{% endswagger-parameter %}

{% swagger-parameter in="body" name="file" type="string" %}
File
{% endswagger-parameter %}

{% swagger-parameter in="body" name="upload_format" type="string" %}
Format of upload, one of the following:

\

\[base64 | link]
{% endswagger-parameter %}

{% swagger-parameter in="body" name="file_format" type="object" %}
Format of file, one of the following:

\

\[parquet | csv | xls | xlsx]
{% endswagger-parameter %}

{% swagger-parameter in="body" name="file_id" type="string" %}
 ID of file
{% endswagger-parameter %}

{% swagger-response status="200" description="" %}

[<download_link>]

{% endswagger-response %}
{% endswagger %}

Creates a new parquet file within a specific dataset.

Example:#

curl --location --request POST 'https://api.sigtech.com/ingestion/datasets/<dataset_id>/files' \
--header 'Authorization: Bearer <token>' \
--header 'Content-Type: text/plain' \
--data-raw '{
    "file": "<presigned_file_url>",
    "upload_format": "link",
    "file_format": "parquet",
    "file_id": "<file_id>"
}'

Notes:

  • A dataset file must be provided in a format parsable into a pyarrow table.

    This pyarrow table is exported as a parquet file into S3. Download links for these files are retrievable via GET requests.

    The raw file provided will be uploaded to S3 and is also retrievable via a GET request.

  • The dataset file must be provided in a form corresponding to one of the available upload_formatparameters:

    link: Provide a pre-signed download link for the file.

    base64: Provide the file in the form of base64-encoded bytes, with a maximum size of 10MB.

  • The file formats supported are parquet, csv, cel fand iles (xls, xlsx).

  • Additional parsing parameters are available depending on format:

    CSV: Optional args such as delimiters can be specified in one of the following:

    read_options: Learn more.

    parse_options: Learn more.

    convert_options: Learn more.

    Excel (xls/xlsx): Learn more.

    Can not pass alternative parameters for IO, or engine.

    Parquet: No additional arguments are available.

{% swagger baseUrl="/" path="dataset/<dataset_id>/files" method="delete" summary="Delete dataset files" %}
{% swagger-description %}
Delete all files in a dataset.
{% endswagger-description %}

{% swagger-parameter in="path" name="api_url" type="string" %}
Domain
{% endswagger-parameter %}

{% swagger-parameter in="path" name="dataset_id" type="string" %}
Dataset ID
{% endswagger-parameter %}

{% swagger-parameter in="header" name="Token" type="string" %}
Personal access token
{% endswagger-parameter %}

{% swagger-response status="204" description="" %}
{% endswagger-response %}
{% endswagger %}

Deletes all files from a specific dataset.

Example:#

curl --location --request DELETE 'https://api.sigtech.com/ingestion/datasets/<dataset_id>/files' \
--header 'Authorization: Bearer <token>' \
--header 'Content-Type: text/plain'

Notes:

  • Even if the dataset no longer exists, files that once resided in that dataset will be deleted without error.

  • This request deletes both parsed and raw files within a dataset

{% swagger baseUrl="/" path="dataset/<dataset_id>/<file_id>" method="get" summary="Get dataset file" %}
{% swagger-description %}
Get download links for a single dataset file.
{% endswagger-description %}

{% swagger-parameter in="path" name="api_url" type="string" %}
Domain
{% endswagger-parameter %}

{% swagger-parameter in="path" name="dataset_id" type="string" %}
Dataset ID
{% endswagger-parameter %}

{% swagger-parameter in="path" name="file_id" type="string" %}
File ID
{% endswagger-parameter %}

{% swagger-parameter in="header" name="Token" type="string" %}
Personal access token
{% endswagger-parameter %}

{% swagger-response status="200" description="" %}
```javascript
{
  "file_key": "/datasets/<dataset_id>/<file_id>.snappy.parquet",
  "raw_file_key": "/datasets/<dataset_id>/raw/<file_id>.<file_extension>",
  "schema": [{"name":  "<column_name>", "type": "<column_data_type>"}]
}

{% endswagger-response %} {% endswagger %}

Retrieves a list of details for a specific file within a specific dataset.

### Example:

```bash
curl --location --request GET 'https://api.sigtech.com/ingestion/datasets/<dataset_id>/files/<file_id>' \
--header 'Authorization: Bearer <token>'
{% swagger baseUrl="/" path="datasets/<dataset_id>/files/<file_id>" method="put" summary="Put dataset file" %}
{% swagger-description %}
Create or replace a single dataset file

\

{% endswagger-description %}

{% swagger-parameter in="header" name="Token" type="string" %}
Personal access token
{% endswagger-parameter %}

{% swagger-parameter in="body" name="file" type="string" %}
File
{% endswagger-parameter %}

{% swagger-parameter in="body" name="upload_format" type="string" %}
Format of upload, one of the following:

\

\[base64 | link]
{% endswagger-parameter %}

{% swagger-parameter in="body" name="file_format" type="object" %}
Format of file, one of the following:

\

\[parquet | csv | xls | xlsx]
{% endswagger-parameter %}

{% swagger-parameter in="body" name="file_id" type="string" %}
 ID of file
{% endswagger-parameter %}

{% swagger-response status="200" description="Success response: A UUID for the new dataset will be randomly generated and returned in the response body." %}
```javascript
{
  "file_key": "datasets/<dataset_id>/<file_id>",
  "raw_file_key": "datasets/<dataset_id>/raw/<file_id>.<file_extension>",
  "schema": [{"name":  "<column_name>", "type": "<column_data_type>"}]
}

{% endswagger-response %} {% endswagger %}

Creates or replaces a new dataset file.

### Example:

```bash
curl --location --request PUT 'https://api.sigtech.com/ingestion/datasets/<dataset_id>/files/<file_id>' \
--header 'Authorization: Bearer <token>' \
--header 'Content-Type: text/plain' \
--data-raw '{
    "file": "<presigned_file_url>",
    "upload_format": "link",
    "file_format": "parquet",
    "file_id": "<file_id>"
}'

Note: Optional parameters can also be provided for file parsing logic. To learn more, see the notes included for Post Dataset File.

{% swagger baseUrl="/" path="dataset/<dataset_id>/files/<file_id>" method="delete" summary="Delete dataset file" %}
{% swagger-description %}
Delete a file within a dataset.
{% endswagger-description %}

{% swagger-parameter in="header" name="Token" type="string" %}
Personal access token
{% endswagger-parameter %}

{% swagger-response status="204" description="" %}
{% endswagger-response %}
{% endswagger %}

Deletes a specific dataset file.

Example:#

curl --location --request DELETE 'https://api.sigtech.com/ingestion/datasets/<dataset_id>/files/<file_id>' \
--header 'Authorization: Bearer <token>' \
--header 'Content-Type: text/plain

Notes:

  • Even if the dataset no longer exists, files that once resided in that dataset will be deleted without error.

  • This request deletes both parsed and raw files within a dataset.