SigTech platform user guide
Search
⌃K

API documentation

Datasets

get
/
datasets
Get datasets
Retrieves a list of all datasets within a user's AWS S3 bucket.

Example:

curl --location --request GET 'https://api.sigtech.com/ingestion/datasets'
--header 'Authorization: Bearer <token>'
post
/
datasets
Post dataset
Creates a new dataset and generates a new UUID.

Example (with schema):

curl --location --request POST 'https://api.sigtech.com/ingestion/datasets' \
--header 'Authorization: Bearer <token>' \
--header 'Content-Type: text/plain' \
--data-raw '{
"name": "<dataset_name>",
"schema": [
{"name": "col1", "type": "str"},
{"name": "col2", "type": "int"},
{"name": "col3", "type": "timestamp[ms]"}
]
}'
Note: If a schema is not provided with the request, users must include values for file, upload_format, __ file_format, and file_id.

Example (without schema):

echo '{
"name": "<dataset_name>",
"file": "'"$(base64 <file_path>)"'",
"upload_format": "<upload_format>",
"file_format": "<file_format>",
"file_id": "<file_id>"
}' | curl --location --request POST 'https://api.sigtech.com/ingestion/datasets' \
--header 'Authorization: Bearer <token>' \
--header 'Content-Type: text/plain' \
-d @-
Notes:
  • A UUID for the new dataset is randomly generated and returned in the response body.
  • A schema can be provided in the request body in two ways: 1. As an array of objects with name and type fields, each representing a single column in the dataset. An attempt will be made to parse the schema into a pyarrow table. 2. As an instruction to download a file and parse it into a pyarrow table.
  • Schemas are currently unenforced. Files with different schemas may be uploaded to the same dataset. Although these individual files will be accessible, the entire dataset will be unreadable.
  • Although datasets may share the same name, IDs must be unique.
get
/
dataset/<dataset_id>
Get dataset
Retrieves a list of all available file IDs and pre-signed download links for a specific dataset.

Example:

curl --location --request GET 'https://api.sigtech.com/ingestion/datasets/<dataset_id>' \
--header 'Authorization: Bearer <token>'
put
/
datasets/<dataset_id>
Put dataset
Creates or replaces a specific dataset.
Note: If a schema is not provided with the request, users must include values for file, upload_format, file_format, and __ file_id.

Example (with schema):

curl --location --request PUT 'https://api.sigtech.com/ingestion/datasets/<dataset_id>' \
--header 'Authorization: Bearer <token>' \
--header 'Content-Type: text/plain' \
--data-raw '{
"name": "<dataset_name>",
"schema": [
{"name": "col1", "type": "str"},
{"name": "col2", "type": "int"},
{"name": "col3", "type": "timestamp[ms]"}
]
}'

Example (without schema):

curl --location --request PUT 'https://api.sigtech.com/ingestion/datasets/<dataset_id>' \
--header 'Authorization: Bearer <token>' \
--header 'Content-Type: text/plain' \
--data-raw '{
"name": "<dataset_name>",
"file": "<presigned_file_url>",
"upload_format": "link",
"file_format": "csv",
"file_id": "<file_id>",
"parse_options": {
"delimiter": ";"
}
}'
The notes applying to Post Dataset are also applicable in this instance.
delete
/
dataset/<dataset_id>
Delete dataset
Deletes a specific dataset.

Example:

curl --location --request DELETE 'https://api.sigtech.com/ingestion/datasets/<dataset_id>' \
--header 'Authorization: Bearer <token>'

Files

get
/
datasets/<dataset_id>/files
Get dataset files
Retrieves a list of files uploaded to a specific dataset.

Example:

curl --location --request GET 'https://api.sigtech.com/ingestion/datasets/<dataset_id>/files' \
--header 'Authorization: Bearer <token>'
post
/
datasets/<dataset_id>/files
Post dataset file
Creates a new parquet file within a specific dataset.

Example:

curl --location --request POST 'https://api.sigtech.com/ingestion/datasets/<dataset_id>/files' \
--header 'Authorization: Bearer <token>' \
--header 'Content-Type: text/plain' \
--data-raw '{
"file": "<presigned_file_url>",
"upload_format": "link",
"file_format": "parquet",
"file_id": "<file_id>"
}'
Notes:
  • A dataset file must be provided in a format parsable into a pyarrow table.
    This pyarrow table is exported as a parquet file into S3. Download links for these files are retrievable via GET requests.
    The raw file provided will be uploaded to S3 and is also retrievable via a GET request.
  • The dataset file must be provided in a form corresponding to one of the available upload_formatparameters:
    link: Provide a pre-signed download link for the file.
    base64: Provide the file in the form of base64-encoded bytes, with a maximum size of 10MB.
  • The file formats supported are parquet, csv, cel fand iles (xls, xlsx).
  • Additional parsing parameters are available depending on format:
    CSV: Optional args such as delimiters can be specified in one of the following:
    read_options: Learn more.
    parse_options: Learn more.
    convert_options: Learn more.
    Excel (xls/xlsx): Learn more.
    Can not pass alternative parameters for IO, or engine.
    Parquet: No additional arguments are available.
delete
/
dataset/<dataset_id>/files
Delete dataset files
Deletes all files from a specific dataset.

Example:

curl --location --request DELETE 'https://api.sigtech.com/ingestion/datasets/<dataset_id>/files' \
--header 'Authorization: Bearer <token>' \
--header 'Content-Type: text/plain'
Notes:
  • Even if the dataset no longer exists, files that once resided in that dataset will be deleted without error.
  • This request deletes both parsed and raw files within a dataset
get
/
dataset/<dataset_id>/<file_id>
Get dataset file
Retrieves a list of details for a specific file within a specific dataset.

Example:

curl --location --request GET 'https://api.sigtech.com/ingestion/datasets/<dataset_id>/files/<file_id>' \
--header 'Authorization: Bearer <token>'
put
/
datasets/<dataset_id>/files/<file_id>
Put dataset file
Creates or replaces a new dataset file.

Example:

curl --location --request PUT 'https://api.sigtech.com/ingestion/datasets/<dataset_id>/files/<file_id>' \
--header 'Authorization: Bearer <token>' \
--header 'Content-Type: text/plain' \
--data-raw '{
"file": "<presigned_file_url>",
"upload_format": "link",
"file_format": "parquet",
"file_id": "<file_id>"
}'
Note: Optional parameters can also be provided for file parsing logic. To learn more, see the notes included for Post Dataset File.
delete
/
dataset/<dataset_id>/files/<file_id>
Delete dataset file
Deletes a specific dataset file.

Example:

curl --location --request DELETE 'https://api.sigtech.com/ingestion/datasets/<dataset_id>/files/<file_id>' \
--header 'Authorization: Bearer <token>' \
--header 'Content-Type: text/plain
Notes:
  • Even if the dataset no longer exists, files that once resided in that dataset will be deleted without error.
  • This request deletes both parsed and raw files within a dataset.
© 2023 SIG Technologies Limited