API documentation
get
/
datasets
Get datasets
Retrieves a list of all datasets within a user's AWS S3 bucket.
curl --location --request GET 'https://api.sigtech.com/ingestion/datasets'
--header 'Authorization: Bearer <token>'
post
/
datasets
Post dataset
Creates a new dataset and generates a new UUID.
curl --location --request POST 'https://api.sigtech.com/ingestion/datasets' \
--header 'Authorization: Bearer <token>' \
--header 'Content-Type: text/plain' \
--data-raw '{
"name": "<dataset_name>",
"schema": [
{"name": "col1", "type": "str"},
{"name": "col2", "type": "int"},
{"name": "col3", "type": "timestamp[ms]"}
]
}'
Note: If a schema is not provided with the request, users must include values for
file
, upload_format
, __ file_format
, and file_id
.echo '{
"name": "<dataset_name>",
"file": "'"$(base64 <file_path>)"'",
"upload_format": "<upload_format>",
"file_format": "<file_format>",
"file_id": "<file_id>"
}' | curl --location --request POST 'https://api.sigtech.com/ingestion/datasets' \
--header 'Authorization: Bearer <token>' \
--header 'Content-Type: text/plain' \
-d @-
Notes:
- A UUID for the new dataset is randomly generated and returned in the response body.
- A schema can be provided in the request body in two ways: 1. As an array of objects with
name
andtype
fields, each representing a single column in the dataset. An attempt will be made to parse the schema into a pyarrow table. 2. As an instruction to download a file and parse it into a pyarrow table. - Schemas are currently unenforced. Files with different schemas may be uploaded to the same dataset. Although these individual files will be accessible, the entire dataset will be unreadable.
- Although datasets may share the same name, IDs must be unique.
get
/
dataset/<dataset_id>
Get dataset
Retrieves a list of all available file IDs and pre-signed download links for a specific dataset.
curl --location --request GET 'https://api.sigtech.com/ingestion/datasets/<dataset_id>' \
--header 'Authorization: Bearer <token>'
put
/
datasets/<dataset_id>
Put dataset
Creates or replaces a specific dataset.
Note: If a schema is not provided with the request, users must include values for
file
, upload_format
, file_format
, and __ file_id
.curl --location --request PUT 'https://api.sigtech.com/ingestion/datasets/<dataset_id>' \
--header 'Authorization: Bearer <token>' \
--header 'Content-Type: text/plain' \
--data-raw '{
"name": "<dataset_name>",
"schema": [
{"name": "col1", "type": "str"},
{"name": "col2", "type": "int"},
{"name": "col3", "type": "timestamp[ms]"}
]
}'
curl --location --request PUT 'https://api.sigtech.com/ingestion/datasets/<dataset_id>' \
--header 'Authorization: Bearer <token>' \
--header 'Content-Type: text/plain' \
--data-raw '{
"name": "<dataset_name>",
"file": "<presigned_file_url>",
"upload_format": "link",
"file_format": "csv",
"file_id": "<file_id>",
"parse_options": {
"delimiter": ";"
}
}'
The notes applying to
Post Dataset
are also applicable in this instance.delete
/
dataset/<dataset_id>
Delete dataset
Deletes a specific dataset.
curl --location --request DELETE 'https://api.sigtech.com/ingestion/datasets/<dataset_id>' \
--header 'Authorization: Bearer <token>'
get
/
datasets/<dataset_id>/files
Get dataset files
Retrieves a list of files uploaded to a specific dataset.
curl --location --request GET 'https://api.sigtech.com/ingestion/datasets/<dataset_id>/files' \
--header 'Authorization: Bearer <token>'
post
/
datasets/<dataset_id>/files
Post dataset file
Creates a new parquet file within a specific dataset.
curl --location --request POST 'https://api.sigtech.com/ingestion/datasets/<dataset_id>/files' \
--header 'Authorization: Bearer <token>' \
--header 'Content-Type: text/plain' \
--data-raw '{
"file": "<presigned_file_url>",
"upload_format": "link",
"file_format": "parquet",
"file_id": "<file_id>"
}'
Notes:
- A dataset file must be provided in a format parsable into a pyarrow table.This pyarrow table is exported as a parquet file into S3. Download links for these files are retrievable via
GET
requests.The raw file provided will be uploaded to S3 and is also retrievable via aGET
request. - The dataset file must be provided in a form corresponding to one of the available
upload_format
parameters:link
: Provide a pre-signed download link for the file.base64
: Provide the file in the form of base64-encoded bytes, with a maximum size of 10MB. - The file formats supported are parquet, csv, cel fand iles (xls, xlsx).
- Additional parsing parameters are available depending on format:CSV: Optional args such as delimiters can be specified in one of the following:Can not pass alternative parameters for IO, or engine.Parquet: No additional arguments are available.
delete
/
dataset/<dataset_id>/files
Delete dataset files
Deletes all files from a specific dataset.
curl --location --request DELETE 'https://api.sigtech.com/ingestion/datasets/<dataset_id>/files' \
--header 'Authorization: Bearer <token>' \
--header 'Content-Type: text/plain'
Notes:
- Even if the dataset no longer exists, files that once resided in that dataset will be deleted without error.
- This request deletes both parsed and raw files within a dataset
get
/
dataset/<dataset_id>/<file_id>
Get dataset file
Retrieves a list of details for a specific file within a specific dataset.
curl --location --request GET 'https://api.sigtech.com/ingestion/datasets/<dataset_id>/files/<file_id>' \
--header 'Authorization: Bearer <token>'
put
/
datasets/<dataset_id>/files/<file_id>
Put dataset file
Creates or replaces a new dataset file.
curl --location --request PUT 'https://api.sigtech.com/ingestion/datasets/<dataset_id>/files/<file_id>' \
--header 'Authorization: Bearer <token>' \
--header 'Content-Type: text/plain' \
--data-raw '{
"file": "<presigned_file_url>",
"upload_format": "link",
"file_format": "parquet",
"file_id": "<file_id>"
}'
Note: Optional parameters can also be provided for file parsing logic. To learn more, see the notes included for
Post Dataset File.
delete
/
dataset/<dataset_id>/files/<file_id>
Delete dataset file
Deletes a specific dataset file.
curl --location --request DELETE 'https://api.sigtech.com/ingestion/datasets/<dataset_id>/files/<file_id>' \
--header 'Authorization: Bearer <token>' \
--header 'Content-Type: text/plain
Notes:
- Even if the dataset no longer exists, files that once resided in that dataset will be deleted without error.
- This request deletes both parsed and raw files within a dataset.
Last modified 1mo ago