Get started

Introduction

The Data Ingestion API enables you to ingest custom data.

This article covers a range of different use cases and data types, from using both signal and market price data to covering how custom market price data can be used to replace existing, or create new, tradable instruments.

Concepts

  • File: A portion of custom data uploaded to the SigTech platform and added to an existing dataset. To successfully add a file to a dataset, the file's schema must match the dataset's schema.

  • Dataset: Data that follows a specified schema. A dataset can ingest additional data, provided that the new data adheres to the specified schema. Each additional portion of data added to a dataset is called a file.

  • Schema: Every dataset uploaded to the SigTech platform needs to have a schema. A schema is the organisation of columns within a table, comprised of both the column names and the type of data included in the columns. Once the dataset has been created and a schema specified, files following that same schema can be uploaded and added to the dataset.

Prerequisites

Personal access token

To access the Data Extraction API you need to generate a personal access token. This token can be parsed as an authorisation header.

To generate the token, click on your user profile in the top right corner of the SigTech platform > Access Tokens > GENERATE TOKEN.

Upload data from local memory or local files

Upload from local memory

There are two available processes:

  • Creating a new dataset from an existing pandas dataframe.

  • Adding a pandas dataframe to an existing dataset.

Upload data

Note:

If no dataset_id is provided, the script will generate a new dataset based on the pandas dataframe. The schema that dataset follows will match the column structure used in the dataframe.

If a dataset_id is provided, the script will add the pandas dataframe to the existing dataset. This action will only be successful if the schema used in the dataframe matches the existing dataset's schema.

import base64
import io
import pandas as pd
import pyarrow as pa
from pyarrow import parquet as pq
import requests
import datetime


def upload_dataframe(token: str, df: pd.DataFrame, dataset_name: str = "new_test",
                     file_id: str = None, dataset_id: str = None):
    """
    Uploads dataframe by either creating a new dataset or appends to
    an existing dataset if dataset_id is provided.
    """
    with io.BytesIO() as f:
        pq.write_table(pa.Table.from_pandas(df), f)
        f.seek(0)

        # Set file_id to current time if not provided
        if not file_id:
            file_id = datetime.datetime.now().strftime('%Y%m%d%H%M%S')

        body = {
            'name': dataset_name,
            'file': base64.b64encode(f.read()).decode(),
            'upload_format': 'base64',
            'file_format': 'parquet',
            'file_id': file_id,
        }
        headers = {
            'Authorization': f'Bearer {token}'
        }
        if dataset_id:
            response = requests.put(f'https://api.sigtech.com/ingestion/datasets/{dataset_id}/files', json=body,
                                    headers=headers)
        else:
            response = requests.post('https://api.sigtech.com/ingestion/datasets', json=body, headers=headers)
    return response.json()

Create a new dataset

To create a new dataset, the following code should be added to the above script:

token = ''                      # Acquired SigTech token
df = pd.DataFrame()             # Dataframe to be uploaded
dataset_name = 'test_dataset'   # assign relevant name to dataset
response = upload_dataframe(token, df, dataset_name)

# Retrieve the dataset_id to be used on the SigTech platform
# for querying the dataset
response.get('id')

Add data to an existing dataset

Note: Adding data to an existing dataset requires that the dataset_id is provided.

To add data to an existing dataset, the following code can be used together with the script for creating a new dataset:

token = ''                      # Acquired SigTech token
df = pd.DataFrame()             # Dataframe to be uploaded
dataset_name = 'test_dataset'   # assign relevant name to dataset
dataset_id = ''                 # assign relevant dataset_id to append to
response = upload_dataframe(token, df, dataset_name, dataset_id=dataset_id)

If a dataset hasn't been created, see Create a new dataset.

Upload from local files

There are two available processes:

  • Creating a new dataset from an existing local file.

  • Adding the contents from a local file to an existing dataset.

The API currently supports the upload of parquet, csv, xls, and xlsx file formats.

Upload data

If no dataset_id is provided, the script will generate a new dataset.

import base64
import requests
import datetime


def upload_data_from_file(token: str,  path: str, dataset_name: str, dataset_id: str = None,
                          file_format: str = 'csv', file_id: str = None):
    """
    Uploads data from file by either creating a new dataset or appends to
    an existing dataset if dataset_id is provided.
    """    
    # Set file_id to current time if none provided
    if not file_id:
        file_id = datetime.datetime.now().strftime('%Y%m%d%H%M%S')
    with open(path, 'rb') as f:
        body = {
            'name': dataset_name,
            'file': base64.b64encode(f.read()).decode(),
            'upload_format': 'base64',
            'file_format': file_format,
            'file_id': file_id,
        }
        headers = {
            'Authorization': f'Bearer {token}'
        }
        if dataset_id:
            response = requests.put(f'https://api.sigtech.com/ingestion/datasets/{dataset_id}/files',
                                    json=body, headers=headers)
        else:
            response = requests.post('https://api.sigtech.com/ingestion/datasets',
                                     json=body, headers=headers)
    return response.json()

Create a new dataset

To create a new dataset, the following code should be added to the script for uploading data:

token = ''                      # Acquired SigTech token
path = 'dummy_signal.csv'       # Path to existing local file
dataset_name = 'test_dataset'   # assign relevant name to dataset
response = upload_data_from_file(token, path, dataset_name)

# Retrieve the dataset_id to be used on the SigTech platform
# for querying the dataset
response.get('id')

Add data to an existing dataset

Note: Adding data to an existing dataset requires that the dataset_id is provided.

To add data to an existing dataset, the following code can be used together with the script for creating a new dataset:

token = ''                      # Acquired SigTech token
path = 'dummy_signal.csv'       # Path to existing local file
dataset_name = 'test_dataset'   # assign relevant name to dataset
dataset_id = ''                 # assign relevant dataset_id to append to
response = upload_data_from_file(token, path, dataset_name, dataset_id=dataset_id)

If a dataset hasn't been created, see Create a new dataset.

Last updated

© 2023 SIG Technologies Limited