Instantly fork feature engineering pipelines
Experiment with new features
Define features in notebooks
Jupyter Notebook
In []:
client = ChalkClient(branch="testing")
In []:
def name_match_score_v2(
  name: User.full_name,
  plaid_name: User.plaid.name_on_account,
) -> User.name_match_score:
  return jaccard_similarity(name, plaid_name)
In []:
df = client.offline_query(
{ "name_match_score": 0.82 }
Re-use your online pipelines
Instead of building an alternate pipeline for training set generation, Chalk allows you to automatically re-use online serving infrastructure to compute historical datasets. Chalk transforms inference pipelines into efficient batch pipelines and automatically time-filters data to make historical accuracy easy.
Batch and Streaming
In addition to online data sources, Chalk supports batch and streaming data sources. Chalk can automatically swap to data warehouses instead of online APIs or databases in order to source historical data points.
Notebook Support
Chalk’s Python SDK works in the Jupyter notebook of your choice–local Jupyter, Google Colab, Deepnote, Hex, or Databricks–if it can execute Python, you can generate dataframes of training data.
Dataset Governance
Every dataset that you generate is automatically versioned and retained, which lets you seamlessly travel back in time to view the output of any past computation. Datasets can be named and shared so that teammates can use your work. Track which features are used in which datasets to help with discovery.
Get Started with Code Examples
Unlock the power of real-time data pipelines.
Tags & Owners
Feature Discovery
Assigning tags & owners to features.
Preview deployments
GitHub Actions
Set up preview deployments for all PRs.
Serve many end-customers with differentiated behavior.
Unit tests
Resolvers are just Python functions, so they are easy to unit test.
Explore All