We have developed foundation models that can reliably explain their reasoning and are easy to steer, debug, and align.

We have developed interpretable foundation models (LLMs, Diffusion models, and large-scale classifiers) that can:

explain and steer a model's output using human-understandable features;

provide reliable prompt attribution, i.e., indicate what part of the prompt is important; and,

identify tokens, in the pre-training and fine-tuning data, that most influence the model's generated output.

1from guidelabs import Client
3gl = Client(api_key="<your secret key>")
4prompt_poem = "Once upon a time there was a pumpkin, "
6response, explanation ="cb-llm-v1",
7				prompt=prompt_poem,
8                            prompt_attribution=True,
9                            concept_importance=True,
10                            influential_points=10)
12# Explanations
13prompt_attribution = explanation['prompt_attribution']
14concept_importance = explanation['concept_importance']
15influential_points = explanation['influential_points']
1from guidelabs import Client
3gl = Client(api_key="<your secret key>")
5# upload file
6gl.files.create(file=open("pathtodata.jsonl", "rb"),
7		    purpose="fine-tune")
9# fine-tune model"file-name",
11			       model="cb-llm-v1")

Get high quality output and explanations

Our interpretable models are as accurate and high performing as standard foundation models. However, we also provide explanations.

Fine-tune the model with customized data

Use your own data to insert high-level concepts into the model to steer and control the model's output.


Interpretable models that provide reliable explanations for debugging, aligning, and steering the model.


Our team built these new models
based on a decade of insights from the machine learning literature.


Our interpretable models do not underperform
black-box counterparts.


Identify the part of the prompt that
the generated output is most sensitive to.


Identify which pre-training and/or fine-tuning data
is most influential for the generated output.


Customize and explain the foundation model
using high level human understandable features provided
by a domain expert.



Our models can be trained/fine-tuned on any
data modality.

Julius Adebayo

PhD in ML interpretability from MIT,
Google Brain, Meta, & Prescient Design,
Published > 10 ML interpretable papers.

Fulton Wang

PhD in ML interpretability from MIT,
Meta & Apple,
Key developer on Captum package.

