Reliable Foundation Models

Interpretable Foundation Models

We have developed foundation models that can reliably explain their reasoning and are easy to steer, debug, and align.

Interpretable LLM GIF

Interpretable Model API

We have developed interpretable foundation models (LLMs, Diffusion models, and large-scale classifiers) that can:

check icon

explain and steer a model's output using human-understandable features;

check icon

provide reliable prompt attribution, i.e., indicate what part of the prompt is important; and,

check icon

identify tokens, in the pre-training and fine-tuning data, that most influence the model's generated output.

1from guidelabs import Client
3gl = Client(api_key="<your secret key>")
4prompt_poem = "Once upon a time there was a pumpkin, "
6response, explanation ="cb-llm-v1",
7				prompt=prompt_poem,
8                            prompt_attribution=True,
9                            concept_importance=True,
10                            influential_points=10)
12# Explanations
13prompt_attribution = explanation['prompt_attribution']
14concept_importance = explanation['concept_importance']
15influential_points = explanation['influential_points']
1from guidelabs import Client
3gl = Client(api_key="<your secret key>")
5# upload file
6gl.files.create(file=open("pathtodata.jsonl", "rb"),
7		    purpose="fine-tune")
9# fine-tune model"file-name",
11			       model="cb-llm-v1")

Fine-tuning API for interpretable models

user icon

Get high quality output and explanations

Our interpretable models are as accurate and high performing as standard foundation models. However, we also provide explanations.

arrow icon

Fine-tune the model with customized data

Use your own data to insert high-level concepts into the model to steer and control the model's output.


Interpretable models that provide reliable explanations for debugging, aligning, and steering the model.


Novel Research

Our team built these new models
based on a decade of insights from the machine learning literature.


Accurate and Interpretable

Our interpretable models do not underperform
black-box counterparts.


Prompt Attribution

Identify the part of the prompt that
the generated output is most sensitive to.


Data Attribution

Identify which pre-training and/or fine-tuning data
is most influential for the generated output.


Concept Explanation

Customize and explain the foundation model
using high level human understandable features provided
by a domain expert.



Our models can be trained/fine-tuned on any
data modality.

About Us

Interpretable ML Experts

Julius Adebayo

PhD in ML interpretability from MIT,
Google Brain, Meta, & Prescient Design,
Published > 10 ML interpretable papers.

Fulton Wang

PhD in ML interpretability from MIT,
Meta & Apple,
Key developer on Captum package.

Join the Waitlist for Exclusive Early Access!

We are currently working with selected users to test these models. Sign up to get early access.

I Accept the Terms of Service.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Stay Informed



If you have any questions, please feel free to contact us at or



Follow us on Twitter @guidelabsai for any news and product updates.