Reliable Foundation Models

Interpretable Foundation Models

We have developed foundation models that can reliably explain their reasoning and are easy to steer, debug, and align.

Interpretable LLM GIF

Interpretable Model API

We have developed interpretable foundation models (LLMs, Diffusion models, and large-scale classifiers) that can:

check icon

explain and steer a model's output using human-understandable features;

check icon

provide reliable prompt attribution, i.e., indicate what part of the prompt is important; and,

check icon

identify tokens, in the pre-training and fine-tuning data, that most influence the model's generated output.

1from guidelabs import Client
2
3gl = Client(api_key="<your secret key>")
4prompt_poem = "Once upon a time there was a pumpkin, "
5
6response, explanation = gl.chat.create(model="cb-llm-v1",
7				prompt=prompt_poem,
8                            prompt_attribution=True,
9                            concept_importance=True,
10                            influential_points=10)
11                                       
12# Explanations
13prompt_attribution = explanation['prompt_attribution']
14concept_importance = explanation['concept_importance']
15influential_points = explanation['influential_points']
1from guidelabs import Client
2
3gl = Client(api_key="<your secret key>")
4
5# upload file
6gl.files.create(file=open("pathtodata.jsonl", "rb"),
7		    purpose="fine-tune")
8        
9# fine-tune model
10gl.fine_tuning.jobs.create(traing_file="file-name",
11			       model="cb-llm-v1")

Fine-tuning API for interpretable models

user icon

Get high quality output and explanations

Our interpretable models are as accurate and high performing as standard foundation models. However, we also provide explanations.

arrow icon

Fine-tune the model with customized data

Use your own data to insert high-level concepts into the model to steer and control the model's output.

Benefits

Interpretable models that provide reliable explanations for debugging, aligning, and steering the model.

icon

Novel Research

Our team built these new models
based on a decade of insights from the machine learning literature.

icon

Accurate and Interpretable

Our interpretable models do not underperform
black-box counterparts.

icon

Prompt Attribution

Identify the part of the prompt that
the generated output is most sensitive to.

icon

Data Attribution

Identify which pre-training and/or fine-tuning data
is most influential for the generated output.

icon

Concept Explanation

Customize and explain the foundation model
using high level human understandable features provided
by a domain expert.

icon

Multi-Modal

Our models can be trained/fine-tuned on any
data modality.

About Us

Interpretable ML Experts

Julius Adebayo

PhD in ML interpretability from MIT,
Google Brain, Meta, & Prescient Design,
Published > 10 ML interpretable papers.

Fulton Wang

PhD in ML interpretability from MIT,
Meta & Apple,
Key developer on Captum package.

Join the Waitlist for Exclusive Early Access!

We are currently working with selected users to test these models. Sign up to get early access.

I Accept the Terms of Service.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Contact

Stay Informed

icon

Email

If you have any questions, please feel free to contact us at info@guidelabs.ai or julius@guidelabs.ai

icon

Twitter

Follow us on Twitter @guidelabsai for any news and product updates.