Reliable Foundation Models

Interpretable Foundation Models

We have developed foundation models that can reliably explain their reasoning and are easy to steer, debug, and align.

Interpretable Model API

We have developed interpretable foundation models (LLMs, Diffusion models, and large-scale classifiers) that can:

explain and steer a model's output using human-understandable features;

provide reliable prompt attribution, i.e., indicate what part of the prompt is important; and,

identify tokens, in the pre-training and fine-tuning data, that most influence the model's generated output.

1from guidelabs import Client
2
3gl = Client(api_key="<your secret key>")
4prompt_poem = "Once upon a time there was a pumpkin, "
5
6response, explanation = gl.chat.create(model="cb-llm-v1",
7				prompt=prompt_poem,
8                            prompt_attribution=True,
9                            concept_importance=True,
10                            influential_points=10)
11                                       
12# Explanations
13prompt_attribution = explanation['prompt_attribution']
14concept_importance = explanation['concept_importance']
15influential_points = explanation['influential_points']

1from guidelabs import Client
2
3gl = Client(api_key="<your secret key>")
4
5# upload file
6gl.files.create(file=open("pathtodata.jsonl", "rb"),
7		    purpose="fine-tune")
8        
9# fine-tune model
10gl.fine_tuning.jobs.create(traing_file="file-name",
11			       model="cb-llm-v1")

Fine-tuning API for interpretable models

Get high quality output and explanations

Our interpretable models are as accurate and high performing as standard foundation models. However, we also provide explanations.

Fine-tune the model with customized data

Use your own data to insert high-level concepts into the model to steer and control the model's output.

Benefits

Interpretable models that provide reliable explanations for debugging, aligning, and steering the model.

Novel Research

Our team built these new models
based on a decade of insights from the machine learning literature.

Accurate and Interpretable

Our interpretable models do not underperform
black-box counterparts.

Prompt Attribution

Identify the part of the prompt that
the generated output is most sensitive to.

Data Attribution

Identify which pre-training and/or fine-tuning data
is most influential for the generated output.

Concept Explanation

Customize and explain the foundation model
using high level human understandable features provided
by a domain expert.

Multi-Modal

Our models can be trained/fine-tuned on any
data modality.

About Us

Interpretable ML Experts

Julius Adebayo

PhD in ML interpretability from MIT,
Google Brain, Meta, & Prescient Design,
Published > 10 ML interpretable papers.

Fulton Wang

PhD in ML interpretability from MIT,
Meta & Apple,
Key developer on Captum package.

Join the Waitlist for Exclusive Early Access!

We are currently working with selected users to test these models. Sign up to get early access.

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

Contact

Stay Informed

Email

If you have any questions, please feel free to contact us at info@guidelabs.ai or julius@guidelabs.ai