X button icon

Jasmine Nackash is a multidisciplinary designer and developer intereseted in creating unique and innovative experiences.

Research

Research & Fine tuning a LLM

But first, background (feel free to skip)

I've been utilizing a lot of the technologies covered in this class in my Red Line project which I've mentioned both in class and as part of some of the weekly assignments. One of the biggest parts of that project is about calculating a score — which I've been researching how to do in a way that makes sense to my project in the recent weeks.

I'm using a news API to periodically grab news articles that are about Israel/Palestine and then I'm asking for a score of how contributing the event depicted in the article is to what I consider an acceptably livable situation back home on a scale of 0 to 1, 0 being extremely detrimental to livability, and 1 being highly contributing.

I started out by using a sentiment analysis model through Replicate. Sentiment analysis models work by detecting how positive or negative the input is, which was just what I needed — but it proved to be way too literal. Dealing with the subject of an ongoing violent and complex conflict, pretty much all articles that mention it include words that are "negative", and so I really wasn't getting the granular results I was looking for. Everything was very negative, even things that I personally viewed as quite positive all considering.

So I switched to a Meta's 70b-Llama model — I figured this huge LLM would have more context, and I could provide more specific instructions that take my outlook into account. While I was getting a wider range of results, it still felt a bit arbitrary and like it was missing the point — the project isn't about me letting a LLM figure out the situation for me, but rather it's using a ML model as tool — one that can adequately reliably provide scores quickly(!); saving me the awful dread of constantly reading the news and figuring out what they mean for the foreseeable future of my family, friends, and myself. So merely using a LLM wasn't it either.

I then tried to use embeddings — this made sense because I assumed the "idea" of "good" and "bad" things happening would be distinct enough, and it allowed me to sort of insert myself into the mix; I could provide points of reference — an extremely negative and an extremely positive scenario, and for each news article see where it lands in between. I explain more about this attempt in my week #08 post titled Distillation. To put it briefly, what I ended up realizing was that the supposed positive and negative scenarios were in fact very close to each other in the great latent space. So close that I couldn't reliably calculate a score. Opposite things being close to each other actually make sense in retrospective. I did eventually manage to get a high enough definition between clusters of positive news and negative news, but only with a great deal of abstraction of the text and only with highly contrasting ideas. This was not going to work with actual news.

At this point the only thing I could think of was training my own model. But that requires a lot of time and knowledge I do not have, and after talking to some people it also did not sound like a good and/or feasible idea. I kept searching for ways to do this until Matthew (thank you Matthew!) suggested I use Google AI Studio that lets you fine tune Gemini (Google's LLM). I looked into it and it made a lot of sense — I'd be able to sort of "train" Gemini on my own viewpoint, while retaining the LLM's large corpus of data that provides a much needed supportive context. It's like an additional ML model filter layer on top of Gemini.

This set up was relatively straightforward — the "hardest" part is providing the training examples as, at least in my case, meant I had to sit and read and score many articles myself. I didn't feel like I could just give articles scores (that felt a bit arbitrary too) so I came up with a few metrics by which one score was calculated eventually. So for each article I would rate it separately by the following metrics: Contribution to peace, contribution the socio-economy, contribution to human rights and equality. This made it much easier for me to score articles because for each piece I could ask myself these questions and just respond to them.

Anyway, the following is a quick guide on how to fine tune Google's Gemini (I'm using Gemini 1.5 Flash).

Fine tuning Gemini with Google AI Studio

After signing in to Google AI Studio (sign in with the same email address you registered with Firebase — this will make it easier in integrating everything), you should see something like this:

Click on New Prompt. It will close the pop-up window and reveal the interface in the background. Then navigate to the New tuned model button on the left side menu. You should see the following:

As mentioned, tuning only works with text at this time, but maybe by the time you're reading this it will have changed. Technology moves fast.

Clicking on Create a Structured prompt will bring you to a page similar to what we've seen before, only now it has a additional room for examples (you may include up to 500 of them).

In the System Instructions section you can include instructions that would precede your input for the model. The following is a simple example:

I haven't included examples yet — I can upload some by clicking on Actions on the top right corner of the examples section. I can also just generate examples right inside the interface itself:

After testing the prompt I can choose to add the response as a tuning example

Or I can just type in examples manually: 

One or two examples will not do much, but even with 20 (the minimum you're going to need in order to fine-tune Gemini) you can start seeing results. Under 20 (and you can freely test this) I wasn't able to get anything useful really. But you should probably aim much higher and provide as many examples as possible, up to 500.

When you finish adding examples and testing the results you're getting, you might want to make it usable in your project — for this you need to actually tune Gemini — navigate to the New tuned model button again, and now that you have a structured prompt set up, you can select it in the Select a data source dropdown menu. It should look something like this:

...And then click Tune! It'll take a few minutes — you'll be referred to your library from which you can click on the model and see the training in progress like so:

Congrats! You now have a tuned model! In order to use it you'll need to add API access — clicking on that button will prompt you to select a Google Cloud project — the one we've been using for Shared Minds should be in there too if you've used the same login email.

The way I did it was by creating an API key. Once you do that, you can include it in your code ( or if you're using Firebase Functions you can include it in the environment variables).

This part might different for you, depending on what you're going for, but here's the parts of my code that handle the requests and responses from the tuned model:

First, include the Google Studio AI library:

const { GoogleGenerativeAI } = require('@google/generative-ai');

The following is a firebase function I used to test my tuned model — comparing the results from a tuned Gemini to a non-tuned Gemini. The "exports.functionName..." is just the syntax used for Firebase Functions. The rest might look familiar:

exports.compareModels = functions.https.onRequest(async (req, res) => {

 try {

   console.log("Starting model comparison test...");

   const genAI = new GoogleGenerativeAI(process.env.GOOGLE_AI_STUDIO_API_KEY); //this is where the API key is included — I'm using Firebase environment variables but you can also just throw the key in as long as you're not exposing it on the client side

   const tunedModel = genAI.getGenerativeModel({

     model: "gemini-1.5-flash",

     tuningModel: "tunedModels/REDACTED" //Get the correct url for your tuned model! It should appear under Model ID when you click your tuned model in the library. It should look like this: tunedModels/test-two-1jvfu8tyvs0m

   });

   const baseModel = genAI.getGenerativeModel({

     model: "gemini-1.5-flash"

   });

   const testCases = [

     // REDACTED. But this is where I had an array of test prompts - just strings separated by commas.

   ];

   const results = [];

   for (const testCase of testCases) {

     const prompt = ` // REDACTED. But I asked for analysis of: ${testCase}`;

     const tunedResult = await tunedModel.generateContent(prompt);

     const baseResult = await baseModel.generateContent(prompt);

     const tunedResponse = tunedResult.response.text().trim();

     const baseResponse = baseResult.response.text().trim();

     results.push({

       summary: testCase,

       tunedModel: {

         response: tunedResponse,

         parsedScore: parseFloat(tunedResponse)

       },

       baseModel: {

         response: baseResponse,

         parsedScore: parseFloat(baseResponse)

       }

     });

   }

   res.status(200).json({

     success: true,

     modelInfo: {

       tunedModel: "tunedModels/test-two-1jvfu8tyvs0m",

       baseModel: "gemini-1.5-flash"

     },

     results: results,

     timestamp: new Date().toISOString()

   });

 } catch (error) {

   console.error("Error comparing models:", error);

   res.status(500).json({

     success: false,

     error: error.message,

     timestamp: new Date().toISOString()

   });

 }

});

This is pretty much it! You can customize it to your own needs, keep adding examples and retraining the tuned model to get better results. I hope I covered everything and that it's helpful to someone! If anyone's trying this and encountering some issues feel free to let me know and I'll update the guide accordingly.