Intelligent Svelte: Unleashing AI with Reactivity
- Conner (Semicognitive) Open Source Engineer
Learn how to harness the power of Svelte and SvelteKit to create elegant AI driven experiences. Svelte Stores, Streaming endpoints, and Python endpoints (through preprocessors), are magic you can only find in Svelte.
Transcript
Hi, I'm Conner. Welcome to Intelligence Felt Unleashing AI with Reactivity.
AI driven applications are quickly becoming mainstream and well regarded to play a big role in future software.
I believe Felt and SvelteKit are the best ways to build software, so today I'll be showing you all how to combine the two.
So when it comes to building around AI, there's a few very important things.
Number one is speed. These models are groundbreaking but are still exceptionally slow.
Meaning the speed of all your infrastructure around these models is make or break when it comes to your application.
Number two is iteration.
In an AI-driven world, it becomes a lot easier to build applications.
This is a good thing for developers, but it means you also have to be more careful about
what tech stack you use and how fast you can change direction.
Number three is complexity.
The hard part of your app should be the models you use and how you use them.
The rest of your stack, you need to be simple and straightforward.
So Svelte, of course, does all of these exceptionally well.
Today we're diving into two examples of how Svelte is perfect for building intelligent
applications.
The first will be a straightforward chatbot using the chat GPT API.
The second will be image generation with stable diffusion.
First up is our chatbot.
It's pretty straightforward for an app.
You send messages to the large language model and wait for a response back.
On the surface, not too complicated.
In Svelte, it's of course very easy to show a list of chats and then update it when you
send and receive one.
Svelte's declarative reactivity means that simple state and form handling and even visual
tricks like animations are all very straightforward.
But what about streaming?
It's not a great UX to wait for an entire response.
So how do we stream new tokens of a message into the front end declaratively?
Or markdown?
Text is easy to show, but what about when our requirements get more difficult?
Tables, headings, and images suddenly less simple.
So how do we tackle this?
So of course, the simplicity of Svelte and SvelteKit are amazing frameworks to tackle these unknowns.
Svelte stores can very easily wrap web streams,
meaning a server-side stream from SvelteKit integrates perfectly into a client-side Svelte store.
For our chatbot today, we're using LangChain on the server.
LangChain is a common package for building applications with large language models,
models and to run seamlessly on the edge in JavaScript environments like Skel.Kit.
LangChain works through building chains of multiple modules to create a more complex
language application.
Here, we're simply using GVT-4, which is the latest and greatest LLM out of OpenAI.
You could hypothetically add tools such as Google Search and Calculator, or Retrievers
to search a vector database, or Document Loaders to accept PDFs and markdowns instead of just
text.
The other great thing about OpenAI and LangChain is that you can stream back responses.
Language models work by returning a single token at a time, each in succession, until
it decides it is done.
A token can be anything from an entire word to just a character, but each is streamed
back the same way.
For LangChain, we use this callback manager to enqueue each new token to our readable
stream's body.
The biggest takeaway here is that no matter how complex of a language pipeline you build,
you can still stream back a very simple response.
In combination, web streams work perfectly with stores, where new events from the stream
can simply update the store.
Our readable stream store here is what's called a custom store, so like any store, it returns
a subscribe method.
Instead of external update and set methods like a basic
writable store has, we have an external request method.
First, the value of our store is an object holding two
values, whether or not the store is loading and the
current value is text.
Our request method here is special in that it's async.
Usually store methods are synchronous.
This method is simple but pretty powerful.
It has a single parameter, a request object, so it can pass
any parameters to your API.
While it's fetching a request, loading is true and the text
key continuously updates.
When the request finishes, loading is false and the text
resets.
Internally, when the request is fetched, the body is streamed into this while loop.
For live value updates, each new token is concatenated to the previous one, then the
stored value is updated to match the new string.
The final value is also kept track of so that the request method itself can return this
value.
This might seem a bit weird, but on our front end it makes perfect sense.
Now for our entire chatbot, this is our client code.
A few imports, store chat history, handle our form, and define our request.
That's it.
With such simplicity, it's extremely easy to redesign and take a new angle at your project
without having to rebuild from the ground up every time.
So with these three simple building blocks,
first, a spell kit endpoint returning a stream
from Lang Jane, second, a store that wraps the stream,
and third, a UI that handles chats,
you have the perfect foundation for a chat application.
And now, here's our final chatbot.
Our second example today is this.
You type in a prompt, an image is generated that matches it.
We're using stable diffusion under the hood, a text image
model released late last year.
So while this example wasn't too complicated of an
experience, this is actually our entire server-side code.
It runs on the Edge and is perfectly integrated into
your SvelteKit app.
Like any SvelteKit endpoint, you can export a post function,
take in parameters, and return a response.
What's different here, though, is that you can run any Python
code, use GPUs, use high memory, and it
also works the same.
Here we import PyTorch, a popular machine learning
library in the stable diffusion pipeline from Hugging Face's
diffuser library.
Then we define what scheduler we use
and import part of high quality images
and pass them into our stable diffusion pipeline.
If you notice, stable diffusion is
being imported from pre-trained model files in a cache
and being told to run in CUDA, the NVIDIA machine learning
framework.
This is possible because we have a separate spellkit model
config file that defines our GPU and our cache
and our environment variables.
Modal plays a big part here because they
engineer a custom runtime where you can define containers
programmatically, and the containers themselves
our cache to have an extremely fast cold start.
After we have our stable diffusion pipeline,
we enter inference mode with PyTorch
and create our images from it.
Finally, we use the Python base 64 library
to encode the image and return it
inside a JSON object for our friend.
All of this takes full advantage
of file system-based routing.
Normal SvelteKit server endpoints
are written as server.ts files,
but thanks to the power of preprocessors in SvelteKit,
we can write our server.py files that, internally,
SvelteKit will serve the exact same way as server.ts files.
So how do we get this experience?
I wrote a plugin for SvelteKit called SvelteKitModal.
Installation is really easy.
You install the byte plugin and you add .py files as valid route modules in this FullKit
config.
It takes your Python endpoints and employs them to Modal, a serverless Python service.
This works in both development environments with fast live reloading and in production
environments with a stable endpoint.
This endpoint works like any other server endpoint and has all the advantages of a full
stack FullKit app.
You can submit to it from a form and handle responses if they use enhanced action.
Here our entire UI is just 48 lines of code.
Our request logic is just an HTML form.
Because of this, the prompt input maps one-to-one to our request parameters from earlier.
But of course, these endpoints can also take JSON, or be a great request, or handle any
HTTP request whatsoever.
The rest of our front-end code is simply taking advantage of the felt kits used in-hands,
by tracking states of loading, any error, and of course, their image.
And now, here's our final image generator.
Today we put together a chatbot and image generator.
Our chatbot highlights lang chain, SvelteKit stream responses, and Svelte stores.
Our image generation highlights stable diffusion, SvelteKit preprocessing, and Moldo serverless
Python.
Thanks for watching.
I'm Conner, also known as Semicognitive.
You can find today's examples on my GitHub, and you can find me on Twitter @Semicognitive
if I'm happy to answer any more questions.
Good luck building with AI.