Software Development

As a backend and Ops engineer I was very curious this year about AI because in my mind it was something only for PhD, Scientist and Big companies working on cutting edge technologies.

I found that it’s partially true because only big companies and specialized teams can develop foundational models(Gemini, Llama, Deepseek). It requires a lot of human resources and money.

On the other hand fortunately there are open source AI models like Llama, the foundational model that required tons of training hours and power is free to the world.

So you can build your own AI platform, you read it right, you don’t need to pay ChatGTP, Claude, etc and share your personal information with them but with limited resources it’s all about your budget.

Of course this is not for all, you would want to have your own AI platform because you are a company, researcher, engineer, etc.

So I would say there are 3 levels to simplify it

Top Level AI consumers: Used for assistance and source of info paying subscriptions.
Middle Level AI Builders/MLOps: Used by private companies, engineers, etc to deploy own AI systems using foundational models.
Bottom Level AI Researchers: Big Companies and AI researchers to create the algorithms and training methodologies for new foundational models.

In that context I’m going to put myself in the middle level where I want to use a foundational model for my research needs.

AI World

LLM/text processing
Image processing
Video processing
Audio processing

LLMs in my opinion are the most popular and “easy” to start because currently there is a lot of information and platforms ready to start working with it.

LLM - Foundations models

Llama
Deepseek
Mistral

Llama

Llama AI models are open source and the ones with the most documentation and community, don't forget that it was the first one and probably deepseek, mistral and others were born thanks to Llama.

It's true that currently it is not the most accurate or fastest model but again it has a lot of documentation.

Another advantage is PyTorch framework which is used to run llama code easily on nVidia video cards. It's super intuitive and well documented to use Cuda.

Hardware

A key component working with AI is the hardware, understanding the requirements, costs and power consumption is very important.

While you can start working in Google Colab, AWS Bedrock, etc to use cloud resources and GPUs it's better to understand the foundations.

Because we picked Llama3.1 8B-Instruct
We are focused on nVidia hardware and software for local environments

Llama3.1 Requirements:
https://github.com/meta-llama/llama3?tab=readme-ov-file#inference

Model
Nodes/VideoCards
vRAM - FP16
8B/8B-instruct
1
16gb
70B
8
140gb
405B
8++
810gb

Local AI System

Linux Ubuntu 24.04
GeForce 4070 TI SUPER 16gb
Nvidia T400 4gb
Nvidia Tesla K80 24gb
Intel i9 14900k
64GB RAM

Drivers

Driver
Video Card
5xx
GeForce 4070 TI SUPER
5xx
Nvidia T400
4xx
Nvidia Tesla K80

Findings:

Can't combine k80 with 4070 so 24gb + 16gb forget about it drivers not compatible
Nvidia k80 only runs on ubuntu 18.X
Running torchrun must be on free 16gb gdm off and tty only
T400 and 4070 can be combined to run gdm in 4gb and full llama on 4070 16gb

Download and Run Llama3.1

At this point we have already 16gb free for llama and ready to run examples on our local AI System.

torchrun --nproc_per_node 1 example_chat_completion.py --ckpt_dir /home/torukmnk/.llama/checkpoints/Llama3.1-8B-Instruct/ --tokenizer_path /home/torukmnk/.llama/checkpoints/Llama3.1-8B-Instruct/tokenizer.model --max_seq_len 1200 --max_batch_size 1

#1

#2

Common errors are:

CUDA out of memory
Bad configuration for max_seq_len
Bad configuration for max_batch_size

Power Consumption

Idle - 111w

torchRun - 366w

Nvidia Tool for monitoring

nvidia-smi

After get a success run of torchrun using llama3.1 8B-instruct is time to optimize and understand max_seq_len and max_batch_size

max_seq_len: text size in response
max_batch_size: prompts concurrency

Play around with a chat example prompt and look at the nvidia-smi power consumption and memory.

For my system/

Max configurations allowed

max_seq_len: 1200
max_batch_size: 1

So to get bigger numbers and configurations more vRAM is good.

At this point having a foundational model running locally is a big achievement and while it could appear to be easy to start, it’s not, in the middle there are many other issues to fix.

Next steps

Let’s say this is our development environment. We need to add features and then deploy them to a production environment.

It’s true that we must expect better results on production environments because of better resources but the dev environment helps us to understand all the little details needed to run it.

Part #2

API to receive prompts and return responses
Chat interface
RAG to train data to the FM and get answers from our data

The goal for the first feature is to get results from our documents or database.

that’s part of the RAG (Retrieval-Augmented Generation)

Tools

Llama3.1 8B-Instruct

Pytorch
Vector storage system(TBD)

Part #3

Deploy to production, we are going to start with AWS Bedrock since it’s the cloud platform I have most experience with.

While we could think it’s the last step, it’s not, this is the most critical part since each project has its own needs like how many users are going to use it and how many concurrently in terms of max_batch_size processing.

Because it’s the first time setting up this system I don’t expect the best results at first deployment so it will be a continuous improvement running configs and resources.

Production deployment are for API and Pytorch/Llama3 processing but also to build the automation for the RAG, we need something similar to a Continuous Integrations but for data training.

Conclusion

I started with zero AI knowledge and to my surprise I ended up running locally llama3.1 so at this points there is a light that shows that you don't have to be a scientist to build with foundational models,

The truth is that currently this research is only for fun.

Links

- https://github.com/oddmario/NVIDIA-Ubuntu-Driver-Guide/issues/2
- https://huggingface.co/blog/llama31

- https://aws.amazon.com/bedrock/pricing/

- https://huggingface.co/torukmnk

paperweight

Nvidia Tesla K80

When you join a company it’s very common to get access to many applications like slack, sentry, Datadog, Cloudflare, etc. but Have you thought about it? How is it possible? how to manage the increasing number of users across a whole ecosystem of applications and services?

Single Sign On, also known as SSO, allows users to have access to multiple applications by signing in with only one existing account and SAML is an open standard based on XML that can empower SSO implementation, It consists of two parts, namely the SAML identity provider (IdP) and the SAML service provider (SP).

IdP examples:

- Jumpcloud

- Okta - AWS IAM Identity Center

SP examples:

- Datadog

- Slack

- Sentry - AWS Resources

Whether you are starting to grow a team or looking for more IdP options, here we are going to show an integration using AWS IAM Identity Center as IdP and DataDog as SP.

Why AWS IAM Identity Center?

One important concern in my opinion is to save money so we are looking for the best pricing solutions, here comes AWS IAM Identity Center, a service that makes it easy for you to centrally manage IAM Identity Center access to multiple AWS accounts and business applications.

AWS IAM Identity Center is available to you at no additional cost. IAM Identity Center APIs are available in all regions supported by IAM Identity Center.

So since we are running servers on AWS and IAM Identity Center it's free and it supports many application integrations we decided to go for it and DataDog is the first service we want to integrate with new team members and since there is not clear information on the internet about this integration we are going to share it step by step.

Setup

You have an administrator account/root on AWS
Login into your aws console
Go to IAM Identity Center
Enable IAM Identity Center and choose Enable with AWS Organizations
Go to Applications and then click Add Application
Select “I want to select an application from the catalog” and search for Datadog
Download IAM Identity Center SAML metadata file and keep the window open for configurations
You have an administrator account on Datadog
Login into your Datadog dashboard
In the Datadog app, hover over your username in the bottom left corner and select Organization Settings. Select Login Methods and click on Configure under SAML.
Upload the IdP metadata file we downloaded from IAM Identity Center SAML metadata file by clicking the Choose File button. After choosing the file, click Upload File.
After uploading the IdP metadata, return to the Login Methods page and turn SAML on by default
Now copy Single Sign-on URL from Datadog and paste it on the AWS Identity Center window at Application Start URL
Then go back to Datadog Service Provider Details and download the Service Provider Metadata file named sp_metadata.xml
Also add your domain to the Just In Time Provisioning
Go back to AWS AWS Identity Center and at Application Metadata select

Upload application SAML metadata file

Upload sp_metadata.xml from Datadog and submit
Verify your Datadog application is active on AWS
Click on you Datadog application find the Action button and select Edit Attribute mappings
Verify you have givenName, sn and probably you are missing eduPersonPrincipalName then save changes.
It’s very important to map correctly the attributes since each Service Provider has its own attributes and most of the time it’s a source of Authentication errors.
You also can see these attributes at the sp_metadata.xml from Datadog.
After that the IdP and SP are configured and we need to assign users
Go to Users and Add User on IAM Identity Center
Create an user with your domain email, name, etc
After create the user he will receive an invitation email from AWS IAM Identity Center
Accept the invitation from the email
Then the user will need to create a password and set MFA
At this moment the invited user is logged in on the AWS Access Porta but without applications
Now back to the AWS administrator account go to the Application and find Datadog
Click on Assign Users & Groups, select the user we just created and assign users
Back to the new user AWS Access Portal he should see the Datadog application
Click on the Datadog application and the user now should have access to Datadog

Do you know that you RoR application can support SAML client authorization? You just need https://github.com/SAML-Toolkits/ruby-saml but that is another history.

Conclusion

Each Service Provider has its own caveats whether it's a optional start URL, attributes mapping, certificates, etc, here we setup a Datadog integration that is not very clear on the DD documentation. We did not cover IAM Identity groups or Datadog mapping roles FYI. I hope it helps to anyone to start using SSO or to discover a new IdP as AWS IAM Identity Center to manage users and access more easily and in a secure manner... you can find me at X as @torukmnk. Cheers!

Sources

https://docs.aws.amazon.com/singlesignon/latest/userguide/manage-your-applications.html?icmpid=docs_sso_console

https://aws.amazon.com/about-aws/whats-new/2023/11/aws-iam-identity-center-apis-automate-access-applications/#:~:text=AWS%20IAM%20Identity%20Center%20is,IAM%20Identity%20Center%20User%20Guide.

https://aws.amazon.com/about-aws/whats-new/2018/11/aws-single-sign-on-adds-more-pre-integrated-business-applications/

https://docs.datadoghq.com/account_management/saml/

Software Development

Datos personales

martes, 17 de junio de 2025

Llama3.1 LLM, RAG and production deployment PT 1

AI World

LLM - Foundations models

Llama

Hardware

Llama3.1 Requirements:

Local AI System

Drivers

Findings:

Download and Run Llama3.1

Power Consumption

Nvidia Tool for monitoring

Next steps

Part #2

Part #3

Conclusion

Links

paperweight

martes, 9 de abril de 2024