AI & ML interests

Our team builds AI with open models and open source, collaborating privately with security and advanced access controls.

victorย 
posted an update 10 days ago
alvarobarttย 
posted an update 15 days ago
view post
Post
316
Open agents on AWS SageMaker AI with open models from the Hugging Face Hub!

> Deploy an open model from the Hugging Face Hub on SageMaker AI
> Connect the deployed model to Strands Agents
> Add built-in and custom tools for tool calling
> Expose external capabilities through MCP integration
> Bonus: talk to your agent and visualize traces with Gradio

https://alvarobartt.com/agents-on-aws-sagemaker
alvarobarttย 
posted an update 19 days ago
view post
Post
3289
Latest hf-mem release added a breakdown of Mixture-of-Experts (MoE) memory usage!

TL; DR MoEs can be misleading to reason about from active parameters alone, since each token only activates a subset of experts, while the serving setup still needs to account for the full resident memory footprint.

๐Ÿง  hf-mem now splits MoE memory into base model weights, routed experts, and KV cache
๐Ÿ—๏ธ Dense models usually load and use most weights every forward pass, while MoEs load many experts but only route each token to a few of them
โšก Active params isn't the same as memory footprint, especially for sparse architectures
๐Ÿ“ฆ Runtime memory is about what is used per request/token, while loading memory also includes the expert weights that need to be resident
๐Ÿ“š KV cache can still dominate depending on context length, batch size, and concurrency
๐Ÿ”€ Expert Parallelism (EP) helps shard experts across accelerators when expert weights dominate
๐Ÿš€ Data Parallelism (DP) + EP is often a good fit for throughput-oriented MoE serving

Check the repository at https://github.com/alvarobartt/hf-mem
victorย 
posted an update about 2 months ago
view post
Post
6121
Want to share my enthusiasm for zai-org/GLM-5.1 here too ๐Ÿ”ฅ

I think we have it: our open source Claude Code = GLM-5.1 + Pi (https://pi.dev/) - Built a Three.js racing game to eval and it's extremely impressive. Thoughts:

- One-shot car physics with real drift mechanics (this is hard)

- My fav part: Awesome at self iterating (with no vision!) created 20+ Bun.WebView debugging tools to drive the car programmatically and read game state. Proved a winding bug with vector math without ever seeing the screen

- 531-line racing AI in a single write: 4 personalities, curvature map, racing lines, tactical drifting. Built telemetry tools to compare player vs AI speed curves and data-tuned parameters

- All assets from scratch: 3D models, procedural textures, sky shader, engine sounds, spatial AI audio!

- Can do hard math: proved road normals pointed DOWN via vector cross products, computed track curvature normalized by arc length to tune AI cornering speed

You are going to hear about this model a lot in the next months - open source let's go - and thanks z-ai๐Ÿš€๐Ÿš€
  • 5 replies
ยท
alvarobarttย 
posted an update 3 months ago
view post
Post
3743
Learn how to deploy Microsoft Research VibeVoice ASR on Microsoft Azure Foundry with Hugging Face to generate rich audio transcriptions with Who, When, and What! ๐Ÿ’ฅ

> ๐Ÿ•’ 60-minute single-pass processing, no chunking or stitching
> ๐Ÿ‘ค Customized hotwords to guide recognition on domain-specific content
> ๐Ÿ“ Rich transcription: joint ASR + diarization + timestamping in one pass
> ๐ŸŒ 50+ languages with automatic detection and code-switching support
> ๐Ÿค— Deployed on Microsoft Foundry via an OpenAI-compatible Chat Completions API

https://huggingface.co/docs/microsoft-azure/foundry/examples/deploy-vibevoice-asr
victorย 
posted an update 4 months ago
alvarobarttย 
posted an update 4 months ago
view post
Post
3271
๐Ÿ’ฅ hf-mem v0.4.1 now also estimates KV cache memory requirements for any context length and batch size with the --experimental flag!

uvx hf-mem --model-id ... --experimental will automatically pull the required information from the Hugging Face Hub to include the KV cache estimation, when applicable.

๐Ÿ’ก Alternatively, you can also set the --max-model-len, --batch-size and --kv-cache-dtype arguments (ร  la vLLM) manually if preferred.
  • 1 reply
ยท
pcuenqย 
posted an update 5 months ago
view post
Post
5110
๐Ÿ‘‰ What happened in AI in 2025? ๐Ÿ‘ˆ

We prepared the 2025 version of the HF AI Timeline Grid, highlighting open vs API-based model releases, and allowing you to browse and filter by access, modality, and release type!

Play with it here:
2025-ai-timeline/2025-ai-timeline

Here's my personal quarterly TL;DR:

1๏ธโƒฃ Q1 โ€” Learning to Reason
Deepseek not only releases a top-notch reasoning model, but shows how to train them and compete with closed frontier models. OpenAI debuts Deep Research.

Significant milestones: DeepSeek R1 & R1-Zero, Qwen 2.5 VL, OpenAI Deep Research, Gemini 2.5 Pro (experimental)

2๏ธโƒฃ Q2 โ€” Multimodality and Coding
More LLMs embrace multimodality by default, and there's a surge in coding agents. Strong vision, audio, and generative models emerge.

Significant milestones: Llama 4, Qwen 3, Imagen 4, OpenAI Codex, Google Jules, Claude 4

3๏ธโƒฃ Q3 โ€” "Gold" rush, OpenAI opens up, the community goes bananas
Flagship models get gold in Math olympiads and hard benchmarks. OpenAI releases strong open source models and Google releases the much anticipated nano-banana for image generation and editing. Agentic workflows become commonplace.

Significant milestones: Gemini and OpenAI IMO Gold, gpt-oss, Gemini 2.5 Flash Image, Grok 4, Claude Sonnet 4.5

4๏ธโƒฃ Q4 โ€” Mistral returns, leaderboard hill-climbing
Mistral is back with updated model families. All labs release impressive models to wrap up the year!

Significant milestones: Claude Opus 4.5, DeepSeek Math V2, FLUX 2, GPT 5.1, Kimi K2 Thinking, Nano Banana Pro, GLM 4.7, Gemini 3, Mistral 3, MiniMax M2.1 ๐Ÿคฏ

Credits
๐Ÿ™ NHLOCAL for the source data https://github.com/NHLOCAL/AiTimeline

๐Ÿซก @reach-vb for the original idea, design and recipe

๐Ÿ™Œ @ariG23498 and yours truly for compiling and verifying the 2025 edition

๐Ÿฅณ Here's to 2026, wishing it becomes the best year ever for open releases and on-device-first use-cases! ๐Ÿฅ‚
  • 3 replies
ยท
victorย 
posted an update 6 months ago
pagezyhfย 
posted an update 7 months ago
view post
Post
2997
๐Ÿš€ Big news for AI builders!

Weโ€™re thrilled to announce that the Qwen3-VL family of vision-language models is now available on Azure AI Foundry, thanks to our collaboration with Microsoft.

We bring open-source innovation to enterprise-grade AI infrastructure, making it easier than ever for enterprise to deploy and scale the latest and greatest from models from hugging Face securely within Azure.

๐Ÿ” Highlights:

- Deploy Qwen3-VL instantly via managed endpoints
- Built-in governance, telemetry, and lifecycle management
- True multimodal reasoning โ€” vision, language, and code understanding
- State-of-the-art performance, outperforming closed-source models like Gemini 2.5 Pro and GPT-5
- Available in both *Instruct* and *Thinking* modes, across 24 model sizes

๐Ÿ‘‰ Get started today: search for Qwen3-VL in the Hugging Face Collection on Azure AI Foundry.
  • 1 reply
ยท
multimodalartย 
posted an update 8 months ago
view post
Post
29506
Want to iterate on a Hugging Face Space with an LLM?

Now you can easily convert any HF entire repo (Model, Dataset or Space) to a text file and feed it to a language model!

multimodalart/repo2txt
  • 2 replies
ยท
pagezyhfย 
posted an update 9 months ago
view post
Post
874
Whatโ€™s your biggest headache deploying Hugging Face models to the cloudโ€”and how can we fix it for you?
  • 8 replies
ยท
pagezyhfย 
posted an update 9 months ago
pagezyhfย 
posted an update 9 months ago
view post
Post
3963
๐Ÿค Collaborating with AMD to ensure Hugging Face Transformers runs smoothly on AMD GPUs!

We run daily CI on AMD MI325 to track the health of the most important model architectures and weโ€™ve just made our internal dashboard public.

By making this easily accessible, we hope to spark community contributions and improve support for everyone!
  • 2 replies
ยท
jeffboudierย 
posted an update 9 months ago
view post
Post
3400
Quick 30s demo of the new Hub > Azure AI integration to deploy HF models in your own Azure account. Now with Py and CLI!

GG @alvarobartt @kramp @pagezyhf
pagezyhfย 
posted an update 10 months ago
view post
Post
3234
We've improved the Deploy button on Hugging Face model pages for Microsoft Azure

1/ no more long waits before seeing model support status

2/ ready-to-use CLI and Python snippets

3/ redirection to Azure AI Foundry rather than Azure ML

โœ‹ if you see any bugs or have feedback, open an issue on our repo:
https://github.com/huggingface/Microsoft-Azure
pagezyhfย 
posted an update 10 months ago
view post
Post
2208
Deploy GPT OSS models with Hugging Face on Azure AI!

Weโ€™re thrilled to enable OpenAI GPT OSS models on Azure AI Model Catalog for Azure users to try the model securely the day of its release.

In our official launch blogpost, thereโ€™s a section on how to deploy the model to your Azure AI Hub. Get started today!

https://huggingface.co/blog/welcome-openai-gpt-oss#azure
pagezyhfย 
posted an update 10 months ago
view post
Post
296
We now have the newest Open AI models available on the Dell Enterprise Hub!

We built the Dell Enterprise Hub to provide access to the latest and greatest model from the Hugging Face community to our on-prem customers. Weโ€™re happy to give secure access to this amazing contribution from Open AI on the day of its launch!

https://dell.huggingface.co/