Nebius Text Embeddings

Nebius AI Studio provides API access to high-quality embedding models through a unified interface. The Nebius embedding models convert text into numerical vectors that capture semantic meaning, making them useful for various applications like semantic search, clustering, and recommendations.

This notebook demonstrates how to use Nebius Embeddings through the LangChain integration.

Installation

%pip install --upgrade langchain-nebius

Environment

Nebius requires an API key that can be passed as an initialization parameter api_key or set as the environment variable NEBIUS_API_KEY. You can obtain an API key by creating an account on Nebius AI Studio.

import os

# Make sure you've set your API key as an environment variable
os.environ["NEBIUS_API_KEY"] = "YOUR-NEBIUS-API-KEY"

Embedding Model Usage

The NebiusEmbeddings class allows you to generate vector embeddings using Nebius AI Studio's embedding models.

from langchain_nebius import NebiusEmbeddings

# Initialize the embeddings model
embeddings = NebiusEmbeddings(
    # api_key="YOUR_API_KEY",  # You can pass the API key directly
    model="BAAI/bge-en-icl"  # The default embedding model
)

Embedding a Single Text

You can use the embed_query method to embed a single piece of text:

query = "What is machine learning?"
query_embedding = embeddings.embed_query(query)

# Check the embedding dimension
print(f"Embedding dimension: {len(query_embedding)}")
print(f"First few values: {query_embedding[:5]}")

Embedding dimension: 4096
First few values: [0.007419586181640625, 0.002246856689453125, 0.00193023681640625, -0.0066070556640625, -0.0179901123046875]

Embedding Multiple Texts

You can embed multiple texts at once using the embed_documents method:

documents = [
    "Machine learning is a branch of artificial intelligence",
    "Deep learning is a subfield of machine learning",
    "Natural language processing deals with interactions between computers and human language",
]

document_embeddings = embeddings.embed_documents(documents)

# Check the results
print(f"Number of document embeddings: {len(document_embeddings)}")
print(f"Each embedding has {len(document_embeddings[0])} dimensions")

Number of document embeddings: 3
Each embedding has 4096 dimensions

Async Support

NebiusEmbeddings supports async operations:

import asyncio


async def generate_embeddings_async():
    # Embed a single query
    query_result = await embeddings.aembed_query("What is the capital of France?")
    print(f"Async query embedding dimension: {len(query_result)}")

    # Embed multiple documents
    docs = [
        "Paris is the capital of France",
        "Berlin is the capital of Germany",
        "Rome is the capital of Italy",
    ]
    docs_result = await embeddings.aembed_documents(docs)
    print(f"Async document embeddings count: {len(docs_result)}")


await generate_embeddings_async()

Async query embedding dimension: 4096
Async document embeddings count: 3

Available Models

The list of supported models is available at https://studio.nebius.com/?modality=embedding

Practical Use Cases

Document Similarity

import numpy as np
from scipy.spatial.distance import cosine

# Create some documents
documents = [
    "Machine learning algorithms build mathematical models based on sample data",
    "Deep learning uses neural networks with many layers",
    "Climate change is a major global environmental challenge",
    "Neural networks are inspired by the human brain's structure",
]

# Embed the documents
embeddings_list = embeddings.embed_documents(documents)


# Function to calculate similarity
def calculate_similarity(embedding1, embedding2):
    return 1 - cosine(embedding1, embedding2)


# Print similarity matrix
print("Document Similarity Matrix:")
for i, emb_i in enumerate(embeddings_list):
    similarities = []
    for j, emb_j in enumerate(embeddings_list):
        similarity = calculate_similarity(emb_i, emb_j)
        similarities.append(f"{similarity:.4f}")
    print(f"Document {i+1}: {similarities}")

Document Similarity Matrix:
Document 1: ['1.0000', '0.8282', '0.5811', '0.7985']
Document 2: ['0.8282', '1.0000', '0.5897', '0.8315']
Document 3: ['0.5811', '0.5897', '1.0000', '0.5918']
Document 4: ['0.7985', '0.8315', '0.5918', '1.0000']

Using with LangChain VectorStores

You can use NebiusEmbeddings with any LangChain vector store, such as FAISS:

from langchain_community.vectorstores import FAISS
from langchain_core.documents import Document

# Prepare documents
docs = [
    Document(
        page_content="Machine learning algorithms build mathematical models based on sample data"
    ),
    Document(page_content="Deep learning uses neural networks with many layers"),
    Document(page_content="Climate change is a major global environmental challenge"),
    Document(
        page_content="Neural networks are inspired by the human brain's structure"
    ),
]

# Create vector store
vector_store = FAISS.from_documents(docs, embeddings)

# Perform similarity search
query = "How does the brain influence AI?"
results = vector_store.similarity_search(query, k=2)

print("Search results for query:", query)
for i, doc in enumerate(results):
    print(f"Result {i+1}: {doc.page_content}")

API Reference:FAISS | Document

Search results for query: How does the brain influence AI?
Result 1: Neural networks are inspired by the human brain's structure
Result 2: Deep learning uses neural networks with many layers

API Reference

For more details about the Nebius AI Studio API, visit the Nebius AI Studio Documentation.

Embedding model conceptual guide
Embedding model how-to guides

Installation​

Environment​

Embedding Model Usage​

Embedding a Single Text​

Embedding Multiple Texts​

Async Support​

Available Models​

Practical Use Cases​

Document Similarity​

Using with LangChain VectorStores​

API Reference​

Related​

Was this page helpful?