Back to blogs

Image Similarity System for Fashion Items

10 min read
Computer VisionFashionPythonPyTorchEfficientNet

A deep learning-based system for comparing fashion items and determining their similarity. This system uses EfficientNet-B4 to extract features from images and cosine similarity to compare them.

Overview

This system can:

  • Compare fashion items and determine their similarity
  • Identify identical items
  • Distinguish between different types of clothing
  • Recognize similar items with color variations

Example Results

  1. Same Item (Perfect Match):
Image 1: red_tshirt.webp
Image 2: red_tshirt.webp
Similarity Score: 1.0000
Interpretation: Very similar items
  1. Different Types of Clothing:
Image 1: red_tshirt.webp
Image 2: red_dress.jpg
Similarity Score: 0.0218
Interpretation: Different items
  1. Same Type, Different Color:
Image 1: red_tshirt.webp
Image 2: yellow_tshirt.webp
Similarity Score: 0.7540
Interpretation: Similar items with some differences

Technical Details

Libraries Used

import torch
from torchvision import transforms
from PIL import Image
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity
import timm
  1. PyTorch (torch):

    • Deep learning framework
    • Used for loading and running pre-trained models
    • Provides efficient tensor operations and GPU support
  2. TorchVision (transforms):

    • Image transformation utilities
    • Used for preprocessing images
    • Part of PyTorch ecosystem
  3. PIL (Image):

    • Python Imaging Library
    • Used for opening and manipulating images
    • Provides basic image processing capabilities
  4. NumPy (np):

    • Scientific computing package
    • Used for array operations
    • Essential for handling image data and embeddings
  5. scikit-learn (cosine_similarity):

    • Machine learning library
    • Used for computing similarity between embeddings
    • Provides efficient implementation of cosine similarity
  6. timm:

    • PyTorch Image Models
    • Provides access to various pre-trained models
    • We're using EfficientNet-B4

Model Setup

# Load pre-trained model
model = timm.create_model('tf_efficientnet_b4_ns', pretrained=True)
model = torch.nn.Sequential(*list(model.children())[:-1])  # Remove last layer
model.eval()

Image Preprocessing

transform = transforms.Compose([
    transforms.Resize(380),  # EfficientNet-B4 input size
    transforms.CenterCrop(380),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])

Core Functions

Getting Image Embeddings

def get_image_embedding(image_path):
    # Load and transform image
    image = Image.open(image_path).convert('RGB')
    image_tensor = transform(image).unsqueeze(0)
    
    # Get embedding
    with torch.no_grad():
        embedding = model(image_tensor)
        # Flatten the embedding to 1D array
        embedding = embedding.squeeze().numpy()
    
    return embedding

Comparing Images

def compare_images(image1_path, image2_path):
    # Get embeddings
    embedding1 = get_image_embedding(image1_path)
    embedding2 = get_image_embedding(image2_path)
    
    # Compute similarity
    similarity = cosine_similarity([embedding1], [embedding2])[0][0]
    
    # Print comparison results
    print(f"\nImage Comparison Results:")
    print(f"Image 1: {image1_path}")
    print(f"Image 2: {image2_path}")
    print(f"Similarity Score: {similarity:.4f}")
    
    # Interpret the score
    if similarity > 0.85:
        print("\nInterpretation: Very similar items")
    elif similarity > 0.7:
        print("\nInterpretation: Similar items with some differences")
    elif similarity > 0.5:
        print("\nInterpretation: Different items with some similarities")
    else:
        print("\nInterpretation: Different items")
    
    return similarity

How It Works

  1. Feature Extraction:

    • The model extracts high-level features from images
    • These features capture both structural and color information
    • The embedding represents the image in a high-dimensional space
  2. Similarity Calculation:

    • Cosine similarity measures the angle between embeddings
    • Values range from -1 to 1, where 1 means identical
    • Higher values indicate more similar items
  3. Interpretation:

    • 0.85: Very similar items

    • 0.7: Similar items with differences

    • 0.5: Different items with some similarities

    • < 0.5: Different items

## Usage

```python
# Example usage
image_paths = [
    'path/to/image1.webp',
    'path/to/image2.webp'
]

# Compare images
similarity_score = compare_images(image_paths[0], image_paths[1])

Requirements

  • Python 3.7+
  • PyTorch
  • TorchVision
  • Pillow
  • NumPy
  • scikit-learn
  • timm