Upload two images to calculate their visual similarity
The CLIP model generates a numerical representation (embedding) for each image, capturing its semantic meaning.
By calculating the cosine similarity between the two image embeddings, we can determine how visually similar they are.
A higher score (closer to 100%) means the model considers the images to be more alike in content and style.