Published On: June 15th, 2023Categories: AI News

AI Telephone — A Battle of Multimodal Models | by Jacob Marks, Ph.D. ...

DALL-E2, Stable Diffusion, BLIP, and more!

Artistic rendering of a game of AI Telephone. Image generated by the author using DALL-E2.

Generative AI is on fire right now. The past few months especially have seen an explosion in multimodal machine learning — AI that connects concepts across different “modalities” such as text, images, and audio. As an example, Midjourney is a multimodal text-to-image model, because it takes in natural language, and outputs images. The magnum opus for this recent renaissance in multimodal synergy was Meta AI’s ImageBind, which can take inputs of 6(!) varieties and represent them in the same “space”.

With all of this excitement, I wanted to put multimodal models to the test and see how good they actually are. In particular, I wanted to answer three questions:

  1. Which text-to-image model is the best?
  2. Which image-to-text model is the best?
  3. What is more important — image-to-text, or text-to-image?

Of course, each model brings its own biases to the…

Continue reading this article at;

https://towardsdatascience.com/ai-telephone-a-battle-of-multimodal-models-282b01daf044?source=rss—-7f60cf5620c9—4

towardsdatascience.com

Feed Name : Towards Data Science – Medium

deep-dives,machine-learning,artificial-intelligence,openai,multimodal
hashtags : #Telephone #Battle #Multimodal #Models #Jacob #Marks #Ph.D

Leave A Comment