Articles

Microsoft unveiled an AI model that recognizes image content and fixes visual problems

The new model of AI Kosmos-1 is a Multimodal Large Language Model (MLLM), able to respond not only to linguistic cues, but also to visual cues, and therefore respond better to question-and-answer sessions.

Multimodal artificial intelligence (MLLM) could be the key to the development of artificial general intelligence, a technology that could in the future replace humans in any intellectual task or work.

What is Kosmos-1

Kosmos-1 is a multimodal model developed by Microsoft researchers. Last Monday, it was unveiled as a model capable of:

read the content of the images,
solve visual puzzles,
recognize text in images,
score well on visual IQ tests
understand instructions given in natural language.

The development of theArtificial intelligence multimodal is seen as a crucial step towards creating an artificial general intelligence (AGI) capable of performing general human-level tasks.

Language Is Not All You Need: Aligning Perception with Language Models

“Being a fundamental part of intelligence, multimodal perception is a necessity to achieve artificial general intelligence, in terms of knowledge acquisition and real-world embedment,” the researchers write in their academic paper, Language Is Not All You Need: Aligning Perception with Language Model.

The Kosmos-1 model can analyze images and answer questions about them, read text from an image, write captions for images, and score between 22 and 26 percent on a visual IQ test, such as demonstrated in the visual examples in the Kosmos-1 study.

AGI for OpenAI

OpenAI, Microsoft's key business partner in artificial intelligence, has set AGI as its primary focus. Kosmos-1 appears to be an exclusive initiative of Microsoft, without the assistance of OpenAI.

BlogInnovazione.it