computer vision

I Went Birding With the World’s First AI-Powered Binoculars

I Went Birding With the World’s First AI-Powered Binoculars

For bird identification when the Bird ID setting is active, the AX Visio uses a modified version of Cornell Lab of Ornithology’s Merlin Bird ID’s extensive bird database. The Mammals ID, Butterfly ID, and Dragonfly ID settings on the binoculars are powered by the Sunbird database. However, while mammals and flying insects can currently only be identified in Europe and North America, the Bird ID software works everywhere—even Antarctica.The identification apps use a combination of image recognition and geolocation, which is enabled by a built-in GPS sensor that tells the software where you are in the world. That can help…
Read More
From Pixels To Perception: The Impact Of Foundation Models For Vision

From Pixels To Perception: The Impact Of Foundation Models For Vision

Artificial intelligence (AI) has made incredible progress, from deciphering the nuances of written language to interpreting the rich complexity of images, videos, and even LiDAR data. This transition has advanced computer vision (CV), enabling machines to “see” and perceive the visual world. Extracting intelligence from visual data is often more intricate than processing text. This is due to factors like the high dimensionality of visual data, occlusion, perspective, and lighting conditions of images, as well as relative complexity in feature extraction. Machine learning (ML) in CV has additional challenges, such as the non-availability of enough relevant images for pre-training, the…
Read More

Are We Ready for Multi-Image Reasoning? Launching VHs: The Visual Haystacks Benchmark!

Humans excel at processing vast arrays of visual information, a skill that is crucial for achieving artificial general intelligence (AGI). Over the decades, AI researchers have developed Visual Question Answering (VQA) systems to interpret scenes within single images and answer related questions. While recent advancements in foundation models have significantly closed the gap between human and machine visual processing, conventional VQA has been restricted to reason about only single images at a time rather than whole collections of visual data. This limitation poses challenges in more complex scenarios. Take, for example, the challenges of discerning patterns in collections of medical…
Read More
Apple Ushers In The Era Of Spatial Computing, Building On Computer Vision Advances

Apple Ushers In The Era Of Spatial Computing, Building On Computer Vision Advances

Welcome to the frontier of technology, where the giants of Silicon Valley vie to turn science fiction into reality. For years, ambitious companies like Google and Microsoft have been venturing into spatial computing, each with varying degrees of success. For instance, Google Glass and Microsoft HoloLens have had their moment in the limelight, but neither fully captured the mainstream market. Now, enter Apple with its Vision Pro. This device promises to redefine our interaction with technology and usher in the long-anticipated era of spatial computing. But what sets Apple’s Vision Pro apart from its predecessors? While previous attempts at spatial…
Read More
Modeling Extremely Large Images with xT

Modeling Extremely Large Images with xT

As computer vision researchers, we believe that every pixel can tell a story. However, there seems to be a writer’s block settling into the field when it comes to dealing with large images. Large images are no longer rare—the cameras we carry in our pockets and those orbiting our planet snap pictures so big and detailed that they stretch our current best models and hardware to their breaking points when handling them. Generally, we face a quadratic increase in memory usage as a function of image size. Today, we make one of two sub-optimal choices when handling large images: down-sampling…
Read More
No widgets found. Go to Widget page and add the widget in Offcanvas Sidebar Widget Area.