MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models

MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models

arXiv:2408.02718v1 Announce Type: new Abstract: The capability to process multiple images is crucial for Large Vision-Language Models (LVLMs) to develop a more thorough and nuanced understanding of a scene. Recent multi-image LVLMs have begun to address this need. However, their evaluation has not kept pace with their development. To fill this gap, we introduce the Multimodal Multi-image Understanding (MMIU) benchmark, a comprehensive evaluation suite designed to assess LVLMs across a wide range of multi-image tasks. MMIU encompasses 7 types of multi-image relationships, 52 tasks, 77K images, and 11K meticulously curated multiple-choice questions, making it the most extensive benchmark of its…
Read More
OpenAI has finally released the No. 1 feature developers have been desperate for

OpenAI has finally released the No. 1 feature developers have been desperate for

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More The JavaScript Object Notation (JSON) file and data interchange format is an industry-standard because it is both easily readable by humans and parsable by machines.  However, large language models (LLMs) notoriously struggle with JSON — they might hallucinate, create wonky responses that only partially adhere to instructions or fail to parse completely. This often requires developers to use workarounds such as open-source tooling, many different prompts or repeated requests to ensure output interoperability.  Now, OpenAI is helping ease these frustrations with…
Read More
Examining Gender and Power on Wikipedia Through Face and Politeness

Examining Gender and Power on Wikipedia Through Face and Politeness

arXiv:2408.02798v1 Announce Type: new Abstract: We propose a framework for analyzing discourse by combining two interdependent concepts from sociolinguistic theory: face acts and politeness. While politeness has robust existing tools and data, face acts are less resourced. We introduce a new corpus created by annotating Wikipedia talk pages with face acts and we use this to train a face act tagger. We then employ our framework to study how face and politeness interact with gender and power in discussions between Wikipedia editors. Among other findings, we observe that female Wikipedians are not only more polite, which is consistent with prior…
Read More
He won gold for the Philippines. Then the gifts started pouring in — including a $600,000 condo and a lifetime supply of ramen.

He won gold for the Philippines. Then the gifts started pouring in — including a $600,000 condo and a lifetime supply of ramen.

After making history as the first Filipino man to win a gold at the Olympics, gymnast Carlos Yulo left the Paris Olympics with more than victory.The 24-year-old placed first in the men's floor and vault artistic gymnastic events. Then, gifts and freebies started pouring in.Filipino real-estate company Megaworld offers each Filipino gold medalist a fully furnished two-bedroom property worth around $415,000 at McKinley Hill in Taguig, a coastal city in Manila.According to Megaworld's press release, the development is home to several athletes from the country's previous national teams. It also sits less than two miles from Shangri-La The Fort, a…
Read More
Reddit Sales Exceed Analysts’ Expectations on Advertising Growth

Reddit Sales Exceed Analysts’ Expectations on Advertising Growth

Reddit Inc. said it expects steady revenue growth from new data licensing partnerships and advertising technology, charting its path forward as a newly public company.Current-quarter sales will be $290 million to $310 million, Reddit said Tuesday in a statement. Analysts had forecast $281.8 million, according the average of estimates compiled by Bloomberg. Source link lol
Read More
The DOJ says a man with Iranian ties in New York tried to hire hitmen, a female spy, and a fake protest mob to assassinate US officials

The DOJ says a man with Iranian ties in New York tried to hire hitmen, a female spy, and a fake protest mob to assassinate US officials

The Justice Department has charged a Pakistani man who traveled to New York and Houston with planning the political assassination of US officials.Asif Raza Merchant, 46, was arrested on July 12 and has been accused by federal prosecutors of having close ties with Iran's government.The Justice Department said he'd planned to hire hitmen to carry out the assassinations, as well as a woman who would perform "reconnaissance" and 25 people who would stage a protest as a distraction after the killings.The DOJ's complaint, unsealed on Tuesday, did not mention who Merchant was targeting. But CNN and Reuters reported, each citing…
Read More
On Biases in a UK Biobank-based Retinal Image Classification Model

On Biases in a UK Biobank-based Retinal Image Classification Model

arXiv:2408.02676v1 Announce Type: new Abstract: Recent work has uncovered alarming disparities in the performance of machine learning models in healthcare. In this study, we explore whether such disparities are present in the UK Biobank fundus retinal images by training and evaluating a disease classification model on these images. We assess possible disparities across various population groups and find substantial differences despite strong overall performance of the model. In particular, we discover unfair performance for certain assessment centres, which is surprising given the rigorous data standardisation protocol. We compare how these differences emerge and apply a range of existing bias mitigation…
Read More
RIP Chromecast: Looking back at 11 years of Google streaming

RIP Chromecast: Looking back at 11 years of Google streaming

Google’s Chromecast is no more. With Tuesday’s introduction of its successor, the company laid to rest the brand that kicked off 11 years ago with a novel product that helped move streaming onto the center stage of home entertainment. With the Google TV Streamer taking the baton, it’s time to look back at 11 years of Chromecast.Google’s casting-centric brand arrived on July 24, 2013, with the first-generation Chromecast. The streaming stick plugged directly into a TV’s HDMI port and lacked a remote control. Instead, you fired up content using a mobile device or computer.Most importantly, the innovative gizmo only cost…
Read More
Coupang Posts First Loss Since 2022 as Farfetch Deal Saps Profit

Coupang Posts First Loss Since 2022 as Farfetch Deal Saps Profit

Coupang Inc. posted its first loss in two years, after the acquisition of unprofitable Farfetch Holdings Plc and a government regulatory fine offset strong growth in its core e-commerce business.South Korea’s largest online retailer posted a net loss of $77 million for the June quarter, versus an average estimate for a loss of $11.7 million. Excluding Farfetch and a fine from the Korean authorities, Coupang’s second-quarter net income came to about $124 million, it said in a statementBloomberg Terminal. Source link lol
Read More
Compositional Physical Reasoning of Objects and Events from Videos

Compositional Physical Reasoning of Objects and Events from Videos

arXiv:2408.02687v1 Announce Type: new Abstract: Understanding and reasoning about objects' physical properties in the natural world is a fundamental challenge in artificial intelligence. While some properties like colors and shapes can be directly observed, others, such as mass and electric charge, are hidden from the objects' visual appearance. This paper addresses the unique challenge of inferring these hidden physical properties from objects' motion and interactions and predicting corresponding dynamics based on the inferred physical properties. We first introduce the Compositional Physical Reasoning (ComPhy) dataset. For a given set of objects, ComPhy includes limited videos of them moving and interacting under…
Read More
No widgets found. Go to Widget page and add the widget in Offcanvas Sidebar Widget Area.