stp2y

32697 Posts
AI, Platforms, And Big Promises: What We Saw At Red Hat Summit

AI, Platforms, And Big Promises: What We Saw At Red Hat Summit

IT platform teams’ responsibilities in 2024 are expanding radically. They’re being asked to provide cost-effective alternative solutions for the VMware virtualization technology that typically dominates enterprise IT. While they’re at it, could they please also stand up state-of-the-art AI platforms that can support generative AI in the data center and cloud that work as well as big cloud-managed AI services but for less money? Those were the big questions hanging over some 6,000 attendees of Red Hat Summit in Denver May 6–9. Red Hat responded with a series of announcements of offerings — some immediate, some long-term — that are…
Read More
Asymmetric Certified Robustness via Feature-Convex Neural Networks

Asymmetric Certified Robustness via Feature-Convex Neural Networks

Asymmetric Certified Robustness via Feature-Convex Neural Networks TLDR: We propose the asymmetric certified robustness problem, which requires certified robustness for only one class and reflects real-world adversarial scenarios. This focused setting allows us to introduce feature-convex classifiers, which produce closed-form and deterministic certified radii on the order of milliseconds. Figure 1. Illustration of feature-convex classifiers and their certification for sensitive-class inputs. This architecture composes a Lipschitz-continuous feature map $varphi$ with a learned convex function $g$. Since $g$ is convex, it is globally underapproximated by its tangent plane at $varphi(x)$, yielding certified norm balls in the feature space. Lipschitzness of $varphi$…
Read More
Ethical, trust, and skill barriers slow generative AI progress in EMEA

Ethical, trust, and skill barriers slow generative AI progress in EMEA

76% of consumers in EMEA think AI will significantly impact the next five years, yet 47% question the value that AI will bring and 41% are worried about its applications. This is according to research from enterprise analytics AI firm Alteryx. Since the release of ChatGPT by OpenAI in November 2022, there has been significant buzz about the transformative potential of generative AI, with many considering it one of the most revolutionary technologies of our time.  With a significant 79% of organisations reporting that generative AI contributes positively to business, it is evident that a gap needs to be addressed to demonstrate AI’s value to consumers both in…
Read More
Pocket-Sized AI Models Could Unlock a New Era of Computing

Pocket-Sized AI Models Could Unlock a New Era of Computing

When ChatGPT was released in November 2023, it could only be accessed through the cloud because the model behind it was downright enormous.Today I am running a similarly capable AI program on a Macbook Air, and it isn’t even warm. The shrinkage shows how rapidly researchers are refining AI models to make them leaner and more efficient. It also shows how going to ever larger scales isn’t the only way to make machines significantly smarter.The model now infusing my laptop with ChatGPT-like wit and wisdom is called Phi-3-mini. It’s part of a family of smaller AI models recently released by…
Read More
Most US TikTok Creators Don’t Think a Ban Will Happen

Most US TikTok Creators Don’t Think a Ban Will Happen

A majority of US TikTok creators don’t believe the platform will be banned within a year, and most haven’t seen brands they work for shift their marketing budgets away from the app, according to a new survey of people who earn money from posting content on TikTok shared exclusively with WIRED.The findings suggest that TikTok’s influencer economy largely isn’t experiencing existential dread after Congress passed a law last month that put the future of the app’s US operations in jeopardy. The bill demands that TikTok separate from its Chinese parent company within a year or face a nationwide ban; TikTok…
Read More
Instruction Tuning OPT-125M

Instruction Tuning OPT-125M

Large language models are pretrained on terabytes of language datasets. However, the pretraining dataset and strategy teach the model to generate the next token or word. In a real world sense, this is not much useful. Because in the end, we want to accomplish a task using the LLM, either through chat or instruction. We can do so through fine-tuning an LLM. Generally, we call this instruction tuning of the language model. To this end, in this article, we will use the OPT-125M model for instruction tuning. Figure 1. Output sample after instruction tuning OPT-125M on the Open Assistant Guanaco…
Read More
Data Moats in Generative AI

Data Moats in Generative AI

The deep learning wave of the early 2010s led to a surge of data-hungry products. These products needed so much data that gathering it requires significant investment. So, the business community started honing the idea of data as a strategic asset and a business moat. As the Economist put it in a 2017 issue, “The world’s most valuable resource is no longer oil, but data.” This essay discusses data moats in today’s context of generative AI, which is driven by models that are exponentially more data-hungry. But first, what is a data moat? what is even an “AI product”?A data…
Read More
Accelerate Mixtral 8x7B pre-training with expert parallelism on Amazon SageMaker | Amazon Web Services

Accelerate Mixtral 8x7B pre-training with expert parallelism on Amazon SageMaker | Amazon Web Services

Mixture of Experts (MoE) architectures for large language models (LLMs) have recently gained popularity due to their ability to increase model capacity and computational efficiency compared to fully dense models. By utilizing sparse expert subnetworks that process different subsets of tokens, MoE models can effectively increase the number of parameters while requiring less computation per token during training and inference. This enables more cost-effective training of larger models within fixed compute budgets compared to dense architectures. Despite their computational benefits, training and fine-tuning large MoE models efficiently presents some challenges. MoE models can struggle with load balancing if the tokens…
Read More
Study Finds That 52 Percent of ChatGPT Answers to Programming Questions Are Wrong

Study Finds That 52 Percent of ChatGPT Answers to Programming Questions Are Wrong

Ah yes. And yet...Not So SmartIn recent years, computer programmers have flocked to chatbots like OpenAI's ChatGPT to help them code, dealing a blow to places like Stack Overflow, which had to lay off nearly 30 percent of its staff last year.The only problem? A team of researchers from Purdue University presented research this month at the Computer-Human Interaction conference that shows that 52 percent of programming answers generated by ChatGPT are incorrect.That's a staggeringly large proportion for a program that people are relying on to be accurate and precise, underlining what other end users like writers and teachers are experiencing:…
Read More
Least-Squares Concept Erasure with Oracle Concept Labels

Least-Squares Concept Erasure with Oracle Concept Labels

This post assumes some familiarity with the idea of concept erasure and our LEACE concept erasure method. We encourage the reader to consult our arXiv paper for background. For a PyTorch implementation of this method, see the OracleFitter class in our GitHub repository. WARNING: Because this erasure transformation depends on the ground truth concept label, it can increase the nonlinearly-extractable information about the target concept inside a representation, even though it eliminates the linearly available information. For this reason, optimizing deep neural networks on top of O-LEACE'd representations is not recommended; for those use cases we recommend vanilla LEACE. In…
Read More
No widgets found. Go to Widget page and add the widget in Offcanvas Sidebar Widget Area.