stp2y

29302 Posts
Instruction Tuning OPT-125M

Instruction Tuning OPT-125M

Large language models are pretrained on terabytes of language datasets. However, the pretraining dataset and strategy teach the model to generate the next token or word. In a real world sense, this is not much useful. Because in the end, we want to accomplish a task using the LLM, either through chat or instruction. We can do so through fine-tuning an LLM. Generally, we call this instruction tuning of the language model. To this end, in this article, we will use the OPT-125M model for instruction tuning. Figure 1. Output sample after instruction tuning OPT-125M on the Open Assistant Guanaco…
Read More
Data Moats in Generative AI

Data Moats in Generative AI

The deep learning wave of the early 2010s led to a surge of data-hungry products. These products needed so much data that gathering it requires significant investment. So, the business community started honing the idea of data as a strategic asset and a business moat. As the Economist put it in a 2017 issue, “The world’s most valuable resource is no longer oil, but data.” This essay discusses data moats in today’s context of generative AI, which is driven by models that are exponentially more data-hungry. But first, what is a data moat? what is even an “AI product”?A data…
Read More
Accelerate Mixtral 8x7B pre-training with expert parallelism on Amazon SageMaker | Amazon Web Services

Accelerate Mixtral 8x7B pre-training with expert parallelism on Amazon SageMaker | Amazon Web Services

Mixture of Experts (MoE) architectures for large language models (LLMs) have recently gained popularity due to their ability to increase model capacity and computational efficiency compared to fully dense models. By utilizing sparse expert subnetworks that process different subsets of tokens, MoE models can effectively increase the number of parameters while requiring less computation per token during training and inference. This enables more cost-effective training of larger models within fixed compute budgets compared to dense architectures. Despite their computational benefits, training and fine-tuning large MoE models efficiently presents some challenges. MoE models can struggle with load balancing if the tokens…
Read More
Study Finds That 52 Percent of ChatGPT Answers to Programming Questions Are Wrong

Study Finds That 52 Percent of ChatGPT Answers to Programming Questions Are Wrong

Ah yes. And yet...Not So SmartIn recent years, computer programmers have flocked to chatbots like OpenAI's ChatGPT to help them code, dealing a blow to places like Stack Overflow, which had to lay off nearly 30 percent of its staff last year.The only problem? A team of researchers from Purdue University presented research this month at the Computer-Human Interaction conference that shows that 52 percent of programming answers generated by ChatGPT are incorrect.That's a staggeringly large proportion for a program that people are relying on to be accurate and precise, underlining what other end users like writers and teachers are experiencing:…
Read More
Least-Squares Concept Erasure with Oracle Concept Labels

Least-Squares Concept Erasure with Oracle Concept Labels

This post assumes some familiarity with the idea of concept erasure and our LEACE concept erasure method. We encourage the reader to consult our arXiv paper for background. For a PyTorch implementation of this method, see the OracleFitter class in our GitHub repository. WARNING: Because this erasure transformation depends on the ground truth concept label, it can increase the nonlinearly-extractable information about the target concept inside a representation, even though it eliminates the linearly available information. For this reason, optimizing deep neural networks on top of O-LEACE'd representations is not recommended; for those use cases we recommend vanilla LEACE. In…
Read More
Sigma Secures $200M Round to Advance Its BI and Analytics Solutions

Sigma Secures $200M Round to Advance Its BI and Analytics Solutions

(NicoElNino/Shutterstock) Sigma Computing, a cloud-based analytics solutions provider, has raised $200 million in Series D funding to further advance its efforts in broadening BI use within organizations by enabling users to query and analyze data without writing code.  The latest rounding of funding takes the vendor’s total funding to $581.3 million with a valuation estimated to be around $1.5 billion, a staggering rise of 60% since the last funding round in 2021. The steep rise in valuation is partially a result of rising demand for greater productivity and monetization in the era of cloud data transition.  Spark Capital and Avenir…
Read More
Why ChatGPT feels more “intelligent” than Google Search

Why ChatGPT feels more “intelligent” than Google Search

The Google Search bar doesn’t feel like an artificial intelligence. No one speculates that it might soon become an artificial general intelligence (AGI) — an entity that is competitive across many domains to a human being.But do you know many “generally intelligent” humans who can muster a decent translation into and out of 133 languages? Or a co-worker who can perform mathematical operations instantly, know a good route between pretty much any two locations on the planet, and proffer a plausible answer to every question you might ask, in less than a second? Sure, these answers aren’t original, and nor are they…
Read More
LeoLabs zeroes in on anomalies in satellite operations

LeoLabs zeroes in on anomalies in satellite operations

COLORADO SPRINGS – LeoLabs, the Silicon Valley startup mapping activity in low-Earth orbit, is relying on artificial intelligence to spot anomalous satellite operations. A LeoLabs visualization tool shown at the 39th Space Symposium tracks maneuvers performed by satellites that change their orbits frequently. And it highlights maneuvers conducted by satellites that did not typically perform them. Three of the first satellites in a Chinese communications constellation that could include 12,000 satellites, for example, remained in stable orbits for months after they were launched in late 2023. “Then at the same time, all three executed an organized maneuver campaign,” Owen Marshall,…
Read More
AT&T CEO: Focused on Always-On Connectivity

AT&T CEO: Focused on Always-On Connectivity

AT&T CEO John Stankey joins Caroline Hyde and Ed Ludlow following his comments on the company's multi-year growth strategy at the JPmorgan technology conference. He discusses slowing demand for handsets, always-on connectivity, and why he thinks AST is more "consumer-centric" than Starlink. He speaks on "Bloomberg Technology." (Source: Bloomberg) Source link lol
Read More
Data Machina #251

Data Machina #251

Three New Powerful Open AI Models. I’m told by colleagues at Hugging Face that just a week since LLama-3 was released, more than +10,000 model derivatives have been developed! The pressure on black-box, closed AI models is huge, and achieving GPT-4 performance with open, smallish models is upon us. Which is great. In the last few days, three new, smallish, powerful open AI models were released. Interestingly enough, the power of these 3 models is based on a combination of: 1) Innovative training architectures and optimisation techniques, and 2) Data quality for different types of data (synthetic, public or private).…
Read More
No widgets found. Go to Widget page and add the widget in Offcanvas Sidebar Widget Area.