stp2y

29191 Posts
Announcing General Availability of Liquid Clustering

Announcing General Availability of Liquid Clustering

We’re excited to announce the General Availability of Delta Lake Liquid Clustering in the Databricks Data Intelligence Platform. Liquid Clustering is an innovative data management technique that replaces table partitioning and ZORDER so you no longer have to fine-tune your data layout to achieve optimal query performance.  Liquid clustering significantly simplifies data layout-related decisions and provides the flexibility to redefine clustering keys without data rewrites. It allows data layout to evolve alongside analytic needs over time – something you could never do with partitioning on Delta.  Since the Public Preview of Liquid Clustering at the Data and AI Summit last year, we’ve…
Read More
Beautiful dashboards in Python with first-class real-time integration | Deephaven

Beautiful dashboards in Python with first-class real-time integration | Deephaven

from deephaven import ui, agg, empty_tablefrom deephaven.stream.table_publisher import table_publisherfrom deephaven.stream import blink_to_append_onlyfrom deephaven.plot import express as dxfrom deephaven import updateby as ubyfrom deephaven import dtypes as dhtstocks = dx.data.stocks().reverse()def set_bol_properties(fig): fig.update_layout(showlegend=False) fig.update_traces(fill="tonexty", fillcolor='rgba(255,165,0,0.08)')@ui.componentdef line_plot( filtered_source, exchange, window_size, bol_bands): window_size_key = { "5 seconds": ("priceAvg5s", "priceStd5s"), "30 seconds": ("priceAvg30s", "priceStd30s"), "1 minute": ("priceAvg1m", "priceStd1m"), "5 minutes": ("priceAvg5m", "priceStd5m")} bol_bands_key = {"None": None, "80%": 1.282, "90%": 1.645, "95%": 1.960, "99%": 2.576} base_plot = ui.use_memo(lambda: ( dx.line(filtered_source, x="timestamp", y="price", by="exchange" if exchange == "All" else None, unsafe_update_figure=lambda fig: fig.update_traces(opacity=0.4)) ), [filtered_source, exchange]) window_size_avg_key_col = window_size_key[window_size][0] window_size_std_key_col = window_size_key[window_size][1] avg_plot = ui.use_memo(lambda: dx.line(filtered_source, x="timestamp",…
Read More
Ghostbuster: Detecting Text Ghostwritten by Large Language Models

Ghostbuster: Detecting Text Ghostwritten by Large Language Models

The structure of Ghostbuster, our new state-of-the-art method for detecting AI-generated text. Large language models like ChatGPT write impressively well—so well, in fact, that they’ve become a problem. Students have begun using these models to ghostwrite assignments, leading some schools to ban ChatGPT. In addition, these models are also prone to producing text with factual errors, so wary readers may want to know if generative AI tools have been used to ghostwrite news articles or other sources before trusting them. What can teachers and consumers do? Existing tools to detect AI-generated text sometimes do poorly on data that differs from…
Read More

OpenAI and Wall Street Journal owner News Corp sign content deal

ChatGPT developer OpenAI has signed a deal to bring news content from the Wall Street Journal, New York Post, the Times and the Sunday Times to the artificial intelligence platform, the companies said on Wednesday. Neither party disclosed a dollar figure for the deal.The deal will give OpenAI access to current and archived content from all of News Corp’s publications. The deal comes weeks after the AI heavyweight signed a deal with the Financial Times to license its content for the development of AI models. Earlier this year, OpenAI inked a similar contract with Axel Springer, parent of Business Insider…
Read More

Documenting your PhD — Keeping Track of Meetings, Experiments and Decisions • David Stutz

A PhD can be a difficult endeavour. While becoming an expert in tackling a specific problems, it is easy to lose track of things: Have I read this paper before? What was the paper saying? Why did we decide to change course? Why am I running these experiments? Who did I talk to at the last conference? Throughout my PhD, I learned that documenting these things made me significantly more effective with minimal overhead. In this article, I want to share some learnings around what and how to keep track of PhD work. Introduction In the beginning of my PhD,…
Read More
What I learned from looking at 900 most popular open source AI tools

What I learned from looking at 900 most popular open source AI tools

[Hacker News discussion, LinkedIn discussion, Twitter thread] Four years ago, I did an analysis of the open source ML ecosystem. Since then, the landscape has changed, so I revisited the topic. This time, I focused exclusively on the stack around foundation models. The full list of open source AI repos is hosted at llama-police. The list is updated every 6 hours. You can also find most of them on my cool-llm-repos list on GitHub. Table of contentsData…. How to add missing reposThe New AI Stack…. AI stack over time…….. Applications…….. AI engineering…….. Model development…….. InfrastructureOpen source AI developers…. One-person billion-dollar…
Read More
Top Data Validation Tools for Machine Learning in 2024

Top Data Validation Tools for Machine Learning in 2024

Image generated with MidjourneyIt was challenging to stop myself from starting this article with some variation of the popular phrase "garbage in, garbage out." Well, I did it anyway. But jokes aside, we can easily imagine a situation in which we have built and deployed a machine learning model (possibly a black box) that accepts some input and returns some predictions. So far, so good.However, with tons of complexity happening before the model (data preprocessing and manipulation), the model itself, and any post-processing of the outputs, many things can go wrong. And in some mission-critical fields (finance, healthcare, or security),…
Read More
Data Centers’ Doubling Power Demand Seen Stressing Energy Grids – EE Times

Data Centers’ Doubling Power Demand Seen Stressing Energy Grids – EE Times

//php echo do_shortcode('[responsivevoice_button voice="US English Male" buttontext="Listen to Post"]') ?> An expected doubling in power consumption by the world’s data centers during the next few years is expected to strain the capacity of electricity suppliers, according to experts who spoke with EE Times. Those power constraints, without improvements in data center efficiency, will potentially impede the expansion of AI. Electricity demand from data centers, AI and cryptocurrency miners will surge by 2026, the Paris-based International Energy Agency (IEA) said in a January report. After consuming an estimated 460 terawatt-hours (TWh) worldwide in 2022, data centers’ total energy intake could more…
Read More
Create a multimodal assistant with advanced RAG and Amazon Bedrock | Amazon Web Services

Create a multimodal assistant with advanced RAG and Amazon Bedrock | Amazon Web Services

Retrieval Augmented Generation (RAG) models have emerged as a promising approach to enhance the capabilities of language models by incorporating external knowledge from large text corpora. However, despite their impressive performance in various natural language processing tasks, RAG models still face several limitations that need to be addressed. Naive RAG models face limitations such as missing content, reasoning mismatch, and challenges in handling multimodal data. Although they can retrieve relevant information, they may struggle to generate complete and coherent responses when required information is absent, leading to incomplete or inaccurate outputs. Additionally, even with relevant information retrieved, the models may…
Read More
Newsrooms are experimenting with generative AI, warts and all

Newsrooms are experimenting with generative AI, warts and all

The journalism industry has been under immense economic pressure over the past two decades, so it makes sense that journalists have started experimenting with generative AI to boost their productivity. An Associated Press survey published in April 2024 asked journalists about the use of generative artificial intelligence in their work. Nearly 70% of those who responded said they had used these tools to generate text, whether it was composing article drafts, crafting headlines or writing social media posts. A May 2024 global survey conducted by the public relations firm Cision found the slice to be somewhat smaller – 47% of…
Read More
No widgets found. Go to Widget page and add the widget in Offcanvas Sidebar Widget Area.