stp2y

32533 Posts
Building MLOps Capabilities at GitLab As a One-Person ML Platform Team

Building MLOps Capabilities at GitLab As a One-Person ML Platform Team

Eduardo Bonet is an incubation engineer at GitLab, building out their MLOps capabilities. One of the first features Eduardo implemented in this role was a diff for Jupyter Notebooks, bringing code reviews into the data science process. Eduardo believes in an iterative, feedback-driven product development process, although he emphasizes that “minimum viable change” does not necessarily mean that there is an immediately visible value-add from the user’s point of view. While LLMs are quickly gaining traction, Eduardo thinks they’ll not replace ML or traditional software engineering but add to the capabilities. Thus, he believes that GitLab’s current focus on MLOps…
Read More
Generative AI Translation Startup DeepL Locks Up $300M

Generative AI Translation Startup DeepL Locks Up $300M

Investors can’t get enough of different ways to use generative AI. Translation and language startup DeepL became the latest startup using generative AI to raise big, nabbing $300 million at a $2 billion post-money valuation in a round led by Index Ventures. The valuation is about double its previous $1 billion-plus valuation from January 2023. The round included participation from ICONIQ Growth, Teachers’ Venture Growth, IVP, Atomico and WiL (World Innovation Lab). The German startup language AI platform offers writing, editing and translation services for 63 markets and 32 languages for business use cases. The company said it has already…
Read More
Announcing General Availability of Liquid Clustering

Announcing General Availability of Liquid Clustering

We’re excited to announce the General Availability of Delta Lake Liquid Clustering in the Databricks Data Intelligence Platform. Liquid Clustering is an innovative data management technique that replaces table partitioning and ZORDER so you no longer have to fine-tune your data layout to achieve optimal query performance.  Liquid clustering significantly simplifies data layout-related decisions and provides the flexibility to redefine clustering keys without data rewrites. It allows data layout to evolve alongside analytic needs over time – something you could never do with partitioning on Delta.  Since the Public Preview of Liquid Clustering at the Data and AI Summit last year, we’ve…
Read More
Beautiful dashboards in Python with first-class real-time integration | Deephaven

Beautiful dashboards in Python with first-class real-time integration | Deephaven

from deephaven import ui, agg, empty_tablefrom deephaven.stream.table_publisher import table_publisherfrom deephaven.stream import blink_to_append_onlyfrom deephaven.plot import express as dxfrom deephaven import updateby as ubyfrom deephaven import dtypes as dhtstocks = dx.data.stocks().reverse()def set_bol_properties(fig): fig.update_layout(showlegend=False) fig.update_traces(fill="tonexty", fillcolor='rgba(255,165,0,0.08)')@ui.componentdef line_plot( filtered_source, exchange, window_size, bol_bands): window_size_key = { "5 seconds": ("priceAvg5s", "priceStd5s"), "30 seconds": ("priceAvg30s", "priceStd30s"), "1 minute": ("priceAvg1m", "priceStd1m"), "5 minutes": ("priceAvg5m", "priceStd5m")} bol_bands_key = {"None": None, "80%": 1.282, "90%": 1.645, "95%": 1.960, "99%": 2.576} base_plot = ui.use_memo(lambda: ( dx.line(filtered_source, x="timestamp", y="price", by="exchange" if exchange == "All" else None, unsafe_update_figure=lambda fig: fig.update_traces(opacity=0.4)) ), [filtered_source, exchange]) window_size_avg_key_col = window_size_key[window_size][0] window_size_std_key_col = window_size_key[window_size][1] avg_plot = ui.use_memo(lambda: dx.line(filtered_source, x="timestamp",…
Read More
Ghostbuster: Detecting Text Ghostwritten by Large Language Models

Ghostbuster: Detecting Text Ghostwritten by Large Language Models

The structure of Ghostbuster, our new state-of-the-art method for detecting AI-generated text. Large language models like ChatGPT write impressively well—so well, in fact, that they’ve become a problem. Students have begun using these models to ghostwrite assignments, leading some schools to ban ChatGPT. In addition, these models are also prone to producing text with factual errors, so wary readers may want to know if generative AI tools have been used to ghostwrite news articles or other sources before trusting them. What can teachers and consumers do? Existing tools to detect AI-generated text sometimes do poorly on data that differs from…
Read More

OpenAI and Wall Street Journal owner News Corp sign content deal

ChatGPT developer OpenAI has signed a deal to bring news content from the Wall Street Journal, New York Post, the Times and the Sunday Times to the artificial intelligence platform, the companies said on Wednesday. Neither party disclosed a dollar figure for the deal.The deal will give OpenAI access to current and archived content from all of News Corp’s publications. The deal comes weeks after the AI heavyweight signed a deal with the Financial Times to license its content for the development of AI models. Earlier this year, OpenAI inked a similar contract with Axel Springer, parent of Business Insider…
Read More

Documenting your PhD — Keeping Track of Meetings, Experiments and Decisions • David Stutz

A PhD can be a difficult endeavour. While becoming an expert in tackling a specific problems, it is easy to lose track of things: Have I read this paper before? What was the paper saying? Why did we decide to change course? Why am I running these experiments? Who did I talk to at the last conference? Throughout my PhD, I learned that documenting these things made me significantly more effective with minimal overhead. In this article, I want to share some learnings around what and how to keep track of PhD work. Introduction In the beginning of my PhD,…
Read More
What I learned from looking at 900 most popular open source AI tools

What I learned from looking at 900 most popular open source AI tools

[Hacker News discussion, LinkedIn discussion, Twitter thread] Four years ago, I did an analysis of the open source ML ecosystem. Since then, the landscape has changed, so I revisited the topic. This time, I focused exclusively on the stack around foundation models. The full list of open source AI repos is hosted at llama-police. The list is updated every 6 hours. You can also find most of them on my cool-llm-repos list on GitHub. Table of contentsData…. How to add missing reposThe New AI Stack…. AI stack over time…….. Applications…….. AI engineering…….. Model development…….. InfrastructureOpen source AI developers…. One-person billion-dollar…
Read More
Top Data Validation Tools for Machine Learning in 2024

Top Data Validation Tools for Machine Learning in 2024

Image generated with MidjourneyIt was challenging to stop myself from starting this article with some variation of the popular phrase "garbage in, garbage out." Well, I did it anyway. But jokes aside, we can easily imagine a situation in which we have built and deployed a machine learning model (possibly a black box) that accepts some input and returns some predictions. So far, so good.However, with tons of complexity happening before the model (data preprocessing and manipulation), the model itself, and any post-processing of the outputs, many things can go wrong. And in some mission-critical fields (finance, healthcare, or security),…
Read More
Data Centers’ Doubling Power Demand Seen Stressing Energy Grids – EE Times

Data Centers’ Doubling Power Demand Seen Stressing Energy Grids – EE Times

//php echo do_shortcode('[responsivevoice_button voice="US English Male" buttontext="Listen to Post"]') ?> An expected doubling in power consumption by the world’s data centers during the next few years is expected to strain the capacity of electricity suppliers, according to experts who spoke with EE Times. Those power constraints, without improvements in data center efficiency, will potentially impede the expansion of AI. Electricity demand from data centers, AI and cryptocurrency miners will surge by 2026, the Paris-based International Energy Agency (IEA) said in a January report. After consuming an estimated 460 terawatt-hours (TWh) worldwide in 2022, data centers’ total energy intake could more…
Read More
No widgets found. Go to Widget page and add the widget in Offcanvas Sidebar Widget Area.