Changelog · Prodigy · An annotation tool for AI, Machine Learning & NLP

This page lists the history of changes to Prodigy. Whenever a new update is
available, you’ll receive an email notification sent to the address specified
at checkout. You can then download the new version via your personal
download link. If your free upgrades expired, you can now add 12 months of
updates to your license via our online shop. Please allow up to 24 hours
for your download link to be reactivated.

This update features a brand new container interface, pages, to split a single annotation task over multiple sections, and even combine different interfaces, without losing the simplicity and efficiency of Prodigy’s card-based design. Paginated data can be loaded directly from JSON or from a simple file format and is already supported across all relevant built-in annotation recipes and in the Prodigy-PDF plugin.


new	`pages` interface for multi-page tasks like longer documents, PDFs or collections of images.
new	`Pages` loader for loading paginated files and support for `--loader pages` and paginated input data across relevant built-in recipes.
new	`split_pages` and `merge_pages` preprocessors and support for pages in `train` and `data-to-spacy`.
fix	Enhance front-end validation of task data to provide more helpful error messages in the UI.
doc	Update documentation on custom interfaces, computer vision for PDFs, and NER with long texts.
fyi	`"pages"` and `"page_titles"` are now a protected keys in the JSON data (like `"text"`) and you should avoid using them for anything that’s not paginated examples.
fyi	Properties in a JSON task’s `"meta"` that start with an underscore `_` are now considered internal and not displayed in the annotation interface.

For this release, we’ve removed all Cython-compiled source code and are now shipping Prodigy as cross-platform Python wheels, which makes it easier to develop custom recipes and improves type checking and IDE support. The update also includes several front-end fixes, like the restoration of the timeline for the audio interface and enabling wrapping of versions in review. On the backend, we’ve refactored drop to mitigate issues related to SQLITE_MAX_VARIABLE_NUMBER.


fyi	Prodigy is now distributed as pure Python wheels without compiled Cython.
fix	Restore timeline functionality for the `audio` interface.
fix	Enable wrapping of versions in the `review` interface.
fix	Improve `llm.fetch` family of recipes by adding the default `accept` answer.
fix	Improve the `drop` logic to mitigate potential issues with `SQLITE_MAX_VARIABLE_NUMBER`.

This patch release restores the possibility to return an instance of the Controller object from the recipe.


fix	Re-enable returning `Controller` object from the recipe.

This patch release fixes a bug in training config generation for the default textcat and textcat-multilabel spaCy components.


fix	Training config generation for `textcat` and `textcat-multilabel`.

This patch release pins the version of numpy to <2.0.0 to avoid installation
issues due to backwards-incompatible change introduced in numpy 2.0.0.

This patch release updates the version of wavesurfer.js dependency, fixing a
regression in audio.manual that prevented marking regions to the left of
the cursor.


fix	Restore marking regions to the left of the cursos in audio.manual.
fyi	Updated `wavesurfer.js` to ^7.7.15

This patch release fixes a few bugs in review, rel.manual and

metrics.iaa.doc recipes.


fix	Support non string labels in `metric.iaa.doc`.
fix	Fix token disabling via `_` property patterns in `relations`.
fix	Fix the stream generation in `review` so that the relations annotations that differ in order only are not shown as different.

This release adds support for fastapi up to 0.111.0.


new	Allow for `fastapi`<0.111.0.
fix	Fix the issue with `accept_single` flag not filtering the tasks annotated by a single annotator correctly in `review`.
fyi	Make the `No tasks available` screen more informative.

This patch release updates ab.llm.tournament recipe to spacy-llm >=
0.7.0


fix	Update the processing of the model’s response in `ab.llm.tournament` to `spacy-llm` >= 0.7.0 .
fyi	Internal refactoring of dataset processing to support `prodigy-evaluate` plugin .

Plugin updates

Prodigy 1.15.2 release also comes with a new open-source plugin that adds to
your QA toolset. prodigy-evaluate provides commands
for evaluating spaCy pipeline overall or on a per-example basis using a variety
of metrics.

We’ve updated Prodigy company SSO plugin to offer more flexibility
in scopes defnition making the profile scope optional. This can be helpful if
profile results in excessive network payloads.


new	Make `profile` scope optional for `prodigy-sso` plugin.

This release improves the training config generation used by train,

train-curve and data-to-spacy by fixing a bug that previously
prevented the use of transformer spaCy pipelines as base models. Additionally,
the sourcing of the tokenizer from the base model is now automated. We have also
bumped uvicorn dependency to allow <0.27.


new	Automate the sourcing of the tokenizer from the base model.
new	Allow for `uvicorn` <0.27.
fix	Allow `transformer` as the embedding layer for spaCy base model.

This release adds support for the new Prodigy Company Plugins
package that can be downloaded with a Company License. This first
premium plugin for SSO (Single Sign-On) features OIDC authentication across a
variety of providers, including Auth0, Okta, Google, Microsoft Entra, and more.
For more details on the new company features see the OIDC docs.


new	Add support for `prodigy-company-license` 0.1.0

This update allows Prodigy to use spacy-llm <1.0.0 with its
recent addition of
new tasks including
entity linking and
translation, as well as
support for arbitrarily long docs.


new	Allow `spacy-llm` <1.0.0.

This update fixes the recipe config override from the task level so that the
view_id attribute can be updated. It is now possible to design data streams
with different view_ids without requiring a server restart.


fix	It is now possible to overwrite `view_id` (and other config atrributes) from the task level.
new	Added a “Support” column to `metric.iaa.span` result table to represent that number of examples on which the metric was calculated.

This patch release restores the functionality of the audio_rate setting. We
have also upgraded our wavesurfer.js dependency to 7.4.4 which is related to
the deprecation of the show_audio_timeline setting and a slight change in how
the sound waves are rendered in the UI. We’re also excited to announce a new
plugin for image segmentation that leverages Meta’s
Segment Anything model.


new	Updated `wavesurfer.js` dependency to 7.4.4.
new	Added a new `window.prodigy.resetQueue` method available in the frontend, meant to be used together with custom events.
fix	Fixed `audio_rate` setting for audio recipes.
fyi	Deprecated the `show_audio_timeline` setting due to `wavesurfer.js` upgrade.
doc	Updated the custom recipes section to contain examples that leverage `radicli` instead of `plac`.

Plugin Updates

During this release we also released a new plugin:
Prodigy-Segment! This allows you leverage Meta’s
segment-anything model to select pixels from images.

We’ve also introduces new features for Prodigy-ANN and
Prodigy-LUNR, which allows you to reset the stream from
the UI.


new	`Prodigy-Segment` plugin is added for image segmention with Meta’s Segment Anything model.
new	Many `Prodigy-ANN` and `Prodigy-LUNR` recipes now carry a `--allow-reset` flag that allows users to reset the stream from the UI.
doc	Updated the plugins section to reflect the new additions.

This patch release adds support for Python 3.12 and fixes a regression related
to the prodigy.serve function introduced during the transition to radicli.
Additionally, it restores the functionality of using stdin as a source.


new	Added support for Python 3.12.
new	Added the Controller.reset_stream method to allow custom recipes to reset the stream
fix	Fixed CLI argument proccessing in `prodigy.serve` to make it compatible with `radicli`.
fix	Restored the use of stdin as source.

This patch release fixes inferring exclusive labels from spaCy textcat models.


fix	Fixed checking for exclusive labels in spaCy textcat models.

This patch release updates Prodigy to use spacy-llm <0.7.0.


new	Alow `spacy-llm` <0.7.0.

This patch release fixes a bug when sourcing custom JavaScript from a string.


fix	Fixed a bug related to injecting inline JavaScript.

Plugin Updates

This release is paired with a new
prodigy-whisper plugin which
helps with audio transcription by using
OpenAI’s Whisper model in the loop. The
docs has all the details on how to use
this new feature.


new	Added `whisper.audio.transcribe` recipe for audio transcription with a model in the loop.
new	Added `--segment` feature to `whisper.audio.transcribe` to automatically split audio into segments

This release adds an extra validation step to some textcat recipes to make sure
no empty annotations are written into the database. This behavior can be turned
off with a flag if users prefer the original behavior.


new	Added validation for empty annotations for exclusive textcat models in `textcat.manual` and `textcat.correct`.
new	Added `--accept_empty` flag to `textcat.manual` and `textcat.correct` to turn off the new validation.
fix	Fixed a bug in `texcat.correct` that mistook exclusive textcat models for non-exclusive ones.
fix	Fixed a runtime error when combining the `PatternMatcher` with an nlp model in `textcat` recipes.
fix	Corrected the spelling of spaCy in the output of the `stats` command.

Plugin Updates

This release is paired with new features via
the Prodigy Hugging Face plugin. As
of v0.2.0 new
recipes have been added for text classification as well as model-in-the-loop
recipes. The docs for this plugin have been
updated too.


new	Added `hf.train.ner` and `hf.train.textcat` to train text classification transformer models recipe.
new	Added `hf.correct.ner` and `hf.correct.textcat` to use transformer models while annotating new data
new	Added `hf.upload` to upload your annotations to the Hugging Face Hub
doc	Added a new section for the plugin on the docs

This release updates Prodigy to be compatible with spaCy >=3.1.1,<3.8.0 and
Pydantic >=1.10.8,<3.0.


new	Updated `spacy` and `pydantic` dependencies.

This release adds an improved character highlighting feature for

ner_manual and spans_manual that allows for switching between character and token
highlighting from the UI, while annotating.

We have also facilitated developing with custom css and javascript by adding
support for mounting css and javascript files from local directories and remote
URLs.


new	Added a toggle for switching between character and token highlighting in `ner_manual` and `spans_manual` UIs.
new	Support mounting CSS and JS code from local directories and remote URLs.
fix	Fixed annotator filtering in IAA recipes.

This patch release improves error messages and fixes a bug in spacy-config
that prevented to config to be saved on disk properly.


fix	Improve error messages for `iaa` and `stream` modules.
fix	Fixed an issue with saving the config file to disc in `spacy-config`

Plugin Updates

This release is paired with new features for some Prodigy plugins.

Prodigy-PDF, as of
v0.2.0, now
support the pdf.image.ocr recipe. This recipe adds OCR to annotated segments
from the pdf.image.manual recipe and gives a textbox for corrections. This
recipe uses pytessaract under the hood.

Prodigy-ANN, as of
v0.2.0, now
supports recipes that handle retreival for images, besides text via
ann.image.index, ann.image.fetch and image.ann.manual.


new	Add `pdf.image.ocr` recipe.
new	Add `ann.image.index`, `ann.image.fetch` and `image.ann.manual` recipes.
fix	Ensure that `pdf.image.manual` stores all required information for OCR.
fix	Fixed a bug in the `ner.ann.manual`, `spans.ann.manual`, `ner.lunr.manual` and `spans.lunr.manual` recipes that prevented the server from starting.
fyi	Changed the first default color in `pdf.image.manual` to a darker color to ensure high constrast.

This release adds two new commands for computing inter-annotator agreement for
document level and token level annotations. We also introduce Prodigy Plugins:
Prodigy-PDF,
Prodigy-ANN and
Prodigy-LUNR. Prodigy Plugins are
add-ons that extend Prodigy’s functionality with third party libraries. They are
open source and can be installed separately to be used with 1.14.3 and above.


new	Inter annotator agreement for document level and token level annotations.
fyi	Prodigy Plugins for PDF processing and selecting relevant subsets of data.
fix	Fixed the display of the history in the sidebar.
fix	Fixed the truncated display of the available recipes outputted by the `prodigy` command.
doc	Added a new section on inter annotator agreement metrics.
doc	Added a new section on Prodigy Plugins.

This patch update addresses a backward compatibility issue that was introduced
in version 1.14.0, where the get_labels helper function was removed
potentially affecting custom recipes.


fix	Restored `get_label` function for backward compatibility.

This release adds support for custom
Recipe Event Hooks to allow for basic
interactivity in custom Prodigy Recipes. It adds a new window.prodigy.event
function to the window.prodigy object available for use in Custom Recipe
JavaScript. This completes the initial work on an ongoing, undocumented feature
we’ve been using for a while.


new	Added support for basic interface interactivity via custom event hooks.

This release focuses on improving Prodigy internals. We have substituted plac
with radicli for CLI development which
brings DX improvements such as using type hints for argument parsing including
support for custom types as well as custom CLI errors. Please check radicli
documentation for a complete overview of
benefits.

Higher versions of pydantic (<3.0), fastapi (<0.103.0) and spacy-llm
(<0.6.0) dependencies are now supported. Since spacy-llm 0.5.0 adds support
for chain-of-thought prompting, there’s
now a corresponding section in the
docs with examples.

We have also improved typing and error handling across Prodigy.

Finally, some of the older, deprecated helper functions are no longer available:

Reddit dataset loader
read_jsonl, write_jsonl, read_json, b64_uri_to_bytes,
pretty_print_ner, pretty_print_tc


new	Improved CLI by substituting `plac` with `radicli`.
new	Allowed latest versions of `pydantic`, `fastapi` and `spacy-llm`.
doc	Added LLM section with explainers for chain of thought prompts for NER and spancat.
fyi	Deprecated `Reddit` loader and older helpers: `read_jsonl`, `write_jsonl`, `read_json`, `b64_uri_to_bytes`, `pretty_print_ner`, `pretty_print_tc`.

This patch release fixes a bug in the review recipe which prevented
overwriting view-id attribute on CLI. This is particularly relevant when using
datasets with block view-id as input to review, including the
output of *.llm.correct recipes.


fix	Fixed a bug in `review` that didn’t permit overwriting `view-id` attribute for `blocks` interface.

This release introduces spacy-llm variants of terms.openai.fetch and

ab.openai.tournament recipes. The terms.llm.fetch recipe can generate terms and
phrases using an LLM. And the ab.llm.tournament recipe can be used for prompt engineering
and/or comparing different LLM backends. This means that we now have replacements for all the
*.openai.* recipes, which is why they now all carry deprecation notices.

We have also added a new annotation interface llm-io to facilitate
writing custom LLM recipes and fixed a task router bug related to server
restarts.


new	Added `terms.llm.fetch` which can use `spacy-llm` to fetch relevant phrases and terms.
new	Added `ab.llm.tournament` which can be used for prompt engineering and comparing LLM backends.
new	Added `llm-io` interface to show the prompt/response from an LLM.
fix	Fixed a bug that caused inconsistencies in task routers when dealing with many server restarts.
fyi	All the `.openai.` recipes now carry deprecation warnings, because there are `spacy-llm` variants to replace them.

This release introduces recipes that allow spaCy pipelines to annotate examples.
When you combine these recipes with the review recipe, you’re able to
focus on examples where models disagree.

This pattern is powerful because these examples typically carry a lot of
information for your model. But it is also very useful given the spaCy-LLM
integration introduced in v1.13.0, which makes it relatively easy to compare
your own model against an LLM pipeline.

This release introduces support for
spacy-llm, which gives an even
wider support for large language models for NER, textcat and spancat
annotations. Future recipes that leverage large language models will also use
the spaCy-LLM backend and the OpenAI recipes will be deprecated.

This release fixes the issue where DatasetSource, GeneratorSource and
ListSource might end up in erroneous state at the end of the iteration due to
incorrect position resetting. This would also lead to unexpected progress bar
updates.


fix	Remove position reset when closing `DatasetSource`, `GeneratorSource` and `ListSource`.

This release fixes intermittent MySQL integrity errors during bulk DB
operations.


fix	Remove bulk inserts to DB to make the operations more stable.

This release adds temporary support to legacy (pre 1.12.X) loaders in the new
get_stream utility. It also fixes a
few minor CLI and config processing bugs. We have also improved the error
message for missing DB drivers.


fix	Add support to legacy (pre 1.12.X) loaders in `get_stream` utility.
fix	Fix the processing of the `:ignore`, `:accept`, `:reject` suffixes in dataset source CLI.
fix	Improve error message in the event of missing DB drivers.
fix	Fix the support for the `delimiter` argument of the legacy `CSV` loader.
fix	Fix the processing of `hide_arrow_heads` and `hide_true_newline_token` config settings in the `rel.manual` recipe.

This release includes an additional bug fix for the frontend.


fix	Fix issue where individual image spans could not be selected in the `image_manual` view for the Polygon and Freehand tools. This builds on the fix in `v1.12.3` which only fixed selection of image spans annotated with the rectangle tool.

This release includes major bug fixes for the frontend and an extra video docs
for task routing:


fix	Fix issue where individual image spans could not be selected in the `image_manual` view
fix	Fix an issue where the “Save” button could be clicked twice and save a duplicate answer to the database.
fix	Fix an issue that would render `br` elements in the frontend
doc	Added docs on Database.get_hashes and Database.count_dataset `Database` methods
doc	Added a new video on task routers that does a deep dive on how to construct your own

Fixes a bug when using a Prodigy Dataset as a source for an Audio or Image
recipe using the dataset:my_dataset_name syntax.


fix	Fix `FileNotFoundError` when using a Dataset as a source for an Audio or Image recipe
doc	Fixed inconsistencies related to session ids.

This update adds support for the latest spaCy version.


new	Extend spaCy support to the latest v3.6.

For this version we have completely refactored Prodigy internals to make the
annotation flow more tractable and more customizable. We have reimplemented the
Controller and added new abstractions to better represent the stream of tasks
and the input source. This let us deliver a number of new, exciting features
such as partial, configurable feed overlap, custom task routers, custom session
factories, source-based progress estimation, support for Parquet input files,
experimental support for training coref component in train, new

filter-by-patterns recipe and DX improvements.

v1.12 also provides support for LLM-assisted workflows for data annotation
and prompt engineering. We have provided 4 new recipes for bootstrapping NER and
Textcat annotation, 1 for terminology generation and 2 for prompt engineering
including a really creative ab.openai.tournament recipe. As of this
version we support python 3.11 and we drop support for python 3.7.

Thanks to everyone who’s helped us by testing the alpha versions. See the
changelog below for a full list of new features.


new	Added a new Controller to facilitate annotation workflow customization.
new	Added support for task routing, allowing you to customise who annotates each example.
new	Added `annotations_per_task` setting to easily configure a task router for partial annotator overlap.
new	Added a selection of task routers to the public API that can be used in custom recipes.
new	Added a `session_factory` callback for custom recipes, giving you control on how sessions are created.
new	Added support for spacy-experimental `coref` component in the `train` and `train-curve` recipes.
new	All of Prodigy’s internal recipes now support the `.parquet` file format as a data source.
new	Added `allow_work_stealing` setting in `prodigy.json` that allows you to turn off work stealing.
new	Added `PRODIGY_LOG_LOCALS` environment variable to supply local variables for when debugging Prodigy error messages
new	Added get_hash_count and get_hashes_min_cardinality methods to database class, which are useful in custom task routers.
new	The `review` recipe now provides a `--accept-single` flag to also automatically accept annotations from a single annotator when `--auto-accept` is also turned on.
new	Added a new `filter-by-patterns` recipe that can use match patterns to produce a relevant subset for a downstream task.
new	Added support for annotation workflows that use large language models from OpenAI as a model-in-the-loop via the `ner.openai.correct`, `ner.openai.fetch`, `textcat.openai.correct` and `textcat.openai.fetch` recipes.
new	Added support for pattern file generation via OpenAI’s large language model with `terms.openai.fetch` recipe.
new	Added support for prompt engineering recipes via the `ab.openai.prompts` and `ab.openai.tournament` recipes.
new	Added new progress calculation based on relative position in a source object.
new	Distinguish between `target progress` and `source progess`in the UI.
fix	Fixed a bug related to `allow_newline_highlight` setting in NER recipes.
fix	Fixed a bug multiple labels in the `mark` recipe.
fix	Fixed a bug related to multiple labels in the `choice` interface
fix	Fixed a bug related to trailing slashes in session names. Prodigy will now ignore trailing slashes.
fix	Added a more helpful error message when a user needs to provide a `/?session=` via the URL
fyi	Removed `auto_count_stream` setting and automated counting of generators passed to `stream` component. See Controller Progress docs for new information on progress calculation.
fyi	Added a warning when custom recipes don’t ensure the hashes are set appropriately on their examples.
fyi	Tracebacks that Prodigy error messages try to hide will now appear in the logs when `PRODIGY_LOGGING` is configured. Details are explained here.
fyi	Dropped support for python 3.7 due to end of life on 2023-06-27.
doc	Added a new usage guides on task routing, large language models and cloud deployments
doc	Fixed typos and inconsistencies

This patch release adds a constraint for
typing_extensions (<4.6.0) to
avoid a version conflict in Pydantic.


fix	Add dependency for `typing_extensions`.

This patch release relaxes the following requirements:
tqdm (>=4.38.0,<5.0.0),
Jinja2 (no pin),
python-dotenv (>=0.21.1,<2.0.0).


fix	Update dependency versions for tqdm, Jinja2, and python-dotenv.

This patch release fixes an issue with the --base-model argument in

train which resulted in erroneous sourcing of the tok2vec component
from the spaCy base model. It also updates the
peewee (<3.17),
pydantic (<2.0),
FastAPI (<0.95.1) and
typeguard (<4.0) dependencies.


fix	Fix `--base-model` parsing in `train`.
fix	Update dependency versions for peewee, pydantic, typeguard and FastAPI.

This release reverts a backward incompatible change from v1.11.10 that removed
database methods and helpers get_db, set_db and disconnect. These methods
were not part of the documented API, but could be accessed via the Prodigy
source available to users, which might have potentially caused problems.


fix	Restore database methods `get_db`, `set_db` and `disconnect`.

This release updates Prodigy to use superior versions of
spaCy (up to 3.5),
pydantic (up to 1.10.4) and
FastAPI (up to 0.89.1) dependencies.


fix	Update dependency versions for spaCy, pydantic and FastAPI.

This update fixes an issue with duplicate examples in multi-user annotation
flows and includes several smaller bug fixes in recipes.


new	Add logging from the frontend to the backend if the frontend ever receives a batch with duplicate tasks.
fix	Prevent duplicate examples from being shown to annotators, specifically in higher latency, production scenarios.
fix	Fix the `audio.transcribe` recipe by putting the custom UI attributes on each task, instead of in the config.
fix	Fix an issue where some unsaved examples could be lost during a browser refresh.
fix	Fix the `--wrap` functionality in `relations` interface.
fix	Fix multiple copies of session IDs being rendered in `review` recipe.

This update includes various bug fixes and usability improvements and extends
support to the latest spaCy versions.


new	Extend spaCy support to the latest v3.4.
new	Add `--use_annotations` argument to `Span Categorization` recipes for `suggesters` that require annotations from other components.
new	`/health` and `/healthz` API endpoints for health checks.
new	Support tokens with `"disabled": true` in input data for `rel.manual`.
new	`Controller.from_components` classmethod.
new	Add validation to ensure `choice` options are unique.
fix	Allow highlighting newlines in `sent.correct`.
fix	Prevent binary conflicting spans from being added to both positive and negative examples in `train`.
fix	Fix `--auto-accept` for binary rejection agreements in `review`.
fix	Automatically prevent duplicates from appearing in training and evaluation set in `train` and `data-to-spacy`.
fix	Improve default config generation for base models with vectors and ensure vectors are used.
fix	Improve warnings for invalid setting combinations in `PatternMatcher`.

This update includes fixes to better support
spaCy v3.2 and improves sentence segmentation
correction, data export for binary NER annotations, and stream handling for
multi-user sessions.


new	Update spaCy version range to allow installing spaCy v3.2 with Prodigy by default.
fix	Fix vector handling for spaCy v3.2 compatibility.
fix	Improve caching to prevent duplication in named multi-user sessions.
fix	Enable newline and whitespace highlighting by default in `sent.correct`.
fix	Ensure binary rejected entity spans are ignored in `train` and `data-to-spacy` if they’re also in the accepted annotations.
doc	Fix typos and inconsistencies.

This update includes wheels for Python 3.10, as well as small fixes to recipes
and config generation.


new	Pre-built wheels for Python 3.10.
fix	Always use best score in `train-curve` and `train` with `--label-stats`, not last.
fix	Ensure hashes are set correctly in `audio.transcribe`.
fix	Fix issue that could cause progress `0` to not be reported correctly in custom functions.
fix	Correctly preserve `tok2vec` and `transformer` in `train` with base config.
fix	Don’t auto-add examples already in dataset in `review` with `--auto-accept`.
fix	Use relative paths for fonts to support custom path prefixes.

This update includes small fixes to the feed logic for multi-user sessions and
custom progress handling.


fix	Simplify feed history tracking for multi-user sessions to prevent duplication.
fix	Correctly handle progress `0.0` returned by custom progress functions.
fix	Fix display of input content in `compare` recipe.
fix	Include `.prodigy-spans` CSS class in `ner_manual` UI.


fix	Fix issue that could cause stream to repeat batches of questions in some scenarios.

This release includes various small fixes to stream handling, data loading and
config generation.


new	Exclude previously accepted labels during binary annotation with `--exclusive` in `textcat.manual`.
fix	Fixing feed overlap bug that could cause session history to not be tracked correctly after reset.
fix	Ensure recipe’s `update` callback is executed when annotating without a dataset.
fix	Fix handling of `meta` column in CSV loader and add its value to a dict.
fix	Handle non-dictionary `"meta"` values correctly in `PatternMatcher`.
fix	Correctly set both `_session_id` and `_annotator_id` in controller.
fix	Don’t remove frozen scores used by non-frozen components during config generation in `train`.
fix	Ensure that pre-defined spans receive a `"score"` in `ner.teach`.

This update includes small fixes to the train curve plots and recipes that don’t
save data.


fyi	Add warning when using deprecated `--ner-missing` in `train` and `data-to-spacy`.
fix	Add workaround for using v2.x and v3.x of `plotext` in `train-curve` with `--show-plot`.
fix	Fix handling of `False` as dataset value returned by recipe.
fix	Fix handling of `"meta"` dict in `choice` options.
fix	Fix links in PyPi server `/index` endpoint.

This update includes a new workflow for correcting a trained span categorizer,
as well as various small fixes to the config generation, stream setup and UI.


new	`spans.correct` workflow for correcting a trained span categorizer.
new	Support `/index` endpoint in PyPi download server for package index to use in `requirements.txt`.
fix	Fix config generation in `train` if no logger is present in the config.
fix	Prevent error in stream counting for slower streams and don’t pre-count if `--update` is set.
fix	Fix issue that’d cause CSV reader to fail for `None` column headers.
fix	Ensure hashes are always added before task validation.
fix	Prevent labels from wrapping in `spans_manual`.

This release updates Prodigy to use the new
spaCy v3, which brings you lots of new exciting
features like end-to-end support for transformer-based pipelines, a training
config system for reproducible results, as well as new trainable components for
sentence segmentation and span categorization that you can create annotations
for with Prodigy. Thanks to the 300+ (!) nightly users who helped us test this
new release!

Prodigy v1.11 includes a bunch of new features, including a new installation
process via pip and new wheels for Python 3.9 and ARM architectures, a new
recipe and UI for annotating overlapping and nested spans, new recipes for
improving a sentence recognizer model, new training and data export recipes
that seamlessly integrate with spaCy’s config system and let you train multiple
components with different evaluation sets, support for updating the model in the
loop in ner.correct and a new textcat.correct to go along with it, improved
handling of binary annotations in ner.teach for better results, as well as new
customization options and settings.


new	Support for spaCy v3, including transformer-based pipelines, training configs, new trainable components and more.
new	Improved wheel installation: download the best-matching wheel via `pip` using your license key!
new	Pre-built wheels for Python 3.9 and ARM architectures.
new	New `train` command that supports training multiple components from different datasets and evaluation datasets (using the `eval:` prefix) and mixing manual and binary datasets.
new	New `data-to-spacy` command that generates all data you need: training and evaluation corpora in spaCy’s binary format, initialized label sets for faster training and optional config file.
new	New `train-curve` command with support for multiple components and visual plots.
new	`spans.manual` workflow and `spans_manual` UI for annotating any number of potentially overlapping and nested spans. Also see the span categorization docs for details.
new	Support for training spaCy’s new `SpanCategorizer` in `train` and `data-to-spacy`, based on annotations collected with `spans.manual`.
new	Update the model in the loop in `ner.correct` using the `--update` flag.
new	`textcat.correct` recipe for correcting and updating an existing text classifier.
new	`sent.teach` and `sent.correct` recipes for improving a sentence recognizer model.
new	Improved `ner.teach` workflow for more accurate results with spaCy v3. In addition to binary questions about entity suggestions, you’re now also asked questions about texts with no entities at all. If an example includes no highlighted suggestions, you can hit `accept` to confirm that it contains no entities, or `reject` if it contains entities.
new	`progress` command to calculate annotation progress over time.
new	The `-F` argument to provide a file path for custom recipe scripts now supports multiple, comma-separated paths and can also be used to load custom registered functions for spaCy configs. It works across all recipes, including the built-in workflows.
new	Add `--auto-accept` flag to `review` recipe to automatically accept annotations with no conflicts and add them to the database.
new	Support `"history_text"` property in task for customizing preview shown in sidebar history.
new	Allow optional `"meta"` and `"score"` in `choice` options, which will be displayed with the option.
new	Include the UNIX timestamp of when an annotation was answered in the UI as `"_timestamp"`.
new	Support counting finite and potentially filtered generator streams for better progress estimation via `"auto_count_stream": true`. Note that this setting should only be used for streams that are not dynamic and depend on outside state (e.g. an updated model in the loop).
new	Add `total_examples_target` for the total number of examples that should be annotated to reach a progress of 100%. Useful for infinite streams or if completion doesn’t map to stream size.
new	`PRODIGY_CONFIG` and `PRODIGY_CONFIG_OVERRIDES` environment variables to provide custom path to global config JSON and override individual config settings on the CLI.
new	Dark mode theme! Enable it by setting `"theme": "dark"` in your `prodigy.json`.
new	Support overriding color palettes used for labels via `"palettes"` in `"custom_theme"`.
new	Support `"style"` property on individual `"tokens"` in `ner_manual` and `spans_manual`.
new	Add more human-readable CSS class names and data attributes.
fyi	The `train` command now handles binary annotations out-of-the-box, so you won’t have to explicitly set `--binary` or `--ner-missing` anymore. Future annotations created with binary workflows like `ner.teach` will now also set `"_is_binary": true` explicitly in the data.
fyi	Per-label stats in the output of `train` can now be toggled via the `--label-stats` flag.
fyi	The `--textcat-exclusive` argument is not needed anymore in `train` and related workflows and has been removed. Instead, you can explicitly provide datasets via `--textcat` (exclusive categories) and `--textcat-multilabel` (non-exclusive categories).
fyi	The `--init-tok2vec` argument has been removed from `textcat.teach`. You can now pretrained embeddings directly via the spaCy pipeline you load in.
fyi	Examples are now accepted automatically in `textcat.manual` if when you select an option with mutually exclusive categories. You can override this by setting `"choice_auto_accept": false`..
fyi	The `force_stream_order` config setting is now deprecated and the default behavior of the feeds. Batches are now always sent and re-sent in the same order wherever possible.
fyi	Long deprecated old recipes and functions have been removed.
fix	Fix various issues and inconsistencies around stream handling and feed overlap when using named multi-user sessions with a single instance of Prodigy.
fix	Fix span selection in `relations` UI via tap on mobile devices.
fix	Correctly handle vectors for languages without uppercase/lowercase distinction in `terms.teach`.
fix	Fix issue that could cause next batch to be blocked when using `"instant_submit": true`.
fix	Fix issue in CSV loader that would handle title-cased `Label` columns incorrectly.
fix	Fix base64 conversion to be forwards-compatible.
fix	Ensure that `html_template` overrides are correctly interpreted in `blocks` UI.
fix	Deep-merge all config settings provided via global and local `prodigy.json`, recipe config and overrides to support changing only individual nested properties.
fix	Fix issue that could cause config keyword arguments to not be set correctly in `prodigy.serve`.
doc	Add span categorization usage docs and feature page.
doc	Update installation docs and add instructions for PyPi.
doc	Update various documentation references for spaCy v3 and recipe API changes.

This release includes various small fixes to the annotation interfaces, UI
customizability and progress reporting.


new	Fire `prodigyend` event if no more tasks are available.
new	Add `batch_size` argument to `PatternMatcher.__call__`.
new	Add human-readable `.prodigy-spans` class for containers in `ner`, `ner_manual` etc.
fix	Correctly recognize `relations` as manual UI in `review`.
fix	Fix issue with disabled pipes during `--binary` training.
fix	Handle trailing newlines and wrapping correctly in `relations` UI.
fix	Fix handling of label position for long labels in `relations` UI.
fix	Ensure total and session annotation counts are reflected correctly in controller passed to custom `progress` callback with `update` callback.
fix	Improve error handling when using `prodigy.serve` with non-servable functions.


fix	Ensure Prodigy works with the latest `pydantic`.

This release includes a few small fixes, including an update to prevent
Prodigy’s dependencies from pulling in a package incompatible with Python 3.6.


new	Add newlines for newline tokens in `relations` UI (if wrapping is enabled).
new	Support path-based routing and path-strip proxies by using relative paths in the web app.
fix	Prevent dependencies from pulling in newer `uvloop` version that’s incompatible with Python 3.6.
fix	Fix issue that could cause config overrides to not be reflected correctly when calling `prodigy.serve`.
fix	Fix bug in `dep.correct` that could cause the heads to not be sent to spaCy correctly.
fix	Prevent closed socket from killing the server by removing custom signal handlers.
fix	Use better pattern ID delimiter in pattern matcher to prevent conflicts with user-defined labels.

This release includes updates to the relations and review workflows, fixes for
feed overlap and multi-user session handling, new UI customization options and a
Portuguese UI translation, as well as various other small fixes and
improvements.


new	Support custom span label colors in `relations` UI.
new	Support `label_style` setting config instead of just `ner_manual_label_style` to indicate that it applies to `ner_manual`, `image_manual` and `relations` UI.
new	Add `--show-skipped` option to `review` interface to include answers that would otherwise be skipped, like ignored answers or rejected examples in manual interfaces.
new	Allow clicking on a version in `review` interface to update final annotation.
new	Add `swipe_gestures` config setting to customize left/right mapping.
new	Fire `prodigyspanselected` event in `image_manual` and `relations`.
new	Add UI translation for Portuguese. Thanks to Cristiana S Parada for the contribution!
fyi	The `feed_overlap` config setting now defaults to `false` and Prodigy will show a warning if an overlapping feed is used without named sessions.
fix	Use symbols for whitespace characters in `relations` UI.
fix	Fix issue that could cause head or child span of first token to not be represented correctly in `relations` interface.
fix	Fix issue that could cause span label changes to not be reflected correctly in `"relations"` meta generated by `relations` UI and make sure `"spans"` are always sorted by default.
fix	Set default batch size in calls to `nlp.pipe` in recipes.
fix	Ensure that custom database is correctly passed to `RepeatingFeed`.
fix	Fix issue that could cause excluded datasets to not be represented correctly in named sessions.
fix	Always sort files alphabetically in loaders that read from directories.
fix	Make `train` fail more gracefully if no data is available.
fix	Fix issue that could cause `honor_token_whitespace` to not be reflected correctly.
fix	Don’t explicitly set delimiter in CSV loader and let Python guess.
fix	Correctly interpret image spans without `"points"` in `image` UI.

This update includes small fixes to the stream state management and hashing and
improves handling of pre-defined tokens and spans in recipes and interfaces. It
also introduces new customization options for the manual image and audio UIs.


new	Support customizing the JSON key used to store image spans in `image_manual`. Can be used to combine the interface with other interfaces that use `"spans"`, e.g. `ner_manual`.
new	Expose `WaveSurfer` instance in `audio_manual` to allow implementing custom controls.
fix	Support pre-defined `"tokens"` in `rel.manual`.
fix	Ensure that custom values in pre-defined, non-edited `"spans"` are preserved in `ner_manual`.
fix	Fix issue that could cause `--exclude` datasets to not be considered with non-overlapping feeds.
fix	Add filter to prevent state conflicts of incoming answers in high-latency or low-CPU situations.
fix	Fix incorrect span offset issue in `print-dataset` and `print-stream`.
fix	Make label detection in NER annotation model consistent with spaCy if label contains hyphen.
fix	Fix hasing consistency in `review` and ensure existing binary answer doesn’t impact hash.

This update fixes a problem with the exclude logic by input hashes and adds
support for custom tokenization in the manual relation annotation workflow.


new	Allow pre-defined `"tokens"` in `rel.manual`.
fix	Fix issue that could cause `exclude_by": "input"` to re-send tasks with overlapping feeds.

This update includes small fixes to the exclude logic in repeating streams,
improvements to dependency parsing annotation and training, and updates to the
ASGI server. Check out the release notes for v1.10 for all the new
features introduced in the latest update!


fix	Fix issue that’d cause repeating feed to not honor `exclude_by`.
fix	Fix problem in `dep.correct` sentence segmentation that could lead Prodigy to incorrectly report misaligned tokenization.
fix	Correctly print dependency parser training results in `train`.
fix	Update to support latest version of `uvicorn`.
fix	Allow changing expected max length for MySQL via `PRODIGY_MYSQL_MAX_LEN` environment variable to prevent Prodigy from raising an error if the field type was changed to `mediumblob`.

This update includes small fixes for problems introduced in v1.10.0, as well as
improvements to the spaCy v2.3 integration, more customization for audio and
video transcription, and a new UI translation for French. Check out the
release notes for v1.10 for all the new features introduced in the
latest update!


new	Support customizing field ID used to store transcript in `audio.transcribe`.
new	Add UI translation for French. Thanks to Thierno Ibrahima DIOP for the contribution!
fix	Fix sentence segmentation issue that could cause `ner_manual` UI to crash.
fix	Make `terms.teach` compatible with spaCy v2.3 by pre-populating the lexeme cache.
fix	Only show pattern matches for provided labels in `ner.teach`.
fix	Fix issue in `ner_manual` on touch devices that would prevent selecting first token.
fix	Ensure overriding `"text": None` in `blocks` doesn’t cause errors in certain interfaces.

Our biggest release yet includes a bunch of new features, interfaces and recipes
for dependency and relation annotation,
audio and video annotation, as well as a new and improved
manual image annotation interface with support for
editing shapes and bounding boxes. We’ve also added new
recipe callbacks for modifying examples placed
in the database and validating answers at runtime, added more settings for
whitespace-handling in manual NER annotation, including a mode for
character-based highlighting,
and introduced various new config settings to customize the web app and
annotation interfaces. Thanks to everyone who’s helped us beta test the new
features – your feedback has helped a lot! See the changelog below for a full
list of new features.

Important notes

This release includes a few breaking changes, mostly small improvements to
inconsistent APIs, replacement of already deprecated behavior, or bug fixes that
now lead to different – but more correct – results. It also updates to
spaCy v2.3, which
means you’ll have to download new models and retrain your existing models.

The source argument of built-in recipes does not default to reading from
standard input anymore. If you want to
load from standard input, e.g. to pipe data
forward, set it to - explicitly. The deprecated --api arguments on the
built-in recipes have been removed and merged with --loader. The
progress function returned by custom
recipes now follows a consistent API and will always receive the
controller and the return value of the
update callback (if available) as arguments.


new	Flexible `relations` interface for fully manual dependency and relationship annotation and joint span and dependency relation annotation.
new	New recipes `rel.manual`, `coref.manual` and `dep.correct` for efficient manual and model-assisted dependency annotation.
new	`audio` and `audio_manual` interfaces binary and fully manual audio and video annotation. Add and modify segments for different labels and collect feedback about pre-highlighted regions.
new	`audio.manual` and `audio.transcribe` recipes for audio and video annotation and transcription, as well as community recipes for using Prodigy with pretrained `pyannote.audio` models for speaker diarization in the loop.
new	New and improved `image_manual` interface with support for moving and resizing shapes, adjusting polygons, freehand annotation, more detailed data format and more settings.
new	Support `dataset:{name}` and `dataset:{name}:{answer}` syntax as `source` argument in recipes to allow loading from existing datasets. For example, `dataset:my_set` will use examples dataset `my_set` as the input data and `dataset:my_set:accept` will only load in accepted answers.
new	Add `validate_answer` recipe component to perform custom validation of annotations created in the UI and prevent invalid answers from being submitted.
new	Allow recipes to return a `before_db` callback for modifying examples before they’re placed in the database, e.g. to strip out base64 data.
new	Update Prodigy for the latest spaCy v2.3 and new models.
new	Support multi-arc dependency annotations (e.g. created with `dep.correct`) in `train`.
new	Set information about trailing whitespace in `add_tokens` and reflect whitespace (or lack of whitespace) between tokens in `ner_manual` (can be changed using the `"honor_token_whitespace"` setting).
new	Add `--highlight-chars` flag to `ner.manual` and `use_chars` argument to `add_tokens` to allow highlighting individual characters instead of full tokens.
new	Add `"field_suggestions"` property to `text_input` UI to allow specifying a list of auto-suggestions to show when the user types or presses `↓`.
new	Allow disabling and reordering of the `accept`, `reject`, `ignore` and `undo` buttons at the bottom of the screen via the `"buttons"` config setting.
new	Add options `--width` (card with and maximum image width) and `--remove-base64` (remove base64-encoded image data) to `image.manual`.
new	Add `file_ext` argument to `Images` and `ImageServer` loaders, always preserve original local file path as `"path"` and add `Audio`, `AudioServer`, `Video` and `VideoServer` loaders.
new	Expose generic `Base64` and `Server` helpers to load any data as a base64-string or via a web server and add generic `fetch_media` preprocessor.
new	Add `--rehash` flag to `db-merge` to force-overwrite hashes.
new	Add `--base-model` argument to `data-to-spacy` to customize tokenizer and sentencizer.
new	Allow individual tasks to override global or UI config via a key `"config"`.
new	Add `"ui_lang"` config and translations of descriptions, messages and tooltips in the annotation UI to German, Spanish, Dutch and Chinese.
new	Make sidebar history length default to `batch_size` and allow customizing it via the `history_size` config setting. Note that the history size can’t be larger than the batch size.
new	Show recipe name in project info in sidebar and allow customizing info via `"project_info"` config.
new	Add `Controller` methods and attributes for retrieving total counts and progress by session ID.
new	Warn if global or local `prodigy.json` settings override potentially critical recipe components.
new	Support custom label colors manual interfaces and automatically pick contrasting text color.
new	Show keyboard shortcuts for toolbar buttons on hover.
new	Show friendlier error if `prodigy.json` contains invalid JSON.
fix	Make `progress` function returned by recipes consistent and always pass it the controller and the return value of the `update` callback, if available.
fix	Correctly report per-session progress for streams with a length and multi-user sessions and take `feed_overlap` into account.
fix	Improve support for using `"force_stream_order": True` (repeating feed) with `"feed_overlap": False` (no overlap between sessions).
fix	Make all manual recipes default to `"force_stream_order": True` for more intuitive stream behavior: batches of tasks are now always re-sent until they’re answered and refreshing the page will show the same batch again.
fix	Fix issue that could cause the `review` to not displayed changes when user hits `undo`.
fix	Preserve `"choice_style"` config setting on tasks so it can be re-applied when running `review`.
fix	Support simpler data format in `diff` interface to make it work combined with `choice` in a `blocks` interface and prevent clash of `"accept"` property used by both UIs.
fix	Adjust display of spans with RTL text when `"writing_mode": "rtl"` is enabled.
fyi	The deprecated `--api` recipe argument has been removed and merged with `--loader`.
fyi	`textcat.manual` now doesn’t perform additional checks for the pre-v1.9 syntax with an (unused) spaCy model argument anymore.
fyi	Loading from standard input now requires the `source` argument to be set to `-` explicitly.
fyi	The `"show_stats"` setting to display detailed stats in the sidebar is now set to `true` by default.
fyi	The `"spans"` data created with `image_manual` now also include a `"type"` (either `"rect"`, `"polygon"` or `"freehand"`), as well as `"width"`, `"height"`, `"x"`, `"y"` and `"center"` values for rects.
fyi	Forced stream order and repeating batches by default means that you should use named sessions or set `"force_stream_order": False` if you want multiple users connecting to the same instance. Otherwise, you may get duplicate questions.
doc	Add documentation and feature page for dependency relation annotation.
doc	Add documentation and feature page for audio and speech annotation.
doc	Update feature pages for named entity recognition and computer vision.
doc	Add docs section on efficient NER annotation for fine-tuning transformers like BERT.
doc	Add docs section on recipe callback functions in detail.
doc	Document `b64_to_bytes`, `file_to_b64` and `bytes_to_b64` utilities for converting base64.
doc	Tidy up global config docs and move settings specific to a single interface to interface docs.
doc	Fix various typos and inconsistencies.

This patch release includes small fixes to the force_stream_order setting to
prevent a race condition and duplicate examples. Stay tuned for v1.10, which is
coming soon and will include lots of cool new features!


new	Add `Controller.all_session_ids`, all named sessions that have connected to the current instance.
fix	Fix race condition that could cause `force_stream_order` to produce duplicate tasks.
fix	Correctly exclude currently shown task when requesting new questions with `force_stream_order`.

This release includes an important fix for a training regression introduced in
the previous version, as well as small improvements.


fix	Fix issue that’d cause rejected binary text classification annotations to be filtered out in `train`.
fix	Improve handling of rejected and ignored examples across different annotation types in `train`.
fix	Relax unnecessarily strict validation for `diff` tasks.

This release includes a new built-in recipe match for selecting examples
based on pattern matches, as well as various bug fixes and improvements.


new	General-purpose `match` recipe to only match patterns in text with various configurations.
fix	Use custom `--view-id` set in `review` to determine how to merge examples.
fix	Improve default configuration in `train` for NER models with `--init-tok2vec`.
fix	Fix filtering that could cause incorrect totals to be reported before training.
fix	Fix async handling of built-in and user-provided databases.
fix	Fix hashing of patterns that’d cause incorrect line numbers to be displayed.
fix	Fix compiler setting that’d cause `print-stream` and `print-dataset` to not output colored results.
fix	Check for correct view ID when printing text classification datasets with `print-dataset`.
fix	Correctly pass `--eval-split` to `data-to-spacy`.
fix	Show correct path to Prodigy installation root (not recipe root) in `stats` command.
fix	Fix issue that could cause span rendering problems in `ner` if text contains emoji.
fix	Fix UI issue that’d cause card headings to overlay expanded sidebar on small screens.
doc	Fix various typos and links.

This release includes small fixes and improvements to the built-in recipes and
interfaces.


new	Add `overwrite` flag to `add_tokens` preprocessor to overwrite existing `"tokens"`.
new	Allow `review` recipe to overwrite view ID (e.g. to render `blocks` annotations differently).
fix	Accept pre-set tokens correctly in `add_tokens` to make it easier to provide custom tokenization.
fix	Improve backwards-compatibility checks of arguments in `textcat.manual`.
fix	Correctly report numbers of `textcat` examples in `train` and filter out ignored answers instead of just ignoring the examples during training and evaluation.
fix	Fix handling of integer option `"id"` values in `print-dataset`.
fix	Fix issue that’d cause `text_input` value to not reset and auto-focus correctly between tasks.
fix	Fix incorrect validation errors for `dep` UI and `"card_css"` setting.
fix	Set more explicit MIME types for JS bundle for server configs that prevent MIME type sniffing.
fix	Adjust `eighties` theme to prevent dark text on dark background in `choice` options.

This release includes small fixes related to async database usage in Python 3.7+
and text classification training with the new train recipe.


fix	Fix issue with async database usage in Python 3.7+ that could cause MySQL connection errors.
fix	Ensure `--textcat-exclusive` setting is passed down correctly in `train`.

This release includes small fixes related to multiprocessing and new features
introduced in v1.9.0.


fix	Make `ner.manual` with `--patterns` correctly return all examples instead of only the matches.
fix	Fix error when loading evaluation examples in new `train` recipe with `--binary` enabled.
fix	Fix issue in `train` with `--binary` when restoring pipeline component before saving the model.
fix	Fix Foreign Key constraint error that could occur in `Database.drop_examples`.
fix	Add thread locking to database reconnect methods in controller.
fix	Fix accuracy output for tagger and parser in `train`.

This release includes small fixes, a new option for changing keyboard shortcuts
for labels and multiple choice options, and a new loader for serving images.

This release includes small fixes to the new interfaces introduced in v1.9.0.

This release includes small fixes to bugs introduced in v1.9.0.


fix	Fix error in `PatternMatcher` when assigning combined matches to tasks with no `"meta"`.
fix	Fix too strict validation for `html` tasks with no `"html"` key but `"html_template"`.

This release includes small fixes to bugs introduced in v1.9.0.

This release introduces tons of new features and improvements, including new
recipes, interfaces and workflows. We also redesigned the website, rewrote the
documentation from scratch and added lots of new pages, usage guides, demos and
examples. We hope you like it! Some highlights in Prodigy v1.9 include new
unified training recipes, two new
annotation interfaces for free-form text input and
combining different UIs, config settings for making streams repeat questions
until they’re answered, and changing keyboard shortcuts, official support for
spaCy v2.2 and a
new recipe for converting Prodigy annotations of
different types to a single training corpus in spaCy’s JSON format. See the
changelog below for a full list of new features.


new	Add new general-purpose `train` and `train-curve` recipes to replace the task-specific training recipes and make overall training process more consistent.
new	Show accuracy per entity type, tag or text category in training results.
new	Add `data-to-spacy` recipe that takes Prodigy datasets for NER, text classification, tagging and parsing and outputs a merged corpus (optionally split into training and evaluation data) in spaCy’s JSON format that you can use with `spacy train`.
new	Add `--patterns` argument to `ner.manual` to pre-highlight suggestions from patterns. This workflow is going to replace the binary `ner.match`.
new	Add general-purpose `print-stream` and `print-dataset` recipes that can output different data types. Those recipes are going to replace the more specific print utilities like `ner.print-stream`.
new	Add `blocks` interface to freely combine annotation interfaces.
new	Add `text_input` interface to collect free-form text input from annotators.
new	New `"force_stream_order"` config setting. If `True`, tasks will always be sent out in the same order and re-sent until they’re answered – even if you refresh the app in your browser.
new	Support customizing keyboard shortcuts.
new	Support tokenizing terms in `terms.to-patterns` to create patterns for multi-token terms.
new	Add `"exclude_by"` config setting to allow recipes to specify whether to filter by input hash or task hash so that manual recipes don’t repeat the same content with different suggestions.
new	Support `blank:{lang}`, e.g. `blank:en` as an alternative spaCy model in `ner.manual`, `textcat.teach` and `train` to start off with a blank model.
new	Pass `--label` values added to `mark` to the `"labels"` config so the recipe can be used with manual interfaces like `ner_manual` and `image_manual`.
new	Add `--no-fetch` flag to `image.manual` to disable base64 conversion of images.
new	Add `--fetch-media` flag to `review` recipe to temporarily replace paths with base64 data.
new	Also support `-` as the value of `source` arguments to read from standard input and make this the recommended best practice (instead of omitting the argument).
new	Always auto-create datasets and deprecate `dataset` command.
new	Make `compare` and `ner.eval-ab` recipes use the more flexible `choice` interface and deprecate the `compare` UI.
new	Add more human-readable class names to use in custom CSS and JS.
new	Support new syntax in `prodigy.serve` that lets you pass in the full command-line command to start Prodigy from within Python.
new	Add data validation for `prodigy.json` / recipe config, recipe components and training examples.
new	Make the printed output and messages prettier and more consistent.
new	Make FastAPI the default REST API library and include interactive API docs.
new	Drop support for Python 3.5 and make wheel installers support Python 3.8.
new	Update Prodigy for spaCy v2.2.
fix	Show an example suggested by patterns in `textcat.teach` only once with all matches instead of once per match.
fix	Fix error that’d occur when passing in long label sets on the command line (due to Prodigy checking if it’s a valid file path).
fix	Remove unused `spacy_model` argument from `textcat.manual`.
fix	Exclude by input hash instead of task hash in `ner.manual`, `ner.correct`, `pos.correct`, `textcat.manual` and `image.manual`, using the new `"exclude_by"` setting. Examples will only be shown again if their content is identical, not if they include different highlighted suggestions.
fix	Fix handling of newline tokens in `ner.manual` for multiple newline character and adjust style of `↵` symbols. Newline-only tokens are now unselectable by default to prevent creating newline token entities. You can set `"allow_newline_highlight": true` to change this.
fix	Show error if MySQL database is used and JSON blob saved to the database is longer than 65535 characters, to prevent MySQL DB from truncating example.
fyi	Rename `ner.make-gold` and `pos.make-gold` to `ner.correct` and `pos.correct`. The old names are still supported so your code won’t break.
fyi	Deprecate various outdated recipes, the built-in live APIs and the `recipe_args` dict. You can still use all of these features and your code shouldn’t break but they’ll be removed in v2.
fyi	Refactor the whole code base and module organization and various other internals, and added simple type annotations to recipe functions.
doc	New documentation and website redesigned and rewritten completely from scratch, with tons of new content, demos and usage examples. The new site also replaces the `PRODIGY_README.html` that used to be available for download with Prodigy.
doc	Update `prodigy-recipes` repo.

This update includes a fix for a regression introduced in v1.8.4, as well as
small improvements to the dataset creation and stream handling.


new	Warn after exhausting streams with many duplicates.
fix	Fix issue introduced in v1.8.4 that could cause the client to send back empty answers if users annotated very quickly.
fix	Remove `default` session from client and correctly populate session datasets.

This update includes various small fixes to the interfaces and recipes.


new	Experimental: Allow moving selected bounding boxes in `image_manual` interface via keyboard shortcuts `←` `→` `↑` `↓`.
new	Add `prodigyundo` event for custom JavaScript.
fix	Fix issue that’d cause label change in `image_manual` to not be reflected correctly.
fix	Disable unselecting of radio button if `choice_auto_accept` is enabled.
fix	Always prefer rending `"html"` in `classification` interface, if available.
fix	Improve handling of `choice` tasks in `review` recipe.
fix	Re-add default spacing for most common HTML elements in `html` interface.
fix	Ensure `bin/prodigy` and `bin/pgy` are interpreted as shell scripts.
fix	Make `textcat.manual` correctly support single-label use cases.
fix	Fix handling of pre-defined spans in `EntityRecognizer`.
fix	Fix detection of user databases via entry points.
fix	Fix race condition that’d fire `prodigyanswer` event incorrectly.
fix	Prevent card labels from being displayed on top of modals.
fix	Improve fallback if labels are provided to the app in incorrect format.
fix	Fix handling of related sessions in feeds if `"feed_overlap"` is enabled.

This update includes fixes to textcat.batch-train, the NER preprocessing
logic and Prodigy’s dependencies.


fix	Fix issue in `textcat.batch-train` that wouldn’t pass `exclusive` setting to the model and converter functions correctly.
fix	Fix handling of multiple choice data in `textcat.batch-train`.
fix	Fix segmentation bug that caused spans ending on text boundaries to be dropped.
fix	Make sure span is fully excluded if `skip=True` is set in `add_tokens` preprocessor.
fix	Add `srsly` to direct dependencies and pin to latest version.
doc	Fix typos and inconsistencies.

This update includes small fixes to the terms.teach and review
recipes, as well as improvements to the pretraining support.


fix	Fix issue in `review` recipe that’d raise error if no versions were generated.
fix	Make `terms.teach` skip vocab entries with no vectors to prevent unnecessary warnings.
fix	Fix serialization issue of sentencizer in `textcat.batch-train`.
fix	Ensure hyperparameters from pretraining are passed to `textcat.batch-train`.

This update includes small fixes to the text classification workflows.

This release updates Prodigy for the brand new
spaCy v2.1, which features BERT-style
language model pretraining,
an
extended match pattern API
and faster tokenization. We’ve also added support for basic authentication and
several completely new built-in recipes and workflows for reviewing annotations
from multiple sessions and resolving conflicts, manual multiple-choice text
classification, and merging two or more existing datasets.


new	Update Prodigy for spaCy v2.1.
new	Add language model pretraining support via `--init-tok2vec` in training recipes.
new	New interface and `review` recipe for reviewing and reconciling annotations from multiple sessions on the same data. View conflicting annotations, resolve them in the UI and create a final training set.
new	Add `textcat.manual` recipe to annotate text categories using the `choice` UI.
new	Make `textcat.batch-train` accept annotations in `choice` format.
new	Add `--exclusive` flag to `textcat.batch-train` to train mutually exclusive categories.
new	Add `ner.silver-to-gold` recipe to convert binary accept/reject annotations to gold-standard data with no missing values.
new	Add `db-merge` recipe to merge two or more datasets into a new set.
new	Add basic authentication to the app with `PRODIGY_BASIC_AUTH_USER` and `PRODIGY_BASIC_AUTH_PASS` env variables.
new	Add `PRODIGY_ALLOWED_SESSIONS` env variable to specify allowed named sessions.
new	Store `"_session_id"` and `"_view_id"` with annotations.
new	New REST API powered by FastAPI. Set the `PRODIGY_FASTAPI` environment variable and install `fastapi` (Python 3.6+) to try it out.
fix	Fix issue in `image_manual` UI that’d cause boxes to not be deleted correctly.
fix	Make sure flag button isn’t covered by title in annotation UI.
fix	Use named logger `"prodigy"` to allow customizing logging behavior.
fix	Allow `textcat.eval` recipe to read from `stdin` as expected.
fix	Prevent incorrectly raised `KeyError` in `split_sentences` preprocessor.
fix	Raise error if `Database.add_examples` doesn’t receive list/tuple of dataset names.
fix	Make sure `choice` interface adds `"accept": []` if no selection is made.
fix	If `instant_submit` is enabled, send answer before requesting new questions.
fix	Prevent keyboard in custom `<input>` and `<textarea>` elements.
fix	Preserve docstrings of compiled Cython classes, methods and functions.
doc	Improve various typos and inconsistencies and add new sections for new features.

This update includes a small fix to the "instant_submit" feature introduced in
the previous release.


fix	Fix issue that could cause tasks to not receive an `"answer"` when `"instant_submit"` was enabled.

This update makes it easy to set up named multi-user sessions in a single
instance. It also introduces a new setting for instant submissions and support for custom
CSS and JavaScript across all interfaces. Of course, we also fixed various bugs
and inconsistencies to make sure Prodigy runs as smoothly as possible.

By the way, if you want to add 12 months of updates to your license, you can now
do so via our online shop!


new	Add `"instant_submit"` option to send back a task instantly after it’s answered in the app, skipping the history and immediately triggering the update callback if available.
new	Support custom named sessions via query parameters in the app to enable multi-user workflows in single instances. For example, accessing the app with `/?session=alex` will add all annotations to a session dataset `dataset-alex`. The boolean `"feed_overlap"` setting lets you control whether to have each example sent out once so it’s annotated by someone or whether to allow overlaps and send out each example to everyone (default).
new	Add `"global_css"` option across all interfaces, more human-readable class names and expose `data-prodigy-view-id` and `data-prodigy-recipe` for custom interface or recipe-specific styling.
new	Add `"javascript"` option across all interfaces and fire custom events on mount, update and answer.
new	Add `--batch-size` option to `drop` command to prevent database errors when deleting large datasets.
fix	Make labels in `pos.teach` and `pos.make_gold` correctly default to built-in label scheme and raise error if no fine-grained labels are provided.
fix	Make sure `PatternMatcher` only shows matches for recipe labels.
fix	Fix bug that would cause `add_tokens` preprocessor to raise an error.
fix	Correctly handle `min_length` in `split_sentences` preprocessor.
fix	Fix bug that’d cause text classification tasks to not be deep copied correctly.
fix	Raise error if `terms.to-patterns` is used without label to prevent `null` value.
fix	Fix problem that’d cause dependency arcs to be rendered incorrectly.
fix	Improve relative sizing of bounding boxes and labels for large images.
fix	Ensure task can only be flagged via keyboard shortcut if `"show_flag"` is enabled.
fix	Drop third-party dependency `mmh3` that was causing problems for some users.
fix	Make manual NER interface more touch-friendly.
doc	New video: FAQ #1: Tips & tricks for NLP, annotation and training.
doc	Improve various typos and inconsistencies and add new sections for new features.


fix	Fix `split_sentences` pre-processor for untokenized examples.

This update takes advantage of pre-built binary wheels for our dependencies and
speeds up the installation by up to 10 times! We’ve also added official support
for Python 3.7, made excluding the current dataset the default behavior, fixed
issues related to patterns, text classification and NER training and improved
some internals to get Prodigy ready for multi-user workflows.


new	Add official support and wheels for Python 3.7.
new	Use spaCy v2.0.16 to take advantage of pre-built wheels and allow up to 10 times faster installation.
new	Automatically exclude examples already present in the current dataset (e.g. make `--exclude dataset` the default behavior). To disable this feature, you can set `"auto_exclude_current": false` in your `prodigy.json` or recipe config.
new	Add `--loader` argument to `image.manual`.
new	Make annotation card header sticky for long content.
new	Improve internal handling of sessions and streams to get Prodigy ready for better multi-user workflows.
fix	Fix prior in `PatternMatcher` to prevent matches from being excluded by sorter.
fix	Ensure spans and tokens are correctly updated in `split_sentences` preprocessor.
fix	Improve `textcat.eval` recipe and make sure labels are added automatically.
fix	Fix issue that would require refreshing the app when using the manual interface with a low batch size.
fix	Make sure dataset links are removed when dropping a dataset via `drop`.
fix	Fix memory leak in NER training that could cause segmentation fault for large datasets.
fix	Fix issue in `TextClassifier`, where active learning didn’t resume from weights.
fix	Exclude `"model"` key from hashes so that identical predictions by different models receive the same hash.
fix	Add server middleware to prevent response caching in IE 11.
fix	Improve NER model loading with large vectors.
doc	Fix various typos and inconsistencies.

This update includes several bug fixes and stability improvements related to the
new part-of-speech tagging recipes and the built-in pattern matcher model, as
well as a better identification system for match patterns.


new	Add `--resume` argument to `ner.match` to update matcher from dataset.
new	Use hashes as pattern IDs to allow updating existing matchers even if pattern files change across sessions.
fix	Make `pos.teach` and `pos.batch-train` work as expected with both fine-grained and coarse-grained part-of-speech tags.
fix	Fix bug in `ner.iob-to-gold` that’d cause export to fail.
fix	Small improvements to UI and web app stability.

This update includes new recipes for part-of-speech tagging, an experimental
release of the new manual image labeling interface and a new mechanism for
adding custom loaders, database connectors and recipes via Python entry points.
We’ve also added validation for incoming streams and detailed error messages for
incorrect task formats, enhanced the training options for sparse and
gold-standard named entity data, and improved handling of newlines and
formatting tokens in the manual NER interface.


new	New recipes for part-of-speech tagging: `pos.teach`, `pos.batch-train` and `pos.train-curve`.
new	Experimental: Manual image annotation interface and `image.manual` recipe.
new	Add annotation task validation. Before the Prodigy server starts, your stream is checked against a schema to make sure it has the correct format. If not, Prodigy tells you what the problem is.
new	Allow adding custom recipes, databases and loaders via entry points.
new	Add `--no-missing` flag to `ner.batch-train` to assume all correct spans are in the gold annotation, and any spans not in the gold annotation are incorrect. This is especially useful when training from annotations collected with `ner.manual` or `ner.make-gold`.
new	Add `--resume` argument to `terms.teach` to update target vector from dataset.
new	Add “true” newlines to newline tokens `↵` manual interfaces. The behavior can be turned off by setting `"hide_true_newline_tokens": true`.
new	Allow marking tokens as `"disabled": true` in manual interfaces. Disabled tokens can’t be highlighted and can be used to assist annotators with formatting.
new	Converter recipe `ner.iob-to-gold` to convert IOB tags to Prodigy’s JSONL.
fix	Disable and restore other pipeline components in `batch-train` recipes.
fix	Ensure seed terms are added to the dataset correctly.
fix	Fix bug that would cause web app to fail with annotation instructions.
fix	Make keyboard shortcuts in `choice` interface work as expected again.
fix	Add missing import and make `image.test` work out-of-the-box again.
doc	Add sections on Python entry points and document new recipes and interfaces.
doc	Fix various typos and inconsistencies.

This update includes various bug fixes and efficiency improvements.


new	Allow custom HTML in `classification` interface.
new	Allow pre-defined selections in `choice` interface, e.g. `"accept": [1, 3]`.
fix	Improve memory usage of `terms.teach`.
fix	Fix data integrity error when dropping datasets using MySQL.
fix	Fix bug in error message of custom recipe validation introduced in `v1.4.1`.
fix	Resolve problem with image preloading in `image` interfaces.
fix	Make keyboard shortcuts work as expected in `choice` interface.
doc	Fix various typos and inconsistencies.

This update improves efficiency of the ner.batch-train recipe and fixes
the handling of task and input hashes in the database methods and --exclude
option. It also comes with various improvements to error messages and web app
stability.


new	Improve efficiency of `ner.batch-train` – up to 10× faster for some workloads!
fix	Fix problem that would cause text classification tasks created from pattern matches to not have a label assigned to the task.
fix	Ensure that `--exclude` logic is always applied after the stream is (re)hashed.
fix	Fix bug that would cause hashes to not be returned correctly by the database.
fix	Allow the `"instructions"` setting to be `false` or `null`.
fix	Improve error messages if recipe file is not valid and if dataset doesn’t exist in `terms.to-patterns`.
fix	Various improvements to UI and web app stability.

This update includes a new
annotation interface for relations and
dependencies, as well as an experimental dep.teach recipe.

textcat.teach now takes a file of match patterns instead of seed terms, and manual interfaces
now support lists of up to 30 labels with keyboard shortcuts. We’ve also improved the customization
of various components.


new	Dependency and relation annotation interface and recipes `dep.teach`, `dep.batch-train` and `dep.train-curve` recipes for training a dependency parsing model. Still experimental!
new	Allow using `textcat.teach` with a patterns file instead of seed terms.
new	Support list view and keyboard shortcuts for larger label sets in manual interfaces.
new	Add option to display modal with annotation instructions.
new	Allow skipping examples with mismatched tokenization in `add_tokens`.
new	Make swipe gestures optional via `"swipe": true`.
new	Allow overwriting the host and port via `PRODIGY_HOST` and `PRODIGY_PORT` environment variables.
new	Add `split_sents_threshold` config setting and `--unsegmented` command-line option to disable sentence segmentation.
new	Update NewsAPI loader to use v2.
fix	Prevent MySQL server from timing out between requests.
fix	Correctly port over spans in `split_sentences` preprocessor.
fix	Always add labels from examples and `--labels` in `ner.batch-train` and consistently allow loading label sets from a string or a text file.
fix	Fix issue that caused print recipes to not display colors when piped to `less`.
fix	Ensure that pre-set task meta isn’t overwritten in the `PatternMatcher`.
fix	Show error message in the web app if `view_id` is invalid.
doc	Add live demo for new `dep` interface.
doc	Add Prodigy Cookbook with quick solutions to various tasks.
doc	Add glossary to “First Steps” workflow.
doc	Order recipes in `PRODIGY_README.html` table of contents by type.

v1.3.0 2018-02-01

This update introduces a new ner.make-gold recipe that lets you create
gold-standard data faster by manually correcting a model’s predictions. We’ve
also added a new pos.make-gold recipe for
annotating part-of-speech tags, as
well as converters to create spaCy training data from Prodigy datasets.


new	Improved `ner.make-gold` workflow: run a model over your text and manually correct the entities to create gold-standard data.
new	Add `"ner_manual_label_style"` option to display label set as list of dropdown (always uses dropdown for more than 10 labels) and add number keyboard shortcuts to list of labels.
new	Experimental `pos.make-gold` recipe for manual POS annotation.
new	Experimental `ner.gold-to-spacy` and `pos.gold-to-spacy` converters.
new	Add option for custom label color schemes for NER and POS tagging.
new	Add UI option to “flag” tasks to bookmark them for later via `"show_flag"` setting and a flag icon and `f` keyboard shortcut. Add `--flagged-only` setting to `db-out` command.
new	Rename `split_tokens` pre-processor to `add_tokens`.
fix	Fix rendering and use icons for whitespace tokens in `ner_manual`.
fix	Fix rendering of RTL languages in manual interfaces via `"writing_dir"` setting.
fix	Overwrite database settings correctly when using `connect()`.
fix	Fix bug in logging timestamp and log minutes correctly.
fix	Only use colored CLI output if supported by user’s terminal.
fix	Don’t disable entity recognizer in `textcat.batch-train`.
doc	Document preprocessor components and models’ `batch_train` methods.
doc	Fix various typos and add more examples.
doc	Add docstrings to internals so they can be inspected using `help()`.

v1.2.0 2018-01-09

This update introduces ner.manual, a new recipe and interface for
manual NER annotation. You can now
highlight one or more text spans per task and select the entity label from a
dropdown menu. To allow faster annotation and less fiddly clicking, token
boundaries are used to determine the entity spans when highlighting them. Note
that this workflow replaces ner.mark and boundaries.


new	`ner.manual` recipe and interface for manual NER annotation.
new	`"card_css"` option to inject custom CSS into annotation card.
new	Experimental `"show_whitespace"` for basic `ner` interface.
fix	Make `--exclude` argument and recipe option work as expected.
fix	Don’t merge and modify NER spans before adding example to the database.
doc	Document API of `PatternMatcher` model.
doc	Improve formatting of available recipes in `prodigy --help`.
doc	Fix various typos and inconsistencies.

v1.1.0 2017-12-18


new	Automatically add new entity labels in `ner.batch-train`.
new	Improve speed during NER training and allow setting the beam width via CLI.
new	Filter out ignored examples before creating training and evaluation sets.
new	Re-add improved version of `ner.eval` recipe.
new	Handle broken JSONL in Reddit loader.
new	Use spaCy model to assign labels in `ner.print-stream`.
doc	Small improvements to documentation.

Source link
lol

Changelog · Prodigy · An annotation tool for AI, Machine Learning & NLP

v1.3.0 2018-02-01

v1.2.0 2018-01-09

v1.1.0 2017-12-18

By stp2y

Leave a Reply Cancel reply