SAVOR Corpus savor-corpus@1.0.3
A bilingual Romanian–Italian corpus of 3,300 traditional and contemporary recipes, structured against schema.org/Recipe and FOODon, embedded with LaBSE, and exposed via REST, SPARQL, and OAI-PMH. FAIR, reproducible, version-pinned.
01Overview
The SAVOR corpus packages 3,304 recipes from three sources — interwar Romanian manuscripts, contemporary Romanian student recipes, and Pellegrino Artusi's 1891 cookbook — into a single SAVOR-JSON schema (extending schema.org/Recipe). Each record carries multilingual titles, structured ingredients linked to FOODon, regional and temporal metadata via GeoNames and PeriodO, FAO-derived nutritional estimates, and full provenance to the source manuscript or born-digital file.
Distribution is split. Pre-1925 records (n = 791) are released CC0. Post-1925 manuscript recipes and contemporary student records (n = 2,513) are released CC BY-SA 4.0 with documented consent. Multimedia attachments (~150 records) ship under their own per-asset licences via IIIF endpoints.
02SAVOR-JSON schema
SAVOR-JSON extends schema.org/Recipe with culinary-heritage-specific fields. The metadata spine is Qualified Dublin Core for Cloud interoperability; recipe detail is carried in schema.org and FOODon. Every record validates against the JSON Schema below before ingestion.
Fields reference
| Field | Type | Req. | Description |
|---|---|---|---|
| @type | string | required | Always "Recipe" at top level. |
| recipeId | URI | required | Persistent identifier in the sav: namespace, e.g. sav:ro/cluj/1934/placinta-mere. |
| name | { ro, it, en? } | required | Multilingual title. ro and it mandatory; en optional gloss. |
| description | { ro, it, en? } | optional | Standfirst paragraph from the source. |
| ingredients[] | FoodEntity[] | required | Each entry carries food (FOODon URI), qty, unit, and optional note. |
| recipeInstructions[] | Step[] | required | Ordered list of structured steps; each step links semantic actions to foodon:culinary-process. |
| region | GeoNames URI | required | Primary geographical origin. |
| period | PeriodO URI | required | Temporal coverage. Predominant periods: interwar RO, late-XIX IT. |
| festive[] | Wikidata URI[] | optional | Religious or seasonal occasions. |
| diet[] | enum[] | optional | vegan, vegetarian, de-post, kosher, … |
| nutrition | NutritionInformation | optional | FAO-derived per-serving estimate. Validated by USAMV. |
| provenance | string | required | Source manuscript shelfmark or born-digital row reference. |
| license | SPDX | required | CC0-1.0 or CC-BY-SA-4.0. |
| embedding | float[768] | computed | LaBSE bilingual embedding. Computed at build time; not editable. |
03Example record
// /recipes/ro/cluj/1934/placinta-varza-ciuperci.json { "@context": "https://savor.eu/context.jsonld", "@type": "Recipe", "recipeId": "sav:ro/cluj/1934/placinta-varza-ciuperci", "name": { "ro": "Plăcintă cu varză și ciuperci", "it": "Sfoglia ripiena di cavolo e funghi", "en": "Cabbage and mushroom pie" }, "region": "geonames:665087", // Cluj-Napoca "period": "periodo:p0qhvzd", // RO interwar (1918–1944) "festive": ["wd:Q3406641"], // Christmas Eve (Ajun) "diet": ["vegan", "de-post"], "yield": 8, "timeMinutes": 160, "ingredients": [ { "food": "foodon:00002569", "qty": 1000, "unit": "g", "label": "varză albă" }, { "food": "foodon:03411184", "qty": 200, "unit": "g", "label": "ciuperci de pădure" } ], "nutrition": { "calories": 218, "protein_g": 5.4, "source": "FAO-INFOODS" }, "provenance": "AR-CJ ms.b3.f47r", "license": "CC-BY-SA-4.0", "embedding": "<float[768], LaBSE-multilingual>" }
04REST API
The REST surface is generated from the same SAVOR-JSON schema. All endpoints are read-only, return JSON-LD, support content negotiation for Markdown and Turtle, and are CORS-enabled for browser clients. Rate limit: 100 req/min unauthenticated; 1000 req/min with an API key.
?region=&period=&diet=&festive=&lang=&q=
?k=10&cross_lingual=true
05SPARQL endpoint
Federated SPARQL is exposed at https://savor.eu/sparql. The endpoint co-resolves with FOODon, GeoNames, and Wikidata via SERVICE clauses. Below: a query for vegan, fasting-period recipes from Transylvania mentioning brined cabbage.
# Vegan recipes from Transylvania with fermented cabbage PREFIX sav: <https://savor.eu/ns/> PREFIX fdn: <http://purl.obolibrary.org/obo/foodon_> PREFIX schema: <https://schema.org/> SELECT ?id ?title_ro ?title_it ?period WHERE { ?r a schema:Recipe ; sav:recipeId ?id ; sav:diet "vegan" ; sav:region/geonames:parentFeature* <https://sws.geonames.org/665087/> ; sav:hasIngredient/sav:food fdn:00003356 ; # fermented cabbage sav:period ?period ; schema:name [ sav:lang "ro" ; sav:value ?title_ro ] , [ sav:lang "it" ; sav:value ?title_it ] . } ORDER BY ?period LIMIT 50
06Embeddings & FAISS index
Every record carries a 768-dim LaBSE embedding computed over the concatenation of name.ro, name.it, description, and the FOODon-resolved ingredient labels. The full index is shipped as a FAISS IndexFlatIP binary alongside the JSON dump, plus a HNSW variant for low-latency neighbour lookup.
$ pip install savor-corpus $ python -c "from savor import load; c = load(); print(c.search('Lenten cabbage pie', k=5, lang='auto'))" [ ('sav:ro/cluj/1934/placinta-varza-ciuperci', 0.913), ('sav:ro/sibiu/1928/placinta-de-post', 0.872), ('sav:it/artusi/1891/erbazzone-reggio', 0.834), ('sav:it/romagna/c20/torta-cavolo-verza', 0.812), ('sav:ro/turda/1931/bob-fiert-hrean', 0.789) ]
07Install & CLI
$ pip install savor-corpus $ savor pull --version 1.0.3 --format jsonld $ savor validate ./my-recipes/*.json --schema savor-recipe-1.0 $ savor index --embed labse --out ./savor.faiss $ savor query "Christmas Eve, no meat, Transylvania" --k 5 → returns ranked JSON results with provenance & cosine scores
08License & ethics
Pre-1925 records are released CC0 1.0. Post-1925 manuscript recipes and contemporary student records are released CC BY-SA 4.0 with documented consent on file. Student names are pseudonymised by default; contributors retain veto rights over culturally sensitive items per CARE Principles. Provenance fields trace every record back to its manuscript folio or born-digital row, ensuring transparency.
Multimedia attachments (photographs, videos contributed by Casa Artusi) ship under their own per-asset licences and are referenced by IIIF manifest URI rather than embedded.
09How to cite
@dataset{savor2026,
author = {Pop, Liviu and Mocanu, Cosmina and Ion, Radu and Mititelu, Verginica
and Cuibus, Lucian and Mărgăoan, Rodica and Montanari, Massimo
and Fiandaca, Mattia},
title = {{SAVOR}: Semantic AI Valorisation of Original Recipes},
year = {2026},
publisher = {Zenodo},
version = {1.0.3},
doi = {10.5281/zenodo.8729471},
url = {https://savor.eu}
}