SAVOR Corpus savor-corpus@1.0.3

A bilingual Romanian–Italian corpus of 3,300 traditional and contemporary recipes, structured against schema.org/Recipe and FOODon, embedded with LaBSE, and exposed via REST, SPARQL, and OAI-PMH. FAIR, reproducible, version-pinned.

v1.0.3 records3,304 triples181,447 CIpassing licenceCC0 · CC BY-SA DOI10.5281/8729471

01Overview

The SAVOR corpus packages 3,304 recipes from three sources — interwar Romanian manuscripts, contemporary Romanian student recipes, and Pellegrino Artusi's 1891 cookbook — into a single SAVOR-JSON schema (extending schema.org/Recipe). Each record carries multilingual titles, structured ingredients linked to FOODon, regional and temporal metadata via GeoNames and PeriodO, FAO-derived nutritional estimates, and full provenance to the source manuscript or born-digital file.

3,304
recipes
2,003 RO ms · 502 RO student · 799 IT Artusi
181k
RDF triples
turtle · 45 MB compressed
96.4%
QA accuracy
10% sample, manual review
2
languages
RO + IT, with EN gloss in 412 records

Distribution is split. Pre-1925 records (n = 791) are released CC0. Post-1925 manuscript recipes and contemporary student records (n = 2,513) are released CC BY-SA 4.0 with documented consent. Multimedia attachments (~150 records) ship under their own per-asset licences via IIIF endpoints.

02SAVOR-JSON schema

SAVOR-JSON extends schema.org/Recipe with culinary-heritage-specific fields. The metadata spine is Qualified Dublin Core for Cloud interoperability; recipe detail is carried in schema.org and FOODon. Every record validates against the JSON Schema below before ingestion.

extendsschema.org/Recipe spineQualified Dublin Core vocabFOODon, Wikidata, GeoNames, PeriodO formatJSON-LD + Markdown + Turtle

Fields reference

FieldTypeReq.Description
@typestringrequiredAlways "Recipe" at top level.
recipeIdURIrequiredPersistent identifier in the sav: namespace, e.g. sav:ro/cluj/1934/placinta-mere.
name{ ro, it, en? }requiredMultilingual title. ro and it mandatory; en optional gloss.
description{ ro, it, en? }optionalStandfirst paragraph from the source.
ingredients[]FoodEntity[]requiredEach entry carries food (FOODon URI), qty, unit, and optional note.
recipeInstructions[]Step[]requiredOrdered list of structured steps; each step links semantic actions to foodon:culinary-process.
regionGeoNames URIrequiredPrimary geographical origin.
periodPeriodO URIrequiredTemporal coverage. Predominant periods: interwar RO, late-XIX IT.
festive[]Wikidata URI[]optionalReligious or seasonal occasions.
diet[]enum[]optionalvegan, vegetarian, de-post, kosher, …
nutritionNutritionInformationoptionalFAO-derived per-serving estimate. Validated by USAMV.
provenancestringrequiredSource manuscript shelfmark or born-digital row reference.
licenseSPDXrequiredCC0-1.0 or CC-BY-SA-4.0.
embeddingfloat[768]computedLaBSE bilingual embedding. Computed at build time; not editable.

03Example record

// /recipes/ro/cluj/1934/placinta-varza-ciuperci.json
{
  "@context": "https://savor.eu/context.jsonld",
  "@type": "Recipe",
  "recipeId": "sav:ro/cluj/1934/placinta-varza-ciuperci",
  "name": {
    "ro": "Plăcintă cu varză și ciuperci",
    "it": "Sfoglia ripiena di cavolo e funghi",
    "en": "Cabbage and mushroom pie"
  },
  "region": "geonames:665087",            // Cluj-Napoca
  "period": "periodo:p0qhvzd",            // RO interwar (1918–1944)
  "festive": ["wd:Q3406641"],            // Christmas Eve (Ajun)
  "diet": ["vegan", "de-post"],
  "yield": 8, "timeMinutes": 160,
  "ingredients": [
    { "food": "foodon:00002569", "qty": 1000, "unit": "g", "label": "varză albă" },
    { "food": "foodon:03411184", "qty": 200,  "unit": "g", "label": "ciuperci de pădure" }
  ],
  "nutrition": { "calories": 218, "protein_g": 5.4, "source": "FAO-INFOODS" },
  "provenance": "AR-CJ ms.b3.f47r",
  "license": "CC-BY-SA-4.0",
  "embedding": "<float[768], LaBSE-multilingual>"
}

04REST API

The REST surface is generated from the same SAVOR-JSON schema. All endpoints are read-only, return JSON-LD, support content negotiation for Markdown and Turtle, and are CORS-enabled for browser clients. Rate limit: 100 req/min unauthenticated; 1000 req/min with an API key.

GET /v1/recipes List with facets: ?region=&period=&diet=&festive=&lang=&q=
GET /v1/recipes/{recipeId} Single record with full JSON-LD context
GET /v1/recipes/{recipeId}/neighbours k-nearest by LaBSE cosine. ?k=10&cross_lingual=true
POST /v1/converse Natural-language query → ranked recipes + grounded citations
GET /v1/oai-pmh OAI-PMH metadata harvest, Dublin Core profile

05SPARQL endpoint

Federated SPARQL is exposed at https://savor.eu/sparql. The endpoint co-resolves with FOODon, GeoNames, and Wikidata via SERVICE clauses. Below: a query for vegan, fasting-period recipes from Transylvania mentioning brined cabbage.

# Vegan recipes from Transylvania with fermented cabbage
PREFIX sav: <https://savor.eu/ns/>
PREFIX fdn: <http://purl.obolibrary.org/obo/foodon_>
PREFIX schema: <https://schema.org/>

SELECT ?id ?title_ro ?title_it ?period WHERE {
  ?r a schema:Recipe ;
     sav:recipeId ?id ;
     sav:diet "vegan" ;
     sav:region/geonames:parentFeature* <https://sws.geonames.org/665087/> ;
     sav:hasIngredient/sav:food fdn:00003356 ;   # fermented cabbage
     sav:period ?period ;
     schema:name [ sav:lang "ro" ; sav:value ?title_ro ] ,
                 [ sav:lang "it" ; sav:value ?title_it ] .
} ORDER BY ?period LIMIT 50

06Embeddings & FAISS index

Every record carries a 768-dim LaBSE embedding computed over the concatenation of name.ro, name.it, description, and the FOODon-resolved ingredient labels. The full index is shipped as a FAISS IndexFlatIP binary alongside the JSON dump, plus a HNSW variant for low-latency neighbour lookup.

$ pip install savor-corpus
$ python -c "from savor import load; c = load(); print(c.search('Lenten cabbage pie', k=5, lang='auto'))"
[
  ('sav:ro/cluj/1934/placinta-varza-ciuperci',  0.913),
  ('sav:ro/sibiu/1928/placinta-de-post',        0.872),
  ('sav:it/artusi/1891/erbazzone-reggio',       0.834),
  ('sav:it/romagna/c20/torta-cavolo-verza',     0.812),
  ('sav:ro/turda/1931/bob-fiert-hrean',         0.789)
]

07Install & CLI

$ pip install savor-corpus
$ savor pull --version 1.0.3 --format jsonld
$ savor validate ./my-recipes/*.json --schema savor-recipe-1.0
$ savor index --embed labse --out ./savor.faiss

$ savor query "Christmas Eve, no meat, Transylvania" --k 5
  → returns ranked JSON results with provenance & cosine scores

08License & ethics

Pre-1925 records are released CC0 1.0. Post-1925 manuscript recipes and contemporary student records are released CC BY-SA 4.0 with documented consent on file. Student names are pseudonymised by default; contributors retain veto rights over culturally sensitive items per CARE Principles. Provenance fields trace every record back to its manuscript folio or born-digital row, ensuring transparency.

Multimedia attachments (photographs, videos contributed by Casa Artusi) ship under their own per-asset licences and are referenced by IIIF manifest URI rather than embedded.

09How to cite

BibTeX
@dataset{savor2026,
  author    = {Pop, Liviu and Mocanu, Cosmina and Ion, Radu and Mititelu, Verginica
               and Cuibus, Lucian and Mărgăoan, Rodica and Montanari, Massimo
               and Fiandaca, Mattia},
  title     = {{SAVOR}: Semantic AI Valorisation of Original Recipes},
  year      = {2026},
  publisher = {Zenodo},
  version   = {1.0.3},
  doi       = {10.5281/zenodo.8729471},
  url       = {https://savor.eu}
}