Skip to main content

structured-data

Generate JSON-LD and YARRRML for a rendered web page using Playwright and Agent WordLift.

Usage

  • worai structured-data create <url> [schema_type] [options]

Arguments

ArgumentTypeDescription
urlstringTarget page URL.
schema_typestringSchema.org type to generate (e.g., Review). Required unless provided with --type.

Options

OptionTypeDefaultDescription
--typestringnoneSchema.org type to generate (e.g., Review). Required if schema_type is omitted.
--output-dirstring.Output directory for generated files.
--base-namestringstructured-dataBase output filename.
--jsonldstringnoneWrite JSON-LD to this file path.
--yarrmlstringnoneWrite YARRRML to this file path.
--debugboolfalseWrite agent prompt/response to .structured-data/agent_debug.json and echo to stderr.
--headedboolfalseRun the browser with a visible UI instead of headless.
--timeout-msint30000Timeout (ms) for page loads.
--max-xhtml-charsint40000Max characters to keep in cleaned XHTML sent to the agent.
--max-text-node-charsint400Max characters per text node in cleaned XHTML.
--max-nesting-depthint2Max depth for related schema types in the property guide.
--verbosebooltrueEmit progress logs to stderr.
--wait-untilchoicenetworkidlePlaywright wait strategy: domcontentloaded, load, networkidle.

Notes

  • Requires WORDLIFT_KEY (or wordlift.api_key in config) to resolve the dataset URI.
  • Requires yarrrml-parser (npm install -g @rmlio/yarrrml-parser).
  • morph-kgc is included in project dependencies.
  • Each JSON-LD node includes an @id built as <dataset_uri>/<pluralized-type>/<name>-<hash>.
  • YARRRML uses XPath selectors.
  • Intermediate artifacts are stored under <output-dir>/.structured-data/ (HTML, XHTML, cleaned XHTML, mapping, validation reports).
  • The generator rejects hard-coded literals (except schema:url) and checks XPath evidence before accepting a mapping.
  • Missing Google-required properties are reported as warnings (not hard failures).

Examples

  • worai structured-data create https://example.com/article Review --output-dir ./structured-data
  • worai structured-data create https://example.com/article --type Review --output-dir ./structured-data
  • worai structured-data create https://example.com/article Review --jsonld ./out/page.jsonld --yarrml ./out/page.yarrml