Vignette refresh #642

hfrick · 2025-08-21T14:19:08Z

As discussed, I'm chipping away at a refresh of the vignettes. Overall, I'm aiming to have

a vignette covering the core functionality of the various validation rules and how they can be combined into a validation plan/agent
a vignette focused on schema validation aka "validating the fundamentals" of the shape of the data before validating the content of the data
a vignette centered around "Taking action" covering getting notified, stopping automation, and inspecting results
and possibly one around tailoring the validation report to stakeholders/people who understand the context of the data well but may not be data-focused themselves

I'm gonna keep this PR as a draft for now but your comments on the first two vignettes would be very welcome already!

hfrick · 2025-08-21T14:23:52Z

vignettes/validation-custom.Rmd

This one leans heavily on the user guide for the python version, but adapted for the R version

hfrick · 2025-08-21T14:26:29Z

vignettes/schema.Rmd

+You can relax the validation further by allowing `NULL` types in the schema, which means that the column can be of any type or even missing from the table. 
+<!-- This is useful when you want to validate the presence of a column without enforcing a specific type or the column -->


Is there a way to check that a column exists but not bothering with the type? I wasn't expecting the NULL to allow the column to be missing.

There is but it's hacky, not well-explained, and so should be improved in the future:

small_table %>% expect_col_schema_match( schema = col_schema( date_time = "POSIXct", date = "Date", a = NULL, # Column exists but type is ignored b = NULL, # Column exists but type is ignored f = "character", e = "logical" ), complete = FALSE, in_order = FALSE, is_exact = FALSE # Required for NULL to work )

This seems more like a side effect of exact type-matching and isn't very good API design.

I think your example is passing more because complete = FALSE and your schema is missing columns c and d.

Only using is_exact = FALSE does not turn b = NULL into a check that b exists:

library(pointblank) # baseline: passes data.frame(a = 1:2) |> col_schema_match(col_schema(a = "integer")) #> a #> 1 1 #> 2 2 # add b to data frame and to schema as NULL and strict check fails as it should data.frame(a = 1:2, b = 1:2) |> col_schema_match(col_schema(a = "integer", b = NULL)) #> Error: Failure to validate that column schemas match. #> The `col_schema_match()` validation failed beyond the absolute threshold level (1). #> * failure level (1) >= failure threshold (1) # relaxing `is_exact` allows the check to pass data.frame(a = 1:2, b = 1:2) |> col_schema_match(col_schema(a = "integer", b = NULL), is_exact = FALSE) #> a b #> 1 1 1 #> 2 2 2 # but it still passes when b is missing from the data frame # i.e. it's not a check for existence data.frame(a = 1:2) |> col_schema_match(col_schema(a = "integer", b = NULL), is_exact = FALSE) #> a #> 1 1 #> 2 2

^{Created on 2025-08-22 with reprex v2.1.1}

Thanks for checking how these options interact. Definitely need to just make NULL ignore the column type check (but still check column existence), regardless of the options!

hfrick · 2025-08-21T14:27:47Z

vignettes/schema.Rmd

+The default is to define the schema in R types like `"numeric"` or `"character"` and you can use it to validate any of the tables pointblank supports, so not just data frames in R but also tables in databases such as `tbl_dbi` objects. While it may be convienent to define the schema in R types, note that this requires the data to be pulled into R first, which may not be efficient for large datasets. Alternatively, you can define the schema in SQL types and validate directly against the SQL table without pulling data into R.
+
+```{r}
+#| label: types-sql


I'm not particularly database-savvy, so if you spot any ways to improve this example, please let me know!

You could say something like "...in SQL types (like VARCHAR and BIGINT) and validate..."

hfrick · 2025-08-21T14:29:44Z

vignettes/schema.Rmd

+schema_sql <- col_schema(
+  amount = "REAL",
+  customer_name = "TEXT",
+  sale_date = "REAL",


Not loving this conversion of the date format from R, is there a way to make this better?

One option is to use DuckDB instead. It's much better with dates/times and it's a supported input format.

rich-iannone · 2025-08-22T15:45:29Z

I've just read through the vignettes more carefully and I think they are both well written!

rich-iannone

Everything looks good! I think we need to leave out the part about checking columns w/o column types/classes until we fix it in the codebase (I'll create an issue for that). Once that's implemented the vignette could be revised in a separate PR to put that example back in (it's a valuable usage example!).

hfrick added 2 commits August 1, 2025 11:23

draft for vignette on building a custom validation plan

c48d6ff

Draft of vignette on schema validation

6b0bd4a

hfrick commented Aug 21, 2025

View reviewed changes

hfrick requested a review from rich-iannone August 21, 2025 14:38

rich-iannone requested changes Aug 22, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Vignette refresh #642

Vignette refresh #642

Uh oh!

hfrick commented Aug 21, 2025

Uh oh!

hfrick Aug 21, 2025

Uh oh!

hfrick Aug 21, 2025

Uh oh!

rich-iannone Aug 21, 2025

Uh oh!

hfrick Aug 22, 2025 •

edited

Loading

Uh oh!

rich-iannone Aug 22, 2025

Uh oh!

hfrick Aug 21, 2025

Uh oh!

rich-iannone Aug 21, 2025

Uh oh!

hfrick Aug 21, 2025

Uh oh!

rich-iannone Aug 21, 2025

Uh oh!

rich-iannone commented Aug 22, 2025

Uh oh!

rich-iannone left a comment •

edited

Loading

Uh oh!

Uh oh!

		You can relax the validation further by allowing `NULL` types in the schema, which means that the column can be of any type or even missing from the table.
		<!-- This is useful when you want to validate the presence of a column without enforcing a specific type or the column -->

Vignette refresh #642

Are you sure you want to change the base?

Vignette refresh #642

Uh oh!

Conversation

hfrick commented Aug 21, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hfrick Aug 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rich-iannone commented Aug 22, 2025

Uh oh!

rich-iannone left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

hfrick Aug 22, 2025 •

edited

Loading

rich-iannone left a comment •

edited

Loading