Bugzilla – Attachment 174126 Details for
Bug 39667
Enable accessible/tagged PDF export options by default
Home
|
New
|
Browse
|
Search
|
[?]
|
Reports
|
Help
|
New Account
|
Log In
[x]
|
Forgot Password
Login:
[x]
R script to process and visualise file sizes
process.R (text/x-r-source), 2.35 KB, created by
Stéphane Guillou (stragu)
on 2021-08-07 11:27:54 UTC
(
hide
)
Description:
R script to process and visualise file sizes
Filename:
MIME Type:
Creator:
Stéphane Guillou (stragu)
Created:
2021-08-07 11:27:54 UTC
Size:
2.35 KB
patch
obsolete
>#### Get sizes of PDFs #### > ># function to get file size and file name in a dataframe >get_sizes <- function(directory) { > size <- file.size(dir(directory, full.names = TRUE)) > data.frame(file = dir(directory), > size) >} > ># get names, sizes and stages >library(dplyr) >source_files <- get_sizes("source_files") |> > mutate(what = "source") >pdf_export <- get_sizes("pdf_export") |> > mutate(what = "pdf") >pdfTagged_export <- get_sizes("pdfTagged_export") |> > mutate(what = "pdfTagged") >pdfUA_export <- get_sizes("pdfUA_export") |> > mutate(what = "pdfUA") > ># bind into single dataframe >library(stringr) >long <- bind_rows(source_files, pdf_export, pdfTagged_export, pdfUA_export) |> > # remove file extensions > mutate(file = str_extract(file, "^[^\\.]+")) ># names are unique as an ID number was appended before converting to PDF > ># make dataframe wide, calculate rate of change >library(tidyr) >wide <- long |> > pivot_wider(names_from = what, > values_from = size) |> > mutate(change_tagged = pdfTagged / pdf, > change_UA = pdfUA / pdf) > >#### Visualise #### > >library(ggplot2) ># make dataframe tidy (i.e. long) for ggplot2 >tidied <- wide |> > select(file, change_tagged, change_UA) |> > pivot_longer(-file, > names_to = "type", values_to = "change") ># separate dataframe for each facet's median >medians <- tidied |> > group_by(type) |> > summarise(median = median(change, na.rm = TRUE)) > >tidied |> > ggplot(aes(x = change)) + > geom_histogram(binwidth = 0.5, boundary = 1) + > geom_vline(mapping = aes(xintercept = 1), > colour = "blue") + > geom_vline(data = medians, > mapping = aes(xintercept = median), > colour = "red", linetype = "dashed") + > geom_label(data = medians, > aes(x = median, y = 0, label = median |> round(2)), > colour = "red") + > scale_x_continuous(breaks = seq(1, 21, 2)) + > facet_wrap(vars(type)) + > labs(title = "Rate of change in PDF size", > x = "Size change rate", > y = "Number of files", > caption = paste("Blue line is no change (rate of change = 1). > Red line is median change rate. > Number of files in test sample:", > nrow(wide))) > >ggsave("tagged_vs_UA_histograms.svg", > width = 10, height = 6, dpi = 600) > >#### Export data #### > ># export to ODS >library(readODS) >write_ods(wide, "results.ods") >
You cannot view the attachment while viewing its details because your browser does not support IFRAMEs.
View the attachment on a separate page
.
View Attachment As Raw
Actions:
View
Attachments on
bug 39667
:
173642
|
173711
|
174124
|
174125
| 174126