Changelog • manydata

manydata 1.1.1

Package

Updated website address
Updated authorship

manydata 1.1.0

Package

Updated GitHub actions to use code coverage secrets

Wrangling

Added filter_datacube() for filtering datasets in a datacube by date
Added find_ID() and find_common_ID() for identifying ID columns in datasets

Evaluation

Added find_year() for extracting just the year from a date (potentially unnecessary if messydates::year() available)
Added compare_new() and compare_diff() for comparing what is new or different in one dataset over another
Added a range of score_*() functions for scoring datasets on various criteria, including consistency, completeness, accuracy, timeliness, and uniqueness of the data

Maintaining

Added find_duplicates() for identifying duplicate observations in datasets
Added code_extend_glove() and code_extend_bert() for extending existing coding to new or missing data

manydata 1.0.3

CRAN release: 2025-06-18

Connection

Added new getID() helper that obtains the one or two ID columns that appear as the first one or two columns in a datacube
compare_overlap() now returns a list of each datasets IDs to avoid issues with ggVennDiagram
plot.compare_overlap() now always returns an upset plot (closes #292)
Fixed testing of ggplot objects (closes #308)
Fixed how plot.compare_categories() treated identifier variables (closes #291)

manydata 1.0.2

CRAN release: 2025-06-03

Connection

Fixed global variables in several resolve_*() functions

manydata 1.0.1

CRAN release: 2025-03-21

Package

Updated website

Connection

resolve_*() functions now have a parameter indicating whether missing values should be included; unlike base R, by default missing values are excluded
Restored resolve_mean()
Restored resolve_median()
Added resolve_mode() for retaining the most common values
Added resolve_consensus() for retaining only values where there are no conflicts

manydata 1.0.0

Package

Updated GitHub checks and release actions
Fixes to URLs
Updated website
Improved ease of operation by making cli, dplyr, and messydates Depends
Dropped usethis Suggest

Collection

Updated emperors dataset
- Using zero-padded messydates
- Added citation prompts
- Datasets capitalised:
  - emperors$Wikipedia
  - emperors$UNRV
  - emperors$Britannica
- Fixed non-unique IDs bugs
- Fixed inc

Calling

Added call_citations() to print citations added as hidden information
Fixed finicky call_sources() bug related to calling help files
Improved call_sources() and call_citations() to accept datacubes or datasets, as objects or characters
Moved mreport() from messydates
- Added mreport.list() to make it easier to report on datacubes
Added describe_data() for describing key aspects of datasets in datacubes
Fixed call_releases() to use messydates::vmin()

Connection

Improved pluck()
- Function now wraps dplyr::pluck() but adds a citation prompt
Improved consolidate()
- Improved useability with cli progress messages and success alerts
- Improved speed using dtplyr in place of dplyr::full_join() (closes #288)
  - duckplyr considered: faster, but couldn’t handle mdate class
  - collapse considered: even faster, but inconsistent output
- Improved compatibility by converting ‘rows’ argument to ‘join’ (breaking)
  - “all” becomes “inner”
  - “any” becomes “full”
  - “favour” becomes “left”
- Fixed being passed a single dataset
- Prompts users to cite datasets (closes #280)
- Fixed bug in ‘resolve’ argument, named ‘resolve’ vector no longer has to be same length as variables
- Dropped ‘cols’ argument
Updated tests for consolidate() to use new ‘join’ argument
- testthat tests use cli on quiet mode
Updated resolve_coalesce() for coalescing (taking first non-NA value)
Updated resolve_random() for returning random values sampling from those available
Updated resolve_min() and resolve_max() for returning min or max values
Added resolve_unite() for returning all possible values as a set
Added resolve_precision() for returning most precise values available (closes #265)
- Added precision.numeric() to return most significant figures
- Added precision.character() to return most characters
Dropped resolve_median() and resolve_mean() as uncommon choices
Dropped resolve_multiple() in favour of always using more flexible for loop
Dropped favour() in favour of left joins and coalesces
Dropped coalesce_rows() as no longer necessary

manydata 0.9.3

CRAN release: 2024-05-06

Connection

Updated call_sources() to be more flexible when gathering data from datacube documentation
Closed #279 by updating documentation across many packages to be compatible with call_sources()
Updated compare_dimensions() by fixing bugs related to dates and NA observations

manydata 0.9.2

CRAN release: 2024-02-22

Package

Fixed the emperors data documentation issues related to lost braces with CRAN submission

manydata 0.9.1

Package

Updated test expectations to make package compatible with the new release of ggplot2

Connection

Closed #266 by adding startup messages to ‘many’ packages
Closed #267 by adding links to package websites in console messages
Closed #282 by updating all references from ‘database’ to ‘datacube’
Closed #293 by fixing bugs related to missing dates when using consolidate()
Closed #294 by updating how call_sources() identify datasets within datacubes

manydata 0.9.0

Package

Closed #259 by revising CCC package structure and updating the package cheatsheet
Updated documentation for ‘emperors’ data to new style to improve visibility and transparency
Closed #264 by removing tibbleand janitor package imports in DESCRIPTION file
Closed #276 by reviewing package vignettes
Closed #277 by updating ‘manydata-defunct’ file
Closed #284 by removing vignette and updating README to include more information on how to use the package
Updated all references and argument from ‘database’ to ‘datacube’

Connection

Renamed and updated ‘call_’ family of functions
- Closed #250, #251, and #262 by renaming get_packages() to call_packages() and updating how the function works and look up packages, version updates, and availailabity
- Closed #269 and # by adding a call_sources() function that displays sources and variable changes for datasets in datacubes
- Closed #271 by updating the retrieve_ family of functions to call_ functions
- Closed #283 by renaming plot_releases() to call_releases
Renamed and updated ‘compare_’ family of functions
- Closed #243 and #257 by creating a compare_missing() function to compare missing values in datasets in a ‘many’ datacube
- Closed #249 and #253 by renaming db_plot() function to compare_categories() and updating variable categories
- Closed #261 by renaming and updating other db_ functions to compare_ functions
- Closed #268 by adding compare_overlap() to help users investigate overlap for datasets within datacubes
- Closed #285 by adding compare_dimensions() and compare_ranges() to compare dimensions and ranges in datacubes

manydata 0.8.3

CRAN release: 2023-06-15

Connection

Made ´network_map()´ function defunct

manydata 0.8.2

CRAN release: 2022-11-19

Connection

Updated consolidate() to require two keys when joining memberships’ databases
Updated db_comp() to follow consolidation defaults for memberships’ databases
Closed #231 by adding a retrieve_texts() function to retrieve treaty texts from other ‘many’ packages

manydata 0.8.1

CRAN release: 2022-11-11

Package

Added ‘RDataTmp’ files to Rbuildignore and .gitignore
Updated data_evolution() to use inherits() instead of class() for condition comparison

manydata 0.8.0

Package

Closed #212 by implementing package caching in GitHub actions workflows
Closed #218 by fixing bug with GitHub actions workflows
Closed #225 by changing the structure of datasets in “many” data packages
Closed #240 by updating the package cheatsheet

Connection

Closed #134 by adding a data_evolution() function to the report family of functions that gets original datasets, if available, or opens the preparation scripts, if not available
Added ‘db_profile’ family of functions to visualise databases
- Closed #214 by adding db_plot() function to plot a profile of the database to facilitate comparison of matched observations across datasets
- Closed #224 by adding db_comp() function that creates a tibble of the database to facilitate comparison of matched observations across datasets
Updated get_packages() function
- Closed #215 by making get_packages() interactive so that users can chose which branch to download
- Closed #219 by improving get_packages() printing
- Updated get_packages() and plot_releases() to use messydates, instead of lubridate, for dates coercion
Closed #222 by adding network_map() function for plotting geographical networks
Updated consolidate()function to make function over 20 times faster
- Closed #227 by making consolidate() ignore text related variables due to their size
- Closed #230 by making consolidate() more concise to avoid running into memory limits
- Closed #228 and #232 by replacing coalesce_compatible() for a faster approach to coalescing compatible missing observations that relies on zoo::na.locf()
- Made coalesce_compatible() function defunct

manydata 0.7.5

CRAN release: 2022-06-07

Package

Removed skimr table from emperors database documentation
Updated path for binaries in push release GitHub actions

manydata 0.7.4

Package

Closed #187 by updating GitHub actions to implement package caching
Closed #209 by removing all non-ASCII characters in package
Closed #210 by removing pkgdown dependency
Updated emperors data to contain correct date class name consistent with messydates

manydata 0.7.3

CRAN release: 2022-04-01

Connection

Updated how the get_packages() function identifies installed packages to avoid using installed.packages()
Updated documentation for coalesce_compatible() function to include the returns

manydata 0.7.2

Ignored CRAN-SUBMISSION and resubmitted.

manydata 0.7.1

Package

Updated DESCRIPTION by removing ambiguous word from title
Updated README by correcting the URL for life cycle badge

Connection

Updated helper functions for consolidate() to use inherits() to identify variable’s class

manydata 0.7.0

Package

Closed #194 by updating all remaining references from “qID” to “manyID”
Updated package website
- Closed #196 by updating elements that configure website to work properly
- Updated ’_pkgdown.yml’ file to use bootstrap 5 template to build website

Connection

Updated consolidate() function
- Closed #191 by making consolidate() function more concise and faster by removing redundant code lines
- Fixed dates-related warnings by changing how messydates package is used to resolve dates
- Updated how consolidate() substitutes missing observations with first non-missing observation from other datasets
- Closed #201 by fixing how consolidate() detects variables to be resolved to avoid ambiguous variable matching
- Closed #202 by allowing for multiple key vectors to be declared as arguments for consolidate()
Closed #199 by adding favour() (also favor()) function that re-orders datasets within a database

manydata 0.6.0

Package

Closed #189 by renaming package from {qData} to manydata
Updated user vignette to include more examples on working with consolidate()
Updated package website
Closed #167 by adding a cheatsheet to README

Connection

Updated consolidate() function
- Closed #169 by making default key variable “many_ID” instead of “qID”
- Closed #183 by adding further methods to resolve conflicts between observations:
  - Added “max” resolve argument which resolves conflicts in favor of the largest non NA value
  - Added “min” resolve argument which resolves conflicts in favor of the smallest non NA value
  - Added “mean” resolve argument which resolves conflicts in favor of the average non NA value
  - Added “median” resolve argument which resolves conflicts in favor of the median non NA value
  - Added “random” resolve argument which resolves conflicts in favor of a random non NA value
- Closed #185 by making so that users can specify resolve argument differently for different variables
Closed #188 by adding more informative warnings for GitHub download limits for get_packages() function
Added extraction functions to generate edgelists from agreements membership datasets
- Added extract_bilaterals() for extracting adjacency edgelist for bilateral agreements
- Added extract_multilaterals() for extracting adjacency edgelist for multilateral agreements