Changelog
Source:NEWS.md
manydata 1.1.0
Wrangling
- Added
filter_datacube()for filtering datasets in a datacube by date - Added
find_ID()andfind_common_ID()for identifying ID columns in datasets
Evaluation
- Added
find_year()for extracting just the year from a date (potentially unnecessary ifmessydates::year()available) - Added
compare_new()andcompare_diff()for comparing what is new or different in one dataset over another - Added a range of
score_*()functions for scoring datasets on various criteria, including consistency, completeness, accuracy, timeliness, and uniqueness of the data
Maintaining
- Added
find_duplicates()for identifying duplicate observations in datasets - Added
code_extend_glove()andcode_extend_bert()for extending existing coding to new or missing data
manydata 1.0.3
CRAN release: 2025-06-18
Connection
- Added new
getID()helper that obtains the one or two ID columns that appear as the first one or two columns in a datacube -
compare_overlap()now returns a list of each datasets IDs to avoid issues with ggVennDiagram -
plot.compare_overlap()now always returns an upset plot (closes #292) - Fixed testing of ggplot objects (closes #308)
- Fixed how
plot.compare_categories()treated identifier variables (closes #291)
manydata 1.0.1
CRAN release: 2025-03-21
Connection
-
resolve_*()functions now have a parameter indicating whether missing values should be included; unlike base R, by default missing values are excluded - Restored
resolve_mean() - Restored
resolve_median() - Added
resolve_mode()for retaining the most common values - Added
resolve_consensus()for retaining only values where there are no conflicts
manydata 1.0.0
Package
- Updated GitHub checks and release actions
- Fixes to URLs
- Updated website
- Improved ease of operation by making cli, dplyr, and messydates Depends
- Dropped usethis Suggest
Collection
- Updated
emperorsdataset- Using zero-padded messydates
- Added citation prompts
- Datasets capitalised:
emperors$Wikipediaemperors$UNRVemperors$Britannica
- Fixed non-unique IDs bugs
- Fixed inc
Calling
- Added
call_citations()to print citations added as hidden information - Fixed finicky
call_sources()bug related to calling help files - Improved
call_sources()andcall_citations()to accept datacubes or datasets, as objects or characters - Moved
mreport()from messydates- Added
mreport.list()to make it easier to report on datacubes
- Added
- Added
describe_data()for describing key aspects of datasets in datacubes - Fixed
call_releases()to usemessydates::vmin()
Connection
- Improved
pluck()- Function now wraps
dplyr::pluck()but adds a citation prompt
- Function now wraps
- Improved
consolidate()- Improved useability with cli progress messages and success alerts
- Improved speed using dtplyr in place of
dplyr::full_join()(closes #288) - Improved compatibility by converting ‘rows’ argument to ‘join’ (breaking)
- “all” becomes “inner”
- “any” becomes “full”
- “favour” becomes “left”
- Fixed being passed a single dataset
- Prompts users to cite datasets (closes #280)
- Fixed bug in ‘resolve’ argument, named ‘resolve’ vector no longer has to be same length as variables
- Dropped ‘cols’ argument
- Updated tests for
consolidate()to use new ‘join’ argument- testthat tests use cli on quiet mode
- Updated
resolve_coalesce()for coalescing (taking first non-NA value) - Updated
resolve_random()for returning random values sampling from those available - Updated
resolve_min()andresolve_max()for returning min or max values - Added
resolve_unite()for returning all possible values as a set - Added
resolve_precision()for returning most precise values available (closes #265)- Added
precision.numeric()to return most significant figures - Added
precision.character()to return most characters
- Added
- Dropped
resolve_median()andresolve_mean()as uncommon choices - Dropped
resolve_multiple()in favour of always using more flexible for loop - Dropped
favour()in favour of left joins and coalesces - Dropped
coalesce_rows()as no longer necessary
manydata 0.9.3
CRAN release: 2024-05-06
Connection
- Updated
call_sources()to be more flexible when gathering data from datacube documentation - Closed #279 by updating documentation across many packages to be compatible with
call_sources() - Updated
compare_dimensions()by fixing bugs related to dates and NA observations
manydata 0.9.1
Package
- Updated test expectations to make package compatible with the new release of ggplot2
Connection
- Closed #266 by adding startup messages to ‘many’ packages
- Closed #267 by adding links to package websites in console messages
- Closed #282 by updating all references from ‘database’ to ‘datacube’
- Closed #293 by fixing bugs related to missing dates when using
consolidate() - Closed #294 by updating how
call_sources()identify datasets within datacubes
manydata 0.9.0
Package
- Closed #259 by revising CCC package structure and updating the package cheatsheet
- Updated documentation for ‘emperors’ data to new style to improve visibility and transparency
- Closed #264 by removing tibbleand janitor package imports in DESCRIPTION file
- Closed #276 by reviewing package vignettes
- Closed #277 by updating ‘manydata-defunct’ file
- Closed #284 by removing vignette and updating README to include more information on how to use the package
- Updated all references and argument from ‘database’ to ‘datacube’
Connection
- Renamed and updated ‘call_’ family of functions
- Closed #250, #251, and #262 by renaming
get_packages()tocall_packages()and updating how the function works and look up packages, version updates, and availailabity - Closed #269 and # by adding a
call_sources()function that displays sources and variable changes for datasets in datacubes - Closed #271 by updating the
retrieve_family of functions tocall_functions - Closed #283 by renaming
plot_releases()tocall_releases
- Closed #250, #251, and #262 by renaming
- Renamed and updated ‘compare_’ family of functions
- Closed #243 and #257 by creating a
compare_missing()function to compare missing values in datasets in a ‘many’ datacube - Closed #249 and #253 by renaming
db_plot()function tocompare_categories()and updating variable categories - Closed #261 by renaming and updating other
db_functions tocompare_functions - Closed #268 by adding
compare_overlap()to help users investigate overlap for datasets within datacubes - Closed #285 by adding
compare_dimensions()andcompare_ranges()to compare dimensions and ranges in datacubes
- Closed #243 and #257 by creating a
manydata 0.8.2
CRAN release: 2022-11-19
Connection
- Updated
consolidate()to require two keys when joining memberships’ databases - Updated
db_comp()to follow consolidation defaults for memberships’ databases - Closed #231 by adding a
retrieve_texts()function to retrieve treaty texts from other ‘many’ packages
manydata 0.8.1
CRAN release: 2022-11-11
Package
- Added ‘RDataTmp’ files to Rbuildignore and .gitignore
- Updated
data_evolution()to useinherits()instead ofclass()for condition comparison
manydata 0.8.0
Connection
- Closed #134 by adding a
data_evolution()function to the report family of functions that gets original datasets, if available, or opens the preparation scripts, if not available - Added ‘db_profile’ family of functions to visualise databases
- Updated
get_packages()function- Closed #215 by making
get_packages()interactive so that users can chose which branch to download - Closed #219 by improving
get_packages()printing - Updated
get_packages()andplot_releases()to use messydates, instead of lubridate, for dates coercion
- Closed #215 by making
- Closed #222 by adding
network_map()function for plotting geographical networks - Updated
consolidate()function to make function over 20 times faster- Closed #227 by making
consolidate()ignore text related variables due to their size - Closed #230 by making
consolidate()more concise to avoid running into memory limits - Closed #228 and #232 by replacing
coalesce_compatible()for a faster approach to coalescing compatible missing observations that relies onzoo::na.locf() - Made
coalesce_compatible()function defunct
- Closed #227 by making
manydata 0.7.5
CRAN release: 2022-06-07
Package
- Removed skimr table from
emperorsdatabase documentation - Updated path for binaries in push release GitHub actions
manydata 0.7.4
Package
- Closed #187 by updating GitHub actions to implement package caching
- Closed #209 by removing all non-ASCII characters in package
- Closed #210 by removing pkgdown dependency
- Updated
emperorsdata to contain correct date class name consistent with messydates
manydata 0.7.3
CRAN release: 2022-04-01
Connection
- Updated how the
get_packages()function identifies installed packages to avoid usinginstalled.packages() - Updated documentation for
coalesce_compatible()function to include the returns
manydata 0.7.1
Package
- Updated DESCRIPTION by removing ambiguous word from title
- Updated README by correcting the URL for life cycle badge
Connection
- Updated helper functions for
consolidate()to useinherits()to identify variable’s class
manydata 0.7.0
Connection
- Updated
consolidate()function- Closed #191 by making
consolidate()function more concise and faster by removing redundant code lines - Fixed dates-related warnings by changing how messydates package is used to resolve dates
- Updated how
consolidate()substitutes missing observations with first non-missing observation from other datasets - Closed #201 by fixing how
consolidate()detects variables to be resolved to avoid ambiguous variable matching - Closed #202 by allowing for multiple key vectors to be declared as arguments for
consolidate()
- Closed #191 by making
- Closed #199 by adding
favour()(alsofavor()) function that re-orders datasets within a database
manydata 0.6.0
Package
- Closed #189 by renaming package from
{qData}to manydata - Updated user vignette to include more examples on working with
consolidate() - Updated package website
- Closed #167 by adding a cheatsheet to README
Connection
- Updated
consolidate()function- Closed #169 by making default key variable “many_ID” instead of “qID”
- Closed #183 by adding further methods to resolve conflicts between observations:
- Added “max” resolve argument which resolves conflicts in favor of the largest non NA value
- Added “min” resolve argument which resolves conflicts in favor of the smallest non NA value
- Added “mean” resolve argument which resolves conflicts in favor of the average non NA value
- Added “median” resolve argument which resolves conflicts in favor of the median non NA value
- Added “random” resolve argument which resolves conflicts in favor of a random non NA value
- Closed #185 by making so that users can specify resolve argument differently for different variables
- Closed #188 by adding more informative warnings for GitHub download limits for
get_packages()function - Added extraction functions to generate edgelists from agreements membership datasets
- Added
extract_bilaterals()for extracting adjacency edgelist for bilateral agreements - Added
extract_multilaterals()for extracting adjacency edgelist for multilateral agreements
- Added