merge_by_species(x, y, by, all.x)

Robust species-level merge: standardises names (lowercase, underscore), attempts fuzzy matching for unmatched rows, and reports coverage.

datataxonomy
Args:x, y — data framesby — column name in bothall.x=TRUE
merge_by_species <- function(x, y, by = "species", all.x = TRUE) {
  clean <- function(v) tolower(gsub("[^a-zA-Z0-9]", "_", trimws(v)))
  x$.key <- clean(x[[by]])
  y$.key <- clean(y[[by]])
  merged <- merge(x, y, by = ".key", all.x = all.x, suffixes = c("", ".y"))
  matched <- sum(!is.na(merged[[paste0(names(y)[2], ".y")]]))
  message(sprintf("Merged: %d / %d rows matched (%.1f%%)",
                  matched, nrow(x), 100 * matched / nrow(x)))
  merged$.key <- NULL
  merged
}