--- title: "Downloading data with vegbankr" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Downloading data with vegbankr} %\VignetteEncoding{UTF-8} %\VignetteEngine{knitr::rmarkdown} editor_options: markdown: wrap: 72 --- ## Introduction to `vegbankr` This package is an R client for VegBank, the vegetation plot database of the Ecological Society of America's [Panel on Vegetation Classification](https://esa.org/vegpanel/), hosted by the [National Center for Ecological Analysis and Synthesis](https://www.nceas.ucsb.edu) (NCEAS). VegBank contains vegetation plot data, community types recognized by the U.S. National Vegetation Classification and others, and all ITIS/USDA plant taxa along with other taxa recorded in plot records. As a VegBank API client, the `vegbankr` package currently supports querying and downloading vegetation plot records, in addition to validating and uploading new data to the VegBank database. ## Summary of Functions | Function | Description | |---|---| | `vb_get_plot_observations()` | Plot observation data | | `vb_get_community_concepts()` | Community concepts (assertions) linked to community names through usages | | `vb_get_community_classifications()` | Community classification events wherein one or more community concepts were applied to a plot observation | | `vb_get_community_interpretations()` | Assignments of community names and authorities (i.e., community concepts) to specific plot observations, as part of a community classification event | | `vb_get_plant_concepts()` | Plant concepts (assertions) linked to plant names through usages | | `vb_get_taxon_observations()` | Data provider's determination of taxa observed on a plot, and the overall cover of those taxa | | `vb_get_taxon_interpretations()` | Assignments of taxon names and authorities (i.e., plant concepts) to specific taxon observations | | `vb_get_cover_methods()` | Information about registered coverclass methods | | `vb_get_stratum_methods()` | Information about registered strata sampling protocols | | `vb_get_references()` | Information about references cited within VegBank | | `vb_get_projects()` | Information about projects established to collect vegetation plot data | | `vb_get_parties()` | Information about people and organizations who have contributed to the collection or interpretation of a plot | Many of the functions utilize the following arguments: | Argument | Description | |---|---| | `vb_code` | A VegBank code. For example, an observation code (`ob.*`) or project code (`pj.*`) | | `detail` | Level of detail returned: `"full"` includes all available fields; the default returns a summary subset | | `with_nested` | If `TRUE`, nested child records (e.g. taxon observations, stratum data) are included as list columns | | `limit` | Maximum number of records to return (default: 100) | | `offset` | Number of records to skip before returning results; useful for pagination | | `sort` | Field name to sort by; prefix with `-` for descending order (e.g. `"-obs_count"`) | | `search` | Optional search string for filtering results | ```{r, message = FALSE, echo = FALSE} library(vegbankr) ``` ## Plot Observation Data The `vb_get_plot_observations()` function allows users to download vegetation plot data from VegBank. ### Single Plot Observation by Plot Code To retrieve a specific plot observation record using its "ob" code, use the following command: ```{r} # Retrieve a specific plot observation ob.135454 <- vb_get_plot_observations(vb_code = "ob.135454", detail = "full", with_nested = TRUE) # Preview the downloaded data head(ob.135454) ``` Omitting `detail = "full"` or setting `with_nested = FALSE` returns a smaller summary better for quick browsing or larger downloads. ### Plot Observations for a Project You can also download multiple plot observations for a specific project. For example, to retrieve 100 plot observations from the Southwest GAP, Nevada Project (`pj.10510`): ```{r} # Retrieve all plot observations for a specific project pj.10510 <- vb_get_plot_observations(vb_code = "pj.10510") # Preview the data head(pj.10510) ``` ## Project Data `vb_get_projects()` returns information about projects established to collect vegetation plot data. ### Search by Project Name This example retrieves all projects whose name contains "GAP", sorted in descending order by observation count so that the most data-rich projects appear first. ```{r} vb_get_projects(search = "GAP", sort = "-obs_count") ``` ### Plot Observations by Project Code Once you have identified a project code, you can pass it directly to `vb_get_plot_observations`. This example retrieves the first 100 records of plot observations associated with the project `pj.11044` (Pennsylvania HPD Delaware Water Gap), sorted by the author's observation code. ```{r} vb_get_plot_observations("pj.11044", sort = "author_obs_code") ``` ## Party Data `vb_get_parties()` returns information about the people and organization who have contributed to a plot, project, or plant/community interpretation ```{r} # get people associated with a project vb_get_parties(vb_code = "pj.11044") # get people associated with a plot observation vb_get_parties(vb_code = "ob.3298") ``` ### Plot Observations by Party Once you have identified a party `py` code, it can be used to return all the plot observations associated with a person/organization: ```{r} vb_get_plot_observations(vb_code = "py.1062") ``` ## Taxon Observations `vb_get_taxon_observations()` retrieves the individual plant taxon records associated with a given plot observation. Each row represents one taxon recorded in the plot ### Taxon Observations by Plot This example retrieves the taxon (plant) observations associated with the plot `ob.135454` ```{r} vb_get_taxon_observations("ob.135454") ``` ## Plant Species Concepts `vb_get_plant_concepts()` can be used to retrieve the plant species concepts associated with plots and projects. ```{r} vb_get_plant_concepts("ob.135454") ``` ## Plant Community Concepts The example below searches for community concepts that include the genus `Sequoiadendron`. ```{r} sequoia_communities <- vb_get_community_concepts(search = "sequoiadendron") # view the plots head(sequoia_communities) ``` Then we can further determine which concept has the most plot observations, then retrieve all of those plot observations from VegBank by directly passing the community concepts codes into `vb_get_plot_observations()` ```{r} sequoia_plots <- sequoia_communities |> dplyr::arrange(-obs_count) |> dplyr::slice(1) |> dplyr::pull(cc_code) |> vb_get_plot_observations() head(sequoia_plots) ``` ## Other Function Options ### Changing Limit & Offset To download more than 100 records, increase the `limit` argument. To page through a large results in set chunks, combine `limit` and `offset` ```{r} # Download up to 500 records vb_get_plot_observations(vb_code = "pj.10510", limit = 500) # Download the second page of 100 records vb_get_plot_observations(vb_code = "pj.10510", limit = 100, offset = 100) ``` ### Sorting Results Use the `sort` argument to order results by the following fields: | Endpoint | Sortable Fields | |---|---| | `plot-observations` | `default`, `author_obs_code` | | `plant-concepts` | `default`, `plant_name`, `obs_count` | | `community-concepts` | `default`, `comm_name`, `obs_count` | | `projects` | `default`, `project_name`, `obs_count` | | `parties` | `default`, `surname`, `organization_name`, `obs_count` | Prefix the field name with `-` for descending order. The example below retrieves plot observations for projects containing by descending `author_obs_name` ```{r} vb_get_plot_observations(vb_code = "pj.10510", sort = '-author_obs_code') ``` ## Manipulating Downloaded Data with `dplyr` Because downloaded VegBank data is saved as dataframes, the data can be manipulated using base R or `dplyr` functions. Below we highlight a few possible data manipulations. ```{r, message = FALSE} library(dplyr) ``` ### Select a subset of Columns This example retrieves the plot data for `ob.4577`, and then uses the `dplyr::select()` function to select only a subset of columns. ```{r} # Downloading plant concept data plants <- vb_get_plant_concepts("ob.135454") # Selecting only the plant_name. plant_code, and status columns plants_small <- plants |> dplyr::select(plant_name, plant_code, status) ``` ### Filter Rows by a Condition This example first retrieves the first 100 records for project `pj.11044` (Pennsylvania HPD Delaware Water Gap), sorted by the author's observation code. Then it will filter out observations where the elevation is greater than 250 meters ```{r} obs <- vb_get_plot_observations( vb_code = "pj.11044", sort = "author_obs_code", limit = 100 ) # filter where elevation column is greater than 250 meters obs |> dplyr::filter(elevation > 250) ``` ### Summarize Numeric Variables This example downloads the full plot observation records for project `pj.10510` and computes the mean, minimum and maximum slope gradient across all plots. ```{r} plot_data <- vb_get_plot_observations(vb_code = "pj.10510", detail = "full") avg_slope <- plot_data |> dplyr::summarise( slope_mean = mean(slope_gradient, na.rm = TRUE), slope_min = min(slope_gradient, na.rm = TRUE), slope_max = max(slope_gradient, na.rm = TRUE) ) head(avg_slope) ```