This function scrapes a web page for all links (<a>
tags) and extracts both
the URLs and the link text.
Usage
scrape_link(url, sort_by = c("link", "link_text"))
Value
A tibble with two columns: link_text
containing the text of each
link, and link
containing the absolute URL of each link. The tibble is
sorted by link and then by link text, and only unique links are included.
Examples
head(
scrape_link(url = "https://github.com/tidyverse/dplyr"))
#> # A tibble: 6 × 2
#> link_text link
#> <chr> <chr>
#> 1 Acero https://arrow.apache.org/docs/cpp/streaming_execution.html
#> 2 arrow https://arrow.apache.org/docs/r/
#> 3 dbplyr https://dbplyr.tidyverse.org/
#> 4 Documentation https://docs.github.com
#> 5 Docs https://docs.github.com/
#> 6 Search syntax tips https://docs.github.com/search-github/github-code-search/u…
head(
scrape_link(
url = "https://github.com/tidyverse/dplyr", sort_by = "link_text"))
#> # A tibble: 6 × 2
#> link_text link
#> <chr> <chr>
#> 1 + 257 contributors https://github.com/tidyverse/dplyr/graphs/contributors
#> 2 + 42 releases https://github.com/tidyverse/dplyr/releases
#> 3 .Rbuildignore https://github.com/tidyverse/dplyr/blob/main/.Rbuildignore
#> 4 .github https://github.com/tidyverse/dplyr/tree/main/.github
#> 5 .gitignore https://github.com/tidyverse/dplyr/blob/main/.gitignore
#> 6 .vscode https://github.com/tidyverse/dplyr/tree/main/.vscode
# This will give an "Invalid url" error
if (FALSE) { # \dontrun{
scrape_link(url = "https://github50.com")
} # }