This function scrapes a web page for all links (<a> tags) and extracts both
the URLs and the link text.
Usage
scrape_link(url, sort_by = c("link", "link_text"))Value
A tibble with two columns: link_text containing the text of each
link, and link containing the absolute URL of each link. The tibble is
sorted by link and then by link text, and only unique links are included.
Examples
head(scrape_link(url = "https://github.com/tidyverse/dplyr"))
#> # A tibble: 6 × 2
#> link_text link
#> <chr> <chr>
#> 1 Archive Program https://archiveprogram.github.com
#> 2 Acero https://arrow.apache.org/docs/cpp/streaming_execution.html
#> 3 arrow https://arrow.apache.org/docs/r/
#> 4 dbplyr https://dbplyr.tidyverse.org/
#> 5 Documentation https://docs.github.com
#> 6 Docs https://docs.github.com/
head(
scrape_link(
url = "https://github.com/tidyverse/dplyr", sort_by = "link_text"))
#> # A tibble: 6 × 2
#> link_text link
#> <chr> <chr>
#> 1 + 266 contributors https://github.com/tidyverse/dplyr/graphs/contributors
#> 2 + 42 releases https://github.com/tidyverse/dplyr/releases
#> 3 .Rbuildignore https://github.com/tidyverse/dplyr/blob/main/.Rbuildignore
#> 4 .github https://github.com/tidyverse/dplyr/tree/main/.github
#> 5 .gitignore https://github.com/tidyverse/dplyr/blob/main/.gitignore
#> 6 .vscode https://github.com/tidyverse/dplyr/tree/main/.vscode
# This will give an "Invalid url" error
try(scrape_link(url = "https://github50.com"))
#> Error in scrape_link(url = "https://github50.com") :
#> Invalid url
#>
#> ----- Metadata -----
#>
#> url [url]: <character>
#> https://github50.com