The get_and_save
function works with a tibble of
locations (usually URLs) and file names, and then downloads the PDF from the
location to the file name, saving as it goes, and letting you know where it is
up to. It politely waits around 5 seconds between calls to the
location, and skips locations that give an error.
get_and_save( data, links = "links", save_names = "save_names", dir = "heaps_of", bucket = NULL, delay = 5, print_every = 1, dupe_strategy = "overwrite" )
data | A dataframe that contains URLs that you want to download and the names that you want to save them as. |
---|---|
links | The name of the column whose values should be the URLs that you
want to download, |
save_names | The name of the column whose values should be the saved
file names where the downloaded file will be saved, |
dir | The directory to download files to, current working directory by default. |
bucket | name of AWS S3 bucket to save files to. |
delay | The number of seconds to wait between downloads, default (and minimum) is five seconds. We automatically add a bit of noise to lessen the effect on systematic processes that might be otherwise working. |
print_every | The default is that you get a print message for every file, but you can change this. If you want to print an update for every second file then set this equal to 2, for a printed update every tenth file, set it to 10, etc. |
dupe_strategy | There are a variety of ways of dealing with the situation where you already have some of the files downloaded. By default the function will just get them again and overwrite. However you can also specify 'ignore' in which case those files will be ignored. You can also investigate duplicates yourself using heapsofpapers::check_for_existence(). |
A print statement in the console about whether each of the links
was
saved (if not turned off by the user), and notification that the function has
finished.
if (FALSE) two_pdfs <- tibble::tibble( locations_are = c("https://osf.io/preprints/socarxiv/z4qg9/download", "https://osf.io/preprints/socarxiv/a29h8/download"), save_here = c("competing_effects_on_the_average_age_of_infant_death.pdf", "cesr_an_r_package_for_the_canadian_election_study.pdf") ) heapsofpapers::get_and_save( data = two_pdfs, links = "locations_are", save_names = "save_here" )#> Error in is.data.frame(data): object 'two_pdfs' not found