Skip empty lines in batch text files
[clinton/thingy_grabber.git] / README.md
1 # thingy_grabber
2 Script for archiving thingiverse things. Due to this being a glorified webscraper, it's going to be very fragile.
3
4 ## Usage:
5 ````
6 usage: thingy_grabber.py [-h] [-l {debug,info,warning}] [-d DIRECTORY] [-f LOG_FILE] {collection,thing,user,batch,version} ...
7
8 positional arguments:
9 {collection,thing,user,batch,version}
10 Type of thing to download
11 collection Download one or more entire collection(s)
12 thing Download a single thing.
13 user Download all things by one or more users
14 batch Perform multiple actions written in a text file
15 version Show the current version
16
17 optional arguments:
18 -h, --help show this help message and exit
19 -l {debug,info,warning}, --log-level {debug,info,warning}
20 level of logging desired
21 -d DIRECTORY, --directory DIRECTORY
22 Target directory to download into
23 -f LOG_FILE, --log-file LOG_FILE
24 Place to log debug information to
25 ````
26
27 ### Things
28 `thingy_grabber.py thing thingid1 thingid2 ...`
29 This will create a directory named after the title of the thing(s) with the given ID(s) and download the files into it.
30
31 ### Collections
32 `thingy_grabber.py collection user_name collection_name1 collection_name2`
33 Where `user_name` is the name of the creator of the collection (not nes. your name!) and `collection_name1...etc` are the name(s) of the collection(s) you want.
34
35 This will create a series of directorys `user-collection/thing-name` for each thing in the collection.
36
37 If for some reason a download fails, it will get moved sideways to `thing-name-failed` - this way if you rerun it, it will only reattmpt any failed things.
38
39 ### User designs
40 `thingy_grabber.py user user_name1, user_name2..`
41 Where `user_name1.. ` are the names of creator.
42
43 This will create a series of directories `user designs/thing-name` for each thing that user has designed.
44
45 If for some reason a download fails, it will get moved sideways to `thing-name-failed` - this way if you rerun it, it will only reattmpt any failed things.
46
47 ### Batch mode
48 `thingy_grabber.py batch batch_file`
49 This will load a given text file and parse it as a series of calls to this script. The script should be of the form `command arg1 ...`.
50 Be warned that there is currently NO validation that you have given a correct set of commands!
51
52 An example:
53 ````
54 thing 3670144
55 collection cwoac bike
56 user cwoac
57 ````
58
59 If you are using linux, you can just add an appropriate call to the crontab. If you are using windows, it's a bit more of a faff, but at least according to [https://www.technipages.com/scheduled-task-windows](this link), you should be able to with a command something like this (this is not tested!): `schtasks /create /tn thingy_grabber /tr "c:\path\to\thingy_grabber.py -d c:\path\to\output\directory batch c:\path\to\batchfile.txt" /sc weekly /d wed /st 13:00:00`
60 You may have to play with the quotation marks to make that work though.
61
62 ## Examples
63 `thingy_grabber.py collection cwoac bike`
64 Download the collection 'bike' by the user 'cwoac'
65 `thingy_grabber.py -d downloads -l warning thing 1234 4321 1232`
66 Download the three things 1234, 4321 and 1232 into the directory downloads. Only give warnings.
67 `thingy_grabber.py -d c:\downloads -l debug user jim bob`
68 Download all designs by jim and bob into directories under `c:\downloads`, give lots of debug messages
69 `
70
71 ## Requirements
72 python3, beautifulsoup4, requests, lxml
73
74 ## Current features:
75 - can download an entire collection, creating seperate subdirs for each thing in the collection
76 - If you run it again with the same settings, it will check for updated files and only update what has changed. This should make it suitible for syncing a collection on a cronjob
77 - If there is an updated file, the old directory will be moved to `name_timestamp` where `timestamp` is the last upload time of the old files. The code will then copy unchanged files across and download any new ones.
78
79 ## Changelog
80 * v0.6.3
81 - Caught edge case involving old dir clashes
82 - Add support for seperate log file
83 * v0.6.2
84 - Added catches for 404s, 504s and malformed pages
85 * v0.6.1
86 - now downloads readme.txt and licence details
87 * v0.6.0
88 - added support for downloading multiple things/design sets/collections from the command line
89 * v0.5.0
90 - better logging options
91 - batch mode
92 * v0.4.0
93 - Added a changelog
94 - Now download associated images
95 - support `-d` to specify base download directory
96
97 ## Todo features (maybe):
98 - attempt to use -failed dirs for resuming
99 - gui?
100