ensure timestamps are always valid
[clinton/thingy_grabber.git] / README.md
CommitLineData
4f6e317c 1# thingy_grabber
975060c9
OM
2Script for archiving thingiverse things. Due to this being a glorified webscraper, it's going to be very fragile.
3
4## Usage:
4a98996b 5````
e052f0f3 6usage: thingy_grabber.py [-h] [-l {debug,info,warning}] [-d DIRECTORY] [-f LOG_FILE] [-q] {collection,thing,user,batch,version} ...
a7152c35 7
4a98996b 8positional arguments:
1ab49020 9 {collection,thing,user,batch,version}
3522a3bf 10 Type of thing to download
b7bfef68 11 collection Download one or more entire collection(s)
3522a3bf 12 thing Download a single thing.
b7bfef68 13 user Download all things by one or more users
1ab49020 14 batch Perform multiple actions written in a text file
db8066ec 15 version Show the current version
4a98996b
OM
16
17optional arguments:
3522a3bf 18 -h, --help show this help message and exit
fa2f3251
OM
19 -l {debug,info,warning}, --log-level {debug,info,warning}
20 level of logging desired
d66f1f78
OM
21 -d DIRECTORY, --directory DIRECTORY
22 Target directory to download into
4f94efc8
OM
23 -f LOG_FILE, --log-file LOG_FILE
24 Place to log debug information to
e052f0f3 25 -q, --quick Assume date ordering on posts
4a98996b 26````
3522a3bf 27
4a98996b 28### Things
b7bfef68
OM
29`thingy_grabber.py thing thingid1 thingid2 ...`
30This will create a directory named after the title of the thing(s) with the given ID(s) and download the files into it.
4a98996b 31
4a98996b 32### Collections
e052f0f3 33`thingy_grabber.py collection user_name collection_name1 collection_name2`
b7bfef68 34Where `user_name` is the name of the creator of the collection (not nes. your name!) and `collection_name1...etc` are the name(s) of the collection(s) you want.
975060c9 35
a7152c35 36This will create a series of directorys `user-collection/thing-name` for each thing in the collection.
a7152c35
OM
37
38If for some reason a download fails, it will get moved sideways to `thing-name-failed` - this way if you rerun it, it will only reattmpt any failed things.
39
3522a3bf 40### User designs
b7bfef68
OM
41`thingy_grabber.py user user_name1, user_name2..`
42Where `user_name1.. ` are the names of creator.
3522a3bf
OM
43
44This will create a series of directories `user designs/thing-name` for each thing that user has designed.
45
46If for some reason a download fails, it will get moved sideways to `thing-name-failed` - this way if you rerun it, it will only reattmpt any failed things.
47
1ab49020
OM
48### Batch mode
49`thingy_grabber.py batch batch_file`
50This will load a given text file and parse it as a series of calls to this script. The script should be of the form `command arg1 ...`.
51Be warned that there is currently NO validation that you have given a correct set of commands!
52
53An example:
54````
55thing 3670144
56collection cwoac bike
57user cwoac
58````
59
60If you are using linux, you can just add an appropriate call to the crontab. If you are using windows, it's a bit more of a faff, but at least according to [https://www.technipages.com/scheduled-task-windows](this link), you should be able to with a command something like this (this is not tested!): `schtasks /create /tn thingy_grabber /tr "c:\path\to\thingy_grabber.py -d c:\path\to\output\directory batch c:\path\to\batchfile.txt" /sc weekly /d wed /st 13:00:00`
61You may have to play with the quotation marks to make that work though.
62
e052f0f3
OM
63### Quick mode
64All modes now support 'quick mode' (`-q`), although this has no effect for individual item downloads. As thingyverse sorts it's returned items in descending last modified order (I believe), once we have determined that we have the most recent version of a given thing in a collection, we can safely stop processing that collection as we should have _all_ the remaining items in it already. This _substantially_ speeds up the process of keeping big collections up to date and will noticably reduce the server load it generates.
65
66*Warning:* As it stops as soon as it finds an uptodate successful model, if you have unfixed failed downloads further down the list (for want of a better term), they will _not_ be retried.
67
68*Warning:* At the moment I have not conclusively proven to myself that the result is ordered by last updated and not upload time. Once I have verified this, I will probably be making this the default option.
69
b7bfef68
OM
70## Examples
71`thingy_grabber.py collection cwoac bike`
72Download the collection 'bike' by the user 'cwoac'
73`thingy_grabber.py -d downloads -l warning thing 1234 4321 1232`
74Download the three things 1234, 4321 and 1232 into the directory downloads. Only give warnings.
75`thingy_grabber.py -d c:\downloads -l debug user jim bob`
76Download all designs by jim and bob into directories under `c:\downloads`, give lots of debug messages
77`
78
975060c9 79## Requirements
c4388960 80python3, beautifulsoup4, requests, lxml
975060c9
OM
81
82## Current features:
83- can download an entire collection, creating seperate subdirs for each thing in the collection
e36c2a07 84- If you run it again with the same settings, it will check for updated files and only update what has changed. This should make it suitible for syncing a collection on a cronjob
3c82f75b 85- If there is an updated file, the old directory will be moved to `name_timestamp` where `timestamp` is the last upload time of the old files. The code will then copy unchanged files across and download any new ones.
975060c9 86
680039fe 87## Changelog
9828dabe
OM
88* v0.8.7
89 - Always, Always generate a valid time stamp.
247c2cd5
OM
90* v0.8.6
91 - Handle thingiverse returning no files for a thing gracefully.
65bd8b43
OM
92* v0.8.5
93 - Strip '.'s from the end of filenames
94 - If you fail a download for an already failed download it no longer throws an exception
95 - Truncates paths that are too long for windows
d194b140
OM
96* v0.8.4
97 - Just use unicode filenames - puts the unicode characters back in!
98 - Force selenium to shutdown firefox on assert and normal exit
4b5e35a5
OM
99* v0.8.3
100 - Strip unicode characters from license text
cef8aa7a
OM
101* v0.8.2
102 - Strip unicode characters from filenames
1267e583
OM
103* v0.8.1
104 - Fix bug on when all files were created / updated in October after the 9th.
fb28c59b
OM
105* v0.8.0
106 - Updated to support new thingiverse front end
7b84ba6d
OM
107* v0.7.0
108 - Add new quick mode that stops once it has 'caught up' for a group
4f94efc8
OM
109* v0.6.3
110 - Caught edge case involving old dir clashes
111 - Add support for seperate log file
e0e69fc6
OM
112* v0.6.2
113 - Added catches for 404s, 504s and malformed pages
4f75dd69
OM
114* v0.6.1
115 - now downloads readme.txt and licence details
b7bfef68
OM
116* v0.6.0
117 - added support for downloading multiple things/design sets/collections from the command line
fa2f3251
OM
118* v0.5.0
119 - better logging options
1ab49020 120 - batch mode
680039fe
OM
121* v0.4.0
122 - Added a changelog
123 - Now download associated images
124 - support `-d` to specify base download directory
e36c2a07
OM
125
126## Todo features (maybe):
a7152c35 127- attempt to use -failed dirs for resuming
1ab49020 128- gui?
680039fe 129