Commit | Line | Data |
---|---|---|
4f6e317c | 1 | # thingy_grabber |
975060c9 OM |
2 | Script for archiving thingiverse things. Due to this being a glorified webscraper, it's going to be very fragile. |
3 | ||
4 | ## Usage: | |
4a98996b | 5 | ```` |
e052f0f3 | 6 | usage: thingy_grabber.py [-h] [-l {debug,info,warning}] [-d DIRECTORY] [-f LOG_FILE] [-q] {collection,thing,user,batch,version} ... |
a7152c35 | 7 | |
4a98996b | 8 | positional arguments: |
1ab49020 | 9 | {collection,thing,user,batch,version} |
3522a3bf | 10 | Type of thing to download |
b7bfef68 | 11 | collection Download one or more entire collection(s) |
3522a3bf | 12 | thing Download a single thing. |
b7bfef68 | 13 | user Download all things by one or more users |
1ab49020 | 14 | batch Perform multiple actions written in a text file |
db8066ec | 15 | version Show the current version |
4a98996b OM |
16 | |
17 | optional arguments: | |
3522a3bf | 18 | -h, --help show this help message and exit |
fa2f3251 OM |
19 | -l {debug,info,warning}, --log-level {debug,info,warning} |
20 | level of logging desired | |
d66f1f78 OM |
21 | -d DIRECTORY, --directory DIRECTORY |
22 | Target directory to download into | |
4f94efc8 OM |
23 | -f LOG_FILE, --log-file LOG_FILE |
24 | Place to log debug information to | |
e052f0f3 | 25 | -q, --quick Assume date ordering on posts |
4a98996b | 26 | ```` |
3522a3bf | 27 | |
4a98996b | 28 | ### Things |
b7bfef68 OM |
29 | `thingy_grabber.py thing thingid1 thingid2 ...` |
30 | This will create a directory named after the title of the thing(s) with the given ID(s) and download the files into it. | |
4a98996b | 31 | |
4a98996b | 32 | ### Collections |
e052f0f3 | 33 | `thingy_grabber.py collection user_name collection_name1 collection_name2` |
b7bfef68 | 34 | Where `user_name` is the name of the creator of the collection (not nes. your name!) and `collection_name1...etc` are the name(s) of the collection(s) you want. |
975060c9 | 35 | |
a7152c35 | 36 | This will create a series of directorys `user-collection/thing-name` for each thing in the collection. |
a7152c35 OM |
37 | |
38 | If for some reason a download fails, it will get moved sideways to `thing-name-failed` - this way if you rerun it, it will only reattmpt any failed things. | |
39 | ||
3522a3bf | 40 | ### User designs |
b7bfef68 OM |
41 | `thingy_grabber.py user user_name1, user_name2..` |
42 | Where `user_name1.. ` are the names of creator. | |
3522a3bf OM |
43 | |
44 | This will create a series of directories `user designs/thing-name` for each thing that user has designed. | |
45 | ||
46 | If for some reason a download fails, it will get moved sideways to `thing-name-failed` - this way if you rerun it, it will only reattmpt any failed things. | |
47 | ||
1ab49020 OM |
48 | ### Batch mode |
49 | `thingy_grabber.py batch batch_file` | |
50 | This will load a given text file and parse it as a series of calls to this script. The script should be of the form `command arg1 ...`. | |
51 | Be warned that there is currently NO validation that you have given a correct set of commands! | |
52 | ||
53 | An example: | |
54 | ```` | |
55 | thing 3670144 | |
56 | collection cwoac bike | |
57 | user cwoac | |
58 | ```` | |
59 | ||
60 | If you are using linux, you can just add an appropriate call to the crontab. If you are using windows, it's a bit more of a faff, but at least according to [https://www.technipages.com/scheduled-task-windows](this link), you should be able to with a command something like this (this is not tested!): `schtasks /create /tn thingy_grabber /tr "c:\path\to\thingy_grabber.py -d c:\path\to\output\directory batch c:\path\to\batchfile.txt" /sc weekly /d wed /st 13:00:00` | |
61 | You may have to play with the quotation marks to make that work though. | |
62 | ||
e052f0f3 OM |
63 | ### Quick mode |
64 | All modes now support 'quick mode' (`-q`), although this has no effect for individual item downloads. As thingyverse sorts it's returned items in descending last modified order (I believe), once we have determined that we have the most recent version of a given thing in a collection, we can safely stop processing that collection as we should have _all_ the remaining items in it already. This _substantially_ speeds up the process of keeping big collections up to date and will noticably reduce the server load it generates. | |
65 | ||
66 | *Warning:* As it stops as soon as it finds an uptodate successful model, if you have unfixed failed downloads further down the list (for want of a better term), they will _not_ be retried. | |
67 | ||
68 | *Warning:* At the moment I have not conclusively proven to myself that the result is ordered by last updated and not upload time. Once I have verified this, I will probably be making this the default option. | |
69 | ||
b7bfef68 OM |
70 | ## Examples |
71 | `thingy_grabber.py collection cwoac bike` | |
72 | Download the collection 'bike' by the user 'cwoac' | |
73 | `thingy_grabber.py -d downloads -l warning thing 1234 4321 1232` | |
74 | Download the three things 1234, 4321 and 1232 into the directory downloads. Only give warnings. | |
75 | `thingy_grabber.py -d c:\downloads -l debug user jim bob` | |
76 | Download all designs by jim and bob into directories under `c:\downloads`, give lots of debug messages | |
77 | ` | |
78 | ||
975060c9 | 79 | ## Requirements |
c4388960 | 80 | python3, beautifulsoup4, requests, lxml |
975060c9 OM |
81 | |
82 | ## Current features: | |
83 | - can download an entire collection, creating seperate subdirs for each thing in the collection | |
e36c2a07 | 84 | - If you run it again with the same settings, it will check for updated files and only update what has changed. This should make it suitible for syncing a collection on a cronjob |
3c82f75b | 85 | - If there is an updated file, the old directory will be moved to `name_timestamp` where `timestamp` is the last upload time of the old files. The code will then copy unchanged files across and download any new ones. |
975060c9 | 86 | |
680039fe | 87 | ## Changelog |
9828dabe OM |
88 | * v0.8.7 |
89 | - Always, Always generate a valid time stamp. | |
247c2cd5 OM |
90 | * v0.8.6 |
91 | - Handle thingiverse returning no files for a thing gracefully. | |
65bd8b43 OM |
92 | * v0.8.5 |
93 | - Strip '.'s from the end of filenames | |
94 | - If you fail a download for an already failed download it no longer throws an exception | |
95 | - Truncates paths that are too long for windows | |
d194b140 OM |
96 | * v0.8.4 |
97 | - Just use unicode filenames - puts the unicode characters back in! | |
98 | - Force selenium to shutdown firefox on assert and normal exit | |
4b5e35a5 OM |
99 | * v0.8.3 |
100 | - Strip unicode characters from license text | |
cef8aa7a OM |
101 | * v0.8.2 |
102 | - Strip unicode characters from filenames | |
1267e583 OM |
103 | * v0.8.1 |
104 | - Fix bug on when all files were created / updated in October after the 9th. | |
fb28c59b OM |
105 | * v0.8.0 |
106 | - Updated to support new thingiverse front end | |
7b84ba6d OM |
107 | * v0.7.0 |
108 | - Add new quick mode that stops once it has 'caught up' for a group | |
4f94efc8 OM |
109 | * v0.6.3 |
110 | - Caught edge case involving old dir clashes | |
111 | - Add support for seperate log file | |
e0e69fc6 OM |
112 | * v0.6.2 |
113 | - Added catches for 404s, 504s and malformed pages | |
4f75dd69 OM |
114 | * v0.6.1 |
115 | - now downloads readme.txt and licence details | |
b7bfef68 OM |
116 | * v0.6.0 |
117 | - added support for downloading multiple things/design sets/collections from the command line | |
fa2f3251 OM |
118 | * v0.5.0 |
119 | - better logging options | |
1ab49020 | 120 | - batch mode |
680039fe OM |
121 | * v0.4.0 |
122 | - Added a changelog | |
123 | - Now download associated images | |
124 | - support `-d` to specify base download directory | |
e36c2a07 OM |
125 | |
126 | ## Todo features (maybe): | |
a7152c35 | 127 | - attempt to use -failed dirs for resuming |
1ab49020 | 128 | - gui? |
680039fe | 129 |