README.md

   1 # thingy_grabber
   2 Script for archiving thingiverse things.
   3
   4 ## Usage:
   5 ````
   6 usage: thingy_grabber.py [-h] [-l {debug,info,warning}] [-d DIRECTORY] [-f LOG_FILE] [-q] [-c] [-a API_KEY]
   7                          {collection,thing,user,batch,version} ...
   8
   9 positional arguments:
  10   {collection,thing,user,batch,version}
  11                         Type of thing to download
  12     collection          Download one or more entire collection(s)
  13     thing               Download a single thing.
  14     user                Download all things by one or more users
  15     batch               Perform multiple actions written in a text file
  16     version             Show the current version
  17
  18 optional arguments:
  19   -h, --help            show this help message and exit
  20   -l {debug,info,warning}, --log-level {debug,info,warning}
  21                         level of logging desired
  22   -d DIRECTORY, --directory DIRECTORY
  23                         Target directory to download into
  24   -f LOG_FILE, --log-file LOG_FILE
  25                         Place to log debug information to
  26   -q, --quick           Assume date ordering on posts
  27   -c, --compress        Compress files
  28   -a API_KEY, --api-key API_KEY
  29                         API key for thingiverse
  30 ````
  31
  32 ## API KEYs
  33 Thingy_grabber v0.10.0 accesses thingiverse in a _substantially_ different way to before. The plus side is it should be more reliable, possibly faster and no longer needs selenium or a firefox instance (and so drastically reduces memory overhead). The downside is you are _going_ to have to do something to continue using the app - basically get yourself an API KEY.
  34
  35 To do this, go to https://www.thingiverse.com/apps/create and create your own selecting Desktop app.
  36 Once you have your key, either specify it on the command line or put it in a text file called `api.key` whereever you are running the script from - the script will auto load it.
  37
  38 ### Why can't I use yours?
  39 Because API keys can (are?) rate limited.
  40
  41 ## Downloads
  42 The latest version can be downloaded from here: https://github.com/cwoac/thingy_grabber/releases/.  Under the 'assets' triangle there is precompiled binaries for windows (no python needed!).
  43
  44 ### Things
  45 `thingy_grabber.py thing thingid1 thingid2 ...`
  46 This will create a directory named after the title of the thing(s) with the given ID(s) and download the files into it.
  47
  48 ### Collections
  49 `thingy_grabber.py  collection user_name collection_name1 collection_name2`
  50 Where `user_name` is the name of the creator of the collection (not nes. your name!) and `collection_name1...etc` are the name(s) of the collection(s) you want.
  51
  52 This will create a series of directorys `user-collection/thing-name` for each thing in the collection.
  53
  54 If for some reason a download fails, it will get moved sideways to `thing-name-failed` - this way if you rerun it, it will only reattmpt any failed things.
  55
  56 ### User designs
  57 `thingy_grabber.py user user_name1, user_name2..`
  58 Where `user_name1.. ` are the names of creator.
  59
  60 This will create a series of directories `user designs/thing-name` for each thing that user has designed.
  61
  62 If for some reason a download fails, it will get moved sideways to `thing-name-failed` - this way if you rerun it, it will only reattmpt any failed things.
  63
  64 ### Batch mode
  65 `thingy_grabber.py batch batch_file`
  66 This will load a given text file and parse it as a series of calls to this script. The script should be of the form `command arg1 ...`.
  67 Be warned that there is currently NO validation that you have given a correct set of commands!
  68
  69 An example:
  70 ````
  71 thing 3670144
  72 collection cwoac bike
  73 user cwoac
  74 ````
  75
  76 If you are using linux, you can just add an appropriate call to the crontab. If you are using windows, it's a bit more of a faff, but at least according to [https://www.technipages.com/scheduled-task-windows](this link), you should be able to with a command something like this (this is not tested!): `schtasks /create /tn thingy_grabber /tr "c:\path\to\thingy_grabber.py -d c:\path\to\output\directory batch c:\path\to\batchfile.txt" /sc weekly /d wed /st 13:00:00`
  77 You may have to play with the quotation marks to make that work though.
  78
  79 ### Quick mode
  80 All modes now support 'quick mode' (`-q`), although this has no effect for individual item downloads. As thingyverse sorts it's returned items in descending last modified order (I believe), once we have determined that we have the most recent version of a given thing in a collection, we can safely stop processing that collection as we should have _all_ the remaining items in it already. This _substantially_ speeds up the process of keeping big collections up to date and will noticably reduce the server load it generates.
  81
  82 *Warning:* As it stops as soon as it finds an uptodate successful model, if you have unfixed failed downloads further down the list (for want of a better term), they will _not_ be retried.
  83
  84 *Warning:* At the moment I have not conclusively proven to myself that the result is ordered by last updated and not upload time. Once I have verified this, I will probably be making this the default option.
  85
  86 ## Examples
  87 `thingy_grabber.py collection cwoac bike`
  88 Download the collection 'bike' by the user 'cwoac'
  89 `thingy_grabber.py -d downloads -l warning thing 1234 4321 1232`
  90 Download the three things 1234, 4321 and 1232 into the directory downloads. Only give warnings.
  91 `thingy_grabber.py -d c:\downloads -l debug user jim bob`
  92 Download all designs by jim and bob into directories under `c:\downloads`, give lots of debug messages
  93 `
  94
  95 ## Requirements
  96 python3, requests, py7xr (>=0.8.2)
  97
  98 ## Current features:
  99 - can download an entire collection, creating seperate subdirs for each thing in the collection
 100 - If you run it again with the same settings, it will check for updated files and only update what has changed. This should make it suitible for syncing a collection on a cronjob
 101 - If there is an updated file, the old directory will be moved to `name_timestamp` where `timestamp` is the last upload time of the old files. The code will then copy unchanged files across and download any new ones.
 102
 103 ## Changelog
 104 * v0.10.2
 105   - Fixed regression in rest API
 106 * v0.10.1
 107   - A couple of minor bug fixes on exception handling.
 108 * v0.10.0
 109   - API access! new -a option to provide an API key for more stable access.
 110 * v0.9.0
 111   - Compression! New -c option will use 7z to create an archival copy of the file once downloaded.
 112     Note that although it will use the presence of 7z files to determine if a file has been updated, it currently _won't_ read old files from inside the 7z for handling updates, resulting in marginally larger bandwidth usage when dealing with partially updated things. This will be fixed later.
 113   - Internal tidying of how old directories are handled - I've tested this fairly heavily, but do let me know if there are issues.
 114 * v0.8.7
 115   - Always, Always generate a valid time stamp.
 116 * v0.8.6
 117   - Handle thingiverse returning no files for a thing gracefully.
 118 * v0.8.5
 119   - Strip '.'s from the end of filenames
 120   - If you fail a download for an already failed download it no longer throws an exception
 121   - Truncates paths that are too long for windows
 122 * v0.8.4
 123   - Just use unicode filenames - puts the unicode characters back in!
 124   - Force selenium to shutdown firefox on assert and normal exit
 125 * v0.8.3
 126   - Strip unicode characters from license text
 127 * v0.8.2
 128   - Strip unicode characters from filenames
 129 * v0.8.1
 130   - Fix bug on when all files were created / updated in October after the 9th.
 131 * v0.8.0
 132   - Updated to support new thingiverse front end
 133 * v0.7.0
 134   - Add new quick mode that stops once it has 'caught up' for a group
 135 * v0.6.3
 136   - Caught edge case involving old dir clashes
 137   - Add support for seperate log file
 138 * v0.6.2
 139   - Added catches for 404s, 504s and malformed pages
 140 * v0.6.1
 141   - now downloads readme.txt and licence details
 142 * v0.6.0
 143   - added support for downloading multiple things/design sets/collections from the command line
 144 * v0.5.0
 145   - better logging options
 146   - batch mode
 147 * v0.4.0
 148   - Added a changelog
 149   - Now download associated images
 150   - support `-d` to specify base download directory
 151
 152 ## Todo features (maybe):
 153 - attempt to use -failed dirs for resuming
 154 - gui?
 155