HCoop Git - clinton/thingy_grabber.git/blame_incremental

... / ...

Commit	Line	Data
	1	# thingy_grabber
	2	Script for archiving thingiverse things. Due to this being a glorified webscraper, it's going to be very fragile.
	3
	4	## Usage:
	5	````
	6	usage: thingy_grabber.py [-h] [-l {debug,info,warning}] [-d DIRECTORY] [-f LOG_FILE] {collection,thing,user,batch,version} ...
	7
	8	positional arguments:
	9	{collection,thing,user,batch,version}
	10	Type of thing to download
	11	collection Download one or more entire collection(s)
	12	thing Download a single thing.
	13	user Download all things by one or more users
	14	batch Perform multiple actions written in a text file
	15	version Show the current version
	16
	17	optional arguments:
	18	-h, --help show this help message and exit
	19	-l {debug,info,warning}, --log-level {debug,info,warning}
	20	level of logging desired
	21	-d DIRECTORY, --directory DIRECTORY
	22	Target directory to download into
	23	-f LOG_FILE, --log-file LOG_FILE
	24	Place to log debug information to
	25	````
	26
	27	### Things
	28	`thingy_grabber.py thing thingid1 thingid2 ...`
	29	This will create a directory named after the title of the thing(s) with the given ID(s) and download the files into it.
	30
	31	### Collections
	32	`thingy_grabber.py collection user_name collection_name1 collection_name2`
	33	Where `user_name` is the name of the creator of the collection (not nes. your name!) and `collection_name1...etc` are the name(s) of the collection(s) you want.
	34
	35	This will create a series of directorys `user-collection/thing-name` for each thing in the collection.
	36
	37	If for some reason a download fails, it will get moved sideways to `thing-name-failed` - this way if you rerun it, it will only reattmpt any failed things.
	38
	39	### User designs
	40	`thingy_grabber.py user user_name1, user_name2..`
	41	Where `user_name1.. ` are the names of creator.
	42
	43	This will create a series of directories `user designs/thing-name` for each thing that user has designed.
	44
	45	If for some reason a download fails, it will get moved sideways to `thing-name-failed` - this way if you rerun it, it will only reattmpt any failed things.
	46
	47	### Batch mode
	48	`thingy_grabber.py batch batch_file`
	49	This will load a given text file and parse it as a series of calls to this script. The script should be of the form `command arg1 ...`.
	50	Be warned that there is currently NO validation that you have given a correct set of commands!
	51
	52	An example:
	53	````
	54	thing 3670144
	55	collection cwoac bike
	56	user cwoac
	57	````
	58
	59	If you are using linux, you can just add an appropriate call to the crontab. If you are using windows, it's a bit more of a faff, but at least according to [https://www.technipages.com/scheduled-task-windows](this link), you should be able to with a command something like this (this is not tested!): `schtasks /create /tn thingy_grabber /tr "c:\path\to\thingy_grabber.py -d c:\path\to\output\directory batch c:\path\to\batchfile.txt" /sc weekly /d wed /st 13:00:00`
	60	You may have to play with the quotation marks to make that work though.
	61
	62	## Examples
	63	`thingy_grabber.py collection cwoac bike`
	64	Download the collection 'bike' by the user 'cwoac'
	65	`thingy_grabber.py -d downloads -l warning thing 1234 4321 1232`
	66	Download the three things 1234, 4321 and 1232 into the directory downloads. Only give warnings.
	67	`thingy_grabber.py -d c:\downloads -l debug user jim bob`
	68	Download all designs by jim and bob into directories under `c:\downloads`, give lots of debug messages
	69	`
	70
	71	## Requirements
	72	python3, beautifulsoup4, requests, lxml
	73
	74	## Current features:
	75	- can download an entire collection, creating seperate subdirs for each thing in the collection
	76	- If you run it again with the same settings, it will check for updated files and only update what has changed. This should make it suitible for syncing a collection on a cronjob
	77	- If there is an updated file, the old directory will be moved to `name_timestamp` where `timestamp` is the last upload time of the old files. The code will then copy unchanged files across and download any new ones.
	78
	79	## Changelog
	80	* v0.6.3
	81	- Caught edge case involving old dir clashes
	82	- Add support for seperate log file
	83	* v0.6.2
	84	- Added catches for 404s, 504s and malformed pages
	85	* v0.6.1
	86	- now downloads readme.txt and licence details
	87	* v0.6.0
	88	- added support for downloading multiple things/design sets/collections from the command line
	89	* v0.5.0
	90	- better logging options
	91	- batch mode
	92	* v0.4.0
	93	- Added a changelog
	94	- Now download associated images
	95	- support `-d` to specify base download directory
	96
	97	## Todo features (maybe):
	98	- attempt to use -failed dirs for resuming
	99	- gui?
	100