Zerochan.net is one of famous anime/game/CG imageboards with strong community and modest crossposting with other imageboards.
It has specific tagging system - close to e-shuushuu-net - but not to mainstream danbooru / safebooru / yande-re / konachan / sankaku.
That's why Zerochan is a good distinct source for investigation of non-photographic images and their metadata.
This release devoted to paleonthologic part of board
from very start till 31.12.2014 (ID=1820227) right before
2015-2016 releaseThe enormous volume of initial images (~650 GB) brought within reasonable limit with
selective sampling described below.
Release contains:•
313.044 JPG images in 183 zipped folders partitioned by 10.000-th ID
• filtered by initial size
~ least(image
height,imagewidth)>=1080 -- fullHD wallpapers as minimum
~ image
height*imagewidth>=1200000 -- 1100x1100 included
~ image
width/imageheight between 0.32 and 3.2 -- not too disproportional
• renamed
"zerochan - id - upto3sources ~ upto5characters (upto2artists).ext" ~ tags concatenated via "+", spaces replaced with underscores
~ maximum file name length 220 symbols, characters tags may be truncated if too long
• some gentle deduplication made
• metadata for every image
ZERO[/i]POSTS2014.TSV in root folder
313.036 rows • from imageboard (original file URL, upload date)
• image info (size, volume, md5 etc) both for original and sample
• tag info for Copyright / Characters / Artists and more
ZERO[/i]TAGS2014.TSV with 1.815.947 rows
• as parsed from site with Unicode suppressed / replaced
• many of them used in file naming but there are a lot of more
About sampling:Huge total size of initial images leads to unpractical torrent release - too big and not too worthy to be supportible.
I desided to selectively shrink big / bloated images to practical size with good quality, that was chosen as
1920 px longer side (Full HD both landscape and portrait) and JPEG quality 92%I used
image magick mogrify -thumbnail 1920x1920^> -quality 92 -format jpg and then
compare initial and sampled image to left initial image when negative or minor effect of resampling.
There are
90.375 original images left in release, as pointed out in IMG[/i]TYPE column of ZERO_POSTS.
THERE ARE some - also sampled - rips on Sukebei tracker for Konachan and Yande-re. With nipples.
Information:No information.