PetaVision  Alpha
stb_image.h
1 /* stb_image - v2.12 - public domain image loader - http://nothings.org/stb_image.h
2  no warranty implied; use at your own risk
3 
4  Do this:
5  #define STB_IMAGE_IMPLEMENTATION
6  before you include this file in *one* C or C++ file to create the implementation.
7 
8  // i.e. it should look like this:
9  #include ...
10  #include ...
11  #include ...
12  #define STB_IMAGE_IMPLEMENTATION
13  #include "stb_image.h"
14 
15  You can #define STBI_ASSERT(x) before the #include to avoid using assert.h.
16  And #define STBI_MALLOC, STBI_REALLOC, and STBI_FREE to avoid using malloc,realloc,free
17 
18 
19  QUICK NOTES:
20  Primarily of interest to game developers and other people who can
21  avoid problematic images and only need the trivial interface
22 
23  JPEG baseline & progressive (12 bpc/arithmetic not supported, same as stock IJG lib)
24  PNG 1/2/4/8-bit-per-channel (16 bpc not supported)
25 
26  TGA (not sure what subset, if a subset)
27  BMP non-1bpp, non-RLE
28  PSD (composited view only, no extra channels, 8/16 bit-per-channel)
29 
30  GIF (*comp always reports as 4-channel)
31  HDR (radiance rgbE format)
32  PIC (Softimage PIC)
33  PNM (PPM and PGM binary only)
34 
35  Animated GIF still needs a proper API, but here's one way to do it:
36  http://gist.github.com/urraka/685d9a6340b26b830d49
37 
38  - decode from memory or through FILE (define STBI_NO_STDIO to remove code)
39  - decode from arbitrary I/O callbacks
40  - SIMD acceleration on x86/x64 (SSE2) and ARM (NEON)
41 
42  Full documentation under "DOCUMENTATION" below.
43 
44 
45  Revision 2.00 release notes:
46 
47  - Progressive JPEG is now supported.
48 
49  - PPM and PGM binary formats are now supported, thanks to Ken Miller.
50 
51  - x86 platforms now make use of SSE2 SIMD instructions for
52  JPEG decoding, and ARM platforms can use NEON SIMD if requested.
53  This work was done by Fabian "ryg" Giesen. SSE2 is used by
54  default, but NEON must be enabled explicitly; see docs.
55 
56  With other JPEG optimizations included in this version, we see
57  2x speedup on a JPEG on an x86 machine, and a 1.5x speedup
58  on a JPEG on an ARM machine, relative to previous versions of this
59  library. The same results will not obtain for all JPGs and for all
60  x86/ARM machines. (Note that progressive JPEGs are significantly
61  slower to decode than regular JPEGs.) This doesn't mean that this
62  is the fastest JPEG decoder in the land; rather, it brings it
63  closer to parity with standard libraries. If you want the fastest
64  decode, look elsewhere. (See "Philosophy" section of docs below.)
65 
66  See final bullet items below for more info on SIMD.
67 
68  - Added STBI_MALLOC, STBI_REALLOC, and STBI_FREE macros for replacing
69  the memory allocator. Unlike other STBI libraries, these macros don't
70  support a context parameter, so if you need to pass a context in to
71  the allocator, you'll have to store it in a global or a thread-local
72  variable.
73 
74  - Split existing STBI_NO_HDR flag into two flags, STBI_NO_HDR and
75  STBI_NO_LINEAR.
76  STBI_NO_HDR: suppress implementation of .hdr reader format
77  STBI_NO_LINEAR: suppress high-dynamic-range light-linear float API
78 
79  - You can suppress implementation of any of the decoders to reduce
80  your code footprint by #defining one or more of the following
81  symbols before creating the implementation.
82 
83  STBI_NO_JPEG
84  STBI_NO_PNG
85  STBI_NO_BMP
86  STBI_NO_PSD
87  STBI_NO_TGA
88  STBI_NO_GIF
89  STBI_NO_HDR
90  STBI_NO_PIC
91  STBI_NO_PNM (.ppm and .pgm)
92 
93  - You can request *only* certain decoders and suppress all other ones
94  (this will be more forward-compatible, as addition of new decoders
95  doesn't require you to disable them explicitly):
96 
97  STBI_ONLY_JPEG
98  STBI_ONLY_PNG
99  STBI_ONLY_BMP
100  STBI_ONLY_PSD
101  STBI_ONLY_TGA
102  STBI_ONLY_GIF
103  STBI_ONLY_HDR
104  STBI_ONLY_PIC
105  STBI_ONLY_PNM (.ppm and .pgm)
106 
107  Note that you can define multiples of these, and you will get all
108  of them ("only x" and "only y" is interpreted to mean "only x&y").
109 
110  - If you use STBI_NO_PNG (or _ONLY_ without PNG), and you still
111  want the zlib decoder to be available, #define STBI_SUPPORT_ZLIB
112 
113  - Compilation of all SIMD code can be suppressed with
114  #define STBI_NO_SIMD
115  It should not be necessary to disable SIMD unless you have issues
116  compiling (e.g. using an x86 compiler which doesn't support SSE
117  intrinsics or that doesn't support the method used to detect
118  SSE2 support at run-time), and even those can be reported as
119  bugs so I can refine the built-in compile-time checking to be
120  smarter.
121 
122  - The old STBI_SIMD system which allowed installing a user-defined
123  IDCT etc. has been removed. If you need this, don't upgrade. My
124  assumption is that almost nobody was doing this, and those who
125  were will find the built-in SIMD more satisfactory anyway.
126 
127  - RGB values computed for JPEG images are slightly different from
128  previous versions of stb_image. (This is due to using less
129  integer precision in SIMD.) The C code has been adjusted so
130  that the same RGB values will be computed regardless of whether
131  SIMD support is available, so your app should always produce
132  consistent results. But these results are slightly different from
133  previous versions. (Specifically, about 3% of available YCbCr values
134  will compute different RGB results from pre-1.49 versions by +-1;
135  most of the deviating values are one smaller in the G channel.)
136 
137  - If you must produce consistent results with previous versions of
138  stb_image, #define STBI_JPEG_OLD and you will get the same results
139  you used to; however, you will not get the SIMD speedups for
140  the YCbCr-to-RGB conversion step (although you should still see
141  significant JPEG speedup from the other changes).
142 
143  Please note that STBI_JPEG_OLD is a temporary feature; it will be
144  removed in future versions of the library. It is only intended for
145  near-term back-compatibility use.
146 
147 
148  Latest revision history:
149  2.12 (2016-04-02) fix typo in 2.11 PSD fix that caused crashes
150  2.11 (2016-04-02) 16-bit PNGS; enable SSE2 in non-gcc x64
151  RGB-format JPEG; remove white matting in PSD;
152  allocate large structures on the stack;
153  correct channel count for PNG & BMP
154  2.10 (2016-01-22) avoid warning introduced in 2.09
155  2.09 (2016-01-16) 16-bit TGA; comments in PNM files; STBI_REALLOC_SIZED
156  2.08 (2015-09-13) fix to 2.07 cleanup, reading RGB PSD as RGBA
157  2.07 (2015-09-13) partial animated GIF support
158  limited 16-bit PSD support
159  minor bugs, code cleanup, and compiler warnings
160  2.06 (2015-04-19) fix bug where PSD returns wrong '*comp' value
161  2.05 (2015-04-19) fix bug in progressive JPEG handling, fix warning
162  2.04 (2015-04-15) try to re-enable SIMD on MinGW 64-bit
163  2.03 (2015-04-12) additional corruption checking
164  stbi_set_flip_vertically_on_load
165  fix NEON support; fix mingw support
166  2.02 (2015-01-19) fix incorrect assert, fix warning
167  2.01 (2015-01-17) fix various warnings
168  2.00b (2014-12-25) fix STBI_MALLOC in progressive JPEG
169  2.00 (2014-12-25) optimize JPEG, including x86 SSE2 & ARM NEON SIMD
170  progressive JPEG
171  PGM/PPM support
172  STBI_MALLOC,STBI_REALLOC,STBI_FREE
173  STBI_NO_*, STBI_ONLY_*
174  GIF bugfix
175 
176  See end of file for full revision history.
177 
178 
179  ============================ Contributors =========================
180 
181  Image formats Extensions, features
182  Sean Barrett (jpeg, png, bmp) Jetro Lauha (stbi_info)
183  Nicolas Schulz (hdr, psd) Martin "SpartanJ" Golini (stbi_info)
184  Jonathan Dummer (tga) James "moose2000" Brown (iPhone PNG)
185  Jean-Marc Lienher (gif) Ben "Disch" Wenger (io callbacks)
186  Tom Seddon (pic) Omar Cornut (1/2/4-bit PNG)
187  Thatcher Ulrich (psd) Nicolas Guillemot (vertical flip)
188  Ken Miller (pgm, ppm) Richard Mitton (16-bit PSD)
189  urraka@github (animated gif) Junggon Kim (PNM comments)
190  Daniel Gibson (16-bit TGA)
191 
192  Optimizations & bugfixes
193  Fabian "ryg" Giesen
194  Arseny Kapoulkine
195 
196  Bug & warning fixes
197  Marc LeBlanc David Woo Guillaume George Martins Mozeiko
198  Christpher Lloyd Martin Golini Jerry Jansson Joseph Thomson
199  Dave Moore Roy Eltham Hayaki Saito Phil Jordan
200  Won Chun Luke Graham Johan Duparc Nathan Reed
201  the Horde3D community Thomas Ruf Ronny Chevalier Nick Verigakis
202  Janez Zemva John Bartholomew Michal Cichon svdijk@github
203  Jonathan Blow Ken Hamada Tero Hanninen Baldur Karlsson
204  Laurent Gomila Cort Stratton Sergio Gonzalez romigrou@github
205  Aruelien Pocheville Thibault Reuille Cass Everitt Matthew Gregan
206  Ryamond Barbiero Paul Du Bois Engin Manap snagar@github
207  Michaelangel007@github Oriol Ferrer Mesia socks-the-fox
208  Blazej Dariusz Roszkowski
209 
210 
211 LICENSE
212 
213 This software is dual-licensed to the public domain and under the following
214 license: you are granted a perpetual, irrevocable license to copy, modify,
215 publish, and distribute this file as you see fit.
216 
217 */
218 
219 #ifndef STBI_INCLUDE_STB_IMAGE_H
220 #define STBI_INCLUDE_STB_IMAGE_H
221 
222 // DOCUMENTATION
223 //
224 // Limitations:
225 // - no 16-bit-per-channel PNG
226 // - no 12-bit-per-channel JPEG
227 // - no JPEGs with arithmetic coding
228 // - no 1-bit BMP
229 // - GIF always returns *comp=4
230 //
231 // Basic usage (see HDR discussion below for HDR usage):
232 // int x,y,n;
233 // unsigned char *data = stbi_load(filename, &x, &y, &n, 0);
234 // // ... process data if not NULL ...
235 // // ... x = width, y = height, n = # 8-bit components per pixel ...
236 // // ... replace '0' with '1'..'4' to force that many components per pixel
237 // // ... but 'n' will always be the number that it would have been if you said 0
238 // stbi_image_free(data)
239 //
240 // Standard parameters:
241 // int *x -- outputs image width in pixels
242 // int *y -- outputs image height in pixels
243 // int *comp -- outputs # of image components in image file
244 // int req_comp -- if non-zero, # of image components requested in result
245 //
246 // The return value from an image loader is an 'unsigned char *' which points
247 // to the pixel data, or NULL on an allocation failure or if the image is
248 // corrupt or invalid. The pixel data consists of *y scanlines of *x pixels,
249 // with each pixel consisting of N interleaved 8-bit components; the first
250 // pixel pointed to is top-left-most in the image. There is no padding between
251 // image scanlines or between pixels, regardless of format. The number of
252 // components N is 'req_comp' if req_comp is non-zero, or *comp otherwise.
253 // If req_comp is non-zero, *comp has the number of components that _would_
254 // have been output otherwise. E.g. if you set req_comp to 4, you will always
255 // get RGBA output, but you can check *comp to see if it's trivially opaque
256 // because e.g. there were only 3 channels in the source image.
257 //
258 // An output image with N components has the following components interleaved
259 // in this order in each pixel:
260 //
261 // N=#comp components
262 // 1 grey
263 // 2 grey, alpha
264 // 3 red, green, blue
265 // 4 red, green, blue, alpha
266 //
267 // If image loading fails for any reason, the return value will be NULL,
268 // and *x, *y, *comp will be unchanged. The function stbi_failure_reason()
269 // can be queried for an extremely brief, end-user unfriendly explanation
270 // of why the load failed. Define STBI_NO_FAILURE_STRINGS to avoid
271 // compiling these strings at all, and STBI_FAILURE_USERMSG to get slightly
272 // more user-friendly ones.
273 //
274 // Paletted PNG, BMP, GIF, and PIC images are automatically depalettized.
275 //
276 // ===========================================================================
277 //
278 // Philosophy
279 //
280 // stb libraries are designed with the following priorities:
281 //
282 // 1. easy to use
283 // 2. easy to maintain
284 // 3. good performance
285 //
286 // Sometimes I let "good performance" creep up in priority over "easy to maintain",
287 // and for best performance I may provide less-easy-to-use APIs that give higher
288 // performance, in addition to the easy to use ones. Nevertheless, it's important
289 // to keep in mind that from the standpoint of you, a client of this library,
290 // all you care about is #1 and #3, and stb libraries do not emphasize #3 above all.
291 //
292 // Some secondary priorities arise directly from the first two, some of which
293 // make more explicit reasons why performance can't be emphasized.
294 //
295 // - Portable ("ease of use")
296 // - Small footprint ("easy to maintain")
297 // - No dependencies ("ease of use")
298 //
299 // ===========================================================================
300 //
301 // I/O callbacks
302 //
303 // I/O callbacks allow you to read from arbitrary sources, like packaged
304 // files or some other source. Data read from callbacks are processed
305 // through a small internal buffer (currently 128 bytes) to try to reduce
306 // overhead.
307 //
308 // The three functions you must define are "read" (reads some bytes of data),
309 // "skip" (skips some bytes of data), "eof" (reports if the stream is at the end).
310 //
311 // ===========================================================================
312 //
313 // SIMD support
314 //
315 // The JPEG decoder will try to automatically use SIMD kernels on x86 when
316 // supported by the compiler. For ARM Neon support, you must explicitly
317 // request it.
318 //
319 // (The old do-it-yourself SIMD API is no longer supported in the current
320 // code.)
321 //
322 // On x86, SSE2 will automatically be used when available based on a run-time
323 // test; if not, the generic C versions are used as a fall-back. On ARM targets,
324 // the typical path is to have separate builds for NEON and non-NEON devices
325 // (at least this is true for iOS and Android). Therefore, the NEON support is
326 // toggled by a build flag: define STBI_NEON to get NEON loops.
327 //
328 // The output of the JPEG decoder is slightly different from versions where
329 // SIMD support was introduced (that is, for versions before 1.49). The
330 // difference is only +-1 in the 8-bit RGB channels, and only on a small
331 // fraction of pixels. You can force the pre-1.49 behavior by defining
332 // STBI_JPEG_OLD, but this will disable some of the SIMD decoding path
333 // and hence cost some performance.
334 //
335 // If for some reason you do not want to use any of SIMD code, or if
336 // you have issues compiling it, you can disable it entirely by
337 // defining STBI_NO_SIMD.
338 //
339 // ===========================================================================
340 //
341 // HDR image support (disable by defining STBI_NO_HDR)
342 //
343 // stb_image now supports loading HDR images in general, and currently
344 // the Radiance .HDR file format, although the support is provided
345 // generically. You can still load any file through the existing interface;
346 // if you attempt to load an HDR file, it will be automatically remapped to
347 // LDR, assuming gamma 2.2 and an arbitrary scale factor defaulting to 1;
348 // both of these constants can be reconfigured through this interface:
349 //
350 // stbi_hdr_to_ldr_gamma(2.2f);
351 // stbi_hdr_to_ldr_scale(1.0f);
352 //
353 // (note, do not use _inverse_ constants; stbi_image will invert them
354 // appropriately).
355 //
356 // Additionally, there is a new, parallel interface for loading files as
357 // (linear) floats to preserve the full dynamic range:
358 //
359 // float *data = stbi_loadf(filename, &x, &y, &n, 0);
360 //
361 // If you load LDR images through this interface, those images will
362 // be promoted to floating point values, run through the inverse of
363 // constants corresponding to the above:
364 //
365 // stbi_ldr_to_hdr_scale(1.0f);
366 // stbi_ldr_to_hdr_gamma(2.2f);
367 //
368 // Finally, given a filename (or an open file or memory block--see header
369 // file for details) containing image data, you can query for the "most
370 // appropriate" interface to use (that is, whether the image is HDR or
371 // not), using:
372 //
373 // stbi_is_hdr(char *filename);
374 //
375 // ===========================================================================
376 //
377 // iPhone PNG support:
378 //
379 // By default we convert iphone-formatted PNGs back to RGB, even though
380 // they are internally encoded differently. You can disable this conversion
381 // by by calling stbi_convert_iphone_png_to_rgb(0), in which case
382 // you will always just get the native iphone "format" through (which
383 // is BGR stored in RGB).
384 //
385 // Call stbi_set_unpremultiply_on_load(1) as well to force a divide per
386 // pixel to remove any premultiplied alpha *only* if the image file explicitly
387 // says there's premultiplied data (currently only happens in iPhone images,
388 // and only if iPhone convert-to-rgb processing is on).
389 //
390 
391 #ifndef STBI_NO_STDIO
392 #include <stdio.h>
393 #endif // STBI_NO_STDIO
394 
395 #define STBI_VERSION 1
396 
397 enum {
398  STBI_default = 0, // only used for req_comp
399 
400  STBI_grey = 1,
401  STBI_grey_alpha = 2,
402  STBI_rgb = 3,
403  STBI_rgb_alpha = 4
404 };
405 
406 typedef unsigned char stbi_uc;
407 
408 #ifdef __cplusplus
409 extern "C" {
410 #endif
411 
412 #ifdef STB_IMAGE_STATIC
413 #define STBIDEF static
414 #else
415 #define STBIDEF extern
416 #endif
417 
419 //
420 // PRIMARY API - works on images of any type
421 //
422 
423 //
424 // load image by filename, open file, or memory buffer
425 //
426 
427 typedef struct {
428  int (*read)(
429  void *user,
430  char *data,
431  int size); // fill 'data' with 'size' bytes. return number of bytes actually read
432  void (*skip)(
433  void *user,
434  int n); // skip the next 'n' bytes, or 'unget' the last -n bytes if negative
435  int (*eof)(void *user); // returns nonzero if we are at end of file/data
437 
438 STBIDEF stbi_uc *stbi_load(char const *filename, int *x, int *y, int *comp, int req_comp);
439 STBIDEF stbi_uc *
440 stbi_load_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *comp, int req_comp);
441 STBIDEF stbi_uc *stbi_load_from_callbacks(
442  stbi_io_callbacks const *clbk,
443  void *user,
444  int *x,
445  int *y,
446  int *comp,
447  int req_comp);
448 
449 #ifndef STBI_NO_STDIO
450 STBIDEF stbi_uc *stbi_load_from_file(FILE *f, int *x, int *y, int *comp, int req_comp);
451 // for stbi_load_from_file, file pointer is left pointing immediately after image
452 #endif
453 
454 #ifndef STBI_NO_LINEAR
455 STBIDEF float *stbi_loadf(char const *filename, int *x, int *y, int *comp, int req_comp);
456 STBIDEF float *
457 stbi_loadf_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *comp, int req_comp);
458 STBIDEF float *stbi_loadf_from_callbacks(
459  stbi_io_callbacks const *clbk,
460  void *user,
461  int *x,
462  int *y,
463  int *comp,
464  int req_comp);
465 
466 #ifndef STBI_NO_STDIO
467 STBIDEF float *stbi_loadf_from_file(FILE *f, int *x, int *y, int *comp, int req_comp);
468 #endif
469 #endif
470 
471 #ifndef STBI_NO_HDR
472 STBIDEF void stbi_hdr_to_ldr_gamma(float gamma);
473 STBIDEF void stbi_hdr_to_ldr_scale(float scale);
474 #endif // STBI_NO_HDR
475 
476 #ifndef STBI_NO_LINEAR
477 STBIDEF void stbi_ldr_to_hdr_gamma(float gamma);
478 STBIDEF void stbi_ldr_to_hdr_scale(float scale);
479 #endif // STBI_NO_LINEAR
480 
481 // stbi_is_hdr is always defined, but always returns false if STBI_NO_HDR
482 STBIDEF int stbi_is_hdr_from_callbacks(stbi_io_callbacks const *clbk, void *user);
483 STBIDEF int stbi_is_hdr_from_memory(stbi_uc const *buffer, int len);
484 #ifndef STBI_NO_STDIO
485 STBIDEF int stbi_is_hdr(char const *filename);
486 STBIDEF int stbi_is_hdr_from_file(FILE *f);
487 #endif // STBI_NO_STDIO
488 
489 // get a VERY brief reason for failure
490 // NOT THREADSAFE
491 STBIDEF const char *stbi_failure_reason(void);
492 
493 // free the loaded image -- this is just free()
494 STBIDEF void stbi_image_free(void *retval_from_stbi_load);
495 
496 // get image dimensions & components without fully decoding
497 STBIDEF int stbi_info_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *comp);
498 STBIDEF int
499 stbi_info_from_callbacks(stbi_io_callbacks const *clbk, void *user, int *x, int *y, int *comp);
500 
501 #ifndef STBI_NO_STDIO
502 STBIDEF int stbi_info(char const *filename, int *x, int *y, int *comp);
503 STBIDEF int stbi_info_from_file(FILE *f, int *x, int *y, int *comp);
504 
505 #endif
506 
507 // for image formats that explicitly notate that they have premultiplied alpha,
508 // we just return the colors as stored in the file. set this flag to force
509 // unpremultiplication. results are undefined if the unpremultiply overflow.
510 STBIDEF void stbi_set_unpremultiply_on_load(int flag_true_if_should_unpremultiply);
511 
512 // indicate whether we should process iphone images back to canonical format,
513 // or just pass them through "as-is"
514 STBIDEF void stbi_convert_iphone_png_to_rgb(int flag_true_if_should_convert);
515 
516 // flip the image vertically, so the first pixel in the output array is the bottom left
517 STBIDEF void stbi_set_flip_vertically_on_load(int flag_true_if_should_flip);
518 
519 // ZLIB client - used by PNG, available for other purposes
520 
521 STBIDEF char *
522 stbi_zlib_decode_malloc_guesssize(const char *buffer, int len, int initial_size, int *outlen);
523 STBIDEF char *stbi_zlib_decode_malloc_guesssize_headerflag(
524  const char *buffer,
525  int len,
526  int initial_size,
527  int *outlen,
528  int parse_header);
529 STBIDEF char *stbi_zlib_decode_malloc(const char *buffer, int len, int *outlen);
530 STBIDEF int stbi_zlib_decode_buffer(char *obuffer, int olen, const char *ibuffer, int ilen);
531 
532 STBIDEF char *stbi_zlib_decode_noheader_malloc(const char *buffer, int len, int *outlen);
533 STBIDEF int
534 stbi_zlib_decode_noheader_buffer(char *obuffer, int olen, const char *ibuffer, int ilen);
535 
536 #ifdef __cplusplus
537 }
538 #endif
539 
540 //
541 //
543 #endif // STBI_INCLUDE_STB_IMAGE_H
544 
545 #ifdef STB_IMAGE_IMPLEMENTATION
546 
547 #if defined(STBI_ONLY_JPEG) || defined(STBI_ONLY_PNG) || defined(STBI_ONLY_BMP) \
548  || defined(STBI_ONLY_TGA) || defined(STBI_ONLY_GIF) || defined(STBI_ONLY_PSD) \
549  || defined(STBI_ONLY_HDR) || defined(STBI_ONLY_PIC) || defined(STBI_ONLY_PNM) \
550  || defined(STBI_ONLY_ZLIB)
551 #ifndef STBI_ONLY_JPEG
552 #define STBI_NO_JPEG
553 #endif
554 #ifndef STBI_ONLY_PNG
555 #define STBI_NO_PNG
556 #endif
557 #ifndef STBI_ONLY_BMP
558 #define STBI_NO_BMP
559 #endif
560 #ifndef STBI_ONLY_PSD
561 #define STBI_NO_PSD
562 #endif
563 #ifndef STBI_ONLY_TGA
564 #define STBI_NO_TGA
565 #endif
566 #ifndef STBI_ONLY_GIF
567 #define STBI_NO_GIF
568 #endif
569 #ifndef STBI_ONLY_HDR
570 #define STBI_NO_HDR
571 #endif
572 #ifndef STBI_ONLY_PIC
573 #define STBI_NO_PIC
574 #endif
575 #ifndef STBI_ONLY_PNM
576 #define STBI_NO_PNM
577 #endif
578 #endif
579 
580 #if defined(STBI_NO_PNG) && !defined(STBI_SUPPORT_ZLIB) && !defined(STBI_NO_ZLIB)
581 #define STBI_NO_ZLIB
582 #endif
583 
584 #include <stdarg.h>
585 #include <stddef.h> // ptrdiff_t on osx
586 #include <stdlib.h>
587 #include <string.h>
588 
589 #if !defined(STBI_NO_LINEAR) || !defined(STBI_NO_HDR)
590 #include <math.h> // ldexp
591 #endif
592 
593 #ifndef STBI_NO_STDIO
594 #include <stdio.h>
595 #endif
596 
597 #ifndef STBI_ASSERT
598 #include <assert.h>
599 #define STBI_ASSERT(x) assert(x)
600 #endif
601 
602 #ifndef _MSC_VER
603 #ifdef __cplusplus
604 #define stbi_inline inline
605 #else
606 #define stbi_inline
607 #endif
608 #else
609 #define stbi_inline __forceinline
610 #endif
611 
612 #ifdef _MSC_VER
613 typedef unsigned short stbi__uint16;
614 typedef signed short stbi__int16;
615 typedef unsigned int stbi__uint32;
616 typedef signed int stbi__int32;
617 #else
618 #include <stdint.h>
619 typedef uint16_t stbi__uint16;
620 typedef int16_t stbi__int16;
621 typedef uint32_t stbi__uint32;
622 typedef int32_t stbi__int32;
623 #endif
624 
625 // should produce compiler error if size is wrong
626 typedef unsigned char validate_uint32[sizeof(stbi__uint32) == 4 ? 1 : -1];
627 
628 #ifdef _MSC_VER
629 #define STBI_NOTUSED(v) (void)(v)
630 #else
631 #define STBI_NOTUSED(v) (void)sizeof(v)
632 #endif
633 
634 #ifdef _MSC_VER
635 #define STBI_HAS_LROTL
636 #endif
637 
638 #ifdef STBI_HAS_LROTL
639 #define stbi_lrot(x, y) _lrotl(x, y)
640 #else
641 #define stbi_lrot(x, y) (((x) << (y)) | ((x) >> (32 - (y))))
642 #endif
643 
644 #if defined(STBI_MALLOC) && defined(STBI_FREE) \
645  && (defined(STBI_REALLOC) || defined(STBI_REALLOC_SIZED))
646 // ok
647 #elif !defined(STBI_MALLOC) && !defined(STBI_FREE) && !defined(STBI_REALLOC) \
648  && !defined(STBI_REALLOC_SIZED)
649 // ok
650 #else
651 #error \
652  "Must define all or none of STBI_MALLOC, STBI_FREE, and STBI_REALLOC (or STBI_REALLOC_SIZED)."
653 #endif
654 
655 #ifndef STBI_MALLOC
656 #define STBI_MALLOC(sz) malloc(sz)
657 #define STBI_REALLOC(p, newsz) realloc(p, newsz)
658 #define STBI_FREE(p) free(p)
659 #endif
660 
661 #ifndef STBI_REALLOC_SIZED
662 #define STBI_REALLOC_SIZED(p, oldsz, newsz) STBI_REALLOC(p, newsz)
663 #endif
664 
665 // x86/x64 detection
666 #if defined(__x86_64__) || defined(_M_X64)
667 #define STBI__X64_TARGET
668 #elif defined(__i386) || defined(_M_IX86)
669 #define STBI__X86_TARGET
670 #endif
671 
672 #if defined(__GNUC__) && (defined(STBI__X86_TARGET) || defined(STBI__X64_TARGET)) \
673  && !defined(__SSE2__) && !defined(STBI_NO_SIMD)
674 // NOTE: not clear do we actually need this for the 64-bit path?
675 // gcc doesn't support sse2 intrinsics unless you compile with -msse2,
676 // (but compiling with -msse2 allows the compiler to use SSE2 everywhere;
677 // this is just broken and gcc are jerks for not fixing it properly
678 // http://www.virtualdub.org/blog/pivot/entry.php?id=363 )
679 #define STBI_NO_SIMD
680 #endif
681 
682 #if defined(__MINGW32__) && defined(STBI__X86_TARGET) && !defined(STBI_MINGW_ENABLE_SSE2) \
683  && !defined(STBI_NO_SIMD)
684 // Note that __MINGW32__ doesn't actually mean 32-bit, so we have to avoid STBI__X64_TARGET
685 //
686 // 32-bit MinGW wants ESP to be 16-byte aligned, but this is not in the
687 // Windows ABI and VC++ as well as Windows DLLs don't maintain that invariant.
688 // As a result, enabling SSE2 on 32-bit MinGW is dangerous when not
689 // simultaneously enabling "-mstackrealign".
690 //
691 // See https://github.com/nothings/stb/issues/81 for more information.
692 //
693 // So default to no SSE2 on 32-bit MinGW. If you've read this far and added
694 // -mstackrealign to your build settings, feel free to #define STBI_MINGW_ENABLE_SSE2.
695 #define STBI_NO_SIMD
696 #endif
697 
698 #if !defined(STBI_NO_SIMD) && (defined(STBI__X86_TARGET) || defined(STBI__X64_TARGET))
699 #define STBI_SSE2
700 #include <emmintrin.h>
701 
702 #ifdef _MSC_VER
703 
704 #if _MSC_VER >= 1400 // not VC6
705 #include <intrin.h> // __cpuid
706 static int stbi__cpuid3(void) {
707  int info[4];
708  __cpuid(info, 1);
709  return info[3];
710 }
711 #else
712 static int stbi__cpuid3(void) {
713  int res;
714  __asm {
715  mov eax,1
716  cpuid
717  mov res,edx
718  }
719  return res;
720 }
721 #endif
722 
723 #define STBI_SIMD_ALIGN(type, name) __declspec(align(16)) type name
724 
725 static int stbi__sse2_available() {
726  int info3 = stbi__cpuid3();
727  return ((info3 >> 26) & 1) != 0;
728 }
729 #else // assume GCC-style if not VC++
730 #define STBI_SIMD_ALIGN(type, name) type name __attribute__((aligned(16)))
731 
732 static int stbi__sse2_available() {
733 #if defined(__GNUC__) && (__GNUC__ * 100 + __GNUC_MINOR__) >= 408 // GCC 4.8 or later
734  // GCC 4.8+ has a nice way to do this
735  return __builtin_cpu_supports("sse2");
736 #else
737  // portable way to do this, preferably without using GCC inline ASM?
738  // just bail for now.
739  return 0;
740 #endif
741 }
742 #endif
743 #endif
744 
745 // ARM NEON
746 #if defined(STBI_NO_SIMD) && defined(STBI_NEON)
747 #undef STBI_NEON
748 #endif
749 
750 #ifdef STBI_NEON
751 #include <arm_neon.h>
752 // assume GCC or Clang on ARM targets
753 #define STBI_SIMD_ALIGN(type, name) type name __attribute__((aligned(16)))
754 #endif
755 
756 #ifndef STBI_SIMD_ALIGN
757 #define STBI_SIMD_ALIGN(type, name) type name
758 #endif
759 
761 //
762 // stbi__context struct and start_xxx functions
763 
764 // stbi__context structure is our basic context used by all images, so it
765 // contains all the IO context, plus some basic image information
766 typedef struct {
767  stbi__uint32 img_x, img_y;
768  int img_n, img_out_n;
769 
771  void *io_user_data;
772 
773  int read_from_callbacks;
774  int buflen;
775  stbi_uc buffer_start[128];
776 
777  stbi_uc *img_buffer, *img_buffer_end;
778  stbi_uc *img_buffer_original, *img_buffer_original_end;
779 } stbi__context;
780 
781 static void stbi__refill_buffer(stbi__context *s);
782 
783 // initialize a memory-decode context
784 static void stbi__start_mem(stbi__context *s, stbi_uc const *buffer, int len) {
785  s->io.read = NULL;
786  s->read_from_callbacks = 0;
787  s->img_buffer = s->img_buffer_original = (stbi_uc *)buffer;
788  s->img_buffer_end = s->img_buffer_original_end = (stbi_uc *)buffer + len;
789 }
790 
791 // initialize a callback-based context
792 static void stbi__start_callbacks(stbi__context *s, stbi_io_callbacks *c, void *user) {
793  s->io = *c;
794  s->io_user_data = user;
795  s->buflen = sizeof(s->buffer_start);
796  s->read_from_callbacks = 1;
797  s->img_buffer_original = s->buffer_start;
798  stbi__refill_buffer(s);
799  s->img_buffer_original_end = s->img_buffer_end;
800 }
801 
802 #ifndef STBI_NO_STDIO
803 
804 static int stbi__stdio_read(void *user, char *data, int size) {
805  return (int)fread(data, 1, size, (FILE *)user);
806 }
807 
808 static void stbi__stdio_skip(void *user, int n) { fseek((FILE *)user, n, SEEK_CUR); }
809 
810 static int stbi__stdio_eof(void *user) { return feof((FILE *)user); }
811 
812 static stbi_io_callbacks stbi__stdio_callbacks = {
813  stbi__stdio_read,
814  stbi__stdio_skip,
815  stbi__stdio_eof,
816 };
817 
818 static void stbi__start_file(stbi__context *s, FILE *f) {
819  stbi__start_callbacks(s, &stbi__stdio_callbacks, (void *)f);
820 }
821 
822 // static void stop_file(stbi__context *s) { }
823 
824 #endif // !STBI_NO_STDIO
825 
826 static void stbi__rewind(stbi__context *s) {
827  // conceptually rewind SHOULD rewind to the beginning of the stream,
828  // but we just rewind to the beginning of the initial buffer, because
829  // we only use it after doing 'test', which only ever looks at at most 92 bytes
830  s->img_buffer = s->img_buffer_original;
831  s->img_buffer_end = s->img_buffer_original_end;
832 }
833 
834 #ifndef STBI_NO_JPEG
835 static int stbi__jpeg_test(stbi__context *s);
836 static stbi_uc *stbi__jpeg_load(stbi__context *s, int *x, int *y, int *comp, int req_comp);
837 static int stbi__jpeg_info(stbi__context *s, int *x, int *y, int *comp);
838 #endif
839 
840 #ifndef STBI_NO_PNG
841 static int stbi__png_test(stbi__context *s);
842 static stbi_uc *stbi__png_load(stbi__context *s, int *x, int *y, int *comp, int req_comp);
843 static int stbi__png_info(stbi__context *s, int *x, int *y, int *comp);
844 #endif
845 
846 #ifndef STBI_NO_BMP
847 static int stbi__bmp_test(stbi__context *s);
848 static stbi_uc *stbi__bmp_load(stbi__context *s, int *x, int *y, int *comp, int req_comp);
849 static int stbi__bmp_info(stbi__context *s, int *x, int *y, int *comp);
850 #endif
851 
852 #ifndef STBI_NO_TGA
853 static int stbi__tga_test(stbi__context *s);
854 static stbi_uc *stbi__tga_load(stbi__context *s, int *x, int *y, int *comp, int req_comp);
855 static int stbi__tga_info(stbi__context *s, int *x, int *y, int *comp);
856 #endif
857 
858 #ifndef STBI_NO_PSD
859 static int stbi__psd_test(stbi__context *s);
860 static stbi_uc *stbi__psd_load(stbi__context *s, int *x, int *y, int *comp, int req_comp);
861 static int stbi__psd_info(stbi__context *s, int *x, int *y, int *comp);
862 #endif
863 
864 #ifndef STBI_NO_HDR
865 static int stbi__hdr_test(stbi__context *s);
866 static float *stbi__hdr_load(stbi__context *s, int *x, int *y, int *comp, int req_comp);
867 static int stbi__hdr_info(stbi__context *s, int *x, int *y, int *comp);
868 #endif
869 
870 #ifndef STBI_NO_PIC
871 static int stbi__pic_test(stbi__context *s);
872 static stbi_uc *stbi__pic_load(stbi__context *s, int *x, int *y, int *comp, int req_comp);
873 static int stbi__pic_info(stbi__context *s, int *x, int *y, int *comp);
874 #endif
875 
876 #ifndef STBI_NO_GIF
877 static int stbi__gif_test(stbi__context *s);
878 static stbi_uc *stbi__gif_load(stbi__context *s, int *x, int *y, int *comp, int req_comp);
879 static int stbi__gif_info(stbi__context *s, int *x, int *y, int *comp);
880 #endif
881 
882 #ifndef STBI_NO_PNM
883 static int stbi__pnm_test(stbi__context *s);
884 static stbi_uc *stbi__pnm_load(stbi__context *s, int *x, int *y, int *comp, int req_comp);
885 static int stbi__pnm_info(stbi__context *s, int *x, int *y, int *comp);
886 #endif
887 
888 // this is not threadsafe
889 static const char *stbi__g_failure_reason;
890 
891 STBIDEF const char *stbi_failure_reason(void) { return stbi__g_failure_reason; }
892 
893 static int stbi__err(const char *str) {
894  stbi__g_failure_reason = str;
895  return 0;
896 }
897 
898 static void *stbi__malloc(size_t size) { return STBI_MALLOC(size); }
899 
900 // stbi__err - error
901 // stbi__errpf - error returning pointer to float
902 // stbi__errpuc - error returning pointer to unsigned char
903 
904 #ifdef STBI_NO_FAILURE_STRINGS
905 #define stbi__err(x, y) 0
906 #elif defined(STBI_FAILURE_USERMSG)
907 #define stbi__err(x, y) stbi__err(y)
908 #else
909 #define stbi__err(x, y) stbi__err(x)
910 #endif
911 
912 #define stbi__errpf(x, y) ((float *)(size_t)(stbi__err(x, y) ? NULL : NULL))
913 #define stbi__errpuc(x, y) ((unsigned char *)(size_t)(stbi__err(x, y) ? NULL : NULL))
914 
915 STBIDEF void stbi_image_free(void *retval_from_stbi_load) { STBI_FREE(retval_from_stbi_load); }
916 
917 #ifndef STBI_NO_LINEAR
918 static float *stbi__ldr_to_hdr(stbi_uc *data, int x, int y, int comp);
919 #endif
920 
921 #ifndef STBI_NO_HDR
922 static stbi_uc *stbi__hdr_to_ldr(float *data, int x, int y, int comp);
923 #endif
924 
925 static int stbi__vertically_flip_on_load = 0;
926 
927 STBIDEF void stbi_set_flip_vertically_on_load(int flag_true_if_should_flip) {
928  stbi__vertically_flip_on_load = flag_true_if_should_flip;
929 }
930 
931 static unsigned char *stbi__load_main(stbi__context *s, int *x, int *y, int *comp, int req_comp) {
932 #ifndef STBI_NO_JPEG
933  if (stbi__jpeg_test(s))
934  return stbi__jpeg_load(s, x, y, comp, req_comp);
935 #endif
936 #ifndef STBI_NO_PNG
937  if (stbi__png_test(s))
938  return stbi__png_load(s, x, y, comp, req_comp);
939 #endif
940 #ifndef STBI_NO_BMP
941  if (stbi__bmp_test(s))
942  return stbi__bmp_load(s, x, y, comp, req_comp);
943 #endif
944 #ifndef STBI_NO_GIF
945  if (stbi__gif_test(s))
946  return stbi__gif_load(s, x, y, comp, req_comp);
947 #endif
948 #ifndef STBI_NO_PSD
949  if (stbi__psd_test(s))
950  return stbi__psd_load(s, x, y, comp, req_comp);
951 #endif
952 #ifndef STBI_NO_PIC
953  if (stbi__pic_test(s))
954  return stbi__pic_load(s, x, y, comp, req_comp);
955 #endif
956 #ifndef STBI_NO_PNM
957  if (stbi__pnm_test(s))
958  return stbi__pnm_load(s, x, y, comp, req_comp);
959 #endif
960 
961 #ifndef STBI_NO_HDR
962  if (stbi__hdr_test(s)) {
963  float *hdr = stbi__hdr_load(s, x, y, comp, req_comp);
964  return stbi__hdr_to_ldr(hdr, *x, *y, req_comp ? req_comp : *comp);
965  }
966 #endif
967 
968 #ifndef STBI_NO_TGA
969  // test tga last because it's a crappy test!
970  if (stbi__tga_test(s))
971  return stbi__tga_load(s, x, y, comp, req_comp);
972 #endif
973 
974  return stbi__errpuc("unknown image type", "Image not of any known type, or corrupt");
975 }
976 
977 static unsigned char *stbi__load_flip(stbi__context *s, int *x, int *y, int *comp, int req_comp) {
978  unsigned char *result = stbi__load_main(s, x, y, comp, req_comp);
979 
980  if (stbi__vertically_flip_on_load && result != NULL) {
981  int w = *x, h = *y;
982  int depth = req_comp ? req_comp : *comp;
983  int row, col, z;
984  stbi_uc temp;
985 
986  // @OPTIMIZE: use a bigger temp buffer and memcpy multiple pixels at once
987  for (row = 0; row < (h >> 1); row++) {
988  for (col = 0; col < w; col++) {
989  for (z = 0; z < depth; z++) {
990  temp = result[(row * w + col) * depth + z];
991  result[(row * w + col) * depth + z] = result[((h - row - 1) * w + col) * depth + z];
992  result[((h - row - 1) * w + col) * depth + z] = temp;
993  }
994  }
995  }
996  }
997 
998  return result;
999 }
1000 
1001 #ifndef STBI_NO_HDR
1002 static void stbi__float_postprocess(float *result, int *x, int *y, int *comp, int req_comp) {
1003  if (stbi__vertically_flip_on_load && result != NULL) {
1004  int w = *x, h = *y;
1005  int depth = req_comp ? req_comp : *comp;
1006  int row, col, z;
1007  float temp;
1008 
1009  // @OPTIMIZE: use a bigger temp buffer and memcpy multiple pixels at once
1010  for (row = 0; row < (h >> 1); row++) {
1011  for (col = 0; col < w; col++) {
1012  for (z = 0; z < depth; z++) {
1013  temp = result[(row * w + col) * depth + z];
1014  result[(row * w + col) * depth + z] = result[((h - row - 1) * w + col) * depth + z];
1015  result[((h - row - 1) * w + col) * depth + z] = temp;
1016  }
1017  }
1018  }
1019  }
1020 }
1021 #endif
1022 
1023 #ifndef STBI_NO_STDIO
1024 
1025 static FILE *stbi__fopen(char const *filename, char const *mode) {
1026  FILE *f;
1027 #if defined(_MSC_VER) && _MSC_VER >= 1400
1028  if (0 != fopen_s(&f, filename, mode))
1029  f = 0;
1030 #else
1031  f = fopen(filename, mode);
1032 #endif
1033  return f;
1034 }
1035 
1036 STBIDEF stbi_uc *stbi_load(char const *filename, int *x, int *y, int *comp, int req_comp) {
1037  FILE *f = stbi__fopen(filename, "rb");
1038  unsigned char *result;
1039  if (!f)
1040  return stbi__errpuc("can't fopen", "Unable to open file");
1041  result = stbi_load_from_file(f, x, y, comp, req_comp);
1042  fclose(f);
1043  return result;
1044 }
1045 
1046 STBIDEF stbi_uc *stbi_load_from_file(FILE *f, int *x, int *y, int *comp, int req_comp) {
1047  unsigned char *result;
1048  stbi__context s;
1049  stbi__start_file(&s, f);
1050  result = stbi__load_flip(&s, x, y, comp, req_comp);
1051  if (result) {
1052  // need to 'unget' all the characters in the IO buffer
1053  fseek(f, -(int)(s.img_buffer_end - s.img_buffer), SEEK_CUR);
1054  }
1055  return result;
1056 }
1057 #endif
1058 
1059 STBIDEF stbi_uc *
1060 stbi_load_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *comp, int req_comp) {
1061  stbi__context s;
1062  stbi__start_mem(&s, buffer, len);
1063  return stbi__load_flip(&s, x, y, comp, req_comp);
1064 }
1065 
1066 STBIDEF stbi_uc *stbi_load_from_callbacks(
1067  stbi_io_callbacks const *clbk,
1068  void *user,
1069  int *x,
1070  int *y,
1071  int *comp,
1072  int req_comp) {
1073  stbi__context s;
1074  stbi__start_callbacks(&s, (stbi_io_callbacks *)clbk, user);
1075  return stbi__load_flip(&s, x, y, comp, req_comp);
1076 }
1077 
1078 #ifndef STBI_NO_LINEAR
1079 static float *stbi__loadf_main(stbi__context *s, int *x, int *y, int *comp, int req_comp) {
1080  unsigned char *data;
1081 #ifndef STBI_NO_HDR
1082  if (stbi__hdr_test(s)) {
1083  float *hdr_data = stbi__hdr_load(s, x, y, comp, req_comp);
1084  if (hdr_data)
1085  stbi__float_postprocess(hdr_data, x, y, comp, req_comp);
1086  return hdr_data;
1087  }
1088 #endif
1089  data = stbi__load_flip(s, x, y, comp, req_comp);
1090  if (data)
1091  return stbi__ldr_to_hdr(data, *x, *y, req_comp ? req_comp : *comp);
1092  return stbi__errpf("unknown image type", "Image not of any known type, or corrupt");
1093 }
1094 
1095 STBIDEF float *
1096 stbi_loadf_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *comp, int req_comp) {
1097  stbi__context s;
1098  stbi__start_mem(&s, buffer, len);
1099  return stbi__loadf_main(&s, x, y, comp, req_comp);
1100 }
1101 
1102 STBIDEF float *stbi_loadf_from_callbacks(
1103  stbi_io_callbacks const *clbk,
1104  void *user,
1105  int *x,
1106  int *y,
1107  int *comp,
1108  int req_comp) {
1109  stbi__context s;
1110  stbi__start_callbacks(&s, (stbi_io_callbacks *)clbk, user);
1111  return stbi__loadf_main(&s, x, y, comp, req_comp);
1112 }
1113 
1114 #ifndef STBI_NO_STDIO
1115 STBIDEF float *stbi_loadf(char const *filename, int *x, int *y, int *comp, int req_comp) {
1116  float *result;
1117  FILE *f = stbi__fopen(filename, "rb");
1118  if (!f)
1119  return stbi__errpf("can't fopen", "Unable to open file");
1120  result = stbi_loadf_from_file(f, x, y, comp, req_comp);
1121  fclose(f);
1122  return result;
1123 }
1124 
1125 STBIDEF float *stbi_loadf_from_file(FILE *f, int *x, int *y, int *comp, int req_comp) {
1126  stbi__context s;
1127  stbi__start_file(&s, f);
1128  return stbi__loadf_main(&s, x, y, comp, req_comp);
1129 }
1130 #endif // !STBI_NO_STDIO
1131 
1132 #endif // !STBI_NO_LINEAR
1133 
1134 // these is-hdr-or-not is defined independent of whether STBI_NO_LINEAR is
1135 // defined, for API simplicity; if STBI_NO_LINEAR is defined, it always
1136 // reports false!
1137 
1138 STBIDEF int stbi_is_hdr_from_memory(stbi_uc const *buffer, int len) {
1139 #ifndef STBI_NO_HDR
1140  stbi__context s;
1141  stbi__start_mem(&s, buffer, len);
1142  return stbi__hdr_test(&s);
1143 #else
1144  STBI_NOTUSED(buffer);
1145  STBI_NOTUSED(len);
1146  return 0;
1147 #endif
1148 }
1149 
1150 #ifndef STBI_NO_STDIO
1151 STBIDEF int stbi_is_hdr(char const *filename) {
1152  FILE *f = stbi__fopen(filename, "rb");
1153  int result = 0;
1154  if (f) {
1155  result = stbi_is_hdr_from_file(f);
1156  fclose(f);
1157  }
1158  return result;
1159 }
1160 
1161 STBIDEF int stbi_is_hdr_from_file(FILE *f) {
1162 #ifndef STBI_NO_HDR
1163  stbi__context s;
1164  stbi__start_file(&s, f);
1165  return stbi__hdr_test(&s);
1166 #else
1167  STBI_NOTUSED(f);
1168  return 0;
1169 #endif
1170 }
1171 #endif // !STBI_NO_STDIO
1172 
1173 STBIDEF int stbi_is_hdr_from_callbacks(stbi_io_callbacks const *clbk, void *user) {
1174 #ifndef STBI_NO_HDR
1175  stbi__context s;
1176  stbi__start_callbacks(&s, (stbi_io_callbacks *)clbk, user);
1177  return stbi__hdr_test(&s);
1178 #else
1179  STBI_NOTUSED(clbk);
1180  STBI_NOTUSED(user);
1181  return 0;
1182 #endif
1183 }
1184 
1185 #ifndef STBI_NO_LINEAR
1186 static float stbi__l2h_gamma = 2.2f, stbi__l2h_scale = 1.0f;
1187 
1188 STBIDEF void stbi_ldr_to_hdr_gamma(float gamma) { stbi__l2h_gamma = gamma; }
1189 STBIDEF void stbi_ldr_to_hdr_scale(float scale) { stbi__l2h_scale = scale; }
1190 #endif
1191 
1192 static float stbi__h2l_gamma_i = 1.0f / 2.2f, stbi__h2l_scale_i = 1.0f;
1193 
1194 STBIDEF void stbi_hdr_to_ldr_gamma(float gamma) { stbi__h2l_gamma_i = 1 / gamma; }
1195 STBIDEF void stbi_hdr_to_ldr_scale(float scale) { stbi__h2l_scale_i = 1 / scale; }
1196 
1198 //
1199 // Common code used by all image loaders
1200 //
1201 
1202 enum { STBI__SCAN_load = 0, STBI__SCAN_type, STBI__SCAN_header };
1203 
1204 static void stbi__refill_buffer(stbi__context *s) {
1205  int n = (s->io.read)(s->io_user_data, (char *)s->buffer_start, s->buflen);
1206  if (n == 0) {
1207  // at end of file, treat same as if from memory, but need to handle case
1208  // where s->img_buffer isn't pointing to safe memory, e.g. 0-byte file
1209  s->read_from_callbacks = 0;
1210  s->img_buffer = s->buffer_start;
1211  s->img_buffer_end = s->buffer_start + 1;
1212  *s->img_buffer = 0;
1213  }
1214  else {
1215  s->img_buffer = s->buffer_start;
1216  s->img_buffer_end = s->buffer_start + n;
1217  }
1218 }
1219 
1220 stbi_inline static stbi_uc stbi__get8(stbi__context *s) {
1221  if (s->img_buffer < s->img_buffer_end)
1222  return *s->img_buffer++;
1223  if (s->read_from_callbacks) {
1224  stbi__refill_buffer(s);
1225  return *s->img_buffer++;
1226  }
1227  return 0;
1228 }
1229 
1230 stbi_inline static int stbi__at_eof(stbi__context *s) {
1231  if (s->io.read) {
1232  if (!(s->io.eof)(s->io_user_data))
1233  return 0;
1234  // if feof() is true, check if buffer = end
1235  // special case: we've only got the special 0 character at the end
1236  if (s->read_from_callbacks == 0)
1237  return 1;
1238  }
1239 
1240  return s->img_buffer >= s->img_buffer_end;
1241 }
1242 
1243 static void stbi__skip(stbi__context *s, int n) {
1244  if (n < 0) {
1245  s->img_buffer = s->img_buffer_end;
1246  return;
1247  }
1248  if (s->io.read) {
1249  int blen = (int)(s->img_buffer_end - s->img_buffer);
1250  if (blen < n) {
1251  s->img_buffer = s->img_buffer_end;
1252  (s->io.skip)(s->io_user_data, n - blen);
1253  return;
1254  }
1255  }
1256  s->img_buffer += n;
1257 }
1258 
1259 static int stbi__getn(stbi__context *s, stbi_uc *buffer, int n) {
1260  if (s->io.read) {
1261  int blen = (int)(s->img_buffer_end - s->img_buffer);
1262  if (blen < n) {
1263  int res, count;
1264 
1265  memcpy(buffer, s->img_buffer, blen);
1266 
1267  count = (s->io.read)(s->io_user_data, (char *)buffer + blen, n - blen);
1268  res = (count == (n - blen));
1269  s->img_buffer = s->img_buffer_end;
1270  return res;
1271  }
1272  }
1273 
1274  if (s->img_buffer + n <= s->img_buffer_end) {
1275  memcpy(buffer, s->img_buffer, n);
1276  s->img_buffer += n;
1277  return 1;
1278  }
1279  else
1280  return 0;
1281 }
1282 
1283 static int stbi__get16be(stbi__context *s) {
1284  int z = stbi__get8(s);
1285  return (z << 8) + stbi__get8(s);
1286 }
1287 
1288 static stbi__uint32 stbi__get32be(stbi__context *s) {
1289  stbi__uint32 z = stbi__get16be(s);
1290  return (z << 16) + stbi__get16be(s);
1291 }
1292 
1293 #if defined(STBI_NO_BMP) && defined(STBI_NO_TGA) && defined(STBI_NO_GIF)
1294 // nothing
1295 #else
1296 static int stbi__get16le(stbi__context *s) {
1297  int z = stbi__get8(s);
1298  return z + (stbi__get8(s) << 8);
1299 }
1300 #endif
1301 
1302 #ifndef STBI_NO_BMP
1303 static stbi__uint32 stbi__get32le(stbi__context *s) {
1304  stbi__uint32 z = stbi__get16le(s);
1305  return z + (stbi__get16le(s) << 16);
1306 }
1307 #endif
1308 
1309 #define STBI__BYTECAST(x) ((stbi_uc)((x)&255)) // truncate int to byte without warnings
1310 
1312 //
1313 // generic converter from built-in img_n to req_comp
1314 // individual types do this automatically as much as possible (e.g. jpeg
1315 // does all cases internally since it needs to colorspace convert anyway,
1316 // and it never has alpha, so very few cases ). png can automatically
1317 // interleave an alpha=255 channel, but falls back to this for other cases
1318 //
1319 // assume data buffer is malloced, so malloc a new one and free that one
1320 // only failure mode is malloc failing
1321 
1322 static stbi_uc stbi__compute_y(int r, int g, int b) {
1323  return (stbi_uc)(((r * 77) + (g * 150) + (29 * b)) >> 8);
1324 }
1325 
1326 static unsigned char *
1327 stbi__convert_format(unsigned char *data, int img_n, int req_comp, unsigned int x, unsigned int y) {
1328  int i, j;
1329  unsigned char *good;
1330 
1331  if (req_comp == img_n)
1332  return data;
1333  STBI_ASSERT(req_comp >= 1 && req_comp <= 4);
1334 
1335  good = (unsigned char *)stbi__malloc(req_comp * x * y);
1336  if (good == NULL) {
1337  STBI_FREE(data);
1338  return stbi__errpuc("outofmem", "Out of memory");
1339  }
1340 
1341  for (j = 0; j < (int)y; ++j) {
1342  unsigned char *src = data + j * x * img_n;
1343  unsigned char *dest = good + j * x * req_comp;
1344 
1345 #define COMBO(a, b) ((a)*8 + (b))
1346 #define CASE(a, b) \
1347  case COMBO(a, b): \
1348  for (i = x - 1; i >= 0; --i, src += a, dest += b)
1349  // convert source image with img_n components to one with req_comp components;
1350  // avoid switch per pixel, so use switch per scanline and massive macros
1351  switch (COMBO(img_n, req_comp)) {
1352  CASE(1, 2) dest[0] = src[0], dest[1] = 255;
1353  break;
1354  CASE(1, 3) dest[0] = dest[1] = dest[2] = src[0];
1355  break;
1356  CASE(1, 4) dest[0] = dest[1] = dest[2] = src[0], dest[3] = 255;
1357  break;
1358  CASE(2, 1) dest[0] = src[0];
1359  break;
1360  CASE(2, 3) dest[0] = dest[1] = dest[2] = src[0];
1361  break;
1362  CASE(2, 4) dest[0] = dest[1] = dest[2] = src[0], dest[3] = src[1];
1363  break;
1364  CASE(3, 4) dest[0] = src[0], dest[1] = src[1], dest[2] = src[2], dest[3] = 255;
1365  break;
1366  CASE(3, 1) dest[0] = stbi__compute_y(src[0], src[1], src[2]);
1367  break;
1368  CASE(3, 2) dest[0] = stbi__compute_y(src[0], src[1], src[2]), dest[1] = 255;
1369  break;
1370  CASE(4, 1) dest[0] = stbi__compute_y(src[0], src[1], src[2]);
1371  break;
1372  CASE(4, 2) dest[0] = stbi__compute_y(src[0], src[1], src[2]), dest[1] = src[3];
1373  break;
1374  CASE(4, 3) dest[0] = src[0], dest[1] = src[1], dest[2] = src[2];
1375  break;
1376  default: STBI_ASSERT(0);
1377  }
1378 #undef CASE
1379  }
1380 
1381  STBI_FREE(data);
1382  return good;
1383 }
1384 
1385 #ifndef STBI_NO_LINEAR
1386 static float *stbi__ldr_to_hdr(stbi_uc *data, int x, int y, int comp) {
1387  int i, k, n;
1388  float *output = (float *)stbi__malloc(x * y * comp * sizeof(float));
1389  if (output == NULL) {
1390  STBI_FREE(data);
1391  return stbi__errpf("outofmem", "Out of memory");
1392  }
1393  // compute number of non-alpha components
1394  if (comp & 1)
1395  n = comp;
1396  else
1397  n = comp - 1;
1398  for (i = 0; i < x * y; ++i) {
1399  for (k = 0; k < n; ++k) {
1400  output[i * comp + k] =
1401  (float)(powf(data[i * comp + k] / 255.0f, stbi__l2h_gamma) * stbi__l2h_scale);
1402  }
1403  if (k < comp)
1404  output[i * comp + k] = data[i * comp + k] / 255.0f;
1405  }
1406  STBI_FREE(data);
1407  return output;
1408 }
1409 #endif
1410 
1411 #ifndef STBI_NO_HDR
1412 #define stbi__float2int(x) ((int)(x))
1413 static stbi_uc *stbi__hdr_to_ldr(float *data, int x, int y, int comp) {
1414  int i, k, n;
1415  stbi_uc *output = (stbi_uc *)stbi__malloc(x * y * comp);
1416  if (output == NULL) {
1417  STBI_FREE(data);
1418  return stbi__errpuc("outofmem", "Out of memory");
1419  }
1420  // compute number of non-alpha components
1421  if (comp & 1)
1422  n = comp;
1423  else
1424  n = comp - 1;
1425  for (i = 0; i < x * y; ++i) {
1426  for (k = 0; k < n; ++k) {
1427  float z =
1428  (float)pow(data[i * comp + k] * stbi__h2l_scale_i, stbi__h2l_gamma_i) * 255 + 0.5f;
1429  if (z < 0)
1430  z = 0;
1431  if (z > 255)
1432  z = 255;
1433  output[i * comp + k] = (stbi_uc)stbi__float2int(z);
1434  }
1435  if (k < comp) {
1436  float z = data[i * comp + k] * 255 + 0.5f;
1437  if (z < 0)
1438  z = 0;
1439  if (z > 255)
1440  z = 255;
1441  output[i * comp + k] = (stbi_uc)stbi__float2int(z);
1442  }
1443  }
1444  STBI_FREE(data);
1445  return output;
1446 }
1447 #endif
1448 
1450 //
1451 // "baseline" JPEG/JFIF decoder
1452 //
1453 // simple implementation
1454 // - doesn't support delayed output of y-dimension
1455 // - simple interface (only one output format: 8-bit interleaved RGB)
1456 // - doesn't try to recover corrupt jpegs
1457 // - doesn't allow partial loading, loading multiple at once
1458 // - still fast on x86 (copying globals into locals doesn't help x86)
1459 // - allocates lots of intermediate memory (full size of all components)
1460 // - non-interleaved case requires this anyway
1461 // - allows good upsampling (see next)
1462 // high-quality
1463 // - upsampled channels are bilinearly interpolated, even across blocks
1464 // - quality integer IDCT derived from IJG's 'slow'
1465 // performance
1466 // - fast huffman; reasonable integer IDCT
1467 // - some SIMD kernels for common paths on targets with SSE2/NEON
1468 // - uses a lot of intermediate memory, could cache poorly
1469 
1470 #ifndef STBI_NO_JPEG
1471 
1472 // huffman decoding acceleration
1473 #define FAST_BITS 9 // larger handles more cases; smaller stomps less cache
1474 
1475 typedef struct {
1476  stbi_uc fast[1 << FAST_BITS];
1477  // weirdly, repacking this into AoS is a 10% speed loss, instead of a win
1478  stbi__uint16 code[256];
1479  stbi_uc values[256];
1480  stbi_uc size[257];
1481  unsigned int maxcode[18];
1482  int delta[17]; // old 'firstsymbol' - old 'firstcode'
1483 } stbi__huffman;
1484 
1485 typedef struct {
1486  stbi__context *s;
1487  stbi__huffman huff_dc[4];
1488  stbi__huffman huff_ac[4];
1489  stbi_uc dequant[4][64];
1490  stbi__int16 fast_ac[4][1 << FAST_BITS];
1491 
1492  // sizes for components, interleaved MCUs
1493  int img_h_max, img_v_max;
1494  int img_mcu_x, img_mcu_y;
1495  int img_mcu_w, img_mcu_h;
1496 
1497  // definition of jpeg image component
1498  struct {
1499  int id;
1500  int h, v;
1501  int tq;
1502  int hd, ha;
1503  int dc_pred;
1504 
1505  int x, y, w2, h2;
1506  stbi_uc *data;
1507  void *raw_data, *raw_coeff;
1508  stbi_uc *linebuf;
1509  short *coeff; // progressive only
1510  int coeff_w, coeff_h; // number of 8x8 coefficient blocks
1511  } img_comp[4];
1512 
1513  stbi__uint32 code_buffer; // jpeg entropy-coded buffer
1514  int code_bits; // number of valid bits
1515  unsigned char marker; // marker seen while filling entropy buffer
1516  int nomore; // flag if we saw a marker so must stop
1517 
1518  int progressive;
1519  int spec_start;
1520  int spec_end;
1521  int succ_high;
1522  int succ_low;
1523  int eob_run;
1524  int rgb;
1525 
1526  int scan_n, order[4];
1527  int restart_interval, todo;
1528 
1529  // kernels
1530  void (*idct_block_kernel)(stbi_uc *out, int out_stride, short data[64]);
1531  void (*YCbCr_to_RGB_kernel)(
1532  stbi_uc *out,
1533  const stbi_uc *y,
1534  const stbi_uc *pcb,
1535  const stbi_uc *pcr,
1536  int count,
1537  int step);
1538  stbi_uc *(
1539  *resample_row_hv_2_kernel)(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs);
1540 } stbi__jpeg;
1541 
1542 static int stbi__build_huffman(stbi__huffman *h, int *count) {
1543  int i, j, k = 0, code;
1544  // build size list for each symbol (from JPEG spec)
1545  for (i = 0; i < 16; ++i)
1546  for (j = 0; j < count[i]; ++j)
1547  h->size[k++] = (stbi_uc)(i + 1);
1548  h->size[k] = 0;
1549 
1550  // compute actual symbols (from jpeg spec)
1551  code = 0;
1552  k = 0;
1553  for (j = 1; j <= 16; ++j) {
1554  // compute delta to add to code to compute symbol id
1555  h->delta[j] = k - code;
1556  if (h->size[k] == j) {
1557  while (h->size[k] == j)
1558  h->code[k++] = (stbi__uint16)(code++);
1559  if (code - 1 >= (1 << j))
1560  return stbi__err("bad code lengths", "Corrupt JPEG");
1561  }
1562  // compute largest code + 1 for this size, preshifted as needed later
1563  h->maxcode[j] = code << (16 - j);
1564  code <<= 1;
1565  }
1566  h->maxcode[j] = 0xffffffff;
1567 
1568  // build non-spec acceleration table; 255 is flag for not-accelerated
1569  memset(h->fast, 255, 1 << FAST_BITS);
1570  for (i = 0; i < k; ++i) {
1571  int s = h->size[i];
1572  if (s <= FAST_BITS) {
1573  int c = h->code[i] << (FAST_BITS - s);
1574  int m = 1 << (FAST_BITS - s);
1575  for (j = 0; j < m; ++j) {
1576  h->fast[c + j] = (stbi_uc)i;
1577  }
1578  }
1579  }
1580  return 1;
1581 }
1582 
1583 // build a table that decodes both magnitude and value of small ACs in
1584 // one go.
1585 static void stbi__build_fast_ac(stbi__int16 *fast_ac, stbi__huffman *h) {
1586  int i;
1587  for (i = 0; i < (1 << FAST_BITS); ++i) {
1588  stbi_uc fast = h->fast[i];
1589  fast_ac[i] = 0;
1590  if (fast < 255) {
1591  int rs = h->values[fast];
1592  int run = (rs >> 4) & 15;
1593  int magbits = rs & 15;
1594  int len = h->size[fast];
1595 
1596  if (magbits && len + magbits <= FAST_BITS) {
1597  // magnitude code followed by receive_extend code
1598  int k = ((i << len) & ((1 << FAST_BITS) - 1)) >> (FAST_BITS - magbits);
1599  int m = 1 << (magbits - 1);
1600  if (k < m)
1601  k += (-1 << magbits) + 1;
1602  // if the result is small enough, we can fit it in fast_ac table
1603  if (k >= -128 && k <= 127)
1604  fast_ac[i] = (stbi__int16)((k << 8) + (run << 4) + (len + magbits));
1605  }
1606  }
1607  }
1608 }
1609 
1610 static void stbi__grow_buffer_unsafe(stbi__jpeg *j) {
1611  do {
1612  int b = j->nomore ? 0 : stbi__get8(j->s);
1613  if (b == 0xff) {
1614  int c = stbi__get8(j->s);
1615  if (c != 0) {
1616  j->marker = (unsigned char)c;
1617  j->nomore = 1;
1618  return;
1619  }
1620  }
1621  j->code_buffer |= b << (24 - j->code_bits);
1622  j->code_bits += 8;
1623  } while (j->code_bits <= 24);
1624 }
1625 
1626 // (1 << n) - 1
1627 static stbi__uint32 stbi__bmask[17] =
1628  {0, 1, 3, 7, 15, 31, 63, 127, 255, 511, 1023, 2047, 4095, 8191, 16383, 32767, 65535};
1629 
1630 // decode a jpeg huffman value from the bitstream
1631 stbi_inline static int stbi__jpeg_huff_decode(stbi__jpeg *j, stbi__huffman *h) {
1632  unsigned int temp;
1633  int c, k;
1634 
1635  if (j->code_bits < 16)
1636  stbi__grow_buffer_unsafe(j);
1637 
1638  // look at the top FAST_BITS and determine what symbol ID it is,
1639  // if the code is <= FAST_BITS
1640  c = (j->code_buffer >> (32 - FAST_BITS)) & ((1 << FAST_BITS) - 1);
1641  k = h->fast[c];
1642  if (k < 255) {
1643  int s = h->size[k];
1644  if (s > j->code_bits)
1645  return -1;
1646  j->code_buffer <<= s;
1647  j->code_bits -= s;
1648  return h->values[k];
1649  }
1650 
1651  // naive test is to shift the code_buffer down so k bits are
1652  // valid, then test against maxcode. To speed this up, we've
1653  // preshifted maxcode left so that it has (16-k) 0s at the
1654  // end; in other words, regardless of the number of bits, it
1655  // wants to be compared against something shifted to have 16;
1656  // that way we don't need to shift inside the loop.
1657  temp = j->code_buffer >> 16;
1658  for (k = FAST_BITS + 1;; ++k)
1659  if (temp < h->maxcode[k])
1660  break;
1661  if (k == 17) {
1662  // error! code not found
1663  j->code_bits -= 16;
1664  return -1;
1665  }
1666 
1667  if (k > j->code_bits)
1668  return -1;
1669 
1670  // convert the huffman code to the symbol id
1671  c = ((j->code_buffer >> (32 - k)) & stbi__bmask[k]) + h->delta[k];
1672  STBI_ASSERT((((j->code_buffer) >> (32 - h->size[c])) & stbi__bmask[h->size[c]]) == h->code[c]);
1673 
1674  // convert the id to a symbol
1675  j->code_bits -= k;
1676  j->code_buffer <<= k;
1677  return h->values[c];
1678 }
1679 
1680 // bias[n] = (-1<<n) + 1
1681 static int const stbi__jbias[16] =
1682  {0, -1, -3, -7, -15, -31, -63, -127, -255, -511, -1023, -2047, -4095, -8191, -16383, -32767};
1683 
1684 // combined JPEG 'receive' and JPEG 'extend', since baseline
1685 // always extends everything it receives.
1686 stbi_inline static int stbi__extend_receive(stbi__jpeg *j, int n) {
1687  unsigned int k;
1688  int sgn;
1689  if (j->code_bits < n)
1690  stbi__grow_buffer_unsafe(j);
1691 
1692  sgn = (stbi__int32)j->code_buffer >> 31; // sign bit is always in MSB
1693  k = stbi_lrot(j->code_buffer, n);
1694  STBI_ASSERT(n >= 0 && n < (int)(sizeof(stbi__bmask) / sizeof(*stbi__bmask)));
1695  j->code_buffer = k & ~stbi__bmask[n];
1696  k &= stbi__bmask[n];
1697  j->code_bits -= n;
1698  return k + (stbi__jbias[n] & ~sgn);
1699 }
1700 
1701 // get some unsigned bits
1702 stbi_inline static int stbi__jpeg_get_bits(stbi__jpeg *j, int n) {
1703  unsigned int k;
1704  if (j->code_bits < n)
1705  stbi__grow_buffer_unsafe(j);
1706  k = stbi_lrot(j->code_buffer, n);
1707  j->code_buffer = k & ~stbi__bmask[n];
1708  k &= stbi__bmask[n];
1709  j->code_bits -= n;
1710  return k;
1711 }
1712 
1713 stbi_inline static int stbi__jpeg_get_bit(stbi__jpeg *j) {
1714  unsigned int k;
1715  if (j->code_bits < 1)
1716  stbi__grow_buffer_unsafe(j);
1717  k = j->code_buffer;
1718  j->code_buffer <<= 1;
1719  --j->code_bits;
1720  return k & 0x80000000;
1721 }
1722 
1723 // given a value that's at position X in the zigzag stream,
1724 // where does it appear in the 8x8 matrix coded as row-major?
1725 static stbi_uc stbi__jpeg_dezigzag[64 + 15] = {0,
1726  1,
1727  8,
1728  16,
1729  9,
1730  2,
1731  3,
1732  10,
1733  17,
1734  24,
1735  32,
1736  25,
1737  18,
1738  11,
1739  4,
1740  5,
1741  12,
1742  19,
1743  26,
1744  33,
1745  40,
1746  48,
1747  41,
1748  34,
1749  27,
1750  20,
1751  13,
1752  6,
1753  7,
1754  14,
1755  21,
1756  28,
1757  35,
1758  42,
1759  49,
1760  56,
1761  57,
1762  50,
1763  43,
1764  36,
1765  29,
1766  22,
1767  15,
1768  23,
1769  30,
1770  37,
1771  44,
1772  51,
1773  58,
1774  59,
1775  52,
1776  45,
1777  38,
1778  31,
1779  39,
1780  46,
1781  53,
1782  60,
1783  61,
1784  54,
1785  47,
1786  55,
1787  62,
1788  63,
1789  // let corrupt input sample past end
1790  63,
1791  63,
1792  63,
1793  63,
1794  63,
1795  63,
1796  63,
1797  63,
1798  63,
1799  63,
1800  63,
1801  63,
1802  63,
1803  63,
1804  63};
1805 
1806 // decode one 64-entry block--
1807 static int stbi__jpeg_decode_block(
1808  stbi__jpeg *j,
1809  short data[64],
1810  stbi__huffman *hdc,
1811  stbi__huffman *hac,
1812  stbi__int16 *fac,
1813  int b,
1814  stbi_uc *dequant) {
1815  int diff, dc, k;
1816  int t;
1817 
1818  if (j->code_bits < 16)
1819  stbi__grow_buffer_unsafe(j);
1820  t = stbi__jpeg_huff_decode(j, hdc);
1821  if (t < 0)
1822  return stbi__err("bad huffman code", "Corrupt JPEG");
1823 
1824  // 0 all the ac values now so we can do it 32-bits at a time
1825  memset(data, 0, 64 * sizeof(data[0]));
1826 
1827  diff = t ? stbi__extend_receive(j, t) : 0;
1828  dc = j->img_comp[b].dc_pred + diff;
1829  j->img_comp[b].dc_pred = dc;
1830  data[0] = (short)(dc * dequant[0]);
1831 
1832  // decode AC components, see JPEG spec
1833  k = 1;
1834  do {
1835  unsigned int zig;
1836  int c, r, s;
1837  if (j->code_bits < 16)
1838  stbi__grow_buffer_unsafe(j);
1839  c = (j->code_buffer >> (32 - FAST_BITS)) & ((1 << FAST_BITS) - 1);
1840  r = fac[c];
1841  if (r) { // fast-AC path
1842  k += (r >> 4) & 15; // run
1843  s = r & 15; // combined length
1844  j->code_buffer <<= s;
1845  j->code_bits -= s;
1846  // decode into unzigzag'd location
1847  zig = stbi__jpeg_dezigzag[k++];
1848  data[zig] = (short)((r >> 8) * dequant[zig]);
1849  }
1850  else {
1851  int rs = stbi__jpeg_huff_decode(j, hac);
1852  if (rs < 0)
1853  return stbi__err("bad huffman code", "Corrupt JPEG");
1854  s = rs & 15;
1855  r = rs >> 4;
1856  if (s == 0) {
1857  if (rs != 0xf0)
1858  break; // end block
1859  k += 16;
1860  }
1861  else {
1862  k += r;
1863  // decode into unzigzag'd location
1864  zig = stbi__jpeg_dezigzag[k++];
1865  data[zig] = (short)(stbi__extend_receive(j, s) * dequant[zig]);
1866  }
1867  }
1868  } while (k < 64);
1869  return 1;
1870 }
1871 
1872 static int
1873 stbi__jpeg_decode_block_prog_dc(stbi__jpeg *j, short data[64], stbi__huffman *hdc, int b) {
1874  int diff, dc;
1875  int t;
1876  if (j->spec_end != 0)
1877  return stbi__err("can't merge dc and ac", "Corrupt JPEG");
1878 
1879  if (j->code_bits < 16)
1880  stbi__grow_buffer_unsafe(j);
1881 
1882  if (j->succ_high == 0) {
1883  // first scan for DC coefficient, must be first
1884  memset(data, 0, 64 * sizeof(data[0])); // 0 all the ac values now
1885  t = stbi__jpeg_huff_decode(j, hdc);
1886  diff = t ? stbi__extend_receive(j, t) : 0;
1887 
1888  dc = j->img_comp[b].dc_pred + diff;
1889  j->img_comp[b].dc_pred = dc;
1890  data[0] = (short)(dc << j->succ_low);
1891  }
1892  else {
1893  // refinement scan for DC coefficient
1894  if (stbi__jpeg_get_bit(j))
1895  data[0] += (short)(1 << j->succ_low);
1896  }
1897  return 1;
1898 }
1899 
1900 // @OPTIMIZE: store non-zigzagged during the decode passes,
1901 // and only de-zigzag when dequantizing
1902 static int stbi__jpeg_decode_block_prog_ac(
1903  stbi__jpeg *j,
1904  short data[64],
1905  stbi__huffman *hac,
1906  stbi__int16 *fac) {
1907  int k;
1908  if (j->spec_start == 0)
1909  return stbi__err("can't merge dc and ac", "Corrupt JPEG");
1910 
1911  if (j->succ_high == 0) {
1912  int shift = j->succ_low;
1913 
1914  if (j->eob_run) {
1915  --j->eob_run;
1916  return 1;
1917  }
1918 
1919  k = j->spec_start;
1920  do {
1921  unsigned int zig;
1922  int c, r, s;
1923  if (j->code_bits < 16)
1924  stbi__grow_buffer_unsafe(j);
1925  c = (j->code_buffer >> (32 - FAST_BITS)) & ((1 << FAST_BITS) - 1);
1926  r = fac[c];
1927  if (r) { // fast-AC path
1928  k += (r >> 4) & 15; // run
1929  s = r & 15; // combined length
1930  j->code_buffer <<= s;
1931  j->code_bits -= s;
1932  zig = stbi__jpeg_dezigzag[k++];
1933  data[zig] = (short)((r >> 8) << shift);
1934  }
1935  else {
1936  int rs = stbi__jpeg_huff_decode(j, hac);
1937  if (rs < 0)
1938  return stbi__err("bad huffman code", "Corrupt JPEG");
1939  s = rs & 15;
1940  r = rs >> 4;
1941  if (s == 0) {
1942  if (r < 15) {
1943  j->eob_run = (1 << r);
1944  if (r)
1945  j->eob_run += stbi__jpeg_get_bits(j, r);
1946  --j->eob_run;
1947  break;
1948  }
1949  k += 16;
1950  }
1951  else {
1952  k += r;
1953  zig = stbi__jpeg_dezigzag[k++];
1954  data[zig] = (short)(stbi__extend_receive(j, s) << shift);
1955  }
1956  }
1957  } while (k <= j->spec_end);
1958  }
1959  else {
1960  // refinement scan for these AC coefficients
1961 
1962  short bit = (short)(1 << j->succ_low);
1963 
1964  if (j->eob_run) {
1965  --j->eob_run;
1966  for (k = j->spec_start; k <= j->spec_end; ++k) {
1967  short *p = &data[stbi__jpeg_dezigzag[k]];
1968  if (*p != 0)
1969  if (stbi__jpeg_get_bit(j))
1970  if ((*p & bit) == 0) {
1971  if (*p > 0)
1972  *p += bit;
1973  else
1974  *p -= bit;
1975  }
1976  }
1977  }
1978  else {
1979  k = j->spec_start;
1980  do {
1981  int r, s;
1982  int rs = stbi__jpeg_huff_decode(j, hac); // @OPTIMIZE see if we can use the fast path
1983  // here, advance-by-r is so slow, eh
1984  if (rs < 0)
1985  return stbi__err("bad huffman code", "Corrupt JPEG");
1986  s = rs & 15;
1987  r = rs >> 4;
1988  if (s == 0) {
1989  if (r < 15) {
1990  j->eob_run = (1 << r) - 1;
1991  if (r)
1992  j->eob_run += stbi__jpeg_get_bits(j, r);
1993  r = 64; // force end of block
1994  }
1995  else {
1996  // r=15 s=0 should write 16 0s, so we just do
1997  // a run of 15 0s and then write s (which is 0),
1998  // so we don't have to do anything special here
1999  }
2000  }
2001  else {
2002  if (s != 1)
2003  return stbi__err("bad huffman code", "Corrupt JPEG");
2004  // sign bit
2005  if (stbi__jpeg_get_bit(j))
2006  s = bit;
2007  else
2008  s = -bit;
2009  }
2010 
2011  // advance by r
2012  while (k <= j->spec_end) {
2013  short *p = &data[stbi__jpeg_dezigzag[k++]];
2014  if (*p != 0) {
2015  if (stbi__jpeg_get_bit(j))
2016  if ((*p & bit) == 0) {
2017  if (*p > 0)
2018  *p += bit;
2019  else
2020  *p -= bit;
2021  }
2022  }
2023  else {
2024  if (r == 0) {
2025  *p = (short)s;
2026  break;
2027  }
2028  --r;
2029  }
2030  }
2031  } while (k <= j->spec_end);
2032  }
2033  }
2034  return 1;
2035 }
2036 
2037 // take a -128..127 value and stbi__clamp it and convert to 0..255
2038 stbi_inline static stbi_uc stbi__clamp(int x) {
2039  // trick to use a single test to catch both cases
2040  if ((unsigned int)x > 255) {
2041  if (x < 0)
2042  return 0;
2043  if (x > 255)
2044  return 255;
2045  }
2046  return (stbi_uc)x;
2047 }
2048 
2049 #define stbi__f2f(x) ((int)(((x)*4096 + 0.5f)))
2050 #define stbi__fsh(x) ((x) << 12)
2051 
2052 // derived from jidctint -- DCT_ISLOW
2053 #define STBI__IDCT_1D(s0, s1, s2, s3, s4, s5, s6, s7) \
2054  int t0, t1, t2, t3, p1, p2, p3, p4, p5, x0, x1, x2, x3; \
2055  p2 = s2; \
2056  p3 = s6; \
2057  p1 = (p2 + p3) * stbi__f2f(0.5411961f); \
2058  t2 = p1 + p3 * stbi__f2f(-1.847759065f); \
2059  t3 = p1 + p2 * stbi__f2f(0.765366865f); \
2060  p2 = s0; \
2061  p3 = s4; \
2062  t0 = stbi__fsh(p2 + p3); \
2063  t1 = stbi__fsh(p2 - p3); \
2064  x0 = t0 + t3; \
2065  x3 = t0 - t3; \
2066  x1 = t1 + t2; \
2067  x2 = t1 - t2; \
2068  t0 = s7; \
2069  t1 = s5; \
2070  t2 = s3; \
2071  t3 = s1; \
2072  p3 = t0 + t2; \
2073  p4 = t1 + t3; \
2074  p1 = t0 + t3; \
2075  p2 = t1 + t2; \
2076  p5 = (p3 + p4) * stbi__f2f(1.175875602f); \
2077  t0 = t0 * stbi__f2f(0.298631336f); \
2078  t1 = t1 * stbi__f2f(2.053119869f); \
2079  t2 = t2 * stbi__f2f(3.072711026f); \
2080  t3 = t3 * stbi__f2f(1.501321110f); \
2081  p1 = p5 + p1 * stbi__f2f(-0.899976223f); \
2082  p2 = p5 + p2 * stbi__f2f(-2.562915447f); \
2083  p3 = p3 * stbi__f2f(-1.961570560f); \
2084  p4 = p4 * stbi__f2f(-0.390180644f); \
2085  t3 += p1 + p4; \
2086  t2 += p2 + p3; \
2087  t1 += p2 + p4; \
2088  t0 += p1 + p3;
2089 
2090 static void stbi__idct_block(stbi_uc *out, int out_stride, short data[64]) {
2091  int i, val[64], *v = val;
2092  stbi_uc *o;
2093  short *d = data;
2094 
2095  // columns
2096  for (i = 0; i < 8; ++i, ++d, ++v) {
2097  // if all zeroes, shortcut -- this avoids dequantizing 0s and IDCTing
2098  if (d[8] == 0 && d[16] == 0 && d[24] == 0 && d[32] == 0 && d[40] == 0 && d[48] == 0
2099  && d[56] == 0) {
2100  // no shortcut 0 seconds
2101  // (1|2|3|4|5|6|7)==0 0 seconds
2102  // all separate -0.047 seconds
2103  // 1 && 2|3 && 4|5 && 6|7: -0.047 seconds
2104  int dcterm = d[0] << 2;
2105  v[0] = v[8] = v[16] = v[24] = v[32] = v[40] = v[48] = v[56] = dcterm;
2106  }
2107  else {
2108  STBI__IDCT_1D(d[0], d[8], d[16], d[24], d[32], d[40], d[48], d[56])
2109  // constants scaled things up by 1<<12; let's bring them back
2110  // down, but keep 2 extra bits of precision
2111  x0 += 512;
2112  x1 += 512;
2113  x2 += 512;
2114  x3 += 512;
2115  v[0] = (x0 + t3) >> 10;
2116  v[56] = (x0 - t3) >> 10;
2117  v[8] = (x1 + t2) >> 10;
2118  v[48] = (x1 - t2) >> 10;
2119  v[16] = (x2 + t1) >> 10;
2120  v[40] = (x2 - t1) >> 10;
2121  v[24] = (x3 + t0) >> 10;
2122  v[32] = (x3 - t0) >> 10;
2123  }
2124  }
2125 
2126  for (i = 0, v = val, o = out; i < 8; ++i, v += 8, o += out_stride) {
2127  // no fast case since the first 1D IDCT spread components out
2128  STBI__IDCT_1D(v[0], v[1], v[2], v[3], v[4], v[5], v[6], v[7])
2129  // constants scaled things up by 1<<12, plus we had 1<<2 from first
2130  // loop, plus horizontal and vertical each scale by sqrt(8) so together
2131  // we've got an extra 1<<3, so 1<<17 total we need to remove.
2132  // so we want to round that, which means adding 0.5 * 1<<17,
2133  // aka 65536. Also, we'll end up with -128 to 127 that we want
2134  // to encode as 0..255 by adding 128, so we'll add that before the shift
2135  x0 += 65536 + (128 << 17);
2136  x1 += 65536 + (128 << 17);
2137  x2 += 65536 + (128 << 17);
2138  x3 += 65536 + (128 << 17);
2139  // tried computing the shifts into temps, or'ing the temps to see
2140  // if any were out of range, but that was slower
2141  o[0] = stbi__clamp((x0 + t3) >> 17);
2142  o[7] = stbi__clamp((x0 - t3) >> 17);
2143  o[1] = stbi__clamp((x1 + t2) >> 17);
2144  o[6] = stbi__clamp((x1 - t2) >> 17);
2145  o[2] = stbi__clamp((x2 + t1) >> 17);
2146  o[5] = stbi__clamp((x2 - t1) >> 17);
2147  o[3] = stbi__clamp((x3 + t0) >> 17);
2148  o[4] = stbi__clamp((x3 - t0) >> 17);
2149  }
2150 }
2151 
2152 #ifdef STBI_SSE2
2153 // sse2 integer IDCT. not the fastest possible implementation but it
2154 // produces bit-identical results to the generic C version so it's
2155 // fully "transparent".
2156 static void stbi__idct_simd(stbi_uc *out, int out_stride, short data[64]) {
2157  // This is constructed to match our regular (generic) integer IDCT exactly.
2158  __m128i row0, row1, row2, row3, row4, row5, row6, row7;
2159  __m128i tmp;
2160 
2161 // dot product constant: even elems=x, odd elems=y
2162 #define dct_const(x, y) _mm_setr_epi16((x), (y), (x), (y), (x), (y), (x), (y))
2163 
2164 // out(0) = c0[even]*x + c0[odd]*y (c0, x, y 16-bit, out 32-bit)
2165 // out(1) = c1[even]*x + c1[odd]*y
2166 #define dct_rot(out0, out1, x, y, c0, c1) \
2167  __m128i c0##lo = _mm_unpacklo_epi16((x), (y)); \
2168  __m128i c0##hi = _mm_unpackhi_epi16((x), (y)); \
2169  __m128i out0##_l = _mm_madd_epi16(c0##lo, c0); \
2170  __m128i out0##_h = _mm_madd_epi16(c0##hi, c0); \
2171  __m128i out1##_l = _mm_madd_epi16(c0##lo, c1); \
2172  __m128i out1##_h = _mm_madd_epi16(c0##hi, c1)
2173 
2174 // out = in << 12 (in 16-bit, out 32-bit)
2175 #define dct_widen(out, in) \
2176  __m128i out##_l = _mm_srai_epi32(_mm_unpacklo_epi16(_mm_setzero_si128(), (in)), 4); \
2177  __m128i out##_h = _mm_srai_epi32(_mm_unpackhi_epi16(_mm_setzero_si128(), (in)), 4)
2178 
2179 // wide add
2180 #define dct_wadd(out, a, b) \
2181  __m128i out##_l = _mm_add_epi32(a##_l, b##_l); \
2182  __m128i out##_h = _mm_add_epi32(a##_h, b##_h)
2183 
2184 // wide sub
2185 #define dct_wsub(out, a, b) \
2186  __m128i out##_l = _mm_sub_epi32(a##_l, b##_l); \
2187  __m128i out##_h = _mm_sub_epi32(a##_h, b##_h)
2188 
2189 // butterfly a/b, add bias, then shift by "s" and pack
2190 #define dct_bfly32o(out0, out1, a, b, bias, s) \
2191  { \
2192  __m128i abiased_l = _mm_add_epi32(a##_l, bias); \
2193  __m128i abiased_h = _mm_add_epi32(a##_h, bias); \
2194  dct_wadd(sum, abiased, b); \
2195  dct_wsub(dif, abiased, b); \
2196  out0 = _mm_packs_epi32(_mm_srai_epi32(sum_l, s), _mm_srai_epi32(sum_h, s)); \
2197  out1 = _mm_packs_epi32(_mm_srai_epi32(dif_l, s), _mm_srai_epi32(dif_h, s)); \
2198  }
2199 
2200 // 8-bit interleave step (for transposes)
2201 #define dct_interleave8(a, b) \
2202  tmp = a; \
2203  a = _mm_unpacklo_epi8(a, b); \
2204  b = _mm_unpackhi_epi8(tmp, b)
2205 
2206 // 16-bit interleave step (for transposes)
2207 #define dct_interleave16(a, b) \
2208  tmp = a; \
2209  a = _mm_unpacklo_epi16(a, b); \
2210  b = _mm_unpackhi_epi16(tmp, b)
2211 
2212 #define dct_pass(bias, shift) \
2213  { \
2214  /* even part */ \
2215  dct_rot(t2e, t3e, row2, row6, rot0_0, rot0_1); \
2216  __m128i sum04 = _mm_add_epi16(row0, row4); \
2217  __m128i dif04 = _mm_sub_epi16(row0, row4); \
2218  dct_widen(t0e, sum04); \
2219  dct_widen(t1e, dif04); \
2220  dct_wadd(x0, t0e, t3e); \
2221  dct_wsub(x3, t0e, t3e); \
2222  dct_wadd(x1, t1e, t2e); \
2223  dct_wsub(x2, t1e, t2e); \
2224  /* odd part */ \
2225  dct_rot(y0o, y2o, row7, row3, rot2_0, rot2_1); \
2226  dct_rot(y1o, y3o, row5, row1, rot3_0, rot3_1); \
2227  __m128i sum17 = _mm_add_epi16(row1, row7); \
2228  __m128i sum35 = _mm_add_epi16(row3, row5); \
2229  dct_rot(y4o, y5o, sum17, sum35, rot1_0, rot1_1); \
2230  dct_wadd(x4, y0o, y4o); \
2231  dct_wadd(x5, y1o, y5o); \
2232  dct_wadd(x6, y2o, y5o); \
2233  dct_wadd(x7, y3o, y4o); \
2234  dct_bfly32o(row0, row7, x0, x7, bias, shift); \
2235  dct_bfly32o(row1, row6, x1, x6, bias, shift); \
2236  dct_bfly32o(row2, row5, x2, x5, bias, shift); \
2237  dct_bfly32o(row3, row4, x3, x4, bias, shift); \
2238  }
2239 
2240  __m128i rot0_0 =
2241  dct_const(stbi__f2f(0.5411961f), stbi__f2f(0.5411961f) + stbi__f2f(-1.847759065f));
2242  __m128i rot0_1 =
2243  dct_const(stbi__f2f(0.5411961f) + stbi__f2f(0.765366865f), stbi__f2f(0.5411961f));
2244  __m128i rot1_0 =
2245  dct_const(stbi__f2f(1.175875602f) + stbi__f2f(-0.899976223f), stbi__f2f(1.175875602f));
2246  __m128i rot1_1 =
2247  dct_const(stbi__f2f(1.175875602f), stbi__f2f(1.175875602f) + stbi__f2f(-2.562915447f));
2248  __m128i rot2_0 =
2249  dct_const(stbi__f2f(-1.961570560f) + stbi__f2f(0.298631336f), stbi__f2f(-1.961570560f));
2250  __m128i rot2_1 =
2251  dct_const(stbi__f2f(-1.961570560f), stbi__f2f(-1.961570560f) + stbi__f2f(3.072711026f));
2252  __m128i rot3_0 =
2253  dct_const(stbi__f2f(-0.390180644f) + stbi__f2f(2.053119869f), stbi__f2f(-0.390180644f));
2254  __m128i rot3_1 =
2255  dct_const(stbi__f2f(-0.390180644f), stbi__f2f(-0.390180644f) + stbi__f2f(1.501321110f));
2256 
2257  // rounding biases in column/row passes, see stbi__idct_block for explanation.
2258  __m128i bias_0 = _mm_set1_epi32(512);
2259  __m128i bias_1 = _mm_set1_epi32(65536 + (128 << 17));
2260 
2261  // load
2262  row0 = _mm_load_si128((const __m128i *)(data + 0 * 8));
2263  row1 = _mm_load_si128((const __m128i *)(data + 1 * 8));
2264  row2 = _mm_load_si128((const __m128i *)(data + 2 * 8));
2265  row3 = _mm_load_si128((const __m128i *)(data + 3 * 8));
2266  row4 = _mm_load_si128((const __m128i *)(data + 4 * 8));
2267  row5 = _mm_load_si128((const __m128i *)(data + 5 * 8));
2268  row6 = _mm_load_si128((const __m128i *)(data + 6 * 8));
2269  row7 = _mm_load_si128((const __m128i *)(data + 7 * 8));
2270 
2271  // column pass
2272  dct_pass(bias_0, 10);
2273 
2274  {
2275  // 16bit 8x8 transpose pass 1
2276  dct_interleave16(row0, row4);
2277  dct_interleave16(row1, row5);
2278  dct_interleave16(row2, row6);
2279  dct_interleave16(row3, row7);
2280 
2281  // transpose pass 2
2282  dct_interleave16(row0, row2);
2283  dct_interleave16(row1, row3);
2284  dct_interleave16(row4, row6);
2285  dct_interleave16(row5, row7);
2286 
2287  // transpose pass 3
2288  dct_interleave16(row0, row1);
2289  dct_interleave16(row2, row3);
2290  dct_interleave16(row4, row5);
2291  dct_interleave16(row6, row7);
2292  }
2293 
2294  // row pass
2295  dct_pass(bias_1, 17);
2296 
2297  {
2298  // pack
2299  __m128i p0 = _mm_packus_epi16(row0, row1); // a0a1a2a3...a7b0b1b2b3...b7
2300  __m128i p1 = _mm_packus_epi16(row2, row3);
2301  __m128i p2 = _mm_packus_epi16(row4, row5);
2302  __m128i p3 = _mm_packus_epi16(row6, row7);
2303 
2304  // 8bit 8x8 transpose pass 1
2305  dct_interleave8(p0, p2); // a0e0a1e1...
2306  dct_interleave8(p1, p3); // c0g0c1g1...
2307 
2308  // transpose pass 2
2309  dct_interleave8(p0, p1); // a0c0e0g0...
2310  dct_interleave8(p2, p3); // b0d0f0h0...
2311 
2312  // transpose pass 3
2313  dct_interleave8(p0, p2); // a0b0c0d0...
2314  dct_interleave8(p1, p3); // a4b4c4d4...
2315 
2316  // store
2317  _mm_storel_epi64((__m128i *)out, p0);
2318  out += out_stride;
2319  _mm_storel_epi64((__m128i *)out, _mm_shuffle_epi32(p0, 0x4e));
2320  out += out_stride;
2321  _mm_storel_epi64((__m128i *)out, p2);
2322  out += out_stride;
2323  _mm_storel_epi64((__m128i *)out, _mm_shuffle_epi32(p2, 0x4e));
2324  out += out_stride;
2325  _mm_storel_epi64((__m128i *)out, p1);
2326  out += out_stride;
2327  _mm_storel_epi64((__m128i *)out, _mm_shuffle_epi32(p1, 0x4e));
2328  out += out_stride;
2329  _mm_storel_epi64((__m128i *)out, p3);
2330  out += out_stride;
2331  _mm_storel_epi64((__m128i *)out, _mm_shuffle_epi32(p3, 0x4e));
2332  }
2333 
2334 #undef dct_const
2335 #undef dct_rot
2336 #undef dct_widen
2337 #undef dct_wadd
2338 #undef dct_wsub
2339 #undef dct_bfly32o
2340 #undef dct_interleave8
2341 #undef dct_interleave16
2342 #undef dct_pass
2343 }
2344 
2345 #endif // STBI_SSE2
2346 
2347 #ifdef STBI_NEON
2348 
2349 // NEON integer IDCT. should produce bit-identical
2350 // results to the generic C version.
2351 static void stbi__idct_simd(stbi_uc *out, int out_stride, short data[64]) {
2352  int16x8_t row0, row1, row2, row3, row4, row5, row6, row7;
2353 
2354  int16x4_t rot0_0 = vdup_n_s16(stbi__f2f(0.5411961f));
2355  int16x4_t rot0_1 = vdup_n_s16(stbi__f2f(-1.847759065f));
2356  int16x4_t rot0_2 = vdup_n_s16(stbi__f2f(0.765366865f));
2357  int16x4_t rot1_0 = vdup_n_s16(stbi__f2f(1.175875602f));
2358  int16x4_t rot1_1 = vdup_n_s16(stbi__f2f(-0.899976223f));
2359  int16x4_t rot1_2 = vdup_n_s16(stbi__f2f(-2.562915447f));
2360  int16x4_t rot2_0 = vdup_n_s16(stbi__f2f(-1.961570560f));
2361  int16x4_t rot2_1 = vdup_n_s16(stbi__f2f(-0.390180644f));
2362  int16x4_t rot3_0 = vdup_n_s16(stbi__f2f(0.298631336f));
2363  int16x4_t rot3_1 = vdup_n_s16(stbi__f2f(2.053119869f));
2364  int16x4_t rot3_2 = vdup_n_s16(stbi__f2f(3.072711026f));
2365  int16x4_t rot3_3 = vdup_n_s16(stbi__f2f(1.501321110f));
2366 
2367 #define dct_long_mul(out, inq, coeff) \
2368  int32x4_t out##_l = vmull_s16(vget_low_s16(inq), coeff); \
2369  int32x4_t out##_h = vmull_s16(vget_high_s16(inq), coeff)
2370 
2371 #define dct_long_mac(out, acc, inq, coeff) \
2372  int32x4_t out##_l = vmlal_s16(acc##_l, vget_low_s16(inq), coeff); \
2373  int32x4_t out##_h = vmlal_s16(acc##_h, vget_high_s16(inq), coeff)
2374 
2375 #define dct_widen(out, inq) \
2376  int32x4_t out##_l = vshll_n_s16(vget_low_s16(inq), 12); \
2377  int32x4_t out##_h = vshll_n_s16(vget_high_s16(inq), 12)
2378 
2379 // wide add
2380 #define dct_wadd(out, a, b) \
2381  int32x4_t out##_l = vaddq_s32(a##_l, b##_l); \
2382  int32x4_t out##_h = vaddq_s32(a##_h, b##_h)
2383 
2384 // wide sub
2385 #define dct_wsub(out, a, b) \
2386  int32x4_t out##_l = vsubq_s32(a##_l, b##_l); \
2387  int32x4_t out##_h = vsubq_s32(a##_h, b##_h)
2388 
2389 // butterfly a/b, then shift using "shiftop" by "s" and pack
2390 #define dct_bfly32o(out0, out1, a, b, shiftop, s) \
2391  { \
2392  dct_wadd(sum, a, b); \
2393  dct_wsub(dif, a, b); \
2394  out0 = vcombine_s16(shiftop(sum_l, s), shiftop(sum_h, s)); \
2395  out1 = vcombine_s16(shiftop(dif_l, s), shiftop(dif_h, s)); \
2396  }
2397 
2398 #define dct_pass(shiftop, shift) \
2399  { \
2400  /* even part */ \
2401  int16x8_t sum26 = vaddq_s16(row2, row6); \
2402  dct_long_mul(p1e, sum26, rot0_0); \
2403  dct_long_mac(t2e, p1e, row6, rot0_1); \
2404  dct_long_mac(t3e, p1e, row2, rot0_2); \
2405  int16x8_t sum04 = vaddq_s16(row0, row4); \
2406  int16x8_t dif04 = vsubq_s16(row0, row4); \
2407  dct_widen(t0e, sum04); \
2408  dct_widen(t1e, dif04); \
2409  dct_wadd(x0, t0e, t3e); \
2410  dct_wsub(x3, t0e, t3e); \
2411  dct_wadd(x1, t1e, t2e); \
2412  dct_wsub(x2, t1e, t2e); \
2413  /* odd part */ \
2414  int16x8_t sum15 = vaddq_s16(row1, row5); \
2415  int16x8_t sum17 = vaddq_s16(row1, row7); \
2416  int16x8_t sum35 = vaddq_s16(row3, row5); \
2417  int16x8_t sum37 = vaddq_s16(row3, row7); \
2418  int16x8_t sumodd = vaddq_s16(sum17, sum35); \
2419  dct_long_mul(p5o, sumodd, rot1_0); \
2420  dct_long_mac(p1o, p5o, sum17, rot1_1); \
2421  dct_long_mac(p2o, p5o, sum35, rot1_2); \
2422  dct_long_mul(p3o, sum37, rot2_0); \
2423  dct_long_mul(p4o, sum15, rot2_1); \
2424  dct_wadd(sump13o, p1o, p3o); \
2425  dct_wadd(sump24o, p2o, p4o); \
2426  dct_wadd(sump23o, p2o, p3o); \
2427  dct_wadd(sump14o, p1o, p4o); \
2428  dct_long_mac(x4, sump13o, row7, rot3_0); \
2429  dct_long_mac(x5, sump24o, row5, rot3_1); \
2430  dct_long_mac(x6, sump23o, row3, rot3_2); \
2431  dct_long_mac(x7, sump14o, row1, rot3_3); \
2432  dct_bfly32o(row0, row7, x0, x7, shiftop, shift); \
2433  dct_bfly32o(row1, row6, x1, x6, shiftop, shift); \
2434  dct_bfly32o(row2, row5, x2, x5, shiftop, shift); \
2435  dct_bfly32o(row3, row4, x3, x4, shiftop, shift); \
2436  }
2437 
2438  // load
2439  row0 = vld1q_s16(data + 0 * 8);
2440  row1 = vld1q_s16(data + 1 * 8);
2441  row2 = vld1q_s16(data + 2 * 8);
2442  row3 = vld1q_s16(data + 3 * 8);
2443  row4 = vld1q_s16(data + 4 * 8);
2444  row5 = vld1q_s16(data + 5 * 8);
2445  row6 = vld1q_s16(data + 6 * 8);
2446  row7 = vld1q_s16(data + 7 * 8);
2447 
2448  // add DC bias
2449  row0 = vaddq_s16(row0, vsetq_lane_s16(1024, vdupq_n_s16(0), 0));
2450 
2451  // column pass
2452  dct_pass(vrshrn_n_s32, 10);
2453 
2454  // 16bit 8x8 transpose
2455  {
2456 // these three map to a single VTRN.16, VTRN.32, and VSWP, respectively.
2457 // whether compilers actually get this is another story, sadly.
2458 #define dct_trn16(x, y) \
2459  { \
2460  int16x8x2_t t = vtrnq_s16(x, y); \
2461  x = t.val[0]; \
2462  y = t.val[1]; \
2463  }
2464 #define dct_trn32(x, y) \
2465  { \
2466  int32x4x2_t t = vtrnq_s32(vreinterpretq_s32_s16(x), vreinterpretq_s32_s16(y)); \
2467  x = vreinterpretq_s16_s32(t.val[0]); \
2468  y = vreinterpretq_s16_s32(t.val[1]); \
2469  }
2470 #define dct_trn64(x, y) \
2471  { \
2472  int16x8_t x0 = x; \
2473  int16x8_t y0 = y; \
2474  x = vcombine_s16(vget_low_s16(x0), vget_low_s16(y0)); \
2475  y = vcombine_s16(vget_high_s16(x0), vget_high_s16(y0)); \
2476  }
2477 
2478  // pass 1
2479  dct_trn16(row0, row1); // a0b0a2b2a4b4a6b6
2480  dct_trn16(row2, row3);
2481  dct_trn16(row4, row5);
2482  dct_trn16(row6, row7);
2483 
2484  // pass 2
2485  dct_trn32(row0, row2); // a0b0c0d0a4b4c4d4
2486  dct_trn32(row1, row3);
2487  dct_trn32(row4, row6);
2488  dct_trn32(row5, row7);
2489 
2490  // pass 3
2491  dct_trn64(row0, row4); // a0b0c0d0e0f0g0h0
2492  dct_trn64(row1, row5);
2493  dct_trn64(row2, row6);
2494  dct_trn64(row3, row7);
2495 
2496 #undef dct_trn16
2497 #undef dct_trn32
2498 #undef dct_trn64
2499  }
2500 
2501  // row pass
2502  // vrshrn_n_s32 only supports shifts up to 16, we need
2503  // 17. so do a non-rounding shift of 16 first then follow
2504  // up with a rounding shift by 1.
2505  dct_pass(vshrn_n_s32, 16);
2506 
2507  {
2508  // pack and round
2509  uint8x8_t p0 = vqrshrun_n_s16(row0, 1);
2510  uint8x8_t p1 = vqrshrun_n_s16(row1, 1);
2511  uint8x8_t p2 = vqrshrun_n_s16(row2, 1);
2512  uint8x8_t p3 = vqrshrun_n_s16(row3, 1);
2513  uint8x8_t p4 = vqrshrun_n_s16(row4, 1);
2514  uint8x8_t p5 = vqrshrun_n_s16(row5, 1);
2515  uint8x8_t p6 = vqrshrun_n_s16(row6, 1);
2516  uint8x8_t p7 = vqrshrun_n_s16(row7, 1);
2517 
2518 // again, these can translate into one instruction, but often don't.
2519 #define dct_trn8_8(x, y) \
2520  { \
2521  uint8x8x2_t t = vtrn_u8(x, y); \
2522  x = t.val[0]; \
2523  y = t.val[1]; \
2524  }
2525 #define dct_trn8_16(x, y) \
2526  { \
2527  uint16x4x2_t t = vtrn_u16(vreinterpret_u16_u8(x), vreinterpret_u16_u8(y)); \
2528  x = vreinterpret_u8_u16(t.val[0]); \
2529  y = vreinterpret_u8_u16(t.val[1]); \
2530  }
2531 #define dct_trn8_32(x, y) \
2532  { \
2533  uint32x2x2_t t = vtrn_u32(vreinterpret_u32_u8(x), vreinterpret_u32_u8(y)); \
2534  x = vreinterpret_u8_u32(t.val[0]); \
2535  y = vreinterpret_u8_u32(t.val[1]); \
2536  }
2537 
2538  // sadly can't use interleaved stores here since we only write
2539  // 8 bytes to each scan line!
2540 
2541  // 8x8 8-bit transpose pass 1
2542  dct_trn8_8(p0, p1);
2543  dct_trn8_8(p2, p3);
2544  dct_trn8_8(p4, p5);
2545  dct_trn8_8(p6, p7);
2546 
2547  // pass 2
2548  dct_trn8_16(p0, p2);
2549  dct_trn8_16(p1, p3);
2550  dct_trn8_16(p4, p6);
2551  dct_trn8_16(p5, p7);
2552 
2553  // pass 3
2554  dct_trn8_32(p0, p4);
2555  dct_trn8_32(p1, p5);
2556  dct_trn8_32(p2, p6);
2557  dct_trn8_32(p3, p7);
2558 
2559  // store
2560  vst1_u8(out, p0);
2561  out += out_stride;
2562  vst1_u8(out, p1);
2563  out += out_stride;
2564  vst1_u8(out, p2);
2565  out += out_stride;
2566  vst1_u8(out, p3);
2567  out += out_stride;
2568  vst1_u8(out, p4);
2569  out += out_stride;
2570  vst1_u8(out, p5);
2571  out += out_stride;
2572  vst1_u8(out, p6);
2573  out += out_stride;
2574  vst1_u8(out, p7);
2575 
2576 #undef dct_trn8_8
2577 #undef dct_trn8_16
2578 #undef dct_trn8_32
2579  }
2580 
2581 #undef dct_long_mul
2582 #undef dct_long_mac
2583 #undef dct_widen
2584 #undef dct_wadd
2585 #undef dct_wsub
2586 #undef dct_bfly32o
2587 #undef dct_pass
2588 }
2589 
2590 #endif // STBI_NEON
2591 
2592 #define STBI__MARKER_none 0xff
2593 // if there's a pending marker from the entropy stream, return that
2594 // otherwise, fetch from the stream and get a marker. if there's no
2595 // marker, return 0xff, which is never a valid marker value
2596 static stbi_uc stbi__get_marker(stbi__jpeg *j) {
2597  stbi_uc x;
2598  if (j->marker != STBI__MARKER_none) {
2599  x = j->marker;
2600  j->marker = STBI__MARKER_none;
2601  return x;
2602  }
2603  x = stbi__get8(j->s);
2604  if (x != 0xff)
2605  return STBI__MARKER_none;
2606  while (x == 0xff)
2607  x = stbi__get8(j->s);
2608  return x;
2609 }
2610 
2611 // in each scan, we'll have scan_n components, and the order
2612 // of the components is specified by order[]
2613 #define STBI__RESTART(x) ((x) >= 0xd0 && (x) <= 0xd7)
2614 
2615 // after a restart interval, stbi__jpeg_reset the entropy decoder and
2616 // the dc prediction
2617 static void stbi__jpeg_reset(stbi__jpeg *j) {
2618  j->code_bits = 0;
2619  j->code_buffer = 0;
2620  j->nomore = 0;
2621  j->img_comp[0].dc_pred = j->img_comp[1].dc_pred = j->img_comp[2].dc_pred = 0;
2622  j->marker = STBI__MARKER_none;
2623  j->todo = j->restart_interval ? j->restart_interval : 0x7fffffff;
2624  j->eob_run = 0;
2625  // no more than 1<<31 MCUs if no restart_interal? that's plenty safe,
2626  // since we don't even allow 1<<30 pixels
2627 }
2628 
2629 static int stbi__parse_entropy_coded_data(stbi__jpeg *z) {
2630  stbi__jpeg_reset(z);
2631  if (!z->progressive) {
2632  if (z->scan_n == 1) {
2633  int i, j;
2634  STBI_SIMD_ALIGN(short, data[64]);
2635  int n = z->order[0];
2636  // non-interleaved data, we just need to process one block at a time,
2637  // in trivial scanline order
2638  // number of blocks to do just depends on how many actual "pixels" this
2639  // component has, independent of interleaved MCU blocking and such
2640  int w = (z->img_comp[n].x + 7) >> 3;
2641  int h = (z->img_comp[n].y + 7) >> 3;
2642  for (j = 0; j < h; ++j) {
2643  for (i = 0; i < w; ++i) {
2644  int ha = z->img_comp[n].ha;
2645  if (!stbi__jpeg_decode_block(
2646  z,
2647  data,
2648  z->huff_dc + z->img_comp[n].hd,
2649  z->huff_ac + ha,
2650  z->fast_ac[ha],
2651  n,
2652  z->dequant[z->img_comp[n].tq]))
2653  return 0;
2654  z->idct_block_kernel(
2655  z->img_comp[n].data + z->img_comp[n].w2 * j * 8 + i * 8,
2656  z->img_comp[n].w2,
2657  data);
2658  // every data block is an MCU, so countdown the restart interval
2659  if (--z->todo <= 0) {
2660  if (z->code_bits < 24)
2661  stbi__grow_buffer_unsafe(z);
2662  // if it's NOT a restart, then just bail, so we get corrupt data
2663  // rather than no data
2664  if (!STBI__RESTART(z->marker))
2665  return 1;
2666  stbi__jpeg_reset(z);
2667  }
2668  }
2669  }
2670  return 1;
2671  }
2672  else { // interleaved
2673  int i, j, k, x, y;
2674  STBI_SIMD_ALIGN(short, data[64]);
2675  for (j = 0; j < z->img_mcu_y; ++j) {
2676  for (i = 0; i < z->img_mcu_x; ++i) {
2677  // scan an interleaved mcu... process scan_n components in order
2678  for (k = 0; k < z->scan_n; ++k) {
2679  int n = z->order[k];
2680  // scan out an mcu's worth of this component; that's just determined
2681  // by the basic H and V specified for the component
2682  for (y = 0; y < z->img_comp[n].v; ++y) {
2683  for (x = 0; x < z->img_comp[n].h; ++x) {
2684  int x2 = (i * z->img_comp[n].h + x) * 8;
2685  int y2 = (j * z->img_comp[n].v + y) * 8;
2686  int ha = z->img_comp[n].ha;
2687  if (!stbi__jpeg_decode_block(
2688  z,
2689  data,
2690  z->huff_dc + z->img_comp[n].hd,
2691  z->huff_ac + ha,
2692  z->fast_ac[ha],
2693  n,
2694  z->dequant[z->img_comp[n].tq]))
2695  return 0;
2696  z->idct_block_kernel(
2697  z->img_comp[n].data + z->img_comp[n].w2 * y2 + x2,
2698  z->img_comp[n].w2,
2699  data);
2700  }
2701  }
2702  }
2703  // after all interleaved components, that's an interleaved MCU,
2704  // so now count down the restart interval
2705  if (--z->todo <= 0) {
2706  if (z->code_bits < 24)
2707  stbi__grow_buffer_unsafe(z);
2708  if (!STBI__RESTART(z->marker))
2709  return 1;
2710  stbi__jpeg_reset(z);
2711  }
2712  }
2713  }
2714  return 1;
2715  }
2716  }
2717  else {
2718  if (z->scan_n == 1) {
2719  int i, j;
2720  int n = z->order[0];
2721  // non-interleaved data, we just need to process one block at a time,
2722  // in trivial scanline order
2723  // number of blocks to do just depends on how many actual "pixels" this
2724  // component has, independent of interleaved MCU blocking and such
2725  int w = (z->img_comp[n].x + 7) >> 3;
2726  int h = (z->img_comp[n].y + 7) >> 3;
2727  for (j = 0; j < h; ++j) {
2728  for (i = 0; i < w; ++i) {
2729  short *data = z->img_comp[n].coeff + 64 * (i + j * z->img_comp[n].coeff_w);
2730  if (z->spec_start == 0) {
2731  if (!stbi__jpeg_decode_block_prog_dc(z, data, &z->huff_dc[z->img_comp[n].hd], n))
2732  return 0;
2733  }
2734  else {
2735  int ha = z->img_comp[n].ha;
2736  if (!stbi__jpeg_decode_block_prog_ac(z, data, &z->huff_ac[ha], z->fast_ac[ha]))
2737  return 0;
2738  }
2739  // every data block is an MCU, so countdown the restart interval
2740  if (--z->todo <= 0) {
2741  if (z->code_bits < 24)
2742  stbi__grow_buffer_unsafe(z);
2743  if (!STBI__RESTART(z->marker))
2744  return 1;
2745  stbi__jpeg_reset(z);
2746  }
2747  }
2748  }
2749  return 1;
2750  }
2751  else { // interleaved
2752  int i, j, k, x, y;
2753  for (j = 0; j < z->img_mcu_y; ++j) {
2754  for (i = 0; i < z->img_mcu_x; ++i) {
2755  // scan an interleaved mcu... process scan_n components in order
2756  for (k = 0; k < z->scan_n; ++k) {
2757  int n = z->order[k];
2758  // scan out an mcu's worth of this component; that's just determined
2759  // by the basic H and V specified for the component
2760  for (y = 0; y < z->img_comp[n].v; ++y) {
2761  for (x = 0; x < z->img_comp[n].h; ++x) {
2762  int x2 = (i * z->img_comp[n].h + x);
2763  int y2 = (j * z->img_comp[n].v + y);
2764  short *data =
2765  z->img_comp[n].coeff + 64 * (x2 + y2 * z->img_comp[n].coeff_w);
2766  if (!stbi__jpeg_decode_block_prog_dc(
2767  z, data, &z->huff_dc[z->img_comp[n].hd], n))
2768  return 0;
2769  }
2770  }
2771  }
2772  // after all interleaved components, that's an interleaved MCU,
2773  // so now count down the restart interval
2774  if (--z->todo <= 0) {
2775  if (z->code_bits < 24)
2776  stbi__grow_buffer_unsafe(z);
2777  if (!STBI__RESTART(z->marker))
2778  return 1;
2779  stbi__jpeg_reset(z);
2780  }
2781  }
2782  }
2783  return 1;
2784  }
2785  }
2786 }
2787 
2788 static void stbi__jpeg_dequantize(short *data, stbi_uc *dequant) {
2789  int i;
2790  for (i = 0; i < 64; ++i)
2791  data[i] *= dequant[i];
2792 }
2793 
2794 static void stbi__jpeg_finish(stbi__jpeg *z) {
2795  if (z->progressive) {
2796  // dequantize and idct the data
2797  int i, j, n;
2798  for (n = 0; n < z->s->img_n; ++n) {
2799  int w = (z->img_comp[n].x + 7) >> 3;
2800  int h = (z->img_comp[n].y + 7) >> 3;
2801  for (j = 0; j < h; ++j) {
2802  for (i = 0; i < w; ++i) {
2803  short *data = z->img_comp[n].coeff + 64 * (i + j * z->img_comp[n].coeff_w);
2804  stbi__jpeg_dequantize(data, z->dequant[z->img_comp[n].tq]);
2805  z->idct_block_kernel(
2806  z->img_comp[n].data + z->img_comp[n].w2 * j * 8 + i * 8,
2807  z->img_comp[n].w2,
2808  data);
2809  }
2810  }
2811  }
2812  }
2813 }
2814 
2815 static int stbi__process_marker(stbi__jpeg *z, int m) {
2816  int L;
2817  switch (m) {
2818  case STBI__MARKER_none: // no marker found
2819  return stbi__err("expected marker", "Corrupt JPEG");
2820 
2821  case 0xDD: // DRI - specify restart interval
2822  if (stbi__get16be(z->s) != 4)
2823  return stbi__err("bad DRI len", "Corrupt JPEG");
2824  z->restart_interval = stbi__get16be(z->s);
2825  return 1;
2826 
2827  case 0xDB: // DQT - define quantization table
2828  L = stbi__get16be(z->s) - 2;
2829  while (L > 0) {
2830  int q = stbi__get8(z->s);
2831  int p = q >> 4;
2832  int t = q & 15, i;
2833  if (p != 0)
2834  return stbi__err("bad DQT type", "Corrupt JPEG");
2835  if (t > 3)
2836  return stbi__err("bad DQT table", "Corrupt JPEG");
2837  for (i = 0; i < 64; ++i)
2838  z->dequant[t][stbi__jpeg_dezigzag[i]] = stbi__get8(z->s);
2839  L -= 65;
2840  }
2841  return L == 0;
2842 
2843  case 0xC4: // DHT - define huffman table
2844  L = stbi__get16be(z->s) - 2;
2845  while (L > 0) {
2846  stbi_uc *v;
2847  int sizes[16], i, n = 0;
2848  int q = stbi__get8(z->s);
2849  int tc = q >> 4;
2850  int th = q & 15;
2851  if (tc > 1 || th > 3)
2852  return stbi__err("bad DHT header", "Corrupt JPEG");
2853  for (i = 0; i < 16; ++i) {
2854  sizes[i] = stbi__get8(z->s);
2855  n += sizes[i];
2856  }
2857  L -= 17;
2858  if (tc == 0) {
2859  if (!stbi__build_huffman(z->huff_dc + th, sizes))
2860  return 0;
2861  v = z->huff_dc[th].values;
2862  }
2863  else {
2864  if (!stbi__build_huffman(z->huff_ac + th, sizes))
2865  return 0;
2866  v = z->huff_ac[th].values;
2867  }
2868  for (i = 0; i < n; ++i)
2869  v[i] = stbi__get8(z->s);
2870  if (tc != 0)
2871  stbi__build_fast_ac(z->fast_ac[th], z->huff_ac + th);
2872  L -= n;
2873  }
2874  return L == 0;
2875  }
2876  // check for comment block or APP blocks
2877  if ((m >= 0xE0 && m <= 0xEF) || m == 0xFE) {
2878  stbi__skip(z->s, stbi__get16be(z->s) - 2);
2879  return 1;
2880  }
2881  return 0;
2882 }
2883 
2884 // after we see SOS
2885 static int stbi__process_scan_header(stbi__jpeg *z) {
2886  int i;
2887  int Ls = stbi__get16be(z->s);
2888  z->scan_n = stbi__get8(z->s);
2889  if (z->scan_n < 1 || z->scan_n > 4 || z->scan_n > (int)z->s->img_n)
2890  return stbi__err("bad SOS component count", "Corrupt JPEG");
2891  if (Ls != 6 + 2 * z->scan_n)
2892  return stbi__err("bad SOS len", "Corrupt JPEG");
2893  for (i = 0; i < z->scan_n; ++i) {
2894  int id = stbi__get8(z->s), which;
2895  int q = stbi__get8(z->s);
2896  for (which = 0; which < z->s->img_n; ++which)
2897  if (z->img_comp[which].id == id)
2898  break;
2899  if (which == z->s->img_n)
2900  return 0; // no match
2901  z->img_comp[which].hd = q >> 4;
2902  if (z->img_comp[which].hd > 3)
2903  return stbi__err("bad DC huff", "Corrupt JPEG");
2904  z->img_comp[which].ha = q & 15;
2905  if (z->img_comp[which].ha > 3)
2906  return stbi__err("bad AC huff", "Corrupt JPEG");
2907  z->order[i] = which;
2908  }
2909 
2910  {
2911  int aa;
2912  z->spec_start = stbi__get8(z->s);
2913  z->spec_end = stbi__get8(z->s); // should be 63, but might be 0
2914  aa = stbi__get8(z->s);
2915  z->succ_high = (aa >> 4);
2916  z->succ_low = (aa & 15);
2917  if (z->progressive) {
2918  if (z->spec_start > 63 || z->spec_end > 63 || z->spec_start > z->spec_end
2919  || z->succ_high > 13
2920  || z->succ_low > 13)
2921  return stbi__err("bad SOS", "Corrupt JPEG");
2922  }
2923  else {
2924  if (z->spec_start != 0)
2925  return stbi__err("bad SOS", "Corrupt JPEG");
2926  if (z->succ_high != 0 || z->succ_low != 0)
2927  return stbi__err("bad SOS", "Corrupt JPEG");
2928  z->spec_end = 63;
2929  }
2930  }
2931 
2932  return 1;
2933 }
2934 
2935 static int stbi__process_frame_header(stbi__jpeg *z, int scan) {
2936  stbi__context *s = z->s;
2937  int Lf, p, i, q, h_max = 1, v_max = 1, c;
2938  Lf = stbi__get16be(s);
2939  if (Lf < 11)
2940  return stbi__err("bad SOF len", "Corrupt JPEG"); // JPEG
2941  p = stbi__get8(s);
2942  if (p != 8)
2943  return stbi__err("only 8-bit", "JPEG format not supported: 8-bit only"); // JPEG baseline
2944  s->img_y = stbi__get16be(s);
2945  if (s->img_y == 0)
2946  return stbi__err(
2947  "no header height", "JPEG format not supported: delayed height"); // Legal, but we don't
2948  // handle it--but
2949  // neither does IJG
2950  s->img_x = stbi__get16be(s);
2951  if (s->img_x == 0)
2952  return stbi__err("0 width", "Corrupt JPEG"); // JPEG requires
2953  c = stbi__get8(s);
2954  if (c != 3 && c != 1)
2955  return stbi__err("bad component count", "Corrupt JPEG"); // JFIF requires
2956  s->img_n = c;
2957  for (i = 0; i < c; ++i) {
2958  z->img_comp[i].data = NULL;
2959  z->img_comp[i].linebuf = NULL;
2960  }
2961 
2962  if (Lf != 8 + 3 * s->img_n)
2963  return stbi__err("bad SOF len", "Corrupt JPEG");
2964 
2965  z->rgb = 0;
2966  for (i = 0; i < s->img_n; ++i) {
2967  static unsigned char rgb[3] = {'R', 'G', 'B'};
2968  z->img_comp[i].id = stbi__get8(s);
2969  if (z->img_comp[i].id != i + 1) // JFIF requires
2970  if (z->img_comp[i].id != i) { // some version of jpegtran outputs non-JFIF-compliant files!
2971  // somethings output this (see
2972  // http://fileformats.archiveteam.org/wiki/JPEG#Color_format)
2973  if (z->img_comp[i].id != rgb[i])
2974  return stbi__err("bad component ID", "Corrupt JPEG");
2975  ++z->rgb;
2976  }
2977  q = stbi__get8(s);
2978  z->img_comp[i].h = (q >> 4);
2979  if (!z->img_comp[i].h || z->img_comp[i].h > 4)
2980  return stbi__err("bad H", "Corrupt JPEG");
2981  z->img_comp[i].v = q & 15;
2982  if (!z->img_comp[i].v || z->img_comp[i].v > 4)
2983  return stbi__err("bad V", "Corrupt JPEG");
2984  z->img_comp[i].tq = stbi__get8(s);
2985  if (z->img_comp[i].tq > 3)
2986  return stbi__err("bad TQ", "Corrupt JPEG");
2987  }
2988 
2989  if (scan != STBI__SCAN_load)
2990  return 1;
2991 
2992  if ((1 << 30) / s->img_x / s->img_n < s->img_y)
2993  return stbi__err("too large", "Image too large to decode");
2994 
2995  for (i = 0; i < s->img_n; ++i) {
2996  if (z->img_comp[i].h > h_max)
2997  h_max = z->img_comp[i].h;
2998  if (z->img_comp[i].v > v_max)
2999  v_max = z->img_comp[i].v;
3000  }
3001 
3002  // compute interleaved mcu info
3003  z->img_h_max = h_max;
3004  z->img_v_max = v_max;
3005  z->img_mcu_w = h_max * 8;
3006  z->img_mcu_h = v_max * 8;
3007  z->img_mcu_x = (s->img_x + z->img_mcu_w - 1) / z->img_mcu_w;
3008  z->img_mcu_y = (s->img_y + z->img_mcu_h - 1) / z->img_mcu_h;
3009 
3010  for (i = 0; i < s->img_n; ++i) {
3011  // number of effective pixels (e.g. for non-interleaved MCU)
3012  z->img_comp[i].x = (s->img_x * z->img_comp[i].h + h_max - 1) / h_max;
3013  z->img_comp[i].y = (s->img_y * z->img_comp[i].v + v_max - 1) / v_max;
3014  // to simplify generation, we'll allocate enough memory to decode
3015  // the bogus oversized data from using interleaved MCUs and their
3016  // big blocks (e.g. a 16x16 iMCU on an image of width 33); we won't
3017  // discard the extra data until colorspace conversion
3018  z->img_comp[i].w2 = z->img_mcu_x * z->img_comp[i].h * 8;
3019  z->img_comp[i].h2 = z->img_mcu_y * z->img_comp[i].v * 8;
3020  z->img_comp[i].raw_data = stbi__malloc(z->img_comp[i].w2 * z->img_comp[i].h2 + 15);
3021 
3022  if (z->img_comp[i].raw_data == NULL) {
3023  for (--i; i >= 0; --i) {
3024  STBI_FREE(z->img_comp[i].raw_data);
3025  z->img_comp[i].raw_data = NULL;
3026  }
3027  return stbi__err("outofmem", "Out of memory");
3028  }
3029  // align blocks for idct using mmx/sse
3030  z->img_comp[i].data = (stbi_uc *)(((size_t)z->img_comp[i].raw_data + 15) & ~15);
3031  z->img_comp[i].linebuf = NULL;
3032  if (z->progressive) {
3033  z->img_comp[i].coeff_w = (z->img_comp[i].w2 + 7) >> 3;
3034  z->img_comp[i].coeff_h = (z->img_comp[i].h2 + 7) >> 3;
3035  z->img_comp[i].raw_coeff = STBI_MALLOC(
3036  z->img_comp[i].coeff_w * z->img_comp[i].coeff_h * 64 * sizeof(short) + 15);
3037  z->img_comp[i].coeff = (short *)(((size_t)z->img_comp[i].raw_coeff + 15) & ~15);
3038  }
3039  else {
3040  z->img_comp[i].coeff = 0;
3041  z->img_comp[i].raw_coeff = 0;
3042  }
3043  }
3044 
3045  return 1;
3046 }
3047 
3048 // use comparisons since in some cases we handle more than one case (e.g. SOF)
3049 #define stbi__DNL(x) ((x) == 0xdc)
3050 #define stbi__SOI(x) ((x) == 0xd8)
3051 #define stbi__EOI(x) ((x) == 0xd9)
3052 #define stbi__SOF(x) ((x) == 0xc0 || (x) == 0xc1 || (x) == 0xc2)
3053 #define stbi__SOS(x) ((x) == 0xda)
3054 
3055 #define stbi__SOF_progressive(x) ((x) == 0xc2)
3056 
3057 static int stbi__decode_jpeg_header(stbi__jpeg *z, int scan) {
3058  int m;
3059  z->marker = STBI__MARKER_none; // initialize cached marker to empty
3060  m = stbi__get_marker(z);
3061  if (!stbi__SOI(m))
3062  return stbi__err("no SOI", "Corrupt JPEG");
3063  if (scan == STBI__SCAN_type)
3064  return 1;
3065  m = stbi__get_marker(z);
3066  while (!stbi__SOF(m)) {
3067  if (!stbi__process_marker(z, m))
3068  return 0;
3069  m = stbi__get_marker(z);
3070  while (m == STBI__MARKER_none) {
3071  // some files have extra padding after their blocks, so ok, we'll scan
3072  if (stbi__at_eof(z->s))
3073  return stbi__err("no SOF", "Corrupt JPEG");
3074  m = stbi__get_marker(z);
3075  }
3076  }
3077  z->progressive = stbi__SOF_progressive(m);
3078  if (!stbi__process_frame_header(z, scan))
3079  return 0;
3080  return 1;
3081 }
3082 
3083 // decode image to YCbCr format
3084 static int stbi__decode_jpeg_image(stbi__jpeg *j) {
3085  int m;
3086  for (m = 0; m < 4; m++) {
3087  j->img_comp[m].raw_data = NULL;
3088  j->img_comp[m].raw_coeff = NULL;
3089  }
3090  j->restart_interval = 0;
3091  if (!stbi__decode_jpeg_header(j, STBI__SCAN_load))
3092  return 0;
3093  m = stbi__get_marker(j);
3094  while (!stbi__EOI(m)) {
3095  if (stbi__SOS(m)) {
3096  if (!stbi__process_scan_header(j))
3097  return 0;
3098  if (!stbi__parse_entropy_coded_data(j))
3099  return 0;
3100  if (j->marker == STBI__MARKER_none) {
3101  // handle 0s at the end of image data from IP Kamera 9060
3102  while (!stbi__at_eof(j->s)) {
3103  int x = stbi__get8(j->s);
3104  if (x == 255) {
3105  j->marker = stbi__get8(j->s);
3106  break;
3107  }
3108  else if (x != 0) {
3109  return stbi__err("junk before marker", "Corrupt JPEG");
3110  }
3111  }
3112  // if we reach eof without hitting a marker, stbi__get_marker() below will fail and
3113  // we'll eventually return 0
3114  }
3115  }
3116  else {
3117  if (!stbi__process_marker(j, m))
3118  return 0;
3119  }
3120  m = stbi__get_marker(j);
3121  }
3122  if (j->progressive)
3123  stbi__jpeg_finish(j);
3124  return 1;
3125 }
3126 
3127 // static jfif-centered resampling (across block boundaries)
3128 
3129 typedef stbi_uc *(*resample_row_func)(stbi_uc *out, stbi_uc *in0, stbi_uc *in1, int w, int hs);
3130 
3131 #define stbi__div4(x) ((stbi_uc)((x) >> 2))
3132 
3133 static stbi_uc *resample_row_1(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs) {
3134  STBI_NOTUSED(out);
3135  STBI_NOTUSED(in_far);
3136  STBI_NOTUSED(w);
3137  STBI_NOTUSED(hs);
3138  return in_near;
3139 }
3140 
3141 static stbi_uc *
3142 stbi__resample_row_v_2(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs) {
3143  // need to generate two samples vertically for every one in input
3144  int i;
3145  STBI_NOTUSED(hs);
3146  for (i = 0; i < w; ++i)
3147  out[i] = stbi__div4(3 * in_near[i] + in_far[i] + 2);
3148  return out;
3149 }
3150 
3151 static stbi_uc *
3152 stbi__resample_row_h_2(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs) {
3153  // need to generate two samples horizontally for every one in input
3154  int i;
3155  stbi_uc *input = in_near;
3156 
3157  if (w == 1) {
3158  // if only one sample, can't do any interpolation
3159  out[0] = out[1] = input[0];
3160  return out;
3161  }
3162 
3163  out[0] = input[0];
3164  out[1] = stbi__div4(input[0] * 3 + input[1] + 2);
3165  for (i = 1; i < w - 1; ++i) {
3166  int n = 3 * input[i] + 2;
3167  out[i * 2 + 0] = stbi__div4(n + input[i - 1]);
3168  out[i * 2 + 1] = stbi__div4(n + input[i + 1]);
3169  }
3170  out[i * 2 + 0] = stbi__div4(input[w - 2] * 3 + input[w - 1] + 2);
3171  out[i * 2 + 1] = input[w - 1];
3172 
3173  STBI_NOTUSED(in_far);
3174  STBI_NOTUSED(hs);
3175 
3176  return out;
3177 }
3178 
3179 #define stbi__div16(x) ((stbi_uc)((x) >> 4))
3180 
3181 static stbi_uc *
3182 stbi__resample_row_hv_2(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs) {
3183  // need to generate 2x2 samples for every one in input
3184  int i, t0, t1;
3185  if (w == 1) {
3186  out[0] = out[1] = stbi__div4(3 * in_near[0] + in_far[0] + 2);
3187  return out;
3188  }
3189 
3190  t1 = 3 * in_near[0] + in_far[0];
3191  out[0] = stbi__div4(t1 + 2);
3192  for (i = 1; i < w; ++i) {
3193  t0 = t1;
3194  t1 = 3 * in_near[i] + in_far[i];
3195  out[i * 2 - 1] = stbi__div16(3 * t0 + t1 + 8);
3196  out[i * 2] = stbi__div16(3 * t1 + t0 + 8);
3197  }
3198  out[w * 2 - 1] = stbi__div4(t1 + 2);
3199 
3200  STBI_NOTUSED(hs);
3201 
3202  return out;
3203 }
3204 
3205 #if defined(STBI_SSE2) || defined(STBI_NEON)
3206 static stbi_uc *
3207 stbi__resample_row_hv_2_simd(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs) {
3208  // need to generate 2x2 samples for every one in input
3209  int i = 0, t0, t1;
3210 
3211  if (w == 1) {
3212  out[0] = out[1] = stbi__div4(3 * in_near[0] + in_far[0] + 2);
3213  return out;
3214  }
3215 
3216  t1 = 3 * in_near[0] + in_far[0];
3217  // process groups of 8 pixels for as long as we can.
3218  // note we can't handle the last pixel in a row in this loop
3219  // because we need to handle the filter boundary conditions.
3220  for (; i < ((w - 1) & ~7); i += 8) {
3221 #if defined(STBI_SSE2)
3222  // load and perform the vertical filtering pass
3223  // this uses 3*x + y = 4*x + (y - x)
3224  __m128i zero = _mm_setzero_si128();
3225  __m128i farb = _mm_loadl_epi64((__m128i *)(in_far + i));
3226  __m128i nearb = _mm_loadl_epi64((__m128i *)(in_near + i));
3227  __m128i farw = _mm_unpacklo_epi8(farb, zero);
3228  __m128i nearw = _mm_unpacklo_epi8(nearb, zero);
3229  __m128i diff = _mm_sub_epi16(farw, nearw);
3230  __m128i nears = _mm_slli_epi16(nearw, 2);
3231  __m128i curr = _mm_add_epi16(nears, diff); // current row
3232 
3233  // horizontal filter works the same based on shifted vers of current
3234  // row. "prev" is current row shifted right by 1 pixel; we need to
3235  // insert the previous pixel value (from t1).
3236  // "next" is current row shifted left by 1 pixel, with first pixel
3237  // of next block of 8 pixels added in.
3238  __m128i prv0 = _mm_slli_si128(curr, 2);
3239  __m128i nxt0 = _mm_srli_si128(curr, 2);
3240  __m128i prev = _mm_insert_epi16(prv0, t1, 0);
3241  __m128i next = _mm_insert_epi16(nxt0, 3 * in_near[i + 8] + in_far[i + 8], 7);
3242 
3243  // horizontal filter, polyphase implementation since it's convenient:
3244  // even pixels = 3*cur + prev = cur*4 + (prev - cur)
3245  // odd pixels = 3*cur + next = cur*4 + (next - cur)
3246  // note the shared term.
3247  __m128i bias = _mm_set1_epi16(8);
3248  __m128i curs = _mm_slli_epi16(curr, 2);
3249  __m128i prvd = _mm_sub_epi16(prev, curr);
3250  __m128i nxtd = _mm_sub_epi16(next, curr);
3251  __m128i curb = _mm_add_epi16(curs, bias);
3252  __m128i even = _mm_add_epi16(prvd, curb);
3253  __m128i odd = _mm_add_epi16(nxtd, curb);
3254 
3255  // interleave even and odd pixels, then undo scaling.
3256  __m128i int0 = _mm_unpacklo_epi16(even, odd);
3257  __m128i int1 = _mm_unpackhi_epi16(even, odd);
3258  __m128i de0 = _mm_srli_epi16(int0, 4);
3259  __m128i de1 = _mm_srli_epi16(int1, 4);
3260 
3261  // pack and write output
3262  __m128i outv = _mm_packus_epi16(de0, de1);
3263  _mm_storeu_si128((__m128i *)(out + i * 2), outv);
3264 #elif defined(STBI_NEON)
3265  // load and perform the vertical filtering pass
3266  // this uses 3*x + y = 4*x + (y - x)
3267  uint8x8_t farb = vld1_u8(in_far + i);
3268  uint8x8_t nearb = vld1_u8(in_near + i);
3269  int16x8_t diff = vreinterpretq_s16_u16(vsubl_u8(farb, nearb));
3270  int16x8_t nears = vreinterpretq_s16_u16(vshll_n_u8(nearb, 2));
3271  int16x8_t curr = vaddq_s16(nears, diff); // current row
3272 
3273  // horizontal filter works the same based on shifted vers of current
3274  // row. "prev" is current row shifted right by 1 pixel; we need to
3275  // insert the previous pixel value (from t1).
3276  // "next" is current row shifted left by 1 pixel, with first pixel
3277  // of next block of 8 pixels added in.
3278  int16x8_t prv0 = vextq_s16(curr, curr, 7);
3279  int16x8_t nxt0 = vextq_s16(curr, curr, 1);
3280  int16x8_t prev = vsetq_lane_s16(t1, prv0, 0);
3281  int16x8_t next = vsetq_lane_s16(3 * in_near[i + 8] + in_far[i + 8], nxt0, 7);
3282 
3283  // horizontal filter, polyphase implementation since it's convenient:
3284  // even pixels = 3*cur + prev = cur*4 + (prev - cur)
3285  // odd pixels = 3*cur + next = cur*4 + (next - cur)
3286  // note the shared term.
3287  int16x8_t curs = vshlq_n_s16(curr, 2);
3288  int16x8_t prvd = vsubq_s16(prev, curr);
3289  int16x8_t nxtd = vsubq_s16(next, curr);
3290  int16x8_t even = vaddq_s16(curs, prvd);
3291  int16x8_t odd = vaddq_s16(curs, nxtd);
3292 
3293  // undo scaling and round, then store with even/odd phases interleaved
3294  uint8x8x2_t o;
3295  o.val[0] = vqrshrun_n_s16(even, 4);
3296  o.val[1] = vqrshrun_n_s16(odd, 4);
3297  vst2_u8(out + i * 2, o);
3298 #endif
3299 
3300  // "previous" value for next iter
3301  t1 = 3 * in_near[i + 7] + in_far[i + 7];
3302  }
3303 
3304  t0 = t1;
3305  t1 = 3 * in_near[i] + in_far[i];
3306  out[i * 2] = stbi__div16(3 * t1 + t0 + 8);
3307 
3308  for (++i; i < w; ++i) {
3309  t0 = t1;
3310  t1 = 3 * in_near[i] + in_far[i];
3311  out[i * 2 - 1] = stbi__div16(3 * t0 + t1 + 8);
3312  out[i * 2] = stbi__div16(3 * t1 + t0 + 8);
3313  }
3314  out[w * 2 - 1] = stbi__div4(t1 + 2);
3315 
3316  STBI_NOTUSED(hs);
3317 
3318  return out;
3319 }
3320 #endif
3321 
3322 static stbi_uc *
3323 stbi__resample_row_generic(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs) {
3324  // resample with nearest-neighbor
3325  int i, j;
3326  STBI_NOTUSED(in_far);
3327  for (i = 0; i < w; ++i)
3328  for (j = 0; j < hs; ++j)
3329  out[i * hs + j] = in_near[i];
3330  return out;
3331 }
3332 
3333 #ifdef STBI_JPEG_OLD
3334 // this is the same YCbCr-to-RGB calculation that stb_image has used
3335 // historically before the algorithm changes in 1.49
3336 #define float2fixed(x) ((int)((x)*65536 + 0.5))
3337 static void stbi__YCbCr_to_RGB_row(
3338  stbi_uc *out,
3339  const stbi_uc *y,
3340  const stbi_uc *pcb,
3341  const stbi_uc *pcr,
3342  int count,
3343  int step) {
3344  int i;
3345  for (i = 0; i < count; ++i) {
3346  int y_fixed = (y[i] << 16) + 32768; // rounding
3347  int r, g, b;
3348  int cr = pcr[i] - 128;
3349  int cb = pcb[i] - 128;
3350  r = y_fixed + cr * float2fixed(1.40200f);
3351  g = y_fixed - cr * float2fixed(0.71414f) - cb * float2fixed(0.34414f);
3352  b = y_fixed + cb * float2fixed(1.77200f);
3353  r >>= 16;
3354  g >>= 16;
3355  b >>= 16;
3356  if ((unsigned)r > 255) {
3357  if (r < 0)
3358  r = 0;
3359  else
3360  r = 255;
3361  }
3362  if ((unsigned)g > 255) {
3363  if (g < 0)
3364  g = 0;
3365  else
3366  g = 255;
3367  }
3368  if ((unsigned)b > 255) {
3369  if (b < 0)
3370  b = 0;
3371  else
3372  b = 255;
3373  }
3374  out[0] = (stbi_uc)r;
3375  out[1] = (stbi_uc)g;
3376  out[2] = (stbi_uc)b;
3377  out[3] = 255;
3378  out += step;
3379  }
3380 }
3381 #else
3382 // this is a reduced-precision calculation of YCbCr-to-RGB introduced
3383 // to make sure the code produces the same results in both SIMD and scalar
3384 #define float2fixed(x) (((int)((x)*4096.0f + 0.5f)) << 8)
3385 static void stbi__YCbCr_to_RGB_row(
3386  stbi_uc *out,
3387  const stbi_uc *y,
3388  const stbi_uc *pcb,
3389  const stbi_uc *pcr,
3390  int count,
3391  int step) {
3392  int i;
3393  for (i = 0; i < count; ++i) {
3394  int y_fixed = (y[i] << 20) + (1 << 19); // rounding
3395  int r, g, b;
3396  int cr = pcr[i] - 128;
3397  int cb = pcb[i] - 128;
3398  r = y_fixed + cr * float2fixed(1.40200f);
3399  g = y_fixed + (cr * -float2fixed(0.71414f)) + ((cb * -float2fixed(0.34414f)) & 0xffff0000);
3400  b = y_fixed + cb * float2fixed(1.77200f);
3401  r >>= 20;
3402  g >>= 20;
3403  b >>= 20;
3404  if ((unsigned)r > 255) {
3405  if (r < 0)
3406  r = 0;
3407  else
3408  r = 255;
3409  }
3410  if ((unsigned)g > 255) {
3411  if (g < 0)
3412  g = 0;
3413  else
3414  g = 255;
3415  }
3416  if ((unsigned)b > 255) {
3417  if (b < 0)
3418  b = 0;
3419  else
3420  b = 255;
3421  }
3422  out[0] = (stbi_uc)r;
3423  out[1] = (stbi_uc)g;
3424  out[2] = (stbi_uc)b;
3425  out[3] = 255;
3426  out += step;
3427  }
3428 }
3429 #endif
3430 
3431 #if defined(STBI_SSE2) || defined(STBI_NEON)
3432 static void stbi__YCbCr_to_RGB_simd(
3433  stbi_uc *out,
3434  stbi_uc const *y,
3435  stbi_uc const *pcb,
3436  stbi_uc const *pcr,
3437  int count,
3438  int step) {
3439  int i = 0;
3440 
3441 #ifdef STBI_SSE2
3442  // step == 3 is pretty ugly on the final interleave, and i'm not convinced
3443  // it's useful in practice (you wouldn't use it for textures, for example).
3444  // so just accelerate step == 4 case.
3445  if (step == 4) {
3446  // this is a fairly straightforward implementation and not super-optimized.
3447  __m128i signflip = _mm_set1_epi8(-0x80);
3448  __m128i cr_const0 = _mm_set1_epi16((short)(1.40200f * 4096.0f + 0.5f));
3449  __m128i cr_const1 = _mm_set1_epi16(-(short)(0.71414f * 4096.0f + 0.5f));
3450  __m128i cb_const0 = _mm_set1_epi16(-(short)(0.34414f * 4096.0f + 0.5f));
3451  __m128i cb_const1 = _mm_set1_epi16((short)(1.77200f * 4096.0f + 0.5f));
3452  __m128i y_bias = _mm_set1_epi8((char)(unsigned char)128);
3453  __m128i xw = _mm_set1_epi16(255); // alpha channel
3454 
3455  for (; i + 7 < count; i += 8) {
3456  // load
3457  __m128i y_bytes = _mm_loadl_epi64((__m128i *)(y + i));
3458  __m128i cr_bytes = _mm_loadl_epi64((__m128i *)(pcr + i));
3459  __m128i cb_bytes = _mm_loadl_epi64((__m128i *)(pcb + i));
3460  __m128i cr_biased = _mm_xor_si128(cr_bytes, signflip); // -128
3461  __m128i cb_biased = _mm_xor_si128(cb_bytes, signflip); // -128
3462 
3463  // unpack to short (and left-shift cr, cb by 8)
3464  __m128i yw = _mm_unpacklo_epi8(y_bias, y_bytes);
3465  __m128i crw = _mm_unpacklo_epi8(_mm_setzero_si128(), cr_biased);
3466  __m128i cbw = _mm_unpacklo_epi8(_mm_setzero_si128(), cb_biased);
3467 
3468  // color transform
3469  __m128i yws = _mm_srli_epi16(yw, 4);
3470  __m128i cr0 = _mm_mulhi_epi16(cr_const0, crw);
3471  __m128i cb0 = _mm_mulhi_epi16(cb_const0, cbw);
3472  __m128i cb1 = _mm_mulhi_epi16(cbw, cb_const1);
3473  __m128i cr1 = _mm_mulhi_epi16(crw, cr_const1);
3474  __m128i rws = _mm_add_epi16(cr0, yws);
3475  __m128i gwt = _mm_add_epi16(cb0, yws);
3476  __m128i bws = _mm_add_epi16(yws, cb1);
3477  __m128i gws = _mm_add_epi16(gwt, cr1);
3478 
3479  // descale
3480  __m128i rw = _mm_srai_epi16(rws, 4);
3481  __m128i bw = _mm_srai_epi16(bws, 4);
3482  __m128i gw = _mm_srai_epi16(gws, 4);
3483 
3484  // back to byte, set up for transpose
3485  __m128i brb = _mm_packus_epi16(rw, bw);
3486  __m128i gxb = _mm_packus_epi16(gw, xw);
3487 
3488  // transpose to interleave channels
3489  __m128i t0 = _mm_unpacklo_epi8(brb, gxb);
3490  __m128i t1 = _mm_unpackhi_epi8(brb, gxb);
3491  __m128i o0 = _mm_unpacklo_epi16(t0, t1);
3492  __m128i o1 = _mm_unpackhi_epi16(t0, t1);
3493 
3494  // store
3495  _mm_storeu_si128((__m128i *)(out + 0), o0);
3496  _mm_storeu_si128((__m128i *)(out + 16), o1);
3497  out += 32;
3498  }
3499  }
3500 #endif
3501 
3502 #ifdef STBI_NEON
3503  // in this version, step=3 support would be easy to add. but is there demand?
3504  if (step == 4) {
3505  // this is a fairly straightforward implementation and not super-optimized.
3506  uint8x8_t signflip = vdup_n_u8(0x80);
3507  int16x8_t cr_const0 = vdupq_n_s16((short)(1.40200f * 4096.0f + 0.5f));
3508  int16x8_t cr_const1 = vdupq_n_s16(-(short)(0.71414f * 4096.0f + 0.5f));
3509  int16x8_t cb_const0 = vdupq_n_s16(-(short)(0.34414f * 4096.0f + 0.5f));
3510  int16x8_t cb_const1 = vdupq_n_s16((short)(1.77200f * 4096.0f + 0.5f));
3511 
3512  for (; i + 7 < count; i += 8) {
3513  // load
3514  uint8x8_t y_bytes = vld1_u8(y + i);
3515  uint8x8_t cr_bytes = vld1_u8(pcr + i);
3516  uint8x8_t cb_bytes = vld1_u8(pcb + i);
3517  int8x8_t cr_biased = vreinterpret_s8_u8(vsub_u8(cr_bytes, signflip));
3518  int8x8_t cb_biased = vreinterpret_s8_u8(vsub_u8(cb_bytes, signflip));
3519 
3520  // expand to s16
3521  int16x8_t yws = vreinterpretq_s16_u16(vshll_n_u8(y_bytes, 4));
3522  int16x8_t crw = vshll_n_s8(cr_biased, 7);
3523  int16x8_t cbw = vshll_n_s8(cb_biased, 7);
3524 
3525  // color transform
3526  int16x8_t cr0 = vqdmulhq_s16(crw, cr_const0);
3527  int16x8_t cb0 = vqdmulhq_s16(cbw, cb_const0);
3528  int16x8_t cr1 = vqdmulhq_s16(crw, cr_const1);
3529  int16x8_t cb1 = vqdmulhq_s16(cbw, cb_const1);
3530  int16x8_t rws = vaddq_s16(yws, cr0);
3531  int16x8_t gws = vaddq_s16(vaddq_s16(yws, cb0), cr1);
3532  int16x8_t bws = vaddq_s16(yws, cb1);
3533 
3534  // undo scaling, round, convert to byte
3535  uint8x8x4_t o;
3536  o.val[0] = vqrshrun_n_s16(rws, 4);
3537  o.val[1] = vqrshrun_n_s16(gws, 4);
3538  o.val[2] = vqrshrun_n_s16(bws, 4);
3539  o.val[3] = vdup_n_u8(255);
3540 
3541  // store, interleaving r/g/b/a
3542  vst4_u8(out, o);
3543  out += 8 * 4;
3544  }
3545  }
3546 #endif
3547 
3548  for (; i < count; ++i) {
3549  int y_fixed = (y[i] << 20) + (1 << 19); // rounding
3550  int r, g, b;
3551  int cr = pcr[i] - 128;
3552  int cb = pcb[i] - 128;
3553  r = y_fixed + cr * float2fixed(1.40200f);
3554  g = y_fixed + cr * -float2fixed(0.71414f) + ((cb * -float2fixed(0.34414f)) & 0xffff0000);
3555  b = y_fixed + cb * float2fixed(1.77200f);
3556  r >>= 20;
3557  g >>= 20;
3558  b >>= 20;
3559  if ((unsigned)r > 255) {
3560  if (r < 0)
3561  r = 0;
3562  else
3563  r = 255;
3564  }
3565  if ((unsigned)g > 255) {
3566  if (g < 0)
3567  g = 0;
3568  else
3569  g = 255;
3570  }
3571  if ((unsigned)b > 255) {
3572  if (b < 0)
3573  b = 0;
3574  else
3575  b = 255;
3576  }
3577  out[0] = (stbi_uc)r;
3578  out[1] = (stbi_uc)g;
3579  out[2] = (stbi_uc)b;
3580  out[3] = 255;
3581  out += step;
3582  }
3583 }
3584 #endif
3585 
3586 // set up the kernels
3587 static void stbi__setup_jpeg(stbi__jpeg *j) {
3588  j->idct_block_kernel = stbi__idct_block;
3589  j->YCbCr_to_RGB_kernel = stbi__YCbCr_to_RGB_row;
3590  j->resample_row_hv_2_kernel = stbi__resample_row_hv_2;
3591 
3592 #ifdef STBI_SSE2
3593  if (stbi__sse2_available()) {
3594  j->idct_block_kernel = stbi__idct_simd;
3595 #ifndef STBI_JPEG_OLD
3596  j->YCbCr_to_RGB_kernel = stbi__YCbCr_to_RGB_simd;
3597 #endif
3598  j->resample_row_hv_2_kernel = stbi__resample_row_hv_2_simd;
3599  }
3600 #endif
3601 
3602 #ifdef STBI_NEON
3603  j->idct_block_kernel = stbi__idct_simd;
3604 #ifndef STBI_JPEG_OLD
3605  j->YCbCr_to_RGB_kernel = stbi__YCbCr_to_RGB_simd;
3606 #endif
3607  j->resample_row_hv_2_kernel = stbi__resample_row_hv_2_simd;
3608 #endif
3609 }
3610 
3611 // clean up the temporary component buffers
3612 static void stbi__cleanup_jpeg(stbi__jpeg *j) {
3613  int i;
3614  for (i = 0; i < j->s->img_n; ++i) {
3615  if (j->img_comp[i].raw_data) {
3616  STBI_FREE(j->img_comp[i].raw_data);
3617  j->img_comp[i].raw_data = NULL;
3618  j->img_comp[i].data = NULL;
3619  }
3620  if (j->img_comp[i].raw_coeff) {
3621  STBI_FREE(j->img_comp[i].raw_coeff);
3622  j->img_comp[i].raw_coeff = 0;
3623  j->img_comp[i].coeff = 0;
3624  }
3625  if (j->img_comp[i].linebuf) {
3626  STBI_FREE(j->img_comp[i].linebuf);
3627  j->img_comp[i].linebuf = NULL;
3628  }
3629  }
3630 }
3631 
3632 typedef struct {
3633  resample_row_func resample;
3634  stbi_uc *line0, *line1;
3635  int hs, vs; // expansion factor in each axis
3636  int w_lores; // horizontal pixels pre-expansion
3637  int ystep; // how far through vertical expansion we are
3638  int ypos; // which pre-expansion row we're on
3639 } stbi__resample;
3640 
3641 static stbi_uc *load_jpeg_image(stbi__jpeg *z, int *out_x, int *out_y, int *comp, int req_comp) {
3642  int n, decode_n;
3643  z->s->img_n = 0; // make stbi__cleanup_jpeg safe
3644 
3645  // validate req_comp
3646  if (req_comp < 0 || req_comp > 4)
3647  return stbi__errpuc("bad req_comp", "Internal error");
3648 
3649  // load a jpeg image from whichever source, but leave in YCbCr format
3650  if (!stbi__decode_jpeg_image(z)) {
3651  stbi__cleanup_jpeg(z);
3652  return NULL;
3653  }
3654 
3655  // determine actual number of components to generate
3656  n = req_comp ? req_comp : z->s->img_n;
3657 
3658  if (z->s->img_n == 3 && n < 3)
3659  decode_n = 1;
3660  else
3661  decode_n = z->s->img_n;
3662 
3663  // resample and color-convert
3664  {
3665  int k;
3666  unsigned int i, j;
3667  stbi_uc *output;
3668  stbi_uc *coutput[4];
3669 
3670  stbi__resample res_comp[4];
3671 
3672  for (k = 0; k < decode_n; ++k) {
3673  stbi__resample *r = &res_comp[k];
3674 
3675  // allocate line buffer big enough for upsampling off the edges
3676  // with upsample factor of 4
3677  z->img_comp[k].linebuf = (stbi_uc *)stbi__malloc(z->s->img_x + 3);
3678  if (!z->img_comp[k].linebuf) {
3679  stbi__cleanup_jpeg(z);
3680  return stbi__errpuc("outofmem", "Out of memory");
3681  }
3682 
3683  r->hs = z->img_h_max / z->img_comp[k].h;
3684  r->vs = z->img_v_max / z->img_comp[k].v;
3685  r->ystep = r->vs >> 1;
3686  r->w_lores = (z->s->img_x + r->hs - 1) / r->hs;
3687  r->ypos = 0;
3688  r->line0 = r->line1 = z->img_comp[k].data;
3689 
3690  if (r->hs == 1 && r->vs == 1)
3691  r->resample = resample_row_1;
3692  else if (r->hs == 1 && r->vs == 2)
3693  r->resample = stbi__resample_row_v_2;
3694  else if (r->hs == 2 && r->vs == 1)
3695  r->resample = stbi__resample_row_h_2;
3696  else if (r->hs == 2 && r->vs == 2)
3697  r->resample = z->resample_row_hv_2_kernel;
3698  else
3699  r->resample = stbi__resample_row_generic;
3700  }
3701 
3702  // can't error after this so, this is safe
3703  output = (stbi_uc *)stbi__malloc(n * z->s->img_x * z->s->img_y + 1);
3704  if (!output) {
3705  stbi__cleanup_jpeg(z);
3706  return stbi__errpuc("outofmem", "Out of memory");
3707  }
3708 
3709  // now go ahead and resample
3710  for (j = 0; j < z->s->img_y; ++j) {
3711  stbi_uc *out = output + n * z->s->img_x * j;
3712  for (k = 0; k < decode_n; ++k) {
3713  stbi__resample *r = &res_comp[k];
3714  int y_bot = r->ystep >= (r->vs >> 1);
3715  coutput[k] = r->resample(
3716  z->img_comp[k].linebuf,
3717  y_bot ? r->line1 : r->line0,
3718  y_bot ? r->line0 : r->line1,
3719  r->w_lores,
3720  r->hs);
3721  if (++r->ystep >= r->vs) {
3722  r->ystep = 0;
3723  r->line0 = r->line1;
3724  if (++r->ypos < z->img_comp[k].y)
3725  r->line1 += z->img_comp[k].w2;
3726  }
3727  }
3728  if (n >= 3) {
3729  stbi_uc *y = coutput[0];
3730  if (z->s->img_n == 3) {
3731  if (z->rgb == 3) {
3732  for (i = 0; i < z->s->img_x; ++i) {
3733  out[0] = y[i];
3734  out[1] = coutput[1][i];
3735  out[2] = coutput[2][i];
3736  out[3] = 255;
3737  out += n;
3738  }
3739  }
3740  else {
3741  z->YCbCr_to_RGB_kernel(out, y, coutput[1], coutput[2], z->s->img_x, n);
3742  }
3743  }
3744  else
3745  for (i = 0; i < z->s->img_x; ++i) {
3746  out[0] = out[1] = out[2] = y[i];
3747  out[3] = 255; // not used if n==3
3748  out += n;
3749  }
3750  }
3751  else {
3752  stbi_uc *y = coutput[0];
3753  if (n == 1)
3754  for (i = 0; i < z->s->img_x; ++i)
3755  out[i] = y[i];
3756  else
3757  for (i = 0; i < z->s->img_x; ++i)
3758  *out++ = y[i], *out++ = 255;
3759  }
3760  }
3761  stbi__cleanup_jpeg(z);
3762  *out_x = z->s->img_x;
3763  *out_y = z->s->img_y;
3764  if (comp)
3765  *comp = z->s->img_n; // report original components, not output
3766  return output;
3767  }
3768 }
3769 
3770 static unsigned char *stbi__jpeg_load(stbi__context *s, int *x, int *y, int *comp, int req_comp) {
3771  unsigned char *result;
3772  stbi__jpeg *j = (stbi__jpeg *)stbi__malloc(sizeof(stbi__jpeg));
3773  j->s = s;
3774  stbi__setup_jpeg(j);
3775  result = load_jpeg_image(j, x, y, comp, req_comp);
3776  STBI_FREE(j);
3777  return result;
3778 }
3779 
3780 static int stbi__jpeg_test(stbi__context *s) {
3781  int r;
3782  stbi__jpeg j;
3783  j.s = s;
3784  stbi__setup_jpeg(&j);
3785  r = stbi__decode_jpeg_header(&j, STBI__SCAN_type);
3786  stbi__rewind(s);
3787  return r;
3788 }
3789 
3790 static int stbi__jpeg_info_raw(stbi__jpeg *j, int *x, int *y, int *comp) {
3791  if (!stbi__decode_jpeg_header(j, STBI__SCAN_header)) {
3792  stbi__rewind(j->s);
3793  return 0;
3794  }
3795  if (x)
3796  *x = j->s->img_x;
3797  if (y)
3798  *y = j->s->img_y;
3799  if (comp)
3800  *comp = j->s->img_n;
3801  return 1;
3802 }
3803 
3804 static int stbi__jpeg_info(stbi__context *s, int *x, int *y, int *comp) {
3805  int result;
3806  stbi__jpeg *j = (stbi__jpeg *)(stbi__malloc(sizeof(stbi__jpeg)));
3807  j->s = s;
3808  result = stbi__jpeg_info_raw(j, x, y, comp);
3809  STBI_FREE(j);
3810  return result;
3811 }
3812 #endif
3813 
3814 // public domain zlib decode v0.2 Sean Barrett 2006-11-18
3815 // simple implementation
3816 // - all input must be provided in an upfront buffer
3817 // - all output is written to a single output buffer (can malloc/realloc)
3818 // performance
3819 // - fast huffman
3820 
3821 #ifndef STBI_NO_ZLIB
3822 
3823 // fast-way is faster to check than jpeg huffman, but slow way is slower
3824 #define STBI__ZFAST_BITS 9 // accelerate all cases in default tables
3825 #define STBI__ZFAST_MASK ((1 << STBI__ZFAST_BITS) - 1)
3826 
3827 // zlib-style huffman encoding
3828 // (jpegs packs from left, zlib from right, so can't share code)
3829 typedef struct {
3830  stbi__uint16 fast[1 << STBI__ZFAST_BITS];
3831  stbi__uint16 firstcode[16];
3832  int maxcode[17];
3833  stbi__uint16 firstsymbol[16];
3834  stbi_uc size[288];
3835  stbi__uint16 value[288];
3836 } stbi__zhuffman;
3837 
3838 stbi_inline static int stbi__bitreverse16(int n) {
3839  n = ((n & 0xAAAA) >> 1) | ((n & 0x5555) << 1);
3840  n = ((n & 0xCCCC) >> 2) | ((n & 0x3333) << 2);
3841  n = ((n & 0xF0F0) >> 4) | ((n & 0x0F0F) << 4);
3842  n = ((n & 0xFF00) >> 8) | ((n & 0x00FF) << 8);
3843  return n;
3844 }
3845 
3846 stbi_inline static int stbi__bit_reverse(int v, int bits) {
3847  STBI_ASSERT(bits <= 16);
3848  // to bit reverse n bits, reverse 16 and shift
3849  // e.g. 11 bits, bit reverse and shift away 5
3850  return stbi__bitreverse16(v) >> (16 - bits);
3851 }
3852 
3853 static int stbi__zbuild_huffman(stbi__zhuffman *z, stbi_uc *sizelist, int num) {
3854  int i, k = 0;
3855  int code, next_code[16], sizes[17];
3856 
3857  // DEFLATE spec for generating codes
3858  memset(sizes, 0, sizeof(sizes));
3859  memset(z->fast, 0, sizeof(z->fast));
3860  for (i = 0; i < num; ++i)
3861  ++sizes[sizelist[i]];
3862  sizes[0] = 0;
3863  for (i = 1; i < 16; ++i)
3864  if (sizes[i] > (1 << i))
3865  return stbi__err("bad sizes", "Corrupt PNG");
3866  code = 0;
3867  for (i = 1; i < 16; ++i) {
3868  next_code[i] = code;
3869  z->firstcode[i] = (stbi__uint16)code;
3870  z->firstsymbol[i] = (stbi__uint16)k;
3871  code = (code + sizes[i]);
3872  if (sizes[i])
3873  if (code - 1 >= (1 << i))
3874  return stbi__err("bad codelengths", "Corrupt PNG");
3875  z->maxcode[i] = code << (16 - i); // preshift for inner loop
3876  code <<= 1;
3877  k += sizes[i];
3878  }
3879  z->maxcode[16] = 0x10000; // sentinel
3880  for (i = 0; i < num; ++i) {
3881  int s = sizelist[i];
3882  if (s) {
3883  int c = next_code[s] - z->firstcode[s] + z->firstsymbol[s];
3884  stbi__uint16 fastv = (stbi__uint16)((s << 9) | i);
3885  z->size[c] = (stbi_uc)s;
3886  z->value[c] = (stbi__uint16)i;
3887  if (s <= STBI__ZFAST_BITS) {
3888  int j = stbi__bit_reverse(next_code[s], s);
3889  while (j < (1 << STBI__ZFAST_BITS)) {
3890  z->fast[j] = fastv;
3891  j += (1 << s);
3892  }
3893  }
3894  ++next_code[s];
3895  }
3896  }
3897  return 1;
3898 }
3899 
3900 // zlib-from-memory implementation for PNG reading
3901 // because PNG allows splitting the zlib stream arbitrarily,
3902 // and it's annoying structurally to have PNG call ZLIB call PNG,
3903 // we require PNG read all the IDATs and combine them into a single
3904 // memory buffer
3905 
3906 typedef struct {
3907  stbi_uc *zbuffer, *zbuffer_end;
3908  int num_bits;
3909  stbi__uint32 code_buffer;
3910 
3911  char *zout;
3912  char *zout_start;
3913  char *zout_end;
3914  int z_expandable;
3915 
3916  stbi__zhuffman z_length, z_distance;
3917 } stbi__zbuf;
3918 
3919 stbi_inline static stbi_uc stbi__zget8(stbi__zbuf *z) {
3920  if (z->zbuffer >= z->zbuffer_end)
3921  return 0;
3922  return *z->zbuffer++;
3923 }
3924 
3925 static void stbi__fill_bits(stbi__zbuf *z) {
3926  do {
3927  STBI_ASSERT(z->code_buffer < (1U << z->num_bits));
3928  z->code_buffer |= (unsigned int)stbi__zget8(z) << z->num_bits;
3929  z->num_bits += 8;
3930  } while (z->num_bits <= 24);
3931 }
3932 
3933 stbi_inline static unsigned int stbi__zreceive(stbi__zbuf *z, int n) {
3934  unsigned int k;
3935  if (z->num_bits < n)
3936  stbi__fill_bits(z);
3937  k = z->code_buffer & ((1 << n) - 1);
3938  z->code_buffer >>= n;
3939  z->num_bits -= n;
3940  return k;
3941 }
3942 
3943 static int stbi__zhuffman_decode_slowpath(stbi__zbuf *a, stbi__zhuffman *z) {
3944  int b, s, k;
3945  // not resolved by fast table, so compute it the slow way
3946  // use jpeg approach, which requires MSbits at top
3947  k = stbi__bit_reverse(a->code_buffer, 16);
3948  for (s = STBI__ZFAST_BITS + 1;; ++s)
3949  if (k < z->maxcode[s])
3950  break;
3951  if (s == 16)
3952  return -1; // invalid code!
3953  // code size is s, so:
3954  b = (k >> (16 - s)) - z->firstcode[s] + z->firstsymbol[s];
3955  STBI_ASSERT(z->size[b] == s);
3956  a->code_buffer >>= s;
3957  a->num_bits -= s;
3958  return z->value[b];
3959 }
3960 
3961 stbi_inline static int stbi__zhuffman_decode(stbi__zbuf *a, stbi__zhuffman *z) {
3962  int b, s;
3963  if (a->num_bits < 16)
3964  stbi__fill_bits(a);
3965  b = z->fast[a->code_buffer & STBI__ZFAST_MASK];
3966  if (b) {
3967  s = b >> 9;
3968  a->code_buffer >>= s;
3969  a->num_bits -= s;
3970  return b & 511;
3971  }
3972  return stbi__zhuffman_decode_slowpath(a, z);
3973 }
3974 
3975 static int stbi__zexpand(stbi__zbuf *z, char *zout, int n) // need to make room for n bytes
3976 {
3977  char *q;
3978  int cur, limit, old_limit;
3979  z->zout = zout;
3980  if (!z->z_expandable)
3981  return stbi__err("output buffer limit", "Corrupt PNG");
3982  cur = (int)(z->zout - z->zout_start);
3983  limit = old_limit = (int)(z->zout_end - z->zout_start);
3984  while (cur + n > limit)
3985  limit *= 2;
3986  q = (char *)STBI_REALLOC_SIZED(z->zout_start, old_limit, limit);
3987  STBI_NOTUSED(old_limit);
3988  if (q == NULL)
3989  return stbi__err("outofmem", "Out of memory");
3990  z->zout_start = q;
3991  z->zout = q + cur;
3992  z->zout_end = q + limit;
3993  return 1;
3994 }
3995 
3996 static int stbi__zlength_base[31] = {3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 15,
3997  17, 19, 23, 27, 31, 35, 43, 51, 59, 67, 83,
3998  99, 115, 131, 163, 195, 227, 258, 0, 0};
3999 
4000 static int stbi__zlength_extra[31] = {0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2,
4001  3, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 0, 0, 0};
4002 
4003 static int stbi__zdist_base[32] = {
4004  1, 2, 3, 4, 5, 7, 9, 13, 17, 25, 33, 49, 65, 97, 129, 193,
4005  257, 385, 513, 769, 1025, 1537, 2049, 3073, 4097, 6145, 8193, 12289, 16385, 24577, 0, 0};
4006 
4007 static int stbi__zdist_extra[32] = {0, 0, 0, 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6,
4008  6, 7, 7, 8, 8, 9, 9, 10, 10, 11, 11, 12, 12, 13, 13};
4009 
4010 static int stbi__parse_huffman_block(stbi__zbuf *a) {
4011  char *zout = a->zout;
4012  for (;;) {
4013  int z = stbi__zhuffman_decode(a, &a->z_length);
4014  if (z < 256) {
4015  if (z < 0)
4016  return stbi__err("bad huffman code", "Corrupt PNG"); // error in huffman codes
4017  if (zout >= a->zout_end) {
4018  if (!stbi__zexpand(a, zout, 1))
4019  return 0;
4020  zout = a->zout;
4021  }
4022  *zout++ = (char)z;
4023  }
4024  else {
4025  stbi_uc *p;
4026  int len, dist;
4027  if (z == 256) {
4028  a->zout = zout;
4029  return 1;
4030  }
4031  z -= 257;
4032  len = stbi__zlength_base[z];
4033  if (stbi__zlength_extra[z])
4034  len += stbi__zreceive(a, stbi__zlength_extra[z]);
4035  z = stbi__zhuffman_decode(a, &a->z_distance);
4036  if (z < 0)
4037  return stbi__err("bad huffman code", "Corrupt PNG");
4038  dist = stbi__zdist_base[z];
4039  if (stbi__zdist_extra[z])
4040  dist += stbi__zreceive(a, stbi__zdist_extra[z]);
4041  if (zout - a->zout_start < dist)
4042  return stbi__err("bad dist", "Corrupt PNG");
4043  if (zout + len > a->zout_end) {
4044  if (!stbi__zexpand(a, zout, len))
4045  return 0;
4046  zout = a->zout;
4047  }
4048  p = (stbi_uc *)(zout - dist);
4049  if (dist == 1) { // run of one byte; common in images.
4050  stbi_uc v = *p;
4051  if (len) {
4052  do
4053  *zout++ = v;
4054  while (--len);
4055  }
4056  }
4057  else {
4058  if (len) {
4059  do
4060  *zout++ = *p++;
4061  while (--len);
4062  }
4063  }
4064  }
4065  }
4066 }
4067 
4068 static int stbi__compute_huffman_codes(stbi__zbuf *a) {
4069  static stbi_uc length_dezigzag[19] = {
4070  16, 17, 18, 0, 8, 7, 9, 6, 10, 5, 11, 4, 12, 3, 13, 2, 14, 1, 15};
4071  stbi__zhuffman z_codelength;
4072  stbi_uc lencodes[286 + 32 + 137]; // padding for maximum single op
4073  stbi_uc codelength_sizes[19];
4074  int i, n;
4075 
4076  int hlit = stbi__zreceive(a, 5) + 257;
4077  int hdist = stbi__zreceive(a, 5) + 1;
4078  int hclen = stbi__zreceive(a, 4) + 4;
4079 
4080  memset(codelength_sizes, 0, sizeof(codelength_sizes));
4081  for (i = 0; i < hclen; ++i) {
4082  int s = stbi__zreceive(a, 3);
4083  codelength_sizes[length_dezigzag[i]] = (stbi_uc)s;
4084  }
4085  if (!stbi__zbuild_huffman(&z_codelength, codelength_sizes, 19))
4086  return 0;
4087 
4088  n = 0;
4089  while (n < hlit + hdist) {
4090  int c = stbi__zhuffman_decode(a, &z_codelength);
4091  if (c < 0 || c >= 19)
4092  return stbi__err("bad codelengths", "Corrupt PNG");
4093  if (c < 16)
4094  lencodes[n++] = (stbi_uc)c;
4095  else if (c == 16) {
4096  c = stbi__zreceive(a, 2) + 3;
4097  memset(lencodes + n, lencodes[n - 1], c);
4098  n += c;
4099  }
4100  else if (c == 17) {
4101  c = stbi__zreceive(a, 3) + 3;
4102  memset(lencodes + n, 0, c);
4103  n += c;
4104  }
4105  else {
4106  STBI_ASSERT(c == 18);
4107  c = stbi__zreceive(a, 7) + 11;
4108  memset(lencodes + n, 0, c);
4109  n += c;
4110  }
4111  }
4112  if (n != hlit + hdist)
4113  return stbi__err("bad codelengths", "Corrupt PNG");
4114  if (!stbi__zbuild_huffman(&a->z_length, lencodes, hlit))
4115  return 0;
4116  if (!stbi__zbuild_huffman(&a->z_distance, lencodes + hlit, hdist))
4117  return 0;
4118  return 1;
4119 }
4120 
4121 static int stbi__parse_uncompressed_block(stbi__zbuf *a) {
4122  stbi_uc header[4];
4123  int len, nlen, k;
4124  if (a->num_bits & 7)
4125  stbi__zreceive(a, a->num_bits & 7); // discard
4126  // drain the bit-packed data into header
4127  k = 0;
4128  while (a->num_bits > 0) {
4129  header[k++] = (stbi_uc)(a->code_buffer & 255); // suppress MSVC run-time check
4130  a->code_buffer >>= 8;
4131  a->num_bits -= 8;
4132  }
4133  STBI_ASSERT(a->num_bits == 0);
4134  // now fill header the normal way
4135  while (k < 4)
4136  header[k++] = stbi__zget8(a);
4137  len = header[1] * 256 + header[0];
4138  nlen = header[3] * 256 + header[2];
4139  if (nlen != (len ^ 0xffff))
4140  return stbi__err("zlib corrupt", "Corrupt PNG");
4141  if (a->zbuffer + len > a->zbuffer_end)
4142  return stbi__err("read past buffer", "Corrupt PNG");
4143  if (a->zout + len > a->zout_end)
4144  if (!stbi__zexpand(a, a->zout, len))
4145  return 0;
4146  memcpy(a->zout, a->zbuffer, len);
4147  a->zbuffer += len;
4148  a->zout += len;
4149  return 1;
4150 }
4151 
4152 static int stbi__parse_zlib_header(stbi__zbuf *a) {
4153  int cmf = stbi__zget8(a);
4154  int cm = cmf & 15;
4155  /* int cinfo = cmf >> 4; */
4156  int flg = stbi__zget8(a);
4157  if ((cmf * 256 + flg) % 31 != 0)
4158  return stbi__err("bad zlib header", "Corrupt PNG"); // zlib spec
4159  if (flg & 32)
4160  return stbi__err("no preset dict", "Corrupt PNG"); // preset dictionary not allowed in png
4161  if (cm != 8)
4162  return stbi__err("bad compression", "Corrupt PNG"); // DEFLATE required for png
4163  // window = 1 << (8 + cinfo)... but who cares, we fully buffer output
4164  return 1;
4165 }
4166 
4167 // @TODO: should statically initialize these for optimal thread safety
4168 static stbi_uc stbi__zdefault_length[288], stbi__zdefault_distance[32];
4169 static void stbi__init_zdefaults(void) {
4170  int i; // use <= to match clearly with spec
4171  for (i = 0; i <= 143; ++i)
4172  stbi__zdefault_length[i] = 8;
4173  for (; i <= 255; ++i)
4174  stbi__zdefault_length[i] = 9;
4175  for (; i <= 279; ++i)
4176  stbi__zdefault_length[i] = 7;
4177  for (; i <= 287; ++i)
4178  stbi__zdefault_length[i] = 8;
4179 
4180  for (i = 0; i <= 31; ++i)
4181  stbi__zdefault_distance[i] = 5;
4182 }
4183 
4184 static int stbi__parse_zlib(stbi__zbuf *a, int parse_header) {
4185  int final, type;
4186  if (parse_header)
4187  if (!stbi__parse_zlib_header(a))
4188  return 0;
4189  a->num_bits = 0;
4190  a->code_buffer = 0;
4191  do {
4192  final = stbi__zreceive(a, 1);
4193  type = stbi__zreceive(a, 2);
4194  if (type == 0) {
4195  if (!stbi__parse_uncompressed_block(a))
4196  return 0;
4197  }
4198  else if (type == 3) {
4199  return 0;
4200  }
4201  else {
4202  if (type == 1) {
4203  // use fixed code lengths
4204  if (!stbi__zdefault_distance[31])
4205  stbi__init_zdefaults();
4206  if (!stbi__zbuild_huffman(&a->z_length, stbi__zdefault_length, 288))
4207  return 0;
4208  if (!stbi__zbuild_huffman(&a->z_distance, stbi__zdefault_distance, 32))
4209  return 0;
4210  }
4211  else {
4212  if (!stbi__compute_huffman_codes(a))
4213  return 0;
4214  }
4215  if (!stbi__parse_huffman_block(a))
4216  return 0;
4217  }
4218  } while (!final);
4219  return 1;
4220 }
4221 
4222 static int stbi__do_zlib(stbi__zbuf *a, char *obuf, int olen, int exp, int parse_header) {
4223  a->zout_start = obuf;
4224  a->zout = obuf;
4225  a->zout_end = obuf + olen;
4226  a->z_expandable = exp;
4227 
4228  return stbi__parse_zlib(a, parse_header);
4229 }
4230 
4231 STBIDEF char *
4232 stbi_zlib_decode_malloc_guesssize(const char *buffer, int len, int initial_size, int *outlen) {
4233  stbi__zbuf a;
4234  char *p = (char *)stbi__malloc(initial_size);
4235  if (p == NULL)
4236  return NULL;
4237  a.zbuffer = (stbi_uc *)buffer;
4238  a.zbuffer_end = (stbi_uc *)buffer + len;
4239  if (stbi__do_zlib(&a, p, initial_size, 1, 1)) {
4240  if (outlen)
4241  *outlen = (int)(a.zout - a.zout_start);
4242  return a.zout_start;
4243  }
4244  else {
4245  STBI_FREE(a.zout_start);
4246  return NULL;
4247  }
4248 }
4249 
4250 STBIDEF char *stbi_zlib_decode_malloc(char const *buffer, int len, int *outlen) {
4251  return stbi_zlib_decode_malloc_guesssize(buffer, len, 16384, outlen);
4252 }
4253 
4254 STBIDEF char *stbi_zlib_decode_malloc_guesssize_headerflag(
4255  const char *buffer,
4256  int len,
4257  int initial_size,
4258  int *outlen,
4259  int parse_header) {
4260  stbi__zbuf a;
4261  char *p = (char *)stbi__malloc(initial_size);
4262  if (p == NULL)
4263  return NULL;
4264  a.zbuffer = (stbi_uc *)buffer;
4265  a.zbuffer_end = (stbi_uc *)buffer + len;
4266  if (stbi__do_zlib(&a, p, initial_size, 1, parse_header)) {
4267  if (outlen)
4268  *outlen = (int)(a.zout - a.zout_start);
4269  return a.zout_start;
4270  }
4271  else {
4272  STBI_FREE(a.zout_start);
4273  return NULL;
4274  }
4275 }
4276 
4277 STBIDEF int stbi_zlib_decode_buffer(char *obuffer, int olen, char const *ibuffer, int ilen) {
4278  stbi__zbuf a;
4279  a.zbuffer = (stbi_uc *)ibuffer;
4280  a.zbuffer_end = (stbi_uc *)ibuffer + ilen;
4281  if (stbi__do_zlib(&a, obuffer, olen, 0, 1))
4282  return (int)(a.zout - a.zout_start);
4283  else
4284  return -1;
4285 }
4286 
4287 STBIDEF char *stbi_zlib_decode_noheader_malloc(char const *buffer, int len, int *outlen) {
4288  stbi__zbuf a;
4289  char *p = (char *)stbi__malloc(16384);
4290  if (p == NULL)
4291  return NULL;
4292  a.zbuffer = (stbi_uc *)buffer;
4293  a.zbuffer_end = (stbi_uc *)buffer + len;
4294  if (stbi__do_zlib(&a, p, 16384, 1, 0)) {
4295  if (outlen)
4296  *outlen = (int)(a.zout - a.zout_start);
4297  return a.zout_start;
4298  }
4299  else {
4300  STBI_FREE(a.zout_start);
4301  return NULL;
4302  }
4303 }
4304 
4305 STBIDEF int
4306 stbi_zlib_decode_noheader_buffer(char *obuffer, int olen, const char *ibuffer, int ilen) {
4307  stbi__zbuf a;
4308  a.zbuffer = (stbi_uc *)ibuffer;
4309  a.zbuffer_end = (stbi_uc *)ibuffer + ilen;
4310  if (stbi__do_zlib(&a, obuffer, olen, 0, 0))
4311  return (int)(a.zout - a.zout_start);
4312  else
4313  return -1;
4314 }
4315 #endif
4316 
4317 // public domain "baseline" PNG decoder v0.10 Sean Barrett 2006-11-18
4318 // simple implementation
4319 // - only 8-bit samples
4320 // - no CRC checking
4321 // - allocates lots of intermediate memory
4322 // - avoids problem of streaming data between subsystems
4323 // - avoids explicit window management
4324 // performance
4325 // - uses stb_zlib, a PD zlib implementation with fast huffman decoding
4326 
4327 #ifndef STBI_NO_PNG
4328 typedef struct {
4329  stbi__uint32 length;
4330  stbi__uint32 type;
4331 } stbi__pngchunk;
4332 
4333 static stbi__pngchunk stbi__get_chunk_header(stbi__context *s) {
4334  stbi__pngchunk c;
4335  c.length = stbi__get32be(s);
4336  c.type = stbi__get32be(s);
4337  return c;
4338 }
4339 
4340 static int stbi__check_png_header(stbi__context *s) {
4341  static stbi_uc png_sig[8] = {137, 80, 78, 71, 13, 10, 26, 10};
4342  int i;
4343  for (i = 0; i < 8; ++i)
4344  if (stbi__get8(s) != png_sig[i])
4345  return stbi__err("bad png sig", "Not a PNG");
4346  return 1;
4347 }
4348 
4349 typedef struct {
4350  stbi__context *s;
4351  stbi_uc *idata, *expanded, *out;
4352  int depth;
4353 } stbi__png;
4354 
4355 enum {
4356  STBI__F_none = 0,
4357  STBI__F_sub = 1,
4358  STBI__F_up = 2,
4359  STBI__F_avg = 3,
4360  STBI__F_paeth = 4,
4361  // synthetic filters used for first scanline to avoid needing a dummy row of 0s
4362  STBI__F_avg_first,
4363  STBI__F_paeth_first
4364 };
4365 
4366 static stbi_uc first_row_filter[5] = {STBI__F_none,
4367  STBI__F_sub,
4368  STBI__F_none,
4369  STBI__F_avg_first,
4370  STBI__F_paeth_first};
4371 
4372 static int stbi__paeth(int a, int b, int c) {
4373  int p = a + b - c;
4374  int pa = abs(p - a);
4375  int pb = abs(p - b);
4376  int pc = abs(p - c);
4377  if (pa <= pb && pa <= pc)
4378  return a;
4379  if (pb <= pc)
4380  return b;
4381  return c;
4382 }
4383 
4384 static stbi_uc stbi__depth_scale_table[9] = {0, 0xff, 0x55, 0, 0x11, 0, 0, 0, 0x01};
4385 
4386 // create the png data from post-deflated data
4387 static int stbi__create_png_image_raw(
4388  stbi__png *a,
4389  stbi_uc *raw,
4390  stbi__uint32 raw_len,
4391  int out_n,
4392  stbi__uint32 x,
4393  stbi__uint32 y,
4394  int depth,
4395  int color) {
4396  int bytes = (depth == 16 ? 2 : 1);
4397  stbi__context *s = a->s;
4398  stbi__uint32 i, j, stride = x * out_n * bytes;
4399  stbi__uint32 img_len, img_width_bytes;
4400  int k;
4401  int img_n = s->img_n; // copy it into a local for later
4402 
4403  int output_bytes = out_n * bytes;
4404  int filter_bytes = img_n * bytes;
4405  int width = x;
4406 
4407  STBI_ASSERT(out_n == s->img_n || out_n == s->img_n + 1);
4408  a->out = (stbi_uc *)stbi__malloc(x * y * output_bytes); // extra bytes to write off the end into
4409  if (!a->out)
4410  return stbi__err("outofmem", "Out of memory");
4411 
4412  img_width_bytes = (((img_n * x * depth) + 7) >> 3);
4413  img_len = (img_width_bytes + 1) * y;
4414  if (s->img_x == x && s->img_y == y) {
4415  if (raw_len != img_len)
4416  return stbi__err("not enough pixels", "Corrupt PNG");
4417  }
4418  else { // interlaced:
4419  if (raw_len < img_len)
4420  return stbi__err("not enough pixels", "Corrupt PNG");
4421  }
4422 
4423  for (j = 0; j < y; ++j) {
4424  stbi_uc *cur = a->out + stride * j;
4425  stbi_uc *prior = cur - stride;
4426  int filter = *raw++;
4427 
4428  if (filter > 4)
4429  return stbi__err("invalid filter", "Corrupt PNG");
4430 
4431  if (depth < 8) {
4432  STBI_ASSERT(img_width_bytes <= x);
4433  cur += x * out_n - img_width_bytes; // store output to the rightmost img_len bytes, so we
4434  // can decode in place
4435  filter_bytes = 1;
4436  width = img_width_bytes;
4437  }
4438 
4439  // if first row, use special filter that doesn't sample previous row
4440  if (j == 0)
4441  filter = first_row_filter[filter];
4442 
4443  // handle first byte explicitly
4444  for (k = 0; k < filter_bytes; ++k) {
4445  switch (filter) {
4446  case STBI__F_none: cur[k] = raw[k]; break;
4447  case STBI__F_sub: cur[k] = raw[k]; break;
4448  case STBI__F_up: cur[k] = STBI__BYTECAST(raw[k] + prior[k]); break;
4449  case STBI__F_avg: cur[k] = STBI__BYTECAST(raw[k] + (prior[k] >> 1)); break;
4450  case STBI__F_paeth:
4451  cur[k] = STBI__BYTECAST(raw[k] + stbi__paeth(0, prior[k], 0));
4452  break;
4453  case STBI__F_avg_first: cur[k] = raw[k]; break;
4454  case STBI__F_paeth_first: cur[k] = raw[k]; break;
4455  }
4456  }
4457 
4458  if (depth == 8) {
4459  if (img_n != out_n)
4460  cur[img_n] = 255; // first pixel
4461  raw += img_n;
4462  cur += out_n;
4463  prior += out_n;
4464  }
4465  else if (depth == 16) {
4466  if (img_n != out_n) {
4467  cur[filter_bytes] = 255; // first pixel top byte
4468  cur[filter_bytes + 1] = 255; // first pixel bottom byte
4469  }
4470  raw += filter_bytes;
4471  cur += output_bytes;
4472  prior += output_bytes;
4473  }
4474  else {
4475  raw += 1;
4476  cur += 1;
4477  prior += 1;
4478  }
4479 
4480  // this is a little gross, so that we don't switch per-pixel or per-component
4481  if (depth < 8 || img_n == out_n) {
4482  int nk = (width - 1) * filter_bytes;
4483 #define CASE(f) \
4484  case f: \
4485  for (k = 0; k < nk; ++k)
4486  switch (filter) {
4487  // "none" filter turns into a memcpy here; make that explicit.
4488  case STBI__F_none:
4489  memcpy(cur, raw, nk);
4490  break;
4491  CASE(STBI__F_sub) cur[k] = STBI__BYTECAST(raw[k] + cur[k - filter_bytes]);
4492  break;
4493  CASE(STBI__F_up) cur[k] = STBI__BYTECAST(raw[k] + prior[k]);
4494  break;
4495  CASE(STBI__F_avg)
4496  cur[k] = STBI__BYTECAST(raw[k] + ((prior[k] + cur[k - filter_bytes]) >> 1));
4497  break;
4498  CASE(STBI__F_paeth)
4499  cur[k] = STBI__BYTECAST(
4500  raw[k]
4501  + stbi__paeth(cur[k - filter_bytes], prior[k], prior[k - filter_bytes]));
4502  break;
4503  CASE(STBI__F_avg_first)
4504  cur[k] = STBI__BYTECAST(raw[k] + (cur[k - filter_bytes] >> 1));
4505  break;
4506  CASE(STBI__F_paeth_first)
4507  cur[k] = STBI__BYTECAST(raw[k] + stbi__paeth(cur[k - filter_bytes], 0, 0));
4508  break;
4509  }
4510 #undef CASE
4511  raw += nk;
4512  }
4513  else {
4514  STBI_ASSERT(img_n + 1 == out_n);
4515 #define CASE(f) \
4516  case f: \
4517  for (i = x - 1; i >= 1; --i, \
4518  cur[filter_bytes] = 255, \
4519  raw += filter_bytes, \
4520  cur += output_bytes, \
4521  prior += output_bytes) \
4522  for (k = 0; k < filter_bytes; ++k)
4523  switch (filter) {
4524  CASE(STBI__F_none) cur[k] = raw[k];
4525  break;
4526  CASE(STBI__F_sub) cur[k] = STBI__BYTECAST(raw[k] + cur[k - output_bytes]);
4527  break;
4528  CASE(STBI__F_up) cur[k] = STBI__BYTECAST(raw[k] + prior[k]);
4529  break;
4530  CASE(STBI__F_avg)
4531  cur[k] = STBI__BYTECAST(raw[k] + ((prior[k] + cur[k - output_bytes]) >> 1));
4532  break;
4533  CASE(STBI__F_paeth)
4534  cur[k] = STBI__BYTECAST(
4535  raw[k] + stbi__paeth(cur[k - output_bytes], prior[k], prior[k - output_bytes]));
4536  break;
4537  CASE(STBI__F_avg_first) cur[k] = STBI__BYTECAST(raw[k] + (cur[k - output_bytes] >> 1));
4538  break;
4539  CASE(STBI__F_paeth_first)
4540  cur[k] = STBI__BYTECAST(raw[k] + stbi__paeth(cur[k - output_bytes], 0, 0));
4541  break;
4542  }
4543 #undef CASE
4544 
4545  // the loop above sets the high byte of the pixels' alpha, but for
4546  // 16 bit png files we also need the low byte set. we'll do that here.
4547  if (depth == 16) {
4548  cur = a->out + stride * j; // start at the beginning of the row again
4549  for (i = 0; i < x; ++i, cur += output_bytes) {
4550  cur[filter_bytes + 1] = 255;
4551  }
4552  }
4553  }
4554  }
4555 
4556  // we make a separate pass to expand bits to pixels; for performance,
4557  // this could run two scanlines behind the above code, so it won't
4558  // intefere with filtering but will still be in the cache.
4559  if (depth < 8) {
4560  for (j = 0; j < y; ++j) {
4561  stbi_uc *cur = a->out + stride * j;
4562  stbi_uc *in = a->out + stride * j + x * out_n - img_width_bytes;
4563  // unpack 1/2/4-bit into a 8-bit buffer. allows us to keep the common 8-bit path optimal at
4564  // minimal cost for 1/2/4-bit
4565  // png guarante byte alignment, if width is not multiple of 8/4/2 we'll decode dummy
4566  // trailing data that will be skipped in the later loop
4567  stbi_uc scale = (color == 0) ? stbi__depth_scale_table[depth]
4568  : 1; // scale grayscale values to 0..255 range
4569 
4570  // note that the final byte might overshoot and write more data than desired.
4571  // we can allocate enough data that this never writes out of memory, but it
4572  // could also overwrite the next scanline. can it overwrite non-empty data
4573  // on the next scanline? yes, consider 1-pixel-wide scanlines with 1-bit-per-pixel.
4574  // so we need to explicitly clamp the final ones
4575 
4576  if (depth == 4) {
4577  for (k = x * img_n; k >= 2; k -= 2, ++in) {
4578  *cur++ = scale * ((*in >> 4));
4579  *cur++ = scale * ((*in) & 0x0f);
4580  }
4581  if (k > 0)
4582  *cur++ = scale * ((*in >> 4));
4583  }
4584  else if (depth == 2) {
4585  for (k = x * img_n; k >= 4; k -= 4, ++in) {
4586  *cur++ = scale * ((*in >> 6));
4587  *cur++ = scale * ((*in >> 4) & 0x03);
4588  *cur++ = scale * ((*in >> 2) & 0x03);
4589  *cur++ = scale * ((*in) & 0x03);
4590  }
4591  if (k > 0)
4592  *cur++ = scale * ((*in >> 6));
4593  if (k > 1)
4594  *cur++ = scale * ((*in >> 4) & 0x03);
4595  if (k > 2)
4596  *cur++ = scale * ((*in >> 2) & 0x03);
4597  }
4598  else if (depth == 1) {
4599  for (k = x * img_n; k >= 8; k -= 8, ++in) {
4600  *cur++ = scale * ((*in >> 7));
4601  *cur++ = scale * ((*in >> 6) & 0x01);
4602  *cur++ = scale * ((*in >> 5) & 0x01);
4603  *cur++ = scale * ((*in >> 4) & 0x01);
4604  *cur++ = scale * ((*in >> 3) & 0x01);
4605  *cur++ = scale * ((*in >> 2) & 0x01);
4606  *cur++ = scale * ((*in >> 1) & 0x01);
4607  *cur++ = scale * ((*in) & 0x01);
4608  }
4609  if (k > 0)
4610  *cur++ = scale * ((*in >> 7));
4611  if (k > 1)
4612  *cur++ = scale * ((*in >> 6) & 0x01);
4613  if (k > 2)
4614  *cur++ = scale * ((*in >> 5) & 0x01);
4615  if (k > 3)
4616  *cur++ = scale * ((*in >> 4) & 0x01);
4617  if (k > 4)
4618  *cur++ = scale * ((*in >> 3) & 0x01);
4619  if (k > 5)
4620  *cur++ = scale * ((*in >> 2) & 0x01);
4621  if (k > 6)
4622  *cur++ = scale * ((*in >> 1) & 0x01);
4623  }
4624  if (img_n != out_n) {
4625  int q;
4626  // insert alpha = 255
4627  cur = a->out + stride * j;
4628  if (img_n == 1) {
4629  for (q = x - 1; q >= 0; --q) {
4630  cur[q * 2 + 1] = 255;
4631  cur[q * 2 + 0] = cur[q];
4632  }
4633  }
4634  else {
4635  STBI_ASSERT(img_n == 3);
4636  for (q = x - 1; q >= 0; --q) {
4637  cur[q * 4 + 3] = 255;
4638  cur[q * 4 + 2] = cur[q * 3 + 2];
4639  cur[q * 4 + 1] = cur[q * 3 + 1];
4640  cur[q * 4 + 0] = cur[q * 3 + 0];
4641  }
4642  }
4643  }
4644  }
4645  }
4646  else if (depth == 16) {
4647  // force the image data from big-endian to platform-native.
4648  // this is done in a separate pass due to the decoding relying
4649  // on the data being untouched, but could probably be done
4650  // per-line during decode if care is taken.
4651  stbi_uc *cur = a->out;
4652  stbi__uint16 *cur16 = (stbi__uint16 *)cur;
4653 
4654  for (i = 0; i < x * y * out_n; ++i, cur16++, cur += 2) {
4655  *cur16 = (cur[0] << 8) | cur[1];
4656  }
4657  }
4658 
4659  return 1;
4660 }
4661 
4662 static int stbi__create_png_image(
4663  stbi__png *a,
4664  stbi_uc *image_data,
4665  stbi__uint32 image_data_len,
4666  int out_n,
4667  int depth,
4668  int color,
4669  int interlaced) {
4670  stbi_uc * final;
4671  int p;
4672  if (!interlaced)
4673  return stbi__create_png_image_raw(
4674  a, image_data, image_data_len, out_n, a->s->img_x, a->s->img_y, depth, color);
4675 
4676  // de-interlacing
4677  final = (stbi_uc *)stbi__malloc(a->s->img_x * a->s->img_y * out_n);
4678  for (p = 0; p < 7; ++p) {
4679  int xorig[] = {0, 4, 0, 2, 0, 1, 0};
4680  int yorig[] = {0, 0, 4, 0, 2, 0, 1};
4681  int xspc[] = {8, 8, 4, 4, 2, 2, 1};
4682  int yspc[] = {8, 8, 8, 4, 4, 2, 2};
4683  int i, j, x, y;
4684  // pass1_x[4] = 0, pass1_x[5] = 1, pass1_x[12] = 1
4685  x = (a->s->img_x - xorig[p] + xspc[p] - 1) / xspc[p];
4686  y = (a->s->img_y - yorig[p] + yspc[p] - 1) / yspc[p];
4687  if (x && y) {
4688  stbi__uint32 img_len = ((((a->s->img_n * x * depth) + 7) >> 3) + 1) * y;
4689  if (!stbi__create_png_image_raw(
4690  a, image_data, image_data_len, out_n, x, y, depth, color)) {
4691  STBI_FREE(final);
4692  return 0;
4693  }
4694  for (j = 0; j < y; ++j) {
4695  for (i = 0; i < x; ++i) {
4696  int out_y = j * yspc[p] + yorig[p];
4697  int out_x = i * xspc[p] + xorig[p];
4698  memcpy(
4699  final + out_y * a->s->img_x * out_n + out_x * out_n,
4700  a->out + (j * x + i) * out_n,
4701  out_n);
4702  }
4703  }
4704  STBI_FREE(a->out);
4705  image_data += img_len;
4706  image_data_len -= img_len;
4707  }
4708  }
4709  a->out = final;
4710 
4711  return 1;
4712 }
4713 
4714 static int stbi__compute_transparency(stbi__png *z, stbi_uc tc[3], int out_n) {
4715  stbi__context *s = z->s;
4716  stbi__uint32 i, pixel_count = s->img_x * s->img_y;
4717  stbi_uc *p = z->out;
4718 
4719  // compute color-based transparency, assuming we've
4720  // already got 255 as the alpha value in the output
4721  STBI_ASSERT(out_n == 2 || out_n == 4);
4722 
4723  if (out_n == 2) {
4724  for (i = 0; i < pixel_count; ++i) {
4725  p[1] = (p[0] == tc[0] ? 0 : 255);
4726  p += 2;
4727  }
4728  }
4729  else {
4730  for (i = 0; i < pixel_count; ++i) {
4731  if (p[0] == tc[0] && p[1] == tc[1] && p[2] == tc[2])
4732  p[3] = 0;
4733  p += 4;
4734  }
4735  }
4736  return 1;
4737 }
4738 
4739 static int stbi__compute_transparency16(stbi__png *z, stbi__uint16 tc[3], int out_n) {
4740  stbi__context *s = z->s;
4741  stbi__uint32 i, pixel_count = s->img_x * s->img_y;
4742  stbi__uint16 *p = (stbi__uint16 *)z->out;
4743 
4744  // compute color-based transparency, assuming we've
4745  // already got 65535 as the alpha value in the output
4746  STBI_ASSERT(out_n == 2 || out_n == 4);
4747 
4748  if (out_n == 2) {
4749  for (i = 0; i < pixel_count; ++i) {
4750  p[1] = (p[0] == tc[0] ? 0 : 65535);
4751  p += 2;
4752  }
4753  }
4754  else {
4755  for (i = 0; i < pixel_count; ++i) {
4756  if (p[0] == tc[0] && p[1] == tc[1] && p[2] == tc[2])
4757  p[3] = 0;
4758  p += 4;
4759  }
4760  }
4761  return 1;
4762 }
4763 
4764 static int stbi__expand_png_palette(stbi__png *a, stbi_uc *palette, int len, int pal_img_n) {
4765  stbi__uint32 i, pixel_count = a->s->img_x * a->s->img_y;
4766  stbi_uc *p, *temp_out, *orig = a->out;
4767 
4768  p = (stbi_uc *)stbi__malloc(pixel_count * pal_img_n);
4769  if (p == NULL)
4770  return stbi__err("outofmem", "Out of memory");
4771 
4772  // between here and free(out) below, exitting would leak
4773  temp_out = p;
4774 
4775  if (pal_img_n == 3) {
4776  for (i = 0; i < pixel_count; ++i) {
4777  int n = orig[i] * 4;
4778  p[0] = palette[n];
4779  p[1] = palette[n + 1];
4780  p[2] = palette[n + 2];
4781  p += 3;
4782  }
4783  }
4784  else {
4785  for (i = 0; i < pixel_count; ++i) {
4786  int n = orig[i] * 4;
4787  p[0] = palette[n];
4788  p[1] = palette[n + 1];
4789  p[2] = palette[n + 2];
4790  p[3] = palette[n + 3];
4791  p += 4;
4792  }
4793  }
4794  STBI_FREE(a->out);
4795  a->out = temp_out;
4796 
4797  STBI_NOTUSED(len);
4798 
4799  return 1;
4800 }
4801 
4802 static int stbi__reduce_png(stbi__png *p) {
4803  int i;
4804  int img_len = p->s->img_x * p->s->img_y * p->s->img_out_n;
4805  stbi_uc *reduced;
4806  stbi__uint16 *orig = (stbi__uint16 *)p->out;
4807 
4808  if (p->depth != 16)
4809  return 1; // don't need to do anything if not 16-bit data
4810 
4811  reduced = (stbi_uc *)stbi__malloc(img_len);
4812  if (p == NULL)
4813  return stbi__err("outofmem", "Out of memory");
4814 
4815  for (i = 0; i < img_len; ++i)
4816  reduced[i] = (stbi_uc)(
4817  (orig[i] >> 8) & 0xFF); // top half of each byte is a decent approx of 16->8 bit scaling
4818 
4819  p->out = reduced;
4820  STBI_FREE(orig);
4821 
4822  return 1;
4823 }
4824 
4825 static int stbi__unpremultiply_on_load = 0;
4826 static int stbi__de_iphone_flag = 0;
4827 
4828 STBIDEF void stbi_set_unpremultiply_on_load(int flag_true_if_should_unpremultiply) {
4829  stbi__unpremultiply_on_load = flag_true_if_should_unpremultiply;
4830 }
4831 
4832 STBIDEF void stbi_convert_iphone_png_to_rgb(int flag_true_if_should_convert) {
4833  stbi__de_iphone_flag = flag_true_if_should_convert;
4834 }
4835 
4836 static void stbi__de_iphone(stbi__png *z) {
4837  stbi__context *s = z->s;
4838  stbi__uint32 i, pixel_count = s->img_x * s->img_y;
4839  stbi_uc *p = z->out;
4840 
4841  if (s->img_out_n == 3) { // convert bgr to rgb
4842  for (i = 0; i < pixel_count; ++i) {
4843  stbi_uc t = p[0];
4844  p[0] = p[2];
4845  p[2] = t;
4846  p += 3;
4847  }
4848  }
4849  else {
4850  STBI_ASSERT(s->img_out_n == 4);
4851  if (stbi__unpremultiply_on_load) {
4852  // convert bgr to rgb and unpremultiply
4853  for (i = 0; i < pixel_count; ++i) {
4854  stbi_uc a = p[3];
4855  stbi_uc t = p[0];
4856  if (a) {
4857  p[0] = p[2] * 255 / a;
4858  p[1] = p[1] * 255 / a;
4859  p[2] = t * 255 / a;
4860  }
4861  else {
4862  p[0] = p[2];
4863  p[2] = t;
4864  }
4865  p += 4;
4866  }
4867  }
4868  else {
4869  // convert bgr to rgb
4870  for (i = 0; i < pixel_count; ++i) {
4871  stbi_uc t = p[0];
4872  p[0] = p[2];
4873  p[2] = t;
4874  p += 4;
4875  }
4876  }
4877  }
4878 }
4879 
4880 #define STBI__PNG_TYPE(a, b, c, d) (((a) << 24) + ((b) << 16) + ((c) << 8) + (d))
4881 
4882 static int stbi__parse_png_file(stbi__png *z, int scan, int req_comp) {
4883  stbi_uc palette[1024], pal_img_n = 0;
4884  stbi_uc has_trans = 0, tc[3];
4885  stbi__uint16 tc16[3];
4886  stbi__uint32 ioff = 0, idata_limit = 0, i, pal_len = 0;
4887  int first = 1, k, interlace = 0, color = 0, is_iphone = 0;
4888  stbi__context *s = z->s;
4889 
4890  z->expanded = NULL;
4891  z->idata = NULL;
4892  z->out = NULL;
4893 
4894  if (!stbi__check_png_header(s))
4895  return 0;
4896 
4897  if (scan == STBI__SCAN_type)
4898  return 1;
4899 
4900  for (;;) {
4901  stbi__pngchunk c = stbi__get_chunk_header(s);
4902  switch (c.type) {
4903  case STBI__PNG_TYPE('C', 'g', 'B', 'I'):
4904  is_iphone = 1;
4905  stbi__skip(s, c.length);
4906  break;
4907  case STBI__PNG_TYPE('I', 'H', 'D', 'R'): {
4908  int comp, filter;
4909  if (!first)
4910  return stbi__err("multiple IHDR", "Corrupt PNG");
4911  first = 0;
4912  if (c.length != 13)
4913  return stbi__err("bad IHDR len", "Corrupt PNG");
4914  s->img_x = stbi__get32be(s);
4915  if (s->img_x > (1 << 24))
4916  return stbi__err("too large", "Very large image (corrupt?)");
4917  s->img_y = stbi__get32be(s);
4918  if (s->img_y > (1 << 24))
4919  return stbi__err("too large", "Very large image (corrupt?)");
4920  z->depth = stbi__get8(s);
4921  if (z->depth != 1 && z->depth != 2 && z->depth != 4 && z->depth != 8 && z->depth != 16)
4922  return stbi__err("1/2/4/8/16-bit only", "PNG not supported: 1/2/4/8/16-bit only");
4923  color = stbi__get8(s);
4924  if (color > 6)
4925  return stbi__err("bad ctype", "Corrupt PNG");
4926  if (color == 3 && z->depth == 16)
4927  return stbi__err("bad ctype", "Corrupt PNG");
4928  if (color == 3)
4929  pal_img_n = 3;
4930  else if (color & 1)
4931  return stbi__err("bad ctype", "Corrupt PNG");
4932  comp = stbi__get8(s);
4933  if (comp)
4934  return stbi__err("bad comp method", "Corrupt PNG");
4935  filter = stbi__get8(s);
4936  if (filter)
4937  return stbi__err("bad filter method", "Corrupt PNG");
4938  interlace = stbi__get8(s);
4939  if (interlace > 1)
4940  return stbi__err("bad interlace method", "Corrupt PNG");
4941  if (!s->img_x || !s->img_y)
4942  return stbi__err("0-pixel image", "Corrupt PNG");
4943  if (!pal_img_n) {
4944  s->img_n = (color & 2 ? 3 : 1) + (color & 4 ? 1 : 0);
4945  if ((1 << 30) / s->img_x / s->img_n < s->img_y)
4946  return stbi__err("too large", "Image too large to decode");
4947  if (scan == STBI__SCAN_header)
4948  return 1;
4949  }
4950  else {
4951  // if paletted, then pal_n is our final components, and
4952  // img_n is # components to decompress/filter.
4953  s->img_n = 1;
4954  if ((1 << 30) / s->img_x / 4 < s->img_y)
4955  return stbi__err("too large", "Corrupt PNG");
4956  // if SCAN_header, have to scan to see if we have a tRNS
4957  }
4958  break;
4959  }
4960 
4961  case STBI__PNG_TYPE('P', 'L', 'T', 'E'): {
4962  if (first)
4963  return stbi__err("first not IHDR", "Corrupt PNG");
4964  if (c.length > 256 * 3)
4965  return stbi__err("invalid PLTE", "Corrupt PNG");
4966  pal_len = c.length / 3;
4967  if (pal_len * 3 != c.length)
4968  return stbi__err("invalid PLTE", "Corrupt PNG");
4969  for (i = 0; i < pal_len; ++i) {
4970  palette[i * 4 + 0] = stbi__get8(s);
4971  palette[i * 4 + 1] = stbi__get8(s);
4972  palette[i * 4 + 2] = stbi__get8(s);
4973  palette[i * 4 + 3] = 255;
4974  }
4975  break;
4976  }
4977 
4978  case STBI__PNG_TYPE('t', 'R', 'N', 'S'): {
4979  if (first)
4980  return stbi__err("first not IHDR", "Corrupt PNG");
4981  if (z->idata)
4982  return stbi__err("tRNS after IDAT", "Corrupt PNG");
4983  if (pal_img_n) {
4984  if (scan == STBI__SCAN_header) {
4985  s->img_n = 4;
4986  return 1;
4987  }
4988  if (pal_len == 0)
4989  return stbi__err("tRNS before PLTE", "Corrupt PNG");
4990  if (c.length > pal_len)
4991  return stbi__err("bad tRNS len", "Corrupt PNG");
4992  pal_img_n = 4;
4993  for (i = 0; i < c.length; ++i)
4994  palette[i * 4 + 3] = stbi__get8(s);
4995  }
4996  else {
4997  if (!(s->img_n & 1))
4998  return stbi__err("tRNS with alpha", "Corrupt PNG");
4999  if (c.length != (stbi__uint32)s->img_n * 2)
5000  return stbi__err("bad tRNS len", "Corrupt PNG");
5001  has_trans = 1;
5002  if (z->depth == 16) {
5003  for (k = 0; k < s->img_n; ++k)
5004  tc16[k] = stbi__get16be(s); // copy the values as-is
5005  }
5006  else {
5007  for (k = 0; k < s->img_n; ++k)
5008  tc[k] = (stbi_uc)(stbi__get16be(s) & 255)
5009  * stbi__depth_scale_table[z->depth]; // non 8-bit images will be larger
5010  }
5011  }
5012  break;
5013  }
5014 
5015  case STBI__PNG_TYPE('I', 'D', 'A', 'T'): {
5016  if (first)
5017  return stbi__err("first not IHDR", "Corrupt PNG");
5018  if (pal_img_n && !pal_len)
5019  return stbi__err("no PLTE", "Corrupt PNG");
5020  if (scan == STBI__SCAN_header) {
5021  s->img_n = pal_img_n;
5022  return 1;
5023  }
5024  if ((int)(ioff + c.length) < (int)ioff)
5025  return 0;
5026  if (ioff + c.length > idata_limit) {
5027  stbi__uint32 idata_limit_old = idata_limit;
5028  stbi_uc *p;
5029  if (idata_limit == 0)
5030  idata_limit = c.length > 4096 ? c.length : 4096;
5031  while (ioff + c.length > idata_limit)
5032  idata_limit *= 2;
5033  STBI_NOTUSED(idata_limit_old);
5034  p = (stbi_uc *)STBI_REALLOC_SIZED(z->idata, idata_limit_old, idata_limit);
5035  if (p == NULL)
5036  return stbi__err("outofmem", "Out of memory");
5037  z->idata = p;
5038  }
5039  if (!stbi__getn(s, z->idata + ioff, c.length))
5040  return stbi__err("outofdata", "Corrupt PNG");
5041  ioff += c.length;
5042  break;
5043  }
5044 
5045  case STBI__PNG_TYPE('I', 'E', 'N', 'D'): {
5046  stbi__uint32 raw_len, bpl;
5047  if (first)
5048  return stbi__err("first not IHDR", "Corrupt PNG");
5049  if (scan != STBI__SCAN_load)
5050  return 1;
5051  if (z->idata == NULL)
5052  return stbi__err("no IDAT", "Corrupt PNG");
5053  // initial guess for decoded data size to avoid unnecessary reallocs
5054  bpl = (s->img_x * z->depth + 7) / 8; // bytes per line, per component
5055  raw_len = bpl * s->img_y * s->img_n /* pixels */ + s->img_y /* filter mode per row */;
5056  z->expanded = (stbi_uc *)stbi_zlib_decode_malloc_guesssize_headerflag(
5057  (char *)z->idata, ioff, raw_len, (int *)&raw_len, !is_iphone);
5058  if (z->expanded == NULL)
5059  return 0; // zlib should set error
5060  STBI_FREE(z->idata);
5061  z->idata = NULL;
5062  if ((req_comp == s->img_n + 1 && req_comp != 3 && !pal_img_n) || has_trans)
5063  s->img_out_n = s->img_n + 1;
5064  else
5065  s->img_out_n = s->img_n;
5066  if (!stbi__create_png_image(
5067  z, z->expanded, raw_len, s->img_out_n, z->depth, color, interlace))
5068  return 0;
5069  if (has_trans) {
5070  if (z->depth == 16) {
5071  if (!stbi__compute_transparency16(z, tc16, s->img_out_n))
5072  return 0;
5073  }
5074  else {
5075  if (!stbi__compute_transparency(z, tc, s->img_out_n))
5076  return 0;
5077  }
5078  }
5079  if (is_iphone && stbi__de_iphone_flag && s->img_out_n > 2)
5080  stbi__de_iphone(z);
5081  if (pal_img_n) {
5082  // pal_img_n == 3 or 4
5083  s->img_n = pal_img_n; // record the actual colors we had
5084  s->img_out_n = pal_img_n;
5085  if (req_comp >= 3)
5086  s->img_out_n = req_comp;
5087  if (!stbi__expand_png_palette(z, palette, pal_len, s->img_out_n))
5088  return 0;
5089  }
5090  STBI_FREE(z->expanded);
5091  z->expanded = NULL;
5092  return 1;
5093  }
5094 
5095  default:
5096  // if critical, fail
5097  if (first)
5098  return stbi__err("first not IHDR", "Corrupt PNG");
5099  if ((c.type & (1 << 29)) == 0) {
5100 #ifndef STBI_NO_FAILURE_STRINGS
5101  // not threadsafe
5102  static char invalid_chunk[] = "XXXX PNG chunk not known";
5103  invalid_chunk[0] = STBI__BYTECAST(c.type >> 24);
5104  invalid_chunk[1] = STBI__BYTECAST(c.type >> 16);
5105  invalid_chunk[2] = STBI__BYTECAST(c.type >> 8);
5106  invalid_chunk[3] = STBI__BYTECAST(c.type >> 0);
5107 #endif
5108  return stbi__err(invalid_chunk, "PNG not supported: unknown PNG chunk type");
5109  }
5110  stbi__skip(s, c.length);
5111  break;
5112  }
5113  // end of PNG chunk, read and skip CRC
5114  stbi__get32be(s);
5115  }
5116 }
5117 
5118 static unsigned char *stbi__do_png(stbi__png *p, int *x, int *y, int *n, int req_comp) {
5119  unsigned char *result = NULL;
5120  if (req_comp < 0 || req_comp > 4)
5121  return stbi__errpuc("bad req_comp", "Internal error");
5122  if (stbi__parse_png_file(p, STBI__SCAN_load, req_comp)) {
5123  if (p->depth == 16) {
5124  if (!stbi__reduce_png(p)) {
5125  return result;
5126  }
5127  }
5128  result = p->out;
5129  p->out = NULL;
5130  if (req_comp && req_comp != p->s->img_out_n) {
5131  result = stbi__convert_format(result, p->s->img_out_n, req_comp, p->s->img_x, p->s->img_y);
5132  p->s->img_out_n = req_comp;
5133  if (result == NULL)
5134  return result;
5135  }
5136  *x = p->s->img_x;
5137  *y = p->s->img_y;
5138  if (n)
5139  *n = p->s->img_n;
5140  }
5141  STBI_FREE(p->out);
5142  p->out = NULL;
5143  STBI_FREE(p->expanded);
5144  p->expanded = NULL;
5145  STBI_FREE(p->idata);
5146  p->idata = NULL;
5147 
5148  return result;
5149 }
5150 
5151 static unsigned char *stbi__png_load(stbi__context *s, int *x, int *y, int *comp, int req_comp) {
5152  stbi__png p;
5153  p.s = s;
5154  return stbi__do_png(&p, x, y, comp, req_comp);
5155 }
5156 
5157 static int stbi__png_test(stbi__context *s) {
5158  int r;
5159  r = stbi__check_png_header(s);
5160  stbi__rewind(s);
5161  return r;
5162 }
5163 
5164 static int stbi__png_info_raw(stbi__png *p, int *x, int *y, int *comp) {
5165  if (!stbi__parse_png_file(p, STBI__SCAN_header, 0)) {
5166  stbi__rewind(p->s);
5167  return 0;
5168  }
5169  if (x)
5170  *x = p->s->img_x;
5171  if (y)
5172  *y = p->s->img_y;
5173  if (comp)
5174  *comp = p->s->img_n;
5175  return 1;
5176 }
5177 
5178 static int stbi__png_info(stbi__context *s, int *x, int *y, int *comp) {
5179  stbi__png p;
5180  p.s = s;
5181  return stbi__png_info_raw(&p, x, y, comp);
5182 }
5183 #endif
5184 
5185 // Microsoft/Windows BMP image
5186 
5187 #ifndef STBI_NO_BMP
5188 static int stbi__bmp_test_raw(stbi__context *s) {
5189  int r;
5190  int sz;
5191  if (stbi__get8(s) != 'B')
5192  return 0;
5193  if (stbi__get8(s) != 'M')
5194  return 0;
5195  stbi__get32le(s); // discard filesize
5196  stbi__get16le(s); // discard reserved
5197  stbi__get16le(s); // discard reserved
5198  stbi__get32le(s); // discard data offset
5199  sz = stbi__get32le(s);
5200  r = (sz == 12 || sz == 40 || sz == 56 || sz == 108 || sz == 124);
5201  return r;
5202 }
5203 
5204 static int stbi__bmp_test(stbi__context *s) {
5205  int r = stbi__bmp_test_raw(s);
5206  stbi__rewind(s);
5207  return r;
5208 }
5209 
5210 // returns 0..31 for the highest set bit
5211 static int stbi__high_bit(unsigned int z) {
5212  int n = 0;
5213  if (z == 0)
5214  return -1;
5215  if (z >= 0x10000)
5216  n += 16, z >>= 16;
5217  if (z >= 0x00100)
5218  n += 8, z >>= 8;
5219  if (z >= 0x00010)
5220  n += 4, z >>= 4;
5221  if (z >= 0x00004)
5222  n += 2, z >>= 2;
5223  if (z >= 0x00002)
5224  n += 1, z >>= 1;
5225  return n;
5226 }
5227 
5228 static int stbi__bitcount(unsigned int a) {
5229  a = (a & 0x55555555) + ((a >> 1) & 0x55555555); // max 2
5230  a = (a & 0x33333333) + ((a >> 2) & 0x33333333); // max 4
5231  a = (a + (a >> 4)) & 0x0f0f0f0f; // max 8 per 4, now 8 bits
5232  a = (a + (a >> 8)); // max 16 per 8 bits
5233  a = (a + (a >> 16)); // max 32 per 8 bits
5234  return a & 0xff;
5235 }
5236 
5237 static int stbi__shiftsigned(int v, int shift, int bits) {
5238  int result;
5239  int z = 0;
5240 
5241  if (shift < 0)
5242  v <<= -shift;
5243  else
5244  v >>= shift;
5245  result = v;
5246 
5247  z = bits;
5248  while (z < 8) {
5249  result += v >> z;
5250  z += bits;
5251  }
5252  return result;
5253 }
5254 
5255 typedef struct {
5256  int bpp, offset, hsz;
5257  unsigned int mr, mg, mb, ma, all_a;
5258 } stbi__bmp_data;
5259 
5260 static void *stbi__bmp_parse_header(stbi__context *s, stbi__bmp_data *info) {
5261  int hsz;
5262  if (stbi__get8(s) != 'B' || stbi__get8(s) != 'M')
5263  return stbi__errpuc("not BMP", "Corrupt BMP");
5264  stbi__get32le(s); // discard filesize
5265  stbi__get16le(s); // discard reserved
5266  stbi__get16le(s); // discard reserved
5267  info->offset = stbi__get32le(s);
5268  info->hsz = hsz = stbi__get32le(s);
5269  info->mr = info->mg = info->mb = info->ma = 0;
5270 
5271  if (hsz != 12 && hsz != 40 && hsz != 56 && hsz != 108 && hsz != 124)
5272  return stbi__errpuc("unknown BMP", "BMP type not supported: unknown");
5273  if (hsz == 12) {
5274  s->img_x = stbi__get16le(s);
5275  s->img_y = stbi__get16le(s);
5276  }
5277  else {
5278  s->img_x = stbi__get32le(s);
5279  s->img_y = stbi__get32le(s);
5280  }
5281  if (stbi__get16le(s) != 1)
5282  return stbi__errpuc("bad BMP", "bad BMP");
5283  info->bpp = stbi__get16le(s);
5284  if (info->bpp == 1)
5285  return stbi__errpuc("monochrome", "BMP type not supported: 1-bit");
5286  if (hsz != 12) {
5287  int compress = stbi__get32le(s);
5288  if (compress == 1 || compress == 2)
5289  return stbi__errpuc("BMP RLE", "BMP type not supported: RLE");
5290  stbi__get32le(s); // discard sizeof
5291  stbi__get32le(s); // discard hres
5292  stbi__get32le(s); // discard vres
5293  stbi__get32le(s); // discard colorsused
5294  stbi__get32le(s); // discard max important
5295  if (hsz == 40 || hsz == 56) {
5296  if (hsz == 56) {
5297  stbi__get32le(s);
5298  stbi__get32le(s);
5299  stbi__get32le(s);
5300  stbi__get32le(s);
5301  }
5302  if (info->bpp == 16 || info->bpp == 32) {
5303  if (compress == 0) {
5304  if (info->bpp == 32) {
5305  info->mr = 0xffu << 16;
5306  info->mg = 0xffu << 8;
5307  info->mb = 0xffu << 0;
5308  info->ma = 0xffu << 24;
5309  info->all_a =
5310  0; // if all_a is 0 at end, then we loaded alpha channel but it was all 0
5311  }
5312  else {
5313  info->mr = 31u << 10;
5314  info->mg = 31u << 5;
5315  info->mb = 31u << 0;
5316  }
5317  }
5318  else if (compress == 3) {
5319  info->mr = stbi__get32le(s);
5320  info->mg = stbi__get32le(s);
5321  info->mb = stbi__get32le(s);
5322  // not documented, but generated by photoshop and handled by mspaint
5323  if (info->mr == info->mg && info->mg == info->mb) {
5324  // ?!?!?
5325  return stbi__errpuc("bad BMP", "bad BMP");
5326  }
5327  }
5328  else
5329  return stbi__errpuc("bad BMP", "bad BMP");
5330  }
5331  }
5332  else {
5333  int i;
5334  if (hsz != 108 && hsz != 124)
5335  return stbi__errpuc("bad BMP", "bad BMP");
5336  info->mr = stbi__get32le(s);
5337  info->mg = stbi__get32le(s);
5338  info->mb = stbi__get32le(s);
5339  info->ma = stbi__get32le(s);
5340  stbi__get32le(s); // discard color space
5341  for (i = 0; i < 12; ++i)
5342  stbi__get32le(s); // discard color space parameters
5343  if (hsz == 124) {
5344  stbi__get32le(s); // discard rendering intent
5345  stbi__get32le(s); // discard offset of profile data
5346  stbi__get32le(s); // discard size of profile data
5347  stbi__get32le(s); // discard reserved
5348  }
5349  }
5350  }
5351  return (void *)1;
5352 }
5353 
5354 static stbi_uc *stbi__bmp_load(stbi__context *s, int *x, int *y, int *comp, int req_comp) {
5355  stbi_uc *out;
5356  unsigned int mr = 0, mg = 0, mb = 0, ma = 0, all_a;
5357  stbi_uc pal[256][4];
5358  int psize = 0, i, j, width;
5359  int flip_vertically, pad, target;
5360  stbi__bmp_data info;
5361 
5362  info.all_a = 255;
5363  if (stbi__bmp_parse_header(s, &info) == NULL)
5364  return NULL; // error code already set
5365 
5366  flip_vertically = ((int)s->img_y) > 0;
5367  s->img_y = abs((int)s->img_y);
5368 
5369  mr = info.mr;
5370  mg = info.mg;
5371  mb = info.mb;
5372  ma = info.ma;
5373  all_a = info.all_a;
5374 
5375  if (info.hsz == 12) {
5376  if (info.bpp < 24)
5377  psize = (info.offset - 14 - 24) / 3;
5378  }
5379  else {
5380  if (info.bpp < 16)
5381  psize = (info.offset - 14 - info.hsz) >> 2;
5382  }
5383 
5384  s->img_n = ma ? 4 : 3;
5385  if (req_comp && req_comp >= 3) // we can directly decode 3 or 4
5386  target = req_comp;
5387  else
5388  target = s->img_n; // if they want monochrome, we'll post-convert
5389 
5390  out = (stbi_uc *)stbi__malloc(target * s->img_x * s->img_y);
5391  if (!out)
5392  return stbi__errpuc("outofmem", "Out of memory");
5393  if (info.bpp < 16) {
5394  int z = 0;
5395  if (psize == 0 || psize > 256) {
5396  STBI_FREE(out);
5397  return stbi__errpuc("invalid", "Corrupt BMP");
5398  }
5399  for (i = 0; i < psize; ++i) {
5400  pal[i][2] = stbi__get8(s);
5401  pal[i][1] = stbi__get8(s);
5402  pal[i][0] = stbi__get8(s);
5403  if (info.hsz != 12)
5404  stbi__get8(s);
5405  pal[i][3] = 255;
5406  }
5407  stbi__skip(s, info.offset - 14 - info.hsz - psize * (info.hsz == 12 ? 3 : 4));
5408  if (info.bpp == 4)
5409  width = (s->img_x + 1) >> 1;
5410  else if (info.bpp == 8)
5411  width = s->img_x;
5412  else {
5413  STBI_FREE(out);
5414  return stbi__errpuc("bad bpp", "Corrupt BMP");
5415  }
5416  pad = (-width) & 3;
5417  for (j = 0; j < (int)s->img_y; ++j) {
5418  for (i = 0; i < (int)s->img_x; i += 2) {
5419  int v = stbi__get8(s), v2 = 0;
5420  if (info.bpp == 4) {
5421  v2 = v & 15;
5422  v >>= 4;
5423  }
5424  out[z++] = pal[v][0];
5425  out[z++] = pal[v][1];
5426  out[z++] = pal[v][2];
5427  if (target == 4)
5428  out[z++] = 255;
5429  if (i + 1 == (int)s->img_x)
5430  break;
5431  v = (info.bpp == 8) ? stbi__get8(s) : v2;
5432  out[z++] = pal[v][0];
5433  out[z++] = pal[v][1];
5434  out[z++] = pal[v][2];
5435  if (target == 4)
5436  out[z++] = 255;
5437  }
5438  stbi__skip(s, pad);
5439  }
5440  }
5441  else {
5442  int rshift = 0, gshift = 0, bshift = 0, ashift = 0, rcount = 0, gcount = 0, bcount = 0,
5443  acount = 0;
5444  int z = 0;
5445  int easy = 0;
5446  stbi__skip(s, info.offset - 14 - info.hsz);
5447  if (info.bpp == 24)
5448  width = 3 * s->img_x;
5449  else if (info.bpp == 16)
5450  width = 2 * s->img_x;
5451  else /* bpp = 32 and pad = 0 */
5452  width = 0;
5453  pad = (-width) & 3;
5454  if (info.bpp == 24) {
5455  easy = 1;
5456  }
5457  else if (info.bpp == 32) {
5458  if (mb == 0xff && mg == 0xff00 && mr == 0x00ff0000 && ma == 0xff000000)
5459  easy = 2;
5460  }
5461  if (!easy) {
5462  if (!mr || !mg || !mb) {
5463  STBI_FREE(out);
5464  return stbi__errpuc("bad masks", "Corrupt BMP");
5465  }
5466  // right shift amt to put high bit in position #7
5467  rshift = stbi__high_bit(mr) - 7;
5468  rcount = stbi__bitcount(mr);
5469  gshift = stbi__high_bit(mg) - 7;
5470  gcount = stbi__bitcount(mg);
5471  bshift = stbi__high_bit(mb) - 7;
5472  bcount = stbi__bitcount(mb);
5473  ashift = stbi__high_bit(ma) - 7;
5474  acount = stbi__bitcount(ma);
5475  }
5476  for (j = 0; j < (int)s->img_y; ++j) {
5477  if (easy) {
5478  for (i = 0; i < (int)s->img_x; ++i) {
5479  unsigned char a;
5480  out[z + 2] = stbi__get8(s);
5481  out[z + 1] = stbi__get8(s);
5482  out[z + 0] = stbi__get8(s);
5483  z += 3;
5484  a = (easy == 2 ? stbi__get8(s) : 255);
5485  all_a |= a;
5486  if (target == 4)
5487  out[z++] = a;
5488  }
5489  }
5490  else {
5491  int bpp = info.bpp;
5492  for (i = 0; i < (int)s->img_x; ++i) {
5493  stbi__uint32 v = (bpp == 16 ? (stbi__uint32)stbi__get16le(s) : stbi__get32le(s));
5494  int a;
5495  out[z++] = STBI__BYTECAST(stbi__shiftsigned(v & mr, rshift, rcount));
5496  out[z++] = STBI__BYTECAST(stbi__shiftsigned(v & mg, gshift, gcount));
5497  out[z++] = STBI__BYTECAST(stbi__shiftsigned(v & mb, bshift, bcount));
5498  a = (ma ? stbi__shiftsigned(v & ma, ashift, acount) : 255);
5499  all_a |= a;
5500  if (target == 4)
5501  out[z++] = STBI__BYTECAST(a);
5502  }
5503  }
5504  stbi__skip(s, pad);
5505  }
5506  }
5507 
5508  // if alpha channel is all 0s, replace with all 255s
5509  if (target == 4 && all_a == 0)
5510  for (i = 4 * s->img_x * s->img_y - 1; i >= 0; i -= 4)
5511  out[i] = 255;
5512 
5513  if (flip_vertically) {
5514  stbi_uc t;
5515  for (j = 0; j<(int)s->img_y>> 1; ++j) {
5516  stbi_uc *p1 = out + j * s->img_x * target;
5517  stbi_uc *p2 = out + (s->img_y - 1 - j) * s->img_x * target;
5518  for (i = 0; i < (int)s->img_x * target; ++i) {
5519  t = p1[i], p1[i] = p2[i], p2[i] = t;
5520  }
5521  }
5522  }
5523 
5524  if (req_comp && req_comp != target) {
5525  out = stbi__convert_format(out, target, req_comp, s->img_x, s->img_y);
5526  if (out == NULL)
5527  return out; // stbi__convert_format frees input on failure
5528  }
5529 
5530  *x = s->img_x;
5531  *y = s->img_y;
5532  if (comp)
5533  *comp = s->img_n;
5534  return out;
5535 }
5536 #endif
5537 
5538 // Targa Truevision - TGA
5539 // by Jonathan Dummer
5540 #ifndef STBI_NO_TGA
5541 // returns STBI_rgb or whatever, 0 on error
5542 static int stbi__tga_get_comp(int bits_per_pixel, int is_grey, int *is_rgb16) {
5543  // only RGB or RGBA (incl. 16bit) or grey allowed
5544  if (is_rgb16)
5545  *is_rgb16 = 0;
5546  switch (bits_per_pixel) {
5547  case 8: return STBI_grey;
5548  case 16:
5549  if (is_grey)
5550  return STBI_grey_alpha;
5551  // else: fall-through
5552  case 15:
5553  if (is_rgb16)
5554  *is_rgb16 = 1;
5555  return STBI_rgb;
5556  case 24: // fall-through
5557  case 32: return bits_per_pixel / 8;
5558  default: return 0;
5559  }
5560 }
5561 
5562 static int stbi__tga_info(stbi__context *s, int *x, int *y, int *comp) {
5563  int tga_w, tga_h, tga_comp, tga_image_type, tga_bits_per_pixel, tga_colormap_bpp;
5564  int sz, tga_colormap_type;
5565  stbi__get8(s); // discard Offset
5566  tga_colormap_type = stbi__get8(s); // colormap type
5567  if (tga_colormap_type > 1) {
5568  stbi__rewind(s);
5569  return 0; // only RGB or indexed allowed
5570  }
5571  tga_image_type = stbi__get8(s); // image type
5572  if (tga_colormap_type == 1) { // colormapped (paletted) image
5573  if (tga_image_type != 1 && tga_image_type != 9) {
5574  stbi__rewind(s);
5575  return 0;
5576  }
5577  stbi__skip(s, 4); // skip index of first colormap entry and number of entries
5578  sz = stbi__get8(s); // check bits per palette color entry
5579  if ((sz != 8) && (sz != 15) && (sz != 16) && (sz != 24) && (sz != 32)) {
5580  stbi__rewind(s);
5581  return 0;
5582  }
5583  stbi__skip(s, 4); // skip image x and y origin
5584  tga_colormap_bpp = sz;
5585  }
5586  else { // "normal" image w/o colormap - only RGB or grey allowed, +/- RLE
5587  if ((tga_image_type != 2) && (tga_image_type != 3) && (tga_image_type != 10)
5588  && (tga_image_type != 11)) {
5589  stbi__rewind(s);
5590  return 0; // only RGB or grey allowed, +/- RLE
5591  }
5592  stbi__skip(s, 9); // skip colormap specification and image x/y origin
5593  tga_colormap_bpp = 0;
5594  }
5595  tga_w = stbi__get16le(s);
5596  if (tga_w < 1) {
5597  stbi__rewind(s);
5598  return 0; // test width
5599  }
5600  tga_h = stbi__get16le(s);
5601  if (tga_h < 1) {
5602  stbi__rewind(s);
5603  return 0; // test height
5604  }
5605  tga_bits_per_pixel = stbi__get8(s); // bits per pixel
5606  stbi__get8(s); // ignore alpha bits
5607  if (tga_colormap_bpp != 0) {
5608  if ((tga_bits_per_pixel != 8) && (tga_bits_per_pixel != 16)) {
5609  // when using a colormap, tga_bits_per_pixel is the size of the indexes
5610  // I don't think anything but 8 or 16bit indexes makes sense
5611  stbi__rewind(s);
5612  return 0;
5613  }
5614  tga_comp = stbi__tga_get_comp(tga_colormap_bpp, 0, NULL);
5615  }
5616  else {
5617  tga_comp = stbi__tga_get_comp(
5618  tga_bits_per_pixel, (tga_image_type == 3) || (tga_image_type == 11), NULL);
5619  }
5620  if (!tga_comp) {
5621  stbi__rewind(s);
5622  return 0;
5623  }
5624  if (x)
5625  *x = tga_w;
5626  if (y)
5627  *y = tga_h;
5628  if (comp)
5629  *comp = tga_comp;
5630  return 1; // seems to have passed everything
5631 }
5632 
5633 static int stbi__tga_test(stbi__context *s) {
5634  int res = 0;
5635  int sz, tga_color_type;
5636  stbi__get8(s); // discard Offset
5637  tga_color_type = stbi__get8(s); // color type
5638  if (tga_color_type > 1)
5639  goto errorEnd; // only RGB or indexed allowed
5640  sz = stbi__get8(s); // image type
5641  if (tga_color_type == 1) { // colormapped (paletted) image
5642  if (sz != 1 && sz != 9)
5643  goto errorEnd; // colortype 1 demands image type 1 or 9
5644  stbi__skip(s, 4); // skip index of first colormap entry and number of entries
5645  sz = stbi__get8(s); // check bits per palette color entry
5646  if ((sz != 8) && (sz != 15) && (sz != 16) && (sz != 24) && (sz != 32))
5647  goto errorEnd;
5648  stbi__skip(s, 4); // skip image x and y origin
5649  }
5650  else { // "normal" image w/o colormap
5651  if ((sz != 2) && (sz != 3) && (sz != 10) && (sz != 11))
5652  goto errorEnd; // only RGB or grey allowed, +/- RLE
5653  stbi__skip(s, 9); // skip colormap specification and image x/y origin
5654  }
5655  if (stbi__get16le(s) < 1)
5656  goto errorEnd; // test width
5657  if (stbi__get16le(s) < 1)
5658  goto errorEnd; // test height
5659  sz = stbi__get8(s); // bits per pixel
5660  if ((tga_color_type == 1) && (sz != 8) && (sz != 16))
5661  goto errorEnd; // for colormapped images, bpp is size of an index
5662  if ((sz != 8) && (sz != 15) && (sz != 16) && (sz != 24) && (sz != 32))
5663  goto errorEnd;
5664 
5665  res = 1; // if we got this far, everything's good and we can return 1 instead of 0
5666 
5667 errorEnd:
5668  stbi__rewind(s);
5669  return res;
5670 }
5671 
5672 // read 16bit value and convert to 24bit RGB
5673 void stbi__tga_read_rgb16(stbi__context *s, stbi_uc *out) {
5674  stbi__uint16 px = stbi__get16le(s);
5675  stbi__uint16 fiveBitMask = 31;
5676  // we have 3 channels with 5bits each
5677  int r = (px >> 10) & fiveBitMask;
5678  int g = (px >> 5) & fiveBitMask;
5679  int b = px & fiveBitMask;
5680  // Note that this saves the data in RGB(A) order, so it doesn't need to be swapped later
5681  out[0] = (r * 255) / 31;
5682  out[1] = (g * 255) / 31;
5683  out[2] = (b * 255) / 31;
5684 
5685  // some people claim that the most significant bit might be used for alpha
5686  // (possibly if an alpha-bit is set in the "image descriptor byte")
5687  // but that only made 16bit test images completely translucent..
5688  // so let's treat all 15 and 16bit TGAs as RGB with no alpha.
5689 }
5690 
5691 static stbi_uc *stbi__tga_load(stbi__context *s, int *x, int *y, int *comp, int req_comp) {
5692  // read in the TGA header stuff
5693  int tga_offset = stbi__get8(s);
5694  int tga_indexed = stbi__get8(s);
5695  int tga_image_type = stbi__get8(s);
5696  int tga_is_RLE = 0;
5697  int tga_palette_start = stbi__get16le(s);
5698  int tga_palette_len = stbi__get16le(s);
5699  int tga_palette_bits = stbi__get8(s);
5700  int tga_x_origin = stbi__get16le(s);
5701  int tga_y_origin = stbi__get16le(s);
5702  int tga_width = stbi__get16le(s);
5703  int tga_height = stbi__get16le(s);
5704  int tga_bits_per_pixel = stbi__get8(s);
5705  int tga_comp, tga_rgb16 = 0;
5706  int tga_inverted = stbi__get8(s);
5707  // int tga_alpha_bits = tga_inverted & 15; // the 4 lowest bits - unused (useless?)
5708  // image data
5709  unsigned char *tga_data;
5710  unsigned char *tga_palette = NULL;
5711  int i, j;
5712  unsigned char raw_data[4];
5713  int RLE_count = 0;
5714  int RLE_repeating = 0;
5715  int read_next_pixel = 1;
5716 
5717  // do a tiny bit of precessing
5718  if (tga_image_type >= 8) {
5719  tga_image_type -= 8;
5720  tga_is_RLE = 1;
5721  }
5722  tga_inverted = 1 - ((tga_inverted >> 5) & 1);
5723 
5724  // If I'm paletted, then I'll use the number of bits from the palette
5725  if (tga_indexed)
5726  tga_comp = stbi__tga_get_comp(tga_palette_bits, 0, &tga_rgb16);
5727  else
5728  tga_comp = stbi__tga_get_comp(tga_bits_per_pixel, (tga_image_type == 3), &tga_rgb16);
5729 
5730  if (!tga_comp) // shouldn't really happen, stbi__tga_test() should have ensured basic consistency
5731  return stbi__errpuc("bad format", "Can't find out TGA pixelformat");
5732 
5733  // tga info
5734  *x = tga_width;
5735  *y = tga_height;
5736  if (comp)
5737  *comp = tga_comp;
5738 
5739  tga_data = (unsigned char *)stbi__malloc((size_t)tga_width * tga_height * tga_comp);
5740  if (!tga_data)
5741  return stbi__errpuc("outofmem", "Out of memory");
5742 
5743  // skip to the data's starting position (offset usually = 0)
5744  stbi__skip(s, tga_offset);
5745 
5746  if (!tga_indexed && !tga_is_RLE && !tga_rgb16) {
5747  for (i = 0; i < tga_height; ++i) {
5748  int row = tga_inverted ? tga_height - i - 1 : i;
5749  stbi_uc *tga_row = tga_data + row * tga_width * tga_comp;
5750  stbi__getn(s, tga_row, tga_width * tga_comp);
5751  }
5752  }
5753  else {
5754  // do I need to load a palette?
5755  if (tga_indexed) {
5756  // any data to skip? (offset usually = 0)
5757  stbi__skip(s, tga_palette_start);
5758  // load the palette
5759  tga_palette = (unsigned char *)stbi__malloc(tga_palette_len * tga_comp);
5760  if (!tga_palette) {
5761  STBI_FREE(tga_data);
5762  return stbi__errpuc("outofmem", "Out of memory");
5763  }
5764  if (tga_rgb16) {
5765  stbi_uc *pal_entry = tga_palette;
5766  STBI_ASSERT(tga_comp == STBI_rgb);
5767  for (i = 0; i < tga_palette_len; ++i) {
5768  stbi__tga_read_rgb16(s, pal_entry);
5769  pal_entry += tga_comp;
5770  }
5771  }
5772  else if (!stbi__getn(s, tga_palette, tga_palette_len * tga_comp)) {
5773  STBI_FREE(tga_data);
5774  STBI_FREE(tga_palette);
5775  return stbi__errpuc("bad palette", "Corrupt TGA");
5776  }
5777  }
5778  // load the data
5779  for (i = 0; i < tga_width * tga_height; ++i) {
5780  // if I'm in RLE mode, do I need to get a RLE stbi__pngchunk?
5781  if (tga_is_RLE) {
5782  if (RLE_count == 0) {
5783  // yep, get the next byte as a RLE command
5784  int RLE_cmd = stbi__get8(s);
5785  RLE_count = 1 + (RLE_cmd & 127);
5786  RLE_repeating = RLE_cmd >> 7;
5787  read_next_pixel = 1;
5788  }
5789  else if (!RLE_repeating) {
5790  read_next_pixel = 1;
5791  }
5792  }
5793  else {
5794  read_next_pixel = 1;
5795  }
5796  // OK, if I need to read a pixel, do it now
5797  if (read_next_pixel) {
5798  // load however much data we did have
5799  if (tga_indexed) {
5800  // read in index, then perform the lookup
5801  int pal_idx = (tga_bits_per_pixel == 8) ? stbi__get8(s) : stbi__get16le(s);
5802  if (pal_idx >= tga_palette_len) {
5803  // invalid index
5804  pal_idx = 0;
5805  }
5806  pal_idx *= tga_comp;
5807  for (j = 0; j < tga_comp; ++j) {
5808  raw_data[j] = tga_palette[pal_idx + j];
5809  }
5810  }
5811  else if (tga_rgb16) {
5812  STBI_ASSERT(tga_comp == STBI_rgb);
5813  stbi__tga_read_rgb16(s, raw_data);
5814  }
5815  else {
5816  // read in the data raw
5817  for (j = 0; j < tga_comp; ++j) {
5818  raw_data[j] = stbi__get8(s);
5819  }
5820  }
5821  // clear the reading flag for the next pixel
5822  read_next_pixel = 0;
5823  } // end of reading a pixel
5824 
5825  // copy data
5826  for (j = 0; j < tga_comp; ++j)
5827  tga_data[i * tga_comp + j] = raw_data[j];
5828 
5829  // in case we're in RLE mode, keep counting down
5830  --RLE_count;
5831  }
5832  // do I need to invert the image?
5833  if (tga_inverted) {
5834  for (j = 0; j * 2 < tga_height; ++j) {
5835  int index1 = j * tga_width * tga_comp;
5836  int index2 = (tga_height - 1 - j) * tga_width * tga_comp;
5837  for (i = tga_width * tga_comp; i > 0; --i) {
5838  unsigned char temp = tga_data[index1];
5839  tga_data[index1] = tga_data[index2];
5840  tga_data[index2] = temp;
5841  ++index1;
5842  ++index2;
5843  }
5844  }
5845  }
5846  // clear my palette, if I had one
5847  if (tga_palette != NULL) {
5848  STBI_FREE(tga_palette);
5849  }
5850  }
5851 
5852  // swap RGB - if the source data was RGB16, it already is in the right order
5853  if (tga_comp >= 3 && !tga_rgb16) {
5854  unsigned char *tga_pixel = tga_data;
5855  for (i = 0; i < tga_width * tga_height; ++i) {
5856  unsigned char temp = tga_pixel[0];
5857  tga_pixel[0] = tga_pixel[2];
5858  tga_pixel[2] = temp;
5859  tga_pixel += tga_comp;
5860  }
5861  }
5862 
5863  // convert to target component count
5864  if (req_comp && req_comp != tga_comp)
5865  tga_data = stbi__convert_format(tga_data, tga_comp, req_comp, tga_width, tga_height);
5866 
5867  // the things I do to get rid of an error message, and yet keep
5868  // Microsoft's C compilers happy... [8^(
5869  tga_palette_start = tga_palette_len = tga_palette_bits = tga_x_origin = tga_y_origin = 0;
5870  // OK, done
5871  return tga_data;
5872 }
5873 #endif
5874 
5875 // *************************************************************************************************
5876 // Photoshop PSD loader -- PD by Thatcher Ulrich, integration by Nicolas Schulz, tweaked by STB
5877 
5878 #ifndef STBI_NO_PSD
5879 static int stbi__psd_test(stbi__context *s) {
5880  int r = (stbi__get32be(s) == 0x38425053);
5881  stbi__rewind(s);
5882  return r;
5883 }
5884 
5885 static stbi_uc *stbi__psd_load(stbi__context *s, int *x, int *y, int *comp, int req_comp) {
5886  int pixelCount;
5887  int channelCount, compression;
5888  int channel, i, count, len;
5889  int bitdepth;
5890  int w, h;
5891  stbi_uc *out;
5892 
5893  // Check identifier
5894  if (stbi__get32be(s) != 0x38425053) // "8BPS"
5895  return stbi__errpuc("not PSD", "Corrupt PSD image");
5896 
5897  // Check file type version.
5898  if (stbi__get16be(s) != 1)
5899  return stbi__errpuc("wrong version", "Unsupported version of PSD image");
5900 
5901  // Skip 6 reserved bytes.
5902  stbi__skip(s, 6);
5903 
5904  // Read the number of channels (R, G, B, A, etc).
5905  channelCount = stbi__get16be(s);
5906  if (channelCount < 0 || channelCount > 16)
5907  return stbi__errpuc("wrong channel count", "Unsupported number of channels in PSD image");
5908 
5909  // Read the rows and columns of the image.
5910  h = stbi__get32be(s);
5911  w = stbi__get32be(s);
5912 
5913  // Make sure the depth is 8 bits.
5914  bitdepth = stbi__get16be(s);
5915  if (bitdepth != 8 && bitdepth != 16)
5916  return stbi__errpuc("unsupported bit depth", "PSD bit depth is not 8 or 16 bit");
5917 
5918  // Make sure the color mode is RGB.
5919  // Valid options are:
5920  // 0: Bitmap
5921  // 1: Grayscale
5922  // 2: Indexed color
5923  // 3: RGB color
5924  // 4: CMYK color
5925  // 7: Multichannel
5926  // 8: Duotone
5927  // 9: Lab color
5928  if (stbi__get16be(s) != 3)
5929  return stbi__errpuc("wrong color format", "PSD is not in RGB color format");
5930 
5931  // Skip the Mode Data. (It's the palette for indexed color; other info for other modes.)
5932  stbi__skip(s, stbi__get32be(s));
5933 
5934  // Skip the image resources. (resolution, pen tool paths, etc)
5935  stbi__skip(s, stbi__get32be(s));
5936 
5937  // Skip the reserved data.
5938  stbi__skip(s, stbi__get32be(s));
5939 
5940  // Find out if the data is compressed.
5941  // Known values:
5942  // 0: no compression
5943  // 1: RLE compressed
5944  compression = stbi__get16be(s);
5945  if (compression > 1)
5946  return stbi__errpuc("bad compression", "PSD has an unknown compression format");
5947 
5948  // Create the destination image.
5949  out = (stbi_uc *)stbi__malloc(4 * w * h);
5950  if (!out)
5951  return stbi__errpuc("outofmem", "Out of memory");
5952  pixelCount = w * h;
5953 
5954  // Initialize the data to zero.
5955  // memset( out, 0, pixelCount * 4 );
5956 
5957  // Finally, the image data.
5958  if (compression) {
5959  // RLE as used by .PSD and .TIFF
5960  // Loop until you get the number of unpacked bytes you are expecting:
5961  // Read the next source byte into n.
5962  // If n is between 0 and 127 inclusive, copy the next n+1 bytes literally.
5963  // Else if n is between -127 and -1 inclusive, copy the next byte -n+1 times.
5964  // Else if n is 128, noop.
5965  // Endloop
5966 
5967  // The RLE-compressed data is preceeded by a 2-byte data count for each row in the data,
5968  // which we're going to just skip.
5969  stbi__skip(s, h * channelCount * 2);
5970 
5971  // Read the RLE data by channel.
5972  for (channel = 0; channel < 4; channel++) {
5973  stbi_uc *p;
5974 
5975  p = out + channel;
5976  if (channel >= channelCount) {
5977  // Fill this channel with default data.
5978  for (i = 0; i < pixelCount; i++, p += 4)
5979  *p = (channel == 3 ? 255 : 0);
5980  }
5981  else {
5982  // Read the RLE data.
5983  count = 0;
5984  while (count < pixelCount) {
5985  len = stbi__get8(s);
5986  if (len == 128) {
5987  // No-op.
5988  }
5989  else if (len < 128) {
5990  // Copy next len+1 bytes literally.
5991  len++;
5992  count += len;
5993  while (len) {
5994  *p = stbi__get8(s);
5995  p += 4;
5996  len--;
5997  }
5998  }
5999  else if (len > 128) {
6000  stbi_uc val;
6001  // Next -len+1 bytes in the dest are replicated from next source byte.
6002  // (Interpret len as a negative 8-bit int.)
6003  len ^= 0x0FF;
6004  len += 2;
6005  val = stbi__get8(s);
6006  count += len;
6007  while (len) {
6008  *p = val;
6009  p += 4;
6010  len--;
6011  }
6012  }
6013  }
6014  }
6015  }
6016  }
6017  else {
6018  // We're at the raw image data. It's each channel in order (Red, Green, Blue, Alpha, ...)
6019  // where each channel consists of an 8-bit value for each pixel in the image.
6020 
6021  // Read the data by channel.
6022  for (channel = 0; channel < 4; channel++) {
6023  stbi_uc *p;
6024 
6025  p = out + channel;
6026  if (channel >= channelCount) {
6027  // Fill this channel with default data.
6028  stbi_uc val = channel == 3 ? 255 : 0;
6029  for (i = 0; i < pixelCount; i++, p += 4)
6030  *p = val;
6031  }
6032  else {
6033  // Read the data.
6034  if (bitdepth == 16) {
6035  for (i = 0; i < pixelCount; i++, p += 4)
6036  *p = (stbi_uc)(stbi__get16be(s) >> 8);
6037  }
6038  else {
6039  for (i = 0; i < pixelCount; i++, p += 4)
6040  *p = stbi__get8(s);
6041  }
6042  }
6043  }
6044  }
6045 
6046  if (channelCount >= 4) {
6047  for (i = 0; i < w * h; ++i) {
6048  unsigned char *pixel = out + 4 * i;
6049  if (pixel[3] != 0 && pixel[3] != 255) {
6050  // remove weird white matte from PSD
6051  float a = pixel[3] / 255.0f;
6052  float ra = 1.0f / a;
6053  float inv_a = 255.0f * (1 - ra);
6054  pixel[0] = (unsigned char)(pixel[0] * ra + inv_a);
6055  pixel[1] = (unsigned char)(pixel[1] * ra + inv_a);
6056  pixel[2] = (unsigned char)(pixel[2] * ra + inv_a);
6057  }
6058  }
6059  }
6060 
6061  if (req_comp && req_comp != 4) {
6062  out = stbi__convert_format(out, 4, req_comp, w, h);
6063  if (out == NULL)
6064  return out; // stbi__convert_format frees input on failure
6065  }
6066 
6067  if (comp)
6068  *comp = 4;
6069  *y = h;
6070  *x = w;
6071 
6072  return out;
6073 }
6074 #endif
6075 
6076 // *************************************************************************************************
6077 // Softimage PIC loader
6078 // by Tom Seddon
6079 //
6080 // See http://softimage.wiki.softimage.com/index.php/INFO:_PIC_file_format
6081 // See http://ozviz.wasp.uwa.edu.au/~pbourke/dataformats/softimagepic/
6082 
6083 #ifndef STBI_NO_PIC
6084 static int stbi__pic_is4(stbi__context *s, const char *str) {
6085  int i;
6086  for (i = 0; i < 4; ++i)
6087  if (stbi__get8(s) != (stbi_uc)str[i])
6088  return 0;
6089 
6090  return 1;
6091 }
6092 
6093 static int stbi__pic_test_core(stbi__context *s) {
6094  int i;
6095 
6096  if (!stbi__pic_is4(s, "\x53\x80\xF6\x34"))
6097  return 0;
6098 
6099  for (i = 0; i < 84; ++i)
6100  stbi__get8(s);
6101 
6102  if (!stbi__pic_is4(s, "PICT"))
6103  return 0;
6104 
6105  return 1;
6106 }
6107 
6108 typedef struct { stbi_uc size, type, channel; } stbi__pic_packet;
6109 
6110 static stbi_uc *stbi__readval(stbi__context *s, int channel, stbi_uc *dest) {
6111  int mask = 0x80, i;
6112 
6113  for (i = 0; i < 4; ++i, mask >>= 1) {
6114  if (channel & mask) {
6115  if (stbi__at_eof(s))
6116  return stbi__errpuc("bad file", "PIC file too short");
6117  dest[i] = stbi__get8(s);
6118  }
6119  }
6120 
6121  return dest;
6122 }
6123 
6124 static void stbi__copyval(int channel, stbi_uc *dest, const stbi_uc *src) {
6125  int mask = 0x80, i;
6126 
6127  for (i = 0; i < 4; ++i, mask >>= 1)
6128  if (channel & mask)
6129  dest[i] = src[i];
6130 }
6131 
6132 static stbi_uc *
6133 stbi__pic_load_core(stbi__context *s, int width, int height, int *comp, stbi_uc *result) {
6134  int act_comp = 0, num_packets = 0, y, chained;
6135  stbi__pic_packet packets[10];
6136 
6137  // this will (should...) cater for even some bizarre stuff like having data
6138  // for the same channel in multiple packets.
6139  do {
6140  stbi__pic_packet *packet;
6141 
6142  if (num_packets == sizeof(packets) / sizeof(packets[0]))
6143  return stbi__errpuc("bad format", "too many packets");
6144 
6145  packet = &packets[num_packets++];
6146 
6147  chained = stbi__get8(s);
6148  packet->size = stbi__get8(s);
6149  packet->type = stbi__get8(s);
6150  packet->channel = stbi__get8(s);
6151 
6152  act_comp |= packet->channel;
6153 
6154  if (stbi__at_eof(s))
6155  return stbi__errpuc("bad file", "file too short (reading packets)");
6156  if (packet->size != 8)
6157  return stbi__errpuc("bad format", "packet isn't 8bpp");
6158  } while (chained);
6159 
6160  *comp = (act_comp & 0x10 ? 4 : 3); // has alpha channel?
6161 
6162  for (y = 0; y < height; ++y) {
6163  int packet_idx;
6164 
6165  for (packet_idx = 0; packet_idx < num_packets; ++packet_idx) {
6166  stbi__pic_packet *packet = &packets[packet_idx];
6167  stbi_uc *dest = result + y * width * 4;
6168 
6169  switch (packet->type) {
6170  default: return stbi__errpuc("bad format", "packet has bad compression type");
6171 
6172  case 0: { // uncompressed
6173  int x;
6174 
6175  for (x = 0; x < width; ++x, dest += 4)
6176  if (!stbi__readval(s, packet->channel, dest))
6177  return 0;
6178  break;
6179  }
6180 
6181  case 1: // Pure RLE
6182  {
6183  int left = width, i;
6184 
6185  while (left > 0) {
6186  stbi_uc count, value[4];
6187 
6188  count = stbi__get8(s);
6189  if (stbi__at_eof(s))
6190  return stbi__errpuc("bad file", "file too short (pure read count)");
6191 
6192  if (count > left)
6193  count = (stbi_uc)left;
6194 
6195  if (!stbi__readval(s, packet->channel, value))
6196  return 0;
6197 
6198  for (i = 0; i < count; ++i, dest += 4)
6199  stbi__copyval(packet->channel, dest, value);
6200  left -= count;
6201  }
6202  } break;
6203 
6204  case 2: { // Mixed RLE
6205  int left = width;
6206  while (left > 0) {
6207  int count = stbi__get8(s), i;
6208  if (stbi__at_eof(s))
6209  return stbi__errpuc("bad file", "file too short (mixed read count)");
6210 
6211  if (count >= 128) { // Repeated
6212  stbi_uc value[4];
6213 
6214  if (count == 128)
6215  count = stbi__get16be(s);
6216  else
6217  count -= 127;
6218  if (count > left)
6219  return stbi__errpuc("bad file", "scanline overrun");
6220 
6221  if (!stbi__readval(s, packet->channel, value))
6222  return 0;
6223 
6224  for (i = 0; i < count; ++i, dest += 4)
6225  stbi__copyval(packet->channel, dest, value);
6226  }
6227  else { // Raw
6228  ++count;
6229  if (count > left)
6230  return stbi__errpuc("bad file", "scanline overrun");
6231 
6232  for (i = 0; i < count; ++i, dest += 4)
6233  if (!stbi__readval(s, packet->channel, dest))
6234  return 0;
6235  }
6236  left -= count;
6237  }
6238  break;
6239  }
6240  }
6241  }
6242  }
6243 
6244  return result;
6245 }
6246 
6247 static stbi_uc *stbi__pic_load(stbi__context *s, int *px, int *py, int *comp, int req_comp) {
6248  stbi_uc *result;
6249  int i, x, y;
6250 
6251  for (i = 0; i < 92; ++i)
6252  stbi__get8(s);
6253 
6254  x = stbi__get16be(s);
6255  y = stbi__get16be(s);
6256  if (stbi__at_eof(s))
6257  return stbi__errpuc("bad file", "file too short (pic header)");
6258  if ((1 << 28) / x < y)
6259  return stbi__errpuc("too large", "Image too large to decode");
6260 
6261  stbi__get32be(s); // skip `ratio'
6262  stbi__get16be(s); // skip `fields'
6263  stbi__get16be(s); // skip `pad'
6264 
6265  // intermediate buffer is RGBA
6266  result = (stbi_uc *)stbi__malloc(x * y * 4);
6267  memset(result, 0xff, x * y * 4);
6268 
6269  if (!stbi__pic_load_core(s, x, y, comp, result)) {
6270  STBI_FREE(result);
6271  result = 0;
6272  }
6273  *px = x;
6274  *py = y;
6275  if (req_comp == 0)
6276  req_comp = *comp;
6277  result = stbi__convert_format(result, 4, req_comp, x, y);
6278 
6279  return result;
6280 }
6281 
6282 static int stbi__pic_test(stbi__context *s) {
6283  int r = stbi__pic_test_core(s);
6284  stbi__rewind(s);
6285  return r;
6286 }
6287 #endif
6288 
6289 // *************************************************************************************************
6290 // GIF loader -- public domain by Jean-Marc Lienher -- simplified/shrunk by stb
6291 
6292 #ifndef STBI_NO_GIF
6293 typedef struct {
6294  stbi__int16 prefix;
6295  stbi_uc first;
6296  stbi_uc suffix;
6297 } stbi__gif_lzw;
6298 
6299 typedef struct {
6300  int w, h;
6301  stbi_uc *out, *old_out; // output buffer (always 4 components)
6302  int flags, bgindex, ratio, transparent, eflags, delay;
6303  stbi_uc pal[256][4];
6304  stbi_uc lpal[256][4];
6305  stbi__gif_lzw codes[4096];
6306  stbi_uc *color_table;
6307  int parse, step;
6308  int lflags;
6309  int start_x, start_y;
6310  int max_x, max_y;
6311  int cur_x, cur_y;
6312  int line_size;
6313 } stbi__gif;
6314 
6315 static int stbi__gif_test_raw(stbi__context *s) {
6316  int sz;
6317  if (stbi__get8(s) != 'G' || stbi__get8(s) != 'I' || stbi__get8(s) != 'F' || stbi__get8(s) != '8')
6318  return 0;
6319  sz = stbi__get8(s);
6320  if (sz != '9' && sz != '7')
6321  return 0;
6322  if (stbi__get8(s) != 'a')
6323  return 0;
6324  return 1;
6325 }
6326 
6327 static int stbi__gif_test(stbi__context *s) {
6328  int r = stbi__gif_test_raw(s);
6329  stbi__rewind(s);
6330  return r;
6331 }
6332 
6333 static void
6334 stbi__gif_parse_colortable(stbi__context *s, stbi_uc pal[256][4], int num_entries, int transp) {
6335  int i;
6336  for (i = 0; i < num_entries; ++i) {
6337  pal[i][2] = stbi__get8(s);
6338  pal[i][1] = stbi__get8(s);
6339  pal[i][0] = stbi__get8(s);
6340  pal[i][3] = transp == i ? 0 : 255;
6341  }
6342 }
6343 
6344 static int stbi__gif_header(stbi__context *s, stbi__gif *g, int *comp, int is_info) {
6345  stbi_uc version;
6346  if (stbi__get8(s) != 'G' || stbi__get8(s) != 'I' || stbi__get8(s) != 'F' || stbi__get8(s) != '8')
6347  return stbi__err("not GIF", "Corrupt GIF");
6348 
6349  version = stbi__get8(s);
6350  if (version != '7' && version != '9')
6351  return stbi__err("not GIF", "Corrupt GIF");
6352  if (stbi__get8(s) != 'a')
6353  return stbi__err("not GIF", "Corrupt GIF");
6354 
6355  stbi__g_failure_reason = "";
6356  g->w = stbi__get16le(s);
6357  g->h = stbi__get16le(s);
6358  g->flags = stbi__get8(s);
6359  g->bgindex = stbi__get8(s);
6360  g->ratio = stbi__get8(s);
6361  g->transparent = -1;
6362 
6363  if (comp != 0)
6364  *comp = 4; // can't actually tell whether it's 3 or 4 until we parse the comments
6365 
6366  if (is_info)
6367  return 1;
6368 
6369  if (g->flags & 0x80)
6370  stbi__gif_parse_colortable(s, g->pal, 2 << (g->flags & 7), -1);
6371 
6372  return 1;
6373 }
6374 
6375 static int stbi__gif_info_raw(stbi__context *s, int *x, int *y, int *comp) {
6376  stbi__gif *g = (stbi__gif *)stbi__malloc(sizeof(stbi__gif));
6377  if (!stbi__gif_header(s, g, comp, 1)) {
6378  STBI_FREE(g);
6379  stbi__rewind(s);
6380  return 0;
6381  }
6382  if (x)
6383  *x = g->w;
6384  if (y)
6385  *y = g->h;
6386  STBI_FREE(g);
6387  return 1;
6388 }
6389 
6390 static void stbi__out_gif_code(stbi__gif *g, stbi__uint16 code) {
6391  stbi_uc *p, *c;
6392 
6393  // recurse to decode the prefixes, since the linked-list is backwards,
6394  // and working backwards through an interleaved image would be nasty
6395  if (g->codes[code].prefix >= 0)
6396  stbi__out_gif_code(g, g->codes[code].prefix);
6397 
6398  if (g->cur_y >= g->max_y)
6399  return;
6400 
6401  p = &g->out[g->cur_x + g->cur_y];
6402  c = &g->color_table[g->codes[code].suffix * 4];
6403 
6404  if (c[3] >= 128) {
6405  p[0] = c[2];
6406  p[1] = c[1];
6407  p[2] = c[0];
6408  p[3] = c[3];
6409  }
6410  g->cur_x += 4;
6411 
6412  if (g->cur_x >= g->max_x) {
6413  g->cur_x = g->start_x;
6414  g->cur_y += g->step;
6415 
6416  while (g->cur_y >= g->max_y && g->parse > 0) {
6417  g->step = (1 << g->parse) * g->line_size;
6418  g->cur_y = g->start_y + (g->step >> 1);
6419  --g->parse;
6420  }
6421  }
6422 }
6423 
6424 static stbi_uc *stbi__process_gif_raster(stbi__context *s, stbi__gif *g) {
6425  stbi_uc lzw_cs;
6426  stbi__int32 len, init_code;
6427  stbi__uint32 first;
6428  stbi__int32 codesize, codemask, avail, oldcode, bits, valid_bits, clear;
6429  stbi__gif_lzw *p;
6430 
6431  lzw_cs = stbi__get8(s);
6432  if (lzw_cs > 12)
6433  return NULL;
6434  clear = 1 << lzw_cs;
6435  first = 1;
6436  codesize = lzw_cs + 1;
6437  codemask = (1 << codesize) - 1;
6438  bits = 0;
6439  valid_bits = 0;
6440  for (init_code = 0; init_code < clear; init_code++) {
6441  g->codes[init_code].prefix = -1;
6442  g->codes[init_code].first = (stbi_uc)init_code;
6443  g->codes[init_code].suffix = (stbi_uc)init_code;
6444  }
6445 
6446  // support no starting clear code
6447  avail = clear + 2;
6448  oldcode = -1;
6449 
6450  len = 0;
6451  for (;;) {
6452  if (valid_bits < codesize) {
6453  if (len == 0) {
6454  len = stbi__get8(s); // start new block
6455  if (len == 0)
6456  return g->out;
6457  }
6458  --len;
6459  bits |= (stbi__int32)stbi__get8(s) << valid_bits;
6460  valid_bits += 8;
6461  }
6462  else {
6463  stbi__int32 code = bits & codemask;
6464  bits >>= codesize;
6465  valid_bits -= codesize;
6466  // @OPTIMIZE: is there some way we can accelerate the non-clear path?
6467  if (code == clear) { // clear code
6468  codesize = lzw_cs + 1;
6469  codemask = (1 << codesize) - 1;
6470  avail = clear + 2;
6471  oldcode = -1;
6472  first = 0;
6473  }
6474  else if (code == clear + 1) { // end of stream code
6475  stbi__skip(s, len);
6476  while ((len = stbi__get8(s)) > 0)
6477  stbi__skip(s, len);
6478  return g->out;
6479  }
6480  else if (code <= avail) {
6481  if (first)
6482  return stbi__errpuc("no clear code", "Corrupt GIF");
6483 
6484  if (oldcode >= 0) {
6485  p = &g->codes[avail++];
6486  if (avail > 4096)
6487  return stbi__errpuc("too many codes", "Corrupt GIF");
6488  p->prefix = (stbi__int16)oldcode;
6489  p->first = g->codes[oldcode].first;
6490  p->suffix = (code == avail) ? p->first : g->codes[code].first;
6491  }
6492  else if (code == avail)
6493  return stbi__errpuc("illegal code in raster", "Corrupt GIF");
6494 
6495  stbi__out_gif_code(g, (stbi__uint16)code);
6496 
6497  if ((avail & codemask) == 0 && avail <= 0x0FFF) {
6498  codesize++;
6499  codemask = (1 << codesize) - 1;
6500  }
6501 
6502  oldcode = code;
6503  }
6504  else {
6505  return stbi__errpuc("illegal code in raster", "Corrupt GIF");
6506  }
6507  }
6508  }
6509 }
6510 
6511 static void stbi__fill_gif_background(stbi__gif *g, int x0, int y0, int x1, int y1) {
6512  int x, y;
6513  stbi_uc *c = g->pal[g->bgindex];
6514  for (y = y0; y < y1; y += 4 * g->w) {
6515  for (x = x0; x < x1; x += 4) {
6516  stbi_uc *p = &g->out[y + x];
6517  p[0] = c[2];
6518  p[1] = c[1];
6519  p[2] = c[0];
6520  p[3] = 0;
6521  }
6522  }
6523 }
6524 
6525 // this function is designed to support animated gifs, although stb_image doesn't support it
6526 static stbi_uc *stbi__gif_load_next(stbi__context *s, stbi__gif *g, int *comp, int req_comp) {
6527  int i;
6528  stbi_uc *prev_out = 0;
6529 
6530  if (g->out == 0 && !stbi__gif_header(s, g, comp, 0))
6531  return 0; // stbi__g_failure_reason set by stbi__gif_header
6532 
6533  prev_out = g->out;
6534  g->out = (stbi_uc *)stbi__malloc(4 * g->w * g->h);
6535  if (g->out == 0)
6536  return stbi__errpuc("outofmem", "Out of memory");
6537 
6538  switch ((g->eflags & 0x1C) >> 2) {
6539  case 0: // unspecified (also always used on 1st frame)
6540  stbi__fill_gif_background(g, 0, 0, 4 * g->w, 4 * g->w * g->h);
6541  break;
6542  case 1: // do not dispose
6543  if (prev_out)
6544  memcpy(g->out, prev_out, 4 * g->w * g->h);
6545  g->old_out = prev_out;
6546  break;
6547  case 2: // dispose to background
6548  if (prev_out)
6549  memcpy(g->out, prev_out, 4 * g->w * g->h);
6550  stbi__fill_gif_background(g, g->start_x, g->start_y, g->max_x, g->max_y);
6551  break;
6552  case 3: // dispose to previous
6553  if (g->old_out) {
6554  for (i = g->start_y; i < g->max_y; i += 4 * g->w)
6555  memcpy(&g->out[i + g->start_x], &g->old_out[i + g->start_x], g->max_x - g->start_x);
6556  }
6557  break;
6558  }
6559 
6560  for (;;) {
6561  switch (stbi__get8(s)) {
6562  case 0x2C: /* Image Descriptor */
6563  {
6564  int prev_trans = -1;
6565  stbi__int32 x, y, w, h;
6566  stbi_uc *o;
6567 
6568  x = stbi__get16le(s);
6569  y = stbi__get16le(s);
6570  w = stbi__get16le(s);
6571  h = stbi__get16le(s);
6572  if (((x + w) > (g->w)) || ((y + h) > (g->h)))
6573  return stbi__errpuc("bad Image Descriptor", "Corrupt GIF");
6574 
6575  g->line_size = g->w * 4;
6576  g->start_x = x * 4;
6577  g->start_y = y * g->line_size;
6578  g->max_x = g->start_x + w * 4;
6579  g->max_y = g->start_y + h * g->line_size;
6580  g->cur_x = g->start_x;
6581  g->cur_y = g->start_y;
6582 
6583  g->lflags = stbi__get8(s);
6584 
6585  if (g->lflags & 0x40) {
6586  g->step = 8 * g->line_size; // first interlaced spacing
6587  g->parse = 3;
6588  }
6589  else {
6590  g->step = g->line_size;
6591  g->parse = 0;
6592  }
6593 
6594  if (g->lflags & 0x80) {
6595  stbi__gif_parse_colortable(
6596  s, g->lpal, 2 << (g->lflags & 7), g->eflags & 0x01 ? g->transparent : -1);
6597  g->color_table = (stbi_uc *)g->lpal;
6598  }
6599  else if (g->flags & 0x80) {
6600  if (g->transparent >= 0 && (g->eflags & 0x01)) {
6601  prev_trans = g->pal[g->transparent][3];
6602  g->pal[g->transparent][3] = 0;
6603  }
6604  g->color_table = (stbi_uc *)g->pal;
6605  }
6606  else
6607  return stbi__errpuc("missing color table", "Corrupt GIF");
6608 
6609  o = stbi__process_gif_raster(s, g);
6610  if (o == NULL)
6611  return NULL;
6612 
6613  if (prev_trans != -1)
6614  g->pal[g->transparent][3] = (stbi_uc)prev_trans;
6615 
6616  return o;
6617  }
6618 
6619  case 0x21: // Comment Extension.
6620  {
6621  int len;
6622  if (stbi__get8(s) == 0xF9) { // Graphic Control Extension.
6623  len = stbi__get8(s);
6624  if (len == 4) {
6625  g->eflags = stbi__get8(s);
6626  g->delay = stbi__get16le(s);
6627  g->transparent = stbi__get8(s);
6628  }
6629  else {
6630  stbi__skip(s, len);
6631  break;
6632  }
6633  }
6634  while ((len = stbi__get8(s)) != 0)
6635  stbi__skip(s, len);
6636  break;
6637  }
6638 
6639  case 0x3B: // gif stream termination code
6640  return (stbi_uc *)s; // using '1' causes warning on some compilers
6641 
6642  default: return stbi__errpuc("unknown code", "Corrupt GIF");
6643  }
6644  }
6645 
6646  STBI_NOTUSED(req_comp);
6647 }
6648 
6649 static stbi_uc *stbi__gif_load(stbi__context *s, int *x, int *y, int *comp, int req_comp) {
6650  stbi_uc *u = 0;
6651  stbi__gif *g = (stbi__gif *)stbi__malloc(sizeof(stbi__gif));
6652  memset(g, 0, sizeof(*g));
6653 
6654  u = stbi__gif_load_next(s, g, comp, req_comp);
6655  if (u == (stbi_uc *)s)
6656  u = 0; // end of animated gif marker
6657  if (u) {
6658  *x = g->w;
6659  *y = g->h;
6660  if (req_comp && req_comp != 4)
6661  u = stbi__convert_format(u, 4, req_comp, g->w, g->h);
6662  }
6663  else if (g->out)
6664  STBI_FREE(g->out);
6665  STBI_FREE(g);
6666  return u;
6667 }
6668 
6669 static int stbi__gif_info(stbi__context *s, int *x, int *y, int *comp) {
6670  return stbi__gif_info_raw(s, x, y, comp);
6671 }
6672 #endif
6673 
6674 // *************************************************************************************************
6675 // Radiance RGBE HDR loader
6676 // originally by Nicolas Schulz
6677 #ifndef STBI_NO_HDR
6678 static int stbi__hdr_test_core(stbi__context *s) {
6679  const char *signature = "#?RADIANCE\n";
6680  int i;
6681  for (i = 0; signature[i]; ++i)
6682  if (stbi__get8(s) != signature[i])
6683  return 0;
6684  return 1;
6685 }
6686 
6687 static int stbi__hdr_test(stbi__context *s) {
6688  int r = stbi__hdr_test_core(s);
6689  stbi__rewind(s);
6690  return r;
6691 }
6692 
6693 #define STBI__HDR_BUFLEN 1024
6694 static char *stbi__hdr_gettoken(stbi__context *z, char *buffer) {
6695  int len = 0;
6696  char c = '\0';
6697 
6698  c = (char)stbi__get8(z);
6699 
6700  while (!stbi__at_eof(z) && c != '\n') {
6701  buffer[len++] = c;
6702  if (len == STBI__HDR_BUFLEN - 1) {
6703  // flush to end of line
6704  while (!stbi__at_eof(z) && stbi__get8(z) != '\n')
6705  ;
6706  break;
6707  }
6708  c = (char)stbi__get8(z);
6709  }
6710 
6711  buffer[len] = 0;
6712  return buffer;
6713 }
6714 
6715 static void stbi__hdr_convert(float *output, stbi_uc *input, int req_comp) {
6716  if (input[3] != 0) {
6717  float f1;
6718  // Exponent
6719  f1 = (float)ldexp(1.0f, input[3] - (int)(128 + 8));
6720  if (req_comp <= 2)
6721  output[0] = (input[0] + input[1] + input[2]) * f1 / 3;
6722  else {
6723  output[0] = input[0] * f1;
6724  output[1] = input[1] * f1;
6725  output[2] = input[2] * f1;
6726  }
6727  if (req_comp == 2)
6728  output[1] = 1;
6729  if (req_comp == 4)
6730  output[3] = 1;
6731  }
6732  else {
6733  switch (req_comp) {
6734  case 4:
6735  output[3] = 1; /* fallthrough */
6736  case 3: output[0] = output[1] = output[2] = 0; break;
6737  case 2:
6738  output[1] = 1; /* fallthrough */
6739  case 1: output[0] = 0; break;
6740  }
6741  }
6742 }
6743 
6744 static float *stbi__hdr_load(stbi__context *s, int *x, int *y, int *comp, int req_comp) {
6745  char buffer[STBI__HDR_BUFLEN];
6746  char *token;
6747  int valid = 0;
6748  int width, height;
6749  stbi_uc *scanline;
6750  float *hdr_data;
6751  int len;
6752  unsigned char count, value;
6753  int i, j, k, c1, c2, z;
6754 
6755  // Check identifier
6756  if (strcmp(stbi__hdr_gettoken(s, buffer), "#?RADIANCE") != 0)
6757  return stbi__errpf("not HDR", "Corrupt HDR image");
6758 
6759  // Parse header
6760  for (;;) {
6761  token = stbi__hdr_gettoken(s, buffer);
6762  if (token[0] == 0)
6763  break;
6764  if (strcmp(token, "FORMAT=32-bit_rle_rgbe") == 0)
6765  valid = 1;
6766  }
6767 
6768  if (!valid)
6769  return stbi__errpf("unsupported format", "Unsupported HDR format");
6770 
6771  // Parse width and height
6772  // can't use sscanf() if we're not using stdio!
6773  token = stbi__hdr_gettoken(s, buffer);
6774  if (strncmp(token, "-Y ", 3))
6775  return stbi__errpf("unsupported data layout", "Unsupported HDR format");
6776  token += 3;
6777  height = (int)strtol(token, &token, 10);
6778  while (*token == ' ')
6779  ++token;
6780  if (strncmp(token, "+X ", 3))
6781  return stbi__errpf("unsupported data layout", "Unsupported HDR format");
6782  token += 3;
6783  width = (int)strtol(token, NULL, 10);
6784 
6785  *x = width;
6786  *y = height;
6787 
6788  if (comp)
6789  *comp = 3;
6790  if (req_comp == 0)
6791  req_comp = 3;
6792 
6793  // Read data
6794  hdr_data = (float *)stbi__malloc(height * width * req_comp * sizeof(float));
6795 
6796  // Load image data
6797  // image data is stored as some number of sca
6798  if (width < 8 || width >= 32768) {
6799  // Read flat data
6800  for (j = 0; j < height; ++j) {
6801  for (i = 0; i < width; ++i) {
6802  stbi_uc rgbe[4];
6803  main_decode_loop:
6804  stbi__getn(s, rgbe, 4);
6805  stbi__hdr_convert(hdr_data + j * width * req_comp + i * req_comp, rgbe, req_comp);
6806  }
6807  }
6808  }
6809  else {
6810  // Read RLE-encoded data
6811  scanline = NULL;
6812 
6813  for (j = 0; j < height; ++j) {
6814  c1 = stbi__get8(s);
6815  c2 = stbi__get8(s);
6816  len = stbi__get8(s);
6817  if (c1 != 2 || c2 != 2 || (len & 0x80)) {
6818  // not run-length encoded, so we have to actually use THIS data as a decoded
6819  // pixel (note this can't be a valid pixel--one of RGB must be >= 128)
6820  stbi_uc rgbe[4];
6821  rgbe[0] = (stbi_uc)c1;
6822  rgbe[1] = (stbi_uc)c2;
6823  rgbe[2] = (stbi_uc)len;
6824  rgbe[3] = (stbi_uc)stbi__get8(s);
6825  stbi__hdr_convert(hdr_data, rgbe, req_comp);
6826  i = 1;
6827  j = 0;
6828  STBI_FREE(scanline);
6829  goto main_decode_loop; // yes, this makes no sense
6830  }
6831  len <<= 8;
6832  len |= stbi__get8(s);
6833  if (len != width) {
6834  STBI_FREE(hdr_data);
6835  STBI_FREE(scanline);
6836  return stbi__errpf("invalid decoded scanline length", "corrupt HDR");
6837  }
6838  if (scanline == NULL)
6839  scanline = (stbi_uc *)stbi__malloc(width * 4);
6840 
6841  for (k = 0; k < 4; ++k) {
6842  i = 0;
6843  while (i < width) {
6844  count = stbi__get8(s);
6845  if (count > 128) {
6846  // Run
6847  value = stbi__get8(s);
6848  count -= 128;
6849  for (z = 0; z < count; ++z)
6850  scanline[i++ * 4 + k] = value;
6851  }
6852  else {
6853  // Dump
6854  for (z = 0; z < count; ++z)
6855  scanline[i++ * 4 + k] = stbi__get8(s);
6856  }
6857  }
6858  }
6859  for (i = 0; i < width; ++i)
6860  stbi__hdr_convert(hdr_data + (j * width + i) * req_comp, scanline + i * 4, req_comp);
6861  }
6862  STBI_FREE(scanline);
6863  }
6864 
6865  return hdr_data;
6866 }
6867 
6868 static int stbi__hdr_info(stbi__context *s, int *x, int *y, int *comp) {
6869  char buffer[STBI__HDR_BUFLEN];
6870  char *token;
6871  int valid = 0;
6872 
6873  if (stbi__hdr_test(s) == 0) {
6874  stbi__rewind(s);
6875  return 0;
6876  }
6877 
6878  for (;;) {
6879  token = stbi__hdr_gettoken(s, buffer);
6880  if (token[0] == 0)
6881  break;
6882  if (strcmp(token, "FORMAT=32-bit_rle_rgbe") == 0)
6883  valid = 1;
6884  }
6885 
6886  if (!valid) {
6887  stbi__rewind(s);
6888  return 0;
6889  }
6890  token = stbi__hdr_gettoken(s, buffer);
6891  if (strncmp(token, "-Y ", 3)) {
6892  stbi__rewind(s);
6893  return 0;
6894  }
6895  token += 3;
6896  *y = (int)strtol(token, &token, 10);
6897  while (*token == ' ')
6898  ++token;
6899  if (strncmp(token, "+X ", 3)) {
6900  stbi__rewind(s);
6901  return 0;
6902  }
6903  token += 3;
6904  *x = (int)strtol(token, NULL, 10);
6905  *comp = 3;
6906  return 1;
6907 }
6908 #endif // STBI_NO_HDR
6909 
6910 #ifndef STBI_NO_BMP
6911 static int stbi__bmp_info(stbi__context *s, int *x, int *y, int *comp) {
6912  void *p;
6913  stbi__bmp_data info;
6914 
6915  info.all_a = 255;
6916  p = stbi__bmp_parse_header(s, &info);
6917  stbi__rewind(s);
6918  if (p == NULL)
6919  return 0;
6920  *x = s->img_x;
6921  *y = s->img_y;
6922  *comp = info.ma ? 4 : 3;
6923  return 1;
6924 }
6925 #endif
6926 
6927 #ifndef STBI_NO_PSD
6928 static int stbi__psd_info(stbi__context *s, int *x, int *y, int *comp) {
6929  int channelCount;
6930  if (stbi__get32be(s) != 0x38425053) {
6931  stbi__rewind(s);
6932  return 0;
6933  }
6934  if (stbi__get16be(s) != 1) {
6935  stbi__rewind(s);
6936  return 0;
6937  }
6938  stbi__skip(s, 6);
6939  channelCount = stbi__get16be(s);
6940  if (channelCount < 0 || channelCount > 16) {
6941  stbi__rewind(s);
6942  return 0;
6943  }
6944  *y = stbi__get32be(s);
6945  *x = stbi__get32be(s);
6946  if (stbi__get16be(s) != 8) {
6947  stbi__rewind(s);
6948  return 0;
6949  }
6950  if (stbi__get16be(s) != 3) {
6951  stbi__rewind(s);
6952  return 0;
6953  }
6954  *comp = 4;
6955  return 1;
6956 }
6957 #endif
6958 
6959 #ifndef STBI_NO_PIC
6960 static int stbi__pic_info(stbi__context *s, int *x, int *y, int *comp) {
6961  int act_comp = 0, num_packets = 0, chained;
6962  stbi__pic_packet packets[10];
6963 
6964  if (!stbi__pic_is4(s, "\x53\x80\xF6\x34")) {
6965  stbi__rewind(s);
6966  return 0;
6967  }
6968 
6969  stbi__skip(s, 88);
6970 
6971  *x = stbi__get16be(s);
6972  *y = stbi__get16be(s);
6973  if (stbi__at_eof(s)) {
6974  stbi__rewind(s);
6975  return 0;
6976  }
6977  if ((*x) != 0 && (1 << 28) / (*x) < (*y)) {
6978  stbi__rewind(s);
6979  return 0;
6980  }
6981 
6982  stbi__skip(s, 8);
6983 
6984  do {
6985  stbi__pic_packet *packet;
6986 
6987  if (num_packets == sizeof(packets) / sizeof(packets[0]))
6988  return 0;
6989 
6990  packet = &packets[num_packets++];
6991  chained = stbi__get8(s);
6992  packet->size = stbi__get8(s);
6993  packet->type = stbi__get8(s);
6994  packet->channel = stbi__get8(s);
6995  act_comp |= packet->channel;
6996 
6997  if (stbi__at_eof(s)) {
6998  stbi__rewind(s);
6999  return 0;
7000  }
7001  if (packet->size != 8) {
7002  stbi__rewind(s);
7003  return 0;
7004  }
7005  } while (chained);
7006 
7007  *comp = (act_comp & 0x10 ? 4 : 3);
7008 
7009  return 1;
7010 }
7011 #endif
7012 
7013 // *************************************************************************************************
7014 // Portable Gray Map and Portable Pixel Map loader
7015 // by Ken Miller
7016 //
7017 // PGM: http://netpbm.sourceforge.net/doc/pgm.html
7018 // PPM: http://netpbm.sourceforge.net/doc/ppm.html
7019 //
7020 // Known limitations:
7021 // Does not support comments in the header section
7022 // Does not support ASCII image data (formats P2 and P3)
7023 // Does not support 16-bit-per-channel
7024 
7025 #ifndef STBI_NO_PNM
7026 
7027 static int stbi__pnm_test(stbi__context *s) {
7028  char p, t;
7029  p = (char)stbi__get8(s);
7030  t = (char)stbi__get8(s);
7031  if (p != 'P' || (t != '5' && t != '6')) {
7032  stbi__rewind(s);
7033  return 0;
7034  }
7035  return 1;
7036 }
7037 
7038 static stbi_uc *stbi__pnm_load(stbi__context *s, int *x, int *y, int *comp, int req_comp) {
7039  stbi_uc *out;
7040  if (!stbi__pnm_info(s, (int *)&s->img_x, (int *)&s->img_y, (int *)&s->img_n))
7041  return 0;
7042  *x = s->img_x;
7043  *y = s->img_y;
7044  *comp = s->img_n;
7045 
7046  out = (stbi_uc *)stbi__malloc(s->img_n * s->img_x * s->img_y);
7047  if (!out)
7048  return stbi__errpuc("outofmem", "Out of memory");
7049  stbi__getn(s, out, s->img_n * s->img_x * s->img_y);
7050 
7051  if (req_comp && req_comp != s->img_n) {
7052  out = stbi__convert_format(out, s->img_n, req_comp, s->img_x, s->img_y);
7053  if (out == NULL)
7054  return out; // stbi__convert_format frees input on failure
7055  }
7056  return out;
7057 }
7058 
7059 static int stbi__pnm_isspace(char c) {
7060  return c == ' ' || c == '\t' || c == '\n' || c == '\v' || c == '\f' || c == '\r';
7061 }
7062 
7063 static void stbi__pnm_skip_whitespace(stbi__context *s, char *c) {
7064  for (;;) {
7065  while (!stbi__at_eof(s) && stbi__pnm_isspace(*c))
7066  *c = (char)stbi__get8(s);
7067 
7068  if (stbi__at_eof(s) || *c != '#')
7069  break;
7070 
7071  while (!stbi__at_eof(s) && *c != '\n' && *c != '\r')
7072  *c = (char)stbi__get8(s);
7073  }
7074 }
7075 
7076 static int stbi__pnm_isdigit(char c) { return c >= '0' && c <= '9'; }
7077 
7078 static int stbi__pnm_getinteger(stbi__context *s, char *c) {
7079  int value = 0;
7080 
7081  while (!stbi__at_eof(s) && stbi__pnm_isdigit(*c)) {
7082  value = value * 10 + (*c - '0');
7083  *c = (char)stbi__get8(s);
7084  }
7085 
7086  return value;
7087 }
7088 
7089 static int stbi__pnm_info(stbi__context *s, int *x, int *y, int *comp) {
7090  int maxv;
7091  char c, p, t;
7092 
7093  stbi__rewind(s);
7094 
7095  // Get identifier
7096  p = (char)stbi__get8(s);
7097  t = (char)stbi__get8(s);
7098  if (p != 'P' || (t != '5' && t != '6')) {
7099  stbi__rewind(s);
7100  return 0;
7101  }
7102 
7103  *comp = (t == '6') ? 3 : 1; // '5' is 1-component .pgm; '6' is 3-component .ppm
7104 
7105  c = (char)stbi__get8(s);
7106  stbi__pnm_skip_whitespace(s, &c);
7107 
7108  *x = stbi__pnm_getinteger(s, &c); // read width
7109  stbi__pnm_skip_whitespace(s, &c);
7110 
7111  *y = stbi__pnm_getinteger(s, &c); // read height
7112  stbi__pnm_skip_whitespace(s, &c);
7113 
7114  maxv = stbi__pnm_getinteger(s, &c); // read max value
7115 
7116  if (maxv > 255)
7117  return stbi__err("max value > 255", "PPM image not 8-bit");
7118  else
7119  return 1;
7120 }
7121 #endif
7122 
7123 static int stbi__info_main(stbi__context *s, int *x, int *y, int *comp) {
7124 #ifndef STBI_NO_JPEG
7125  if (stbi__jpeg_info(s, x, y, comp))
7126  return 1;
7127 #endif
7128 
7129 #ifndef STBI_NO_PNG
7130  if (stbi__png_info(s, x, y, comp))
7131  return 1;
7132 #endif
7133 
7134 #ifndef STBI_NO_GIF
7135  if (stbi__gif_info(s, x, y, comp))
7136  return 1;
7137 #endif
7138 
7139 #ifndef STBI_NO_BMP
7140  if (stbi__bmp_info(s, x, y, comp))
7141  return 1;
7142 #endif
7143 
7144 #ifndef STBI_NO_PSD
7145  if (stbi__psd_info(s, x, y, comp))
7146  return 1;
7147 #endif
7148 
7149 #ifndef STBI_NO_PIC
7150  if (stbi__pic_info(s, x, y, comp))
7151  return 1;
7152 #endif
7153 
7154 #ifndef STBI_NO_PNM
7155  if (stbi__pnm_info(s, x, y, comp))
7156  return 1;
7157 #endif
7158 
7159 #ifndef STBI_NO_HDR
7160  if (stbi__hdr_info(s, x, y, comp))
7161  return 1;
7162 #endif
7163 
7164 // test tga last because it's a crappy test!
7165 #ifndef STBI_NO_TGA
7166  if (stbi__tga_info(s, x, y, comp))
7167  return 1;
7168 #endif
7169  return stbi__err("unknown image type", "Image not of any known type, or corrupt");
7170 }
7171 
7172 #ifndef STBI_NO_STDIO
7173 STBIDEF int stbi_info(char const *filename, int *x, int *y, int *comp) {
7174  FILE *f = stbi__fopen(filename, "rb");
7175  int result;
7176  if (!f)
7177  return stbi__err("can't fopen", "Unable to open file");
7178  result = stbi_info_from_file(f, x, y, comp);
7179  fclose(f);
7180  return result;
7181 }
7182 
7183 STBIDEF int stbi_info_from_file(FILE *f, int *x, int *y, int *comp) {
7184  int r;
7185  stbi__context s;
7186  long pos = ftell(f);
7187  stbi__start_file(&s, f);
7188  r = stbi__info_main(&s, x, y, comp);
7189  fseek(f, pos, SEEK_SET);
7190  return r;
7191 }
7192 #endif // !STBI_NO_STDIO
7193 
7194 STBIDEF int stbi_info_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *comp) {
7195  stbi__context s;
7196  stbi__start_mem(&s, buffer, len);
7197  return stbi__info_main(&s, x, y, comp);
7198 }
7199 
7200 STBIDEF int
7201 stbi_info_from_callbacks(stbi_io_callbacks const *c, void *user, int *x, int *y, int *comp) {
7202  stbi__context s;
7203  stbi__start_callbacks(&s, (stbi_io_callbacks *)c, user);
7204  return stbi__info_main(&s, x, y, comp);
7205 }
7206 
7207 #endif // STB_IMAGE_IMPLEMENTATION
7208 
7209 /*
7210  revision history:
7211  2.12 (2016-04-02) fix typo in 2.11 PSD fix that caused crashes
7212  2.11 (2016-04-02) allocate large structures on the stack
7213  remove white matting for transparent PSD
7214  fix reported channel count for PNG & BMP
7215  re-enable SSE2 in non-gcc 64-bit
7216  support RGB-formatted JPEG
7217  read 16-bit PNGs (only as 8-bit)
7218  2.10 (2016-01-22) avoid warning introduced in 2.09 by STBI_REALLOC_SIZED
7219  2.09 (2016-01-16) allow comments in PNM files
7220  16-bit-per-pixel TGA (not bit-per-component)
7221  info() for TGA could break due to .hdr handling
7222  info() for BMP to shares code instead of sloppy parse
7223  can use STBI_REALLOC_SIZED if allocator doesn't support realloc
7224  code cleanup
7225  2.08 (2015-09-13) fix to 2.07 cleanup, reading RGB PSD as RGBA
7226  2.07 (2015-09-13) fix compiler warnings
7227  partial animated GIF support
7228  limited 16-bpc PSD support
7229  #ifdef unused functions
7230  bug with < 92 byte PIC,PNM,HDR,TGA
7231  2.06 (2015-04-19) fix bug where PSD returns wrong '*comp' value
7232  2.05 (2015-04-19) fix bug in progressive JPEG handling, fix warning
7233  2.04 (2015-04-15) try to re-enable SIMD on MinGW 64-bit
7234  2.03 (2015-04-12) extra corruption checking (mmozeiko)
7235  stbi_set_flip_vertically_on_load (nguillemot)
7236  fix NEON support; fix mingw support
7237  2.02 (2015-01-19) fix incorrect assert, fix warning
7238  2.01 (2015-01-17) fix various warnings; suppress SIMD on gcc 32-bit without -msse2
7239  2.00b (2014-12-25) fix STBI_MALLOC in progressive JPEG
7240  2.00 (2014-12-25) optimize JPG, including x86 SSE2 & NEON SIMD (ryg)
7241  progressive JPEG (stb)
7242  PGM/PPM support (Ken Miller)
7243  STBI_MALLOC,STBI_REALLOC,STBI_FREE
7244  GIF bugfix -- seemingly never worked
7245  STBI_NO_*, STBI_ONLY_*
7246  1.48 (2014-12-14) fix incorrectly-named assert()
7247  1.47 (2014-12-14) 1/2/4-bit PNG support, both direct and paletted (Omar Cornut & stb)
7248  optimize PNG (ryg)
7249  fix bug in interlaced PNG with user-specified channel count (stb)
7250  1.46 (2014-08-26)
7251  fix broken tRNS chunk (colorkey-style transparency) in non-paletted PNG
7252  1.45 (2014-08-16)
7253  fix MSVC-ARM internal compiler error by wrapping malloc
7254  1.44 (2014-08-07)
7255  various warning fixes from Ronny Chevalier
7256  1.43 (2014-07-15)
7257  fix MSVC-only compiler problem in code changed in 1.42
7258  1.42 (2014-07-09)
7259  don't define _CRT_SECURE_NO_WARNINGS (affects user code)
7260  fixes to stbi__cleanup_jpeg path
7261  added STBI_ASSERT to avoid requiring assert.h
7262  1.41 (2014-06-25)
7263  fix search&replace from 1.36 that messed up comments/error messages
7264  1.40 (2014-06-22)
7265  fix gcc struct-initialization warning
7266  1.39 (2014-06-15)
7267  fix to TGA optimization when req_comp != number of components in TGA;
7268  fix to GIF loading because BMP wasn't rewinding (whoops, no GIFs in my test suite)
7269  add support for BMP version 5 (more ignored fields)
7270  1.38 (2014-06-06)
7271  suppress MSVC warnings on integer casts truncating values
7272  fix accidental rename of 'skip' field of I/O
7273  1.37 (2014-06-04)
7274  remove duplicate typedef
7275  1.36 (2014-06-03)
7276  convert to header file single-file library
7277  if de-iphone isn't set, load iphone images color-swapped instead of returning NULL
7278  1.35 (2014-05-27)
7279  various warnings
7280  fix broken STBI_SIMD path
7281  fix bug where stbi_load_from_file no longer left file pointer in correct place
7282  fix broken non-easy path for 32-bit BMP (possibly never used)
7283  TGA optimization by Arseny Kapoulkine
7284  1.34 (unknown)
7285  use STBI_NOTUSED in stbi__resample_row_generic(), fix one more leak in tga failure
7286  case
7287  1.33 (2011-07-14)
7288  make stbi_is_hdr work in STBI_NO_HDR (as specified), minor compiler-friendly
7289  improvements
7290  1.32 (2011-07-13)
7291  support for "info" function for all supported filetypes (SpartanJ)
7292  1.31 (2011-06-20)
7293  a few more leak fixes, bug in PNG handling (SpartanJ)
7294  1.30 (2011-06-11)
7295  added ability to load files via callbacks to accomidate custom input streams (Ben
7296  Wenger)
7297  removed deprecated format-specific test/load functions
7298  removed support for installable file formats (stbi_loader) -- would have been broken
7299  for IO callbacks anyway
7300  error cases in bmp and tga give messages and don't leak (Raymond Barbiero, grisha)
7301  fix inefficiency in decoding 32-bit BMP (David Woo)
7302  1.29 (2010-08-16)
7303  various warning fixes from Aurelien Pocheville
7304  1.28 (2010-08-01)
7305  fix bug in GIF palette transparency (SpartanJ)
7306  1.27 (2010-08-01)
7307  cast-to-stbi_uc to fix warnings
7308  1.26 (2010-07-24)
7309  fix bug in file buffering for PNG reported by SpartanJ
7310  1.25 (2010-07-17)
7311  refix trans_data warning (Won Chun)
7312  1.24 (2010-07-12)
7313  perf improvements reading from files on platforms with lock-heavy fgetc()
7314  minor perf improvements for jpeg
7315  deprecated type-specific functions so we'll get feedback if they're needed
7316  attempt to fix trans_data warning (Won Chun)
7317  1.23 fixed bug in iPhone support
7318  1.22 (2010-07-10)
7319  removed image *writing* support
7320  stbi_info support from Jetro Lauha
7321  GIF support from Jean-Marc Lienher
7322  iPhone PNG-extensions from James Brown
7323  warning-fixes from Nicolas Schulz and Janez Zemva (i.stbi__err. Janez (U+017D)emva)
7324  1.21 fix use of 'stbi_uc' in header (reported by jon blow)
7325  1.20 added support for Softimage PIC, by Tom Seddon
7326  1.19 bug in interlaced PNG corruption check (found by ryg)
7327  1.18 (2008-08-02)
7328  fix a threading bug (local mutable static)
7329  1.17 support interlaced PNG
7330  1.16 major bugfix - stbi__convert_format converted one too many pixels
7331  1.15 initialize some fields for thread safety
7332  1.14 fix threadsafe conversion bug
7333  header-file-only version (#define STBI_HEADER_FILE_ONLY before including)
7334  1.13 threadsafe
7335  1.12 const qualifiers in the API
7336  1.11 Support installable IDCT, colorspace conversion routines
7337  1.10 Fixes for 64-bit (don't use "unsigned long")
7338  optimized upsampling by Fabian "ryg" Giesen
7339  1.09 Fix format-conversion for PSD code (bad global variables!)
7340  1.08 Thatcher Ulrich's PSD code integrated by Nicolas Schulz
7341  1.07 attempt to fix C++ warning/errors again
7342  1.06 attempt to fix C++ warning/errors again
7343  1.05 fix TGA loading to return correct *comp and use good luminance calc
7344  1.04 default float alpha is 1, not 255; use 'void *' for stbi_image_free
7345  1.03 bugfixes to STBI_NO_STDIO, STBI_NO_HDR
7346  1.02 support for (subset of) HDR files, float interface for preferred access to them
7347  1.01 fix bug: possible bug in handling right-side up bmps... not sure
7348  fix bug: the stbi__bmp_load() and stbi__tga_load() functions didn't work at all
7349  1.00 interface to zlib that skips zlib header
7350  0.99 correct handling of alpha in palette
7351  0.98 TGA loader by lonesock; dynamically add loaders (untested)
7352  0.97 jpeg errors on too large a file; also catch another malloc failure
7353  0.96 fix detection of invalid v value - particleman@mollyrocket forum
7354  0.95 during header scan, seek to markers in case of padding
7355  0.94 STBI_NO_STDIO to disable stdio usage; rename all #defines the same
7356  0.93 handle jpegtran output; verbose errors
7357  0.92 read 4,8,16,24,32-bit BMP files of several formats
7358  0.91 output 24-bit Windows 3.0 BMP files
7359  0.90 fix a few more warnings; bump version number to approach 1.0
7360  0.61 bugfixes due to Marc LeBlanc, Christopher Lloyd
7361  0.60 fix compiling as c++
7362  0.59 fix warnings: merge Dave Moore's -Wall fixes
7363  0.58 fix bug: zlib uncompressed mode len/nlen was wrong endian
7364  0.57 fix bug: jpg last huffman symbol before marker was >9 bits but less than 16 available
7365  0.56 fix bug: zlib uncompressed mode len vs. nlen
7366  0.55 fix bug: restart_interval not initialized to 0
7367  0.54 allow NULL for 'int *comp'
7368  0.53 fix bug in png 3->4; speedup png decoding
7369  0.52 png handles req_comp=3,4 directly; minor cleanup; jpeg comments
7370  0.51 obey req_comp requests, 1-component jpegs return as 1-component,
7371  on 'test' only check type, not whether we support this variant
7372  0.50 (2006-11-19)
7373  first released version
7374 */