Skip to content

Instantly share code, notes, and snippets.

@CMCDragonkai
Last active September 23, 2025 21:29
Show Gist options
  • Save CMCDragonkai/6bfade6431e9ffb7fe88 to your computer and use it in GitHub Desktop.
Save CMCDragonkai/6bfade6431e9ffb7fe88 to your computer and use it in GitHub Desktop.

Revisions

  1. CMCDragonkai revised this gist Jun 17, 2015. 1 changed file with 4 additions and 1 deletion.
    5 changes: 4 additions & 1 deletion http_streaming.md
    Original file line number Diff line number Diff line change
    @@ -232,7 +232,10 @@ memory limit.

    Be aware of the real chunk size after compression. If your upstream is compressing the content,
    the resulting chunk size will be different. In most cases, NGINX should be doing the compression
    and it does support compressing for chunk that arrives from upstream. You just need `gzip on`.
    and it does support compressing for chunk that arrives from upstream. You just need `gzip on`.
    This means your application layer should not be compressing or chunking the content, it should
    just flush raw data. NGINX is smart enough to understand and will automatically compress each
    received upstream data, and then format it into chunks, which is then flushed to downstream.

    There's an advantage in keeping buffers available or having a larger buffer size than the
    chunk size. It comes from dealing with slow clients. NGINX as a reverse proxy is very fast
  2. CMCDragonkai revised this gist Jun 17, 2015. 1 changed file with 10 additions and 0 deletions.
    10 changes: 10 additions & 0 deletions http_streaming.md
    Original file line number Diff line number Diff line change
    @@ -186,17 +186,23 @@ For example, let's invesigate the typical PHP stack such as:
    Browser <--> Proxy <--> NGINX <--> PHP <--> MySQL
    ```

    ### The Client ###

    Firstly browsers have a [rendering buffer limit](http://stackoverflow.com/a/16909228/582917).
    You must send as much data as the limit before the browsers will render the content.
    Having chunks smaller than the buffer will just make the browser hold the data until
    either the buffer is full or when the connection is closed (or after some time limit).

    ### The Proxies ###

    At the proxy level, this could be your ISP or some custom proxy. If the proxy buffers data
    this means, your streamed data from upstream will be stored up the proxy buffer before
    sending to the browser. Some mobile wireless ISP will buffer things and you won't be able
    to control this behaviour, this is a violation of the [end to end principle](https://en.wikipedia.org/wiki/End-to-end_principle),
    so there's nothing here you can do technically.

    ### The Web Server ###

    At the NGINX level, buffering is dependent upon the type of the upstream connection. There
    are 3 common connection types for HTTP: "proxy", "uwsgi", "fastcgi". If you want your NGINX
    server to respect streaming, you can either switch off buffering for your connection type, or
    @@ -257,6 +263,8 @@ Here are 2 quotes about the `*_busy_buffer_size`:
    >
    > - https://www.digitalocean.com/community/tutorials/understanding-nginx-http-proxying-load-balancing-buffering-and-caching
    ### The Application Server ###

    At the PHP level, global buffers can be set inside the `php.ini` configuration file. There are
    3 options defined `output_buffering`, `output_handler` and `implicit_flush`. They
    are explained in the [output control section of the PHP documentation](http://php.net/manual/en/outcontrol.configuration.php).
    @@ -278,6 +286,8 @@ Both the global SAPI buffer and the custom application buffer have settings that
    flushing. This can depend on hitting the buffer limit, or on some function call. Check the
    documentation for more.

    ### The Upstream Data Source ###

    Finally we reach the MySQL level. This can be replaced with any upstream data source that you
    are calling in order to prepare a response. By default all SQL queries are buffered. There are
    2 options to achieve unbuffered queries (writes and reads). The first is the [unbuffered query
  3. CMCDragonkai revised this gist Jun 17, 2015. 1 changed file with 148 additions and 4 deletions.
    152 changes: 148 additions & 4 deletions http_streaming.md
    Original file line number Diff line number Diff line change
    @@ -27,11 +27,12 @@ we consider the "stream" to be.
    Firstly we have to consider the HTTP headers that supports streaming. Open this
    https://en.wikipedia.org/wiki/List_of_HTTP_header_fields up for reference:

    ## Content-Length ###
    ## Content-Length ##

    The `Content-Length` header determines the byte length of the request/response
    body. If you neglect to specify the `Content-Length` header, HTTP servers will
    implicitly add a `Transfer-Encoding: chunked` header. The receiver will have no
    implicitly add a `Transfer-Encoding: chunked` header. The `Content-Length` and
    `Transfer-Encoding` header should not be used together. The receiver will have no
    idea what the length of the body is and cannot estimate the download completion
    time. If you do add a `Content-Length` header, make sure it matches the entire
    body in bytes, if it is incorrect, the behaviour of receivers is undefined.
    @@ -116,6 +117,10 @@ chunks. Chunking isn't always the right answer, it adds extra complexity on the
    recipient. So if you're sending small units of things that won't gain much from
    streaming, don't bother with it!

    Do note that byte serving is compatible with chunked encoding, this would be applicable
    where you know the total content length, want to allow partial or resumable downloads,
    but you want to stream each partial response to the client.

    ## Content-Encoding ##

    It is also possible to compress chunked or non-chunked data. This is practically
    @@ -153,6 +158,145 @@ to use is in fact the `Transfer-Encoding` header. If the HTTP request possessed
    However this is very rarely supported. So you should only use `Content-Encoding`
    for your compression right now.

    ### Buffering Problem ###
    ## Buffering Problem ##

    The biggest problem when implementing HTTP streaming is understanding the effect of
    buffering. Buffering is the practice of accumulating reads or writes into a temporary
    fixed memory space. The advantages of buffering include reducing read or write call
    overhead. For example instead of writing 1KB 4096 times, you can just write 4096KB at
    once. This means your program can create a write buffer holding 4096KB of temporary
    data (which can be aligned to the disk blocksize), and once the space limit is reached,
    the buffer is flushed to disk.

    Typical HTTP architectures include these components:

    ```
    Client <--> Proxy <--> HTTP Server <--> Application Server <--> Database Server
    ```

    Each one of these components can possess adjustable and varied buffering styles and
    limits.

    To correct perform streaming, you have to know and adjust the buffering limits at
    each component.

    For example, let's invesigate the typical PHP stack such as:

    ```
    Browser <--> Proxy <--> NGINX <--> PHP <--> MySQL
    ```

    ...to be continued...
    Firstly browsers have a [rendering buffer limit](http://stackoverflow.com/a/16909228/582917).
    You must send as much data as the limit before the browsers will render the content.
    Having chunks smaller than the buffer will just make the browser hold the data until
    either the buffer is full or when the connection is closed (or after some time limit).

    At the proxy level, this could be your ISP or some custom proxy. If the proxy buffers data
    this means, your streamed data from upstream will be stored up the proxy buffer before
    sending to the browser. Some mobile wireless ISP will buffer things and you won't be able
    to control this behaviour, this is a violation of the [end to end principle](https://en.wikipedia.org/wiki/End-to-end_principle),
    so there's nothing here you can do technically.

    At the NGINX level, buffering is dependent upon the type of the upstream connection. There
    are 3 common connection types for HTTP: "proxy", "uwsgi", "fastcgi". If you want your NGINX
    server to respect streaming, you can either switch off buffering for your connection type, or
    match the buffer size with the upstream chunk size. Switching off buffering can be done
    using a buffering directive (`proxy_buffering`, `uwsgi_buffering`, `fastcgi_buffering`), or
    you can use a special header `X-Accel-Buffering: no` which tells NGINX to not buffer the
    response. The special header is more flexible, as this allows NGINX to buffer responses that
    don't need streaming. It also works for all 3 connection types.

    If you instead try to match the buffer size with the chunk size, you have to make sure that
    the number of buffers multiplied by the buffer size (equal to a system memory page) is equal
    to a single chunk size. If it is greater than a single chunk from upstream, then this means
    your chunks will be accumulated before they are sent downstream. If it is less than the
    chunk size, this would result in NGINX buffering to disk, you want to avoid this as this
    results in extra overhead when streaming. For more information on [buffer size see this gist](https://gist.github.com/magnetikonline/11312172).

    Just a note on buffering optimisation: the larger the total buffer size, the greater
    likelihood of each connection using more memory. This is because if each buffer is large,
    there's a chance that you may not be efficiently using the buffer which can cause
    [memory fragmentation](https://en.wikipedia.org/wiki/Fragmentation_%28computing%29). In
    the end, each buffer size should match the system memory page size. The number of buffers
    is what can be dynamically allocated. If your total buffer size across all connections
    exceeds your OS's memory limit, you're either going to meet an OOM error or starting paging
    to disk. To maintain your NGINX's availability, you have to consider the theoretical
    number of connections that a single NGINX server can handle, before it exhausts your server's
    memory limit.

    Be aware of the real chunk size after compression. If your upstream is compressing the content,
    the resulting chunk size will be different. In most cases, NGINX should be doing the compression
    and it does support compressing for chunk that arrives from upstream. You just need `gzip on`.

    There's an advantage in keeping buffers available or having a larger buffer size than the
    chunk size. It comes from dealing with slow clients. NGINX as a reverse proxy is very fast
    and can read the response from your upstream application server very quickly. NGINX itself
    can deal with any slow browsers that has a slower read rate than your upstream's write rate.
    Because NGINX is very light weight (asynchronous IO), the cost of holding a connection in
    NGINX is far smaller than holding open a process (that is waiting for the client to finish
    reading) in your application server. This is of course relative, as your application server
    might also be very light weight, and rely on either green threads or asynchronous IO. This
    problem does reveal an interesting property of streaming systems. Any stream will only be as
    quick as the slowest link (reader or writer) in the chain. This problem with streaming is
    related to [network back pressure issue in distributed systems](http://engineering.voxer.com/2013/09/16/backpressure-in-nodejs/).

    To take advantage of NGINX's ability of handling slow clients while still streaming data as
    fast as possible, there will need to be some tuning of both the buffer size and potentially the
    `*_busy_buffer_size` option. You cannot just increase the total buffer size, as that will
    just make NGINX wait until the buffer is full. What you need is some buffer size that is
    allocated only for slow clients. This has something to do with the `*_busy_buffer_size`, but
    this is poorly documented currently, so I do not know how make this work.

    Here are 2 quotes about the `*_busy_buffer_size`:

    > When buffering of responses from the * server is enabled, limits the total size of buffers that can be busy sending a response to the client while the response is not yet fully read. In the meantime, the rest of the buffers can be used for reading the response and, if needed, buffering part of the response to a temporary file. By default, size is limited by the size of two buffers set by the *_buffer_size and *_buffers directives.
    >
    > - NGINX documentation
    > proxy_busy_buffers_size: This directive sets the maximum size of buffers that can be marked "client-ready" and thus busy. While a client can only read the data from one buffer at a time, buffers are placed in a queue to send to the client in bunches. This directive controls the size of the buffer space allowed to be in this state.
    >
    > - https://www.digitalocean.com/community/tutorials/understanding-nginx-http-proxying-load-balancing-buffering-and-caching
    At the PHP level, global buffers can be set inside the `php.ini` configuration file. There are
    3 options defined `output_buffering`, `output_handler` and `implicit_flush`. They
    are explained in the [output control section of the PHP documentation](http://php.net/manual/en/outcontrol.configuration.php).
    It is interesting to note that for CLI applications, the output buffering is off by default.
    This is so that your CLI application can show you results as its running. This buffer is controlled
    by the server application programming interface "SAPI". You can control inside your application by
    calling `flush()`, which will flush the entire SAPI buffer.

    During runtime, custom buffers can also be created using `ob_start()`. Once you have added content
    to the buffer, you can then flush your custom buffer using `ob_flush()`. This only flushes the buffer
    that you created using `ob_start()`. Think of the `ob_start()` as a kind of PHP specific manual
    memory management. You're basically asking for some block of memory (fixed or variable), which you
    then can only use for your output statements and functions: `echo` and `print`.

    If you have entered both levels of buffers, you need call the flush functions in this order:
    `ob_flush(); flush();`.

    Both the global SAPI buffer and the custom application buffer have settings that enable automatic
    flushing. This can depend on hitting the buffer limit, or on some function call. Check the
    documentation for more.

    Finally we reach the MySQL level. This can be replaced with any upstream data source that you
    are calling in order to prepare a response. By default all SQL queries are buffered. There are
    2 options to achieve unbuffered queries (writes and reads). The first is the [unbuffered query
    option](http://us.php.net/manual/en/mysqlinfo.concepts.buffering.php). This allows one to work
    with reading large result sets, and to process each row as it arrives (including flushing to the
    client).The second option works with just one single column of data. This is useful where a single
    column contains a large binary or textual content, and you want to be able to work with a stream
    on this data specifically. This involves the usage of the [large object option](http://php.net/manual/en/pdo.lobs.php). You can also stream write a large binary or textual content into the database using large
    object option. The streaming of writing rows is just done by running multiple insert queries.

    With regards to the second method, there are some peculiarities you have to keep in mind:
    https://www.percona.com/blog/2007/07/06/php-large-result-sets-and-summary-tables/

    ## A Note About NodeJS ##

    NodeJS has great support for streaming. In fact its entire native HTTP module does streaming by
    default for both incoming requests and outgoing responses. Everytime you call `response.writeHead` or
    `response.write`, it is just writing a chunk of data. However there may be a buffer size inside
    NodeJS which is probably the `highWaterMark` setting. However I have not looked into this further.

    NodeJS has a native stream module: https://nodejs.org/api/stream.html that serves as a base object
    for all other IO modules.
  4. CMCDragonkai revised this gist Jun 16, 2015. 1 changed file with 3 additions and 1 deletion.
    4 changes: 3 additions & 1 deletion http_streaming.md
    Original file line number Diff line number Diff line change
    @@ -112,7 +112,9 @@ In considering performance, you want to make sure that you're not producing way
    too chunky data. The more "chunking" you do, the more overhead that exists in both
    producing the chunks and parsing the chunks. Furthermore, it also results in more
    executions of buffering functions if the receiver can't make immediate use of the
    chunks.
    chunks. Chunking isn't always the right answer, it adds extra complexity on the
    recipient. So if you're sending small units of things that won't gain much from
    streaming, don't bother with it!

    ## Content-Encoding ##

  5. CMCDragonkai revised this gist Jun 16, 2015. 1 changed file with 147 additions and 4 deletions.
    151 changes: 147 additions & 4 deletions http_streaming.md
    Original file line number Diff line number Diff line change
    @@ -1,13 +1,156 @@
    HTTP Streaming
    ==============
    HTTP Streaming (or Chunked vs Store & Forward)
    ==============================================

    The standard way of understanding the HTTP protocol is via the request reply
    pattern. Each HTTP transaction consists of a finitely bounded HTTP request and
    a finitely bounded HTTP response.

    However it's also possible for both parts of the HTTP transaction to stream
    However it's also possible for both parts of an HTTP 1.1 transaction to stream
    their possibly infinitely bounded data. The advantages is that the sender can
    send data that is beyond the sender's memory limit, and the receiver can act on
    the data stream in chunks immediately instead of waiting for the entire data to
    arrive. Basically you're either saving space or you're saving time.
    arrive. Basically you're either saving space or you're saving time. The
    advantages of streaming is elaborated in Wikipedia's [Online algorithm article](https://en.wikipedia.org/wiki/Online_algorithm).

    Note that HTTP streaming is only involves the HTTP protocol and not websockets.
    Streaming is also the basis for HTML5 server sent events.

    So we're going to look at HTTP streaming architecture, and how to achieve
    streaming in a few different languages.

    The first thing to understand is that HTTP streaming involves streaming within
    a single HTTP transaction. In a larger context, each HTTP transaction itself
    represents an event as part of a larger event stream. This reveals to us that
    the concepts of "streaming" is a context-specific concept, it's relative to what
    we consider the "stream" to be.

    Firstly we have to consider the HTTP headers that supports streaming. Open this
    https://en.wikipedia.org/wiki/List_of_HTTP_header_fields up for reference:

    ## Content-Length ###

    The `Content-Length` header determines the byte length of the request/response
    body. If you neglect to specify the `Content-Length` header, HTTP servers will
    implicitly add a `Transfer-Encoding: chunked` header. The receiver will have no
    idea what the length of the body is and cannot estimate the download completion
    time. If you do add a `Content-Length` header, make sure it matches the entire
    body in bytes, if it is incorrect, the behaviour of receivers is undefined.

    The `Content-Length` header will not allow streaming, but it is useful for large
    binary files, where you want to support partial content serving. This basically
    means resumable downloads, paused downloads, partial downloads, and multi-homed
    downloads. This requires the use of an additional header called `Range`. This
    technique is called [Byte serving](https://en.wikipedia.org/wiki/Byte_serving).

    ## Transfer-Encoding ##

    The use of `Transfer-Encoding: chunked` is what allows streaming within a single
    request or response. This means that the data is transmitted in a chunked manner,
    and does not impact the representation of the content.

    Officially an HTTP client is meant to send a request with a `TE` header field that
    specifies what kinds of transfer encodings the client is willing to accept. This is
    not always sent, however most servers assume that clients can process `chunked`
    encodings.

    The chunked transfer encoding makes better use of persistent TCP connections, which
    HTTP 1.1 assumes to be true by default.

    Chunked data is represented in this manner:

    ```
    4\r\n
    Wiki\r\n
    5\r\n
    pedia\r\n
    e\r\n
    in\r\n\r\nchunks.\r\n
    0\r\n
    \r\n
    ```

    Each chunk starts with its byte length expressed as a hexadecimal number followed by
    optional parameters (chunk extension) and a terminating CRLF sequence, followed by
    the chunk data. The final chunk is terminated by a CRLF sequence.

    Chunk extensions can be used to indicate a message digest or an estimated progress.
    They are just custom metadata that your layer 7 receiver needs to parse. There's no
    standardised format for it. Because of this, it's probably better to just add your
    metadata (if any) into the chunk itself for your layer 7.5 application to parse.

    For your application to send out chunked data, you must first send out the
    `Transfer-Encoding` header, and then you must flush content in chunks according to
    the chunk format. If you don't have an appropriate HTTP server that handles this, then
    you need to implement the syntax generator yourself. Sometimes you can use a library
    to provide an abstract interface.

    For example in PHP, there's the [Symfony HTTP Foundation Stream Response](http://symfony.com/doc/current/components/http_foundation/introduction.html#streaming-a-response)
    and in NodeJS, it's [native HTTP module chunks all responses](https://nodejs.org/api/http.html#http_response_write_chunk_encoding_callback).

    Chunking is a 2 way street. The HTTP protocol allows the client to chunk HTTP
    requests. This allows the client to stream the HTTP request. Which is useful for
    uploading large files. However not many servers (except NGINX) support this feature,
    and most streaming upload implementations rely on Javascript libraries to cut up a
    binary file and send it by chunks to the server. Using Javascript gives you more
    control over the uploading experience, but the HTTP protocol would be the most simplest.

    Browsers natively support chunked data. So if your server sends chunked data, they
    will start rendering data as soon as they receive it. However there's a buffer limit
    that browsers need to receive before it starts rendering them. This is different for
    each browser, but generally it's 1KB. You can see the limits for various browsers
    here: http://stackoverflow.com/a/16909228/582917

    If however you want to consume an API that supports streaming, you need to be aware of
    how your HTTP library handles chunked data. In most cases, you'll need to attach a
    callback handler that executes upon each chunk of data. This should mean that your
    API will need to frame each chunk in a useful manner. If the API is doing too many
    chunks, you may end up needing to buffer the data up into a "semantic protocol data
    unit" (PDU) before you can work on it. This of course defeats the purpose of chunking
    in the first place. For example in PHP, you can use the [Guzzle library or `curl`](http://mtdowling.com/blog/2012/01/27/chunked-encoding-in-php-with-guzzle/).

    In considering performance, you want to make sure that you're not producing way
    too chunky data. The more "chunking" you do, the more overhead that exists in both
    producing the chunks and parsing the chunks. Furthermore, it also results in more
    executions of buffering functions if the receiver can't make immediate use of the
    chunks.

    ## Content-Encoding ##

    It is also possible to compress chunked or non-chunked data. This is practically
    done via the `Content-Encoding` header.

    Note that the `Content-Length` is equal to the length of the body after the
    `Content-Encoding`. This means if you have gzipped your response, then the length
    calculation happens after compression. You will need to be able to load the entire
    body in memory if you want to calculate the length (unless you have that information
    elsewhere).

    When streaming using chunked encoding, the compression algorithm must also support
    online processing. Thankfully, gzip supports stream compression. I believe that
    the content gets compressed first, and then cut up in chunks. That way, the chunks
    are received, then decompressed to acquire the real content. If it were the other
    way around, you'll get the compressed stream, and then decompressing would give us
    chunks. Which doesn't make sense.

    A typical compressed stream response may have these headers:

    ```
    Content-Type: text/html
    Content-Encoding: gzip
    Transfer-Encoding: chunked
    ```

    Semantically the usage of `Content-Encoding` indicates an "end to end" encoding
    scheme, which means only the final client or final server is supposed to decode the
    content. Proxies in the middle are not suppose to decode the content.

    If you want to allow proxies in the middle to decode the content, the correct header
    to use is in fact the `Transfer-Encoding` header. If the HTTP request possessed a
    `TE: gzip chunked` header, then it is legal to respond with `Transfer-Encoding: gzip chunked`.

    However this is very rarely supported. So you should only use `Content-Encoding`
    for your compression right now.

    ### Buffering Problem ###

    ...to be continued...
  6. CMCDragonkai created this gist Jun 16, 2015.
    13 changes: 13 additions & 0 deletions http_streaming.md
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,13 @@
    HTTP Streaming
    ==============

    The standard way of understanding the HTTP protocol is via the request reply
    pattern. Each HTTP transaction consists of a finitely bounded HTTP request and
    a finitely bounded HTTP response.

    However it's also possible for both parts of the HTTP transaction to stream
    their possibly infinitely bounded data. The advantages is that the sender can
    send data that is beyond the sender's memory limit, and the receiver can act on
    the data stream in chunks immediately instead of waiting for the entire data to
    arrive. Basically you're either saving space or you're saving time.