Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: output_buffering = Off / 0 in php and sendfile lack support is destroying Nextcloud download speeds #39384

Closed
5 of 8 tasks
yevon opened this issue Jul 13, 2023 · 24 comments
Assignees
Labels
0. Needs triage Pending check for reproducibility or if it fits our roadmap feature: dav needs info performance 🚀 stale Ticket or PR with no recent activity technical debt

Comments

@yevon
Copy link

yevon commented Jul 13, 2023

⚠️ This issue respects the following points: ⚠️

Bug description

It seems that all docker / helm and default installations of nextcloud do set the php param output_buffering = 0. After seeing a bug with the library used for nextcloud in / out #19567, I decided to futher investigate the download issues that I always had with nextcloud. It was even more noticeable in low power devices like a raspberry pi, or in mechanical drives. And in SSD drives I was getting very inconsistent download speeds, first time file download was always slow, and following downloads were faster.

So I decided to test this out on a 10gb ethernet local network nexcloud instance, with a fast nvme drive, a slow HDD and with output_buffering disabled, and output_buffering enabled set to default 4096. Results:

  • Nvme output_buffering enabled (1032 MB/S)

nvme_buffer

  • Nvme output_buffering disabled (600 - 880 MB/S inconsistent speeds)
    nvme_no_buffer

  • HDD output_buffering_enabled (141 MB/S)

hdd_buffer_enabled

  • HDD output_buffering_disabled (29 - 80 MB/S inconsistent speeds)

hdd_NO_buffer

It there any reason to keep output_buffering disabled in php? As it seems to be made on purpose. If some code needs to perform a buffer flush it should do it, without needing to disable it globally.

Steps to reproduce

I just made a simple php program to test this out, emulating the function that nextcloud does use during file copy or download operations, it prints obtained speeds on those operations. Then you just have to disable output_buffering Off, or let it at 4096 which is the recommended default value to see the difference.

The code to test this out was a simple read and wirte operation on disk of a big file

<?php

function my_stream_copy_to_stream($source, $dest, $max_bytes = -1, $offset = 0) {
    $total_bytes = 0;
    if ($offset > 0) {
        if (stream_seek($dest, $offset) === false) {
            // Error seeking in output stream
            return false;
        }
    }
    while (!feof($source) && ($max_bytes == -1 || $total_bytes < $max_bytes)) {
        $data = fread($source, 8192);
        if ($data === false) {
            break;
        }
        $bytes_written = fwrite($dest, $data);
        if ($bytes_written === false) {
            break;
        }
        $total_bytes += $bytes_written;
    }
    if (!feof($source)) {
        // Error reading from source stream
        return false;
    }
    if ($max_bytes != -1 && $total_bytes < $max_bytes) {
        // Not enough bytes were read from source stream
        return false;
    }
    if (feof($source) && !feof($dest)) {
        // Output stream was closed prematurely
        return false;
    }
    return $total_bytes;
}

$status=unlink('/var/www/html/copy.zip');    
if($status){  
echo "File deleted successfully \n";    
}else{  
echo "Sorry there is no copy file yet, copy started please wait...! \n";    
}  

// Get the path to the file to download
$file_path = "file.zip";

// Get the size of the file
$file_size = filesize($file_path);

// Open the source file
$source = fopen($file_path, "rb");

// Create the destination file
$dest = fopen("/var/www/html/copy.zip", "wb");

// Start the timer
$start_time = microtime(true);

// Copy the file from the source to the destination
try {
    $total_bytes = my_stream_copy_to_stream($source, $dest, $file_size);
} catch (Exception $e) {
    echo "Error: " . $e->getMessage();
    exit;
}

$total_bytes = $file_size;

// Close the files

try {
fclose($dest);
} catch (Exception $e) {
    echo "Error: " . $e->getMessage();
    exit;
}

// Calculate the elapsed time
$elapsed_time = microtime(true) - $start_time;

// Calculate the average speed
$average_speed = $total_bytes / $elapsed_time / 1048576;

echo "Elasped time:" . $elapsed_time;
echo "Totalbytes: " . $total_bytes;
echo "Download successful. Average speed: " . $average_speed . " MB/s \r\n";


phpinfo();

?>

Expected behavior

This parameter should be default in nextcloud:

output_buffering = 4096

Installation method

Community Docker image

Nextcloud Server version

27

Operating system

Debian/Ubuntu

PHP engine version

PHP 8.2

Web server

Apache (supported)

Database engine version

PostgreSQL

Is this bug present after an update or on a fresh install?

Fresh Nextcloud Server install

Are you using the Nextcloud Server Encryption module?

None

What user-backends are you using?

  • Default user-backend (database)
  • LDAP/ Active Directory
  • SSO - SAML
  • Other

Configuration report

No response

List of activated Apps

No response

Nextcloud Signing status

No response

Nextcloud Logs

No response

Additional info

No response

@yevon yevon added 0. Needs triage Pending check for reproducibility or if it fits our roadmap bug labels Jul 13, 2023
@szaimen
Copy link
Contributor

szaimen commented Jul 14, 2023

Cc @nextcloud/server-backend

@joshtrichards
Copy link
Member

Not a reply, just adding some historical context:

owncloud/core#16013

Also an update for the broken link referenced at the above link:

sabre-io/dav#331 (comment)


And explanation in PHP config sample since it's better than what's in the PHP docs:

; Output buffering is a mechanism for controlling how much output data
; (excluding headers and cookies) PHP should keep internally before pushing that
; data to the client. If your application's output exceeds this setting, PHP
; will send that data in chunks of roughly the size you specify.
; Turning on this setting and managing its maximum buffer size can yield some
; interesting side-effects depending on your application and web server.
; You may be able to send headers and cookies after you've already sent output
; through print or echo. You also may see performance benefits if your server is
; emitting less packets due to buffered output versus PHP streaming the output
; as it gets it. On production servers, 4096 bytes is a good setting for performance
; reasons.
; Note: Output buffering can also be controlled via Output Buffering Control
;   functions.
; Possible Values:
;   On = Enabled and buffer is unlimited. (Use with caution)
;   Off = Disabled
;   Integer = Enables the buffer and sets its maximum size in bytes.
; Note: This directive is hardcoded to Off for the CLI SAPI
; Default Value: Off
; Development Value: 4096
; Production Value: 4096
; https://php.net/output-buffering

@yevon
Copy link
Author

yevon commented Jul 14, 2023

As of my understanding, evert response was completely wrong:
Evert Message

output_buffering works as any output buffer, it acumulates some data in memory in order to improve performance by not performing the output to the slower device /output every time that it is requested. Just imagine that you want to write 1.000.000 words, and you call the write function 1.000.000 times. It would call to the hdd device 1.000.000 times without buffering which is crazy, you kill performance and cpu with uneeded cycles. With a buffer of 4096 "words" you will reduce it to just 244 writes to the operating system. The slower the latency of the device, worse the problem gets. Having a buffer between 4kb to 8kb is basic for IO performance. I did some tests with 8192 and it did not perform better. With php recommended value of 4096 was best. It won't ever full your ram as every 4096 or after given delay, the buffer is flushed. The output device and operating system also have their buffers, even when the php buffer gets flushed, ir doesn't mean that the operating system will even write to disk, as every OS decides when and how to do it with their buffer policies and depending of the device. And modern nvme drives will cache even more before writting to disk anyway.

And my experience that I was having downloading files slower the first time and faster following downloads of the same file, is due to Linux systems that do use unused ram as disk cache. So when you first read and write the file to the output without php bufffer, performance did suck because of device latency executing unneeded cycles. But following downloads were served directly from RAM, and that latency is so fast that this was not even a problem. You cant test this easily with my example and free enough RAM. If you download again and again the same file without php buffering, first time is slow, and following downlaods are fast. But if you perform two different file downloads with very big files alternatelively, like 50GB without buffering, speed is always slow.

@yevon
Copy link
Author

yevon commented Jul 14, 2023

The only reason to flush a before before it gets filled is because you are streamming a pretty big response, and your write speed is so slow that it gets some time to the server to fill the buffer, so the client does not start receiving some data until it gets full. Just imagine an API that executes a very very slow SQL, and it outputs just "true" every 30 seconds. The client would close the connection with a http timeout, as you won't fit the 4kb buffer fast enough and the client would not receive any data if you don't flush the buffer (or close the socket). I cannot think of any use case for nextcloud that would fall into to this use case, as it mainly treats files that do fullfill any buffer in a blink of an eye.

And regarding consistency between remote file transfers, files might get corrupted because of multiple factors, even is possible that the tcp protocol might eventually fail in very very very rare cases. TCP checksums are just 16 bytes wide, so collisions might happen and even a corrupted fragment might produce the same hash as a correct fragment. In network transfers between the client and the server would be important to perform an additional hash validation in the client, and later on in the server after writting it to the disk to check the file consistency to assure that the transfer is 100% correct and nobody, or any device or bug has altered the file on its destination. In file copies in the same machine this won't be even needed if you don't pre-process or manipulate files.

@solracsf solracsf changed the title [Bug]: output_buffering = Off / 0 in php is destroying Nextcloud download speeds [Bug]: output_buffering = Off / 0 in php is destroying Nextcloud download speeds Jul 15, 2023
@kesselb
Copy link
Contributor

kesselb commented Jul 15, 2023

Are you certain that copying data from stream a to stream b would involve the output buffer?

The documentation states1:

The Output Control functions allow you to control when output is sent from the script. This can be useful in several different situations, especially if you need to send headers to the browser after your script has begun outputting data. The Output Control functions do not affect headers sent using header() or setcookie(), only functions such as echo and data between blocks of PHP code.

I'd say in your sample script, the output is echo and phpinfo.

Footnotes

  1. https://www.php.net/manual/en/intro.outcontrol.php

@yevon
Copy link
Author

yevon commented Jul 15, 2023

Hi!, in my sample script the function my_stream_copy_to_stream does use fwrites. I'm not a php expert at all but this is common to many programming languages. Here states that fwrites does use an 8k buffer by default, but there is no explicit buffer set so it will use system default settings I guess:

(https://www.w3schools.com/php/func_filesystem_set_file_buffer.asp#:~:text=Output%20using%20fwrite()%20is,allowing%20other%20processes%20to%20write).).)

@kesselb
Copy link
Contributor

kesselb commented Jul 16, 2023

I can understand where you're coming from, but I doubt that writing a stream would use the PHP output buffer.
Exception: php://output should use the output buffer.

Please let us know if you have any reliable information.
Until then, I will close the issue.

@kesselb kesselb closed this as not planned Won't fix, can't repro, duplicate, stale Jul 16, 2023
@yevon
Copy link
Author

yevon commented Jul 16, 2023

But why Is nexctloud disabling output_buffering against official php recommendation? Even when it was disabled in the past when no clear reason and this repository is full of issues regarding nexctloud download speeds? This should be investigated furrher. Please don't close it still as I'm still researching this, and my examples already show a clear difference in performance.

@kesselb
Copy link
Contributor

kesselb commented Jul 16, 2023

Thank you; we're looking forward to your research!

@kesselb kesselb reopened this Jul 16, 2023
@yevon
Copy link
Author

yevon commented Jul 16, 2023

I'm preparing the project to build a new docker images with some changes to see if it really makes a difference or not, compairing the official one with the customized. I will update this issue with what I discover thanks.

@solracsf
Copy link
Member

Pydio also recommends disabling it, probably for the same reasons than Nextcloud.
https://pydio.com/en/docs/v8/tuning-server-performances

Although not as good as NodeJS to provide such behaviour, PHP offers a way to send display results to the users before actually performing some actions. The Event system of Pydio takes party of this to improve the user experience and perceived performance. To make sure this is working, please check that php output_buffering is DISABLED for your virtual host.

@yevon
Copy link
Author

yevon commented Jul 17, 2023

output_buffering should be disable were needed and not globally, as disabling buffers for big responses like a file download can kill the server performance. And is very common having a reverse proxy in front of nextcloud for providing ssl connection encription and additional security, so you are forcing it to encript very small fragments very often, killing even more performance.

@yevon
Copy link
Author

yevon commented Jul 17, 2023

I'm trying to build a docker image with enabled output_buffering without any luck. I tried replacing all references to output_buffering = 0 everywhere and building a new docker image, but it seems that the docker php install is overriding some params and is impossible to me to override them. Any ideas where I should set this? As php_info keeps me saying that output_buffering = 0 and I replaced it everywhere.

@yevon
Copy link
Author

yevon commented Jul 21, 2023

It seems that in the past owncloud did support the sendFile apache function, that makes php delegate file downloads to apache or nginx so it was much faster, instead of doing it 100% in php. It seems that they did remove it because if not they could not guarantee the file locking feature:

owncloud/core#16997

https://github.com/owncloud/core/pull/18881/commits

@yevon
Copy link
Author

yevon commented Jul 22, 2023

Just doing a bit recap of all download problemes from what I could see in nextcloud, owncloud and sabre repositories:

  1. Owncloud / nextcloud in the past did use sendfile, that delegates php downloads to apache / nging which is much faster than doing it in php. It was deleted due to concerns with the file-locking system. Related issues:
  1. Seems that native function php stream_copy_to_stream doesn't perform good with files bigger than 4MB, and could be better by having compatibility with sendFile and better mmap support:
  1. The function stream_copy_to_stream does not verify if the client is still alive, so if you cancel a download it is fully read anyway. Related issues:

@joshtrichards
Copy link
Member

I'm trying to build a docker image with enabled output_buffering without any luck. I tried replacing all references to output_buffering = 0 everywhere and building a new docker image, but it seems that the docker php install is overriding some params and is impossible to me to override them. Any ideas where I should set this? As php_info keeps me saying that output_buffering = 0 and I replaced it everywhere.

@yevon: I don't think OB is set anywhere directly by the community Docker image, since it's already disabled in PHP by default. But server explicitly sets it to 0 and that lands in the Docker image in these two places:

  • /var/www/html/.user.ini
  • /var/www/html/.htaccess

If you comment out those references it'll still be off, since that's PHP's default, but you can then change it at your discretion. Just changed it in .htaccess to 4096 "successfully":

image

@yevon
Copy link
Author

yevon commented Jul 30, 2023

I'm trying to build a docker image with enabled output_buffering without any luck. I tried replacing all references to output_buffering = 0 everywhere and building a new docker image, but it seems that the docker php install is overriding some params and is impossible to me to override them. Any ideas where I should set this? As php_info keeps me saying that output_buffering = 0 and I replaced it everywhere.

@yevon: I don't think OB is set anywhere directly by the community Docker image, since it's already disabled in PHP by default. But server explicitly sets it to 0 and that lands in the Docker image in these two places:

  • /var/www/html/.user.ini
  • /var/www/html/.htaccess

If you comment out those references it'll still be off, since that's PHP's default, but you can then change it at your discretion. Just changed it in .htaccess to 4096 "successfully":

image

Thanks! I will try this this week to see if it makes any difference.

@yevon
Copy link
Author

yevon commented Jul 30, 2023

From what all I cand find, php download performance is terribly bad, and it is better to delegate this task to apache or nginx. It seems that all this began with the lack of sendfile support during downloads, it was disabled in the past when the file locking feature was introduced in owncloud and it has not been revised since then. Handling downloads directly in php seems to be slow, and it would be much faster letting nginx / apache do its work, that is downloading files. Owncloud is not supported anymore and they don't update the project. Should I open another issue to solve the issues with the file lock feature and the lack of sendfile support? I tried to discuss it here, but the project is closed: owncloud/core#40887

@yevon
Copy link
Author

yevon commented Jul 30, 2023

Can somebody help me to find the exact code in nextcloud to download files? It is completely really handled by sabre-io/http or it has some additional code? I will try to wait until nextcloud 28 is release to see if the updated sabre/io lib makes any difference. I also posted to see if sabre is going to support the sendfile feature during downloads. sabre-io/dav#120

@yevon yevon changed the title [Bug]: output_buffering = Off / 0 in php is destroying Nextcloud download speeds [Bug]: output_buffering = Off / 0 in php and sendfile lack support is destroying Nextcloud download speeds Jul 30, 2023
@yevon
Copy link
Author

yevon commented Jul 30, 2023

Sendfile was deleted in the past in this commmit:

owncloud/core@8479702

And this file now corresponds to this in nextcloud:

$view->unlockFile($filename, ILockingProvider::LOCK_SHARED);

Is this code still in use?

It was disabled because with sendfile, once the file download starts, you cannot know when it finishes for unlocking it in php. But there are some fixes that could be done to solve this:

  1. Recover sendfile support from the owncloud commit.
  2. There is no need to delete support of sendfile if the filelocking is disabled 'filelocking.enabled' => 'false', as it impacts dowload performance and there is no file locking anyway.
  3. There could be a new setting like "sendfile.enabled" that enables reagardless if filelocking is enabled or not, warning the user that this will improve download performance but it could lead to corrupted or interrupted file downloads (Can be admissible, as it only requires to downlaod the file again). If sendfile is enabled, no filelock would occur during downloads.
  4. The best solution would be to modify the fileUnlock procedure, so it actively checks if the file is used by any other process (apache or nginx), so it would perform active waiting in the unlock function until the OS process releases the file. But the problem is that this kind of check is OS dependant. (https://github.com/nextcloud/server/blob/master/lib/private/Files/View.php#L2050)

I have some doubts about how file locking works. If two users try to download a file at the same time, the second user has to wait until the first one finishes the download? It seems so by reading the code responsible of the download.

@yevon
Copy link
Author

yevon commented Jul 30, 2023

Nice found! It seems that even when you enable output buffering, it cleans the buffer anyway here:

https://github.com/nextcloud/server/blob/master/lib/private/Files/View.php#L393

update: Doesn't seem to use this at all during downloads isn't it?

@BotoX
Copy link

BotoX commented Aug 4, 2023

See #29467 (comment)
for nginx X-Accel support.

@yevon
Copy link
Author

yevon commented Aug 4, 2023

See #29467 (comment) for nginx X-Accel support.

Thanks I will try the patch to see the difference in performance.

@nextcloud-command
Copy link
Contributor

This issue has been automatically marked as stale because it has not had recent activity and seems to be missing some essential information. It will be closed if no further activity occurs. Thank you for your contributions.

@nextcloud-command nextcloud-command added the stale Ticket or PR with no recent activity label Oct 1, 2023
@nextcloud-command nextcloud-command closed this as not planned Won't fix, can't repro, duplicate, stale Oct 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0. Needs triage Pending check for reproducibility or if it fits our roadmap feature: dav needs info performance 🚀 stale Ticket or PR with no recent activity technical debt
Projects
None yet
Development

No branches or pull requests

7 participants