-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pipeline still unstable with large .fastqs #72
Comments
which file is this, maybe we can split the fastq upfront?
…On Mon, 18 Feb 2019 at 11:48, Firedrops ***@***.***> wrote:
I have tried increasing the MACHINE_TYPE to n1-standard-8, which is 8
vCPUs and 30 GB RAM, should be more than enough for any of the reference
files.
Large files (~>100 kb?) still get stuck with these error logs:
*2019-02-18 (11:33:28) Processing stuck in step Alignment for at least
05m00s without outputting or completing in state pro...*
Processing stuck in step Alignment for at least 05m00s without outputting or completing in state process
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:170)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at org.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:137)
at org.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:153)
at org.apache.http.impl.io.SessionInputBufferImpl.readLine(SessionInputBufferImpl.java:282)
at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:138)
at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56)
at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259)
at org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163)
at org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:165)
at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273)
at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272)
at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185)
at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89)
at org.apache.http.impl.execchain.ServiceUnavailableRetryExec.execute(ServiceUnavailableRetryExec.java:85)
at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:111)
at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:72)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:221)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:165)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:140)
at com.theappsolutions.nanostream.util.HttpHelper.executeRequest(HttpHelper.java:105)
at com.theappsolutions.nanostream.http.NanostreamHttpService.generateAlignData(NanostreamHttpService.java:58)
at com.theappsolutions.nanostream.aligner.MakeAlignmentViaHttpFn.processElement(MakeAlignmentViaHttpFn.java:49)
at com.theappsolutions.nanostream.aligner.MakeAlignmentViaHttpFn$DoFnInvoker.invokeProcessElement(Unknown Source)
*2019-02-18 (11:38:28) Processing stuck in step Alignment for at least
10m00s without outputting or completing in state pro...*
Processing stuck in step Alignment for at least 10m00s without outputting or completing in state process
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:170)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at org.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:137)
at org.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:153)
at org.apache.http.impl.io.SessionInputBufferImpl.readLine(SessionInputBufferImpl.java:282)
at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:138)
at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56)
at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259)
at org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163)
at org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:165)
at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273)
at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272)
at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185)
at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89)
at org.apache.http.impl.execchain.ServiceUnavailableRetryExec.execute(ServiceUnavailableRetryExec.java:85)
at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:111)
at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:72)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:221)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:165)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:140)
at com.theappsolutions.nanostream.util.HttpHelper.executeRequest(HttpHelper.java:105)
at com.theappsolutions.nanostream.http.NanostreamHttpService.generateAlignData(NanostreamHttpService.java:58)
at com.theappsolutions.nanostream.aligner.MakeAlignmentViaHttpFn.processElement(MakeAlignmentViaHttpFn.java:49)
at com.theappsolutions.nanostream.aligner.MakeAlignmentViaHttpFn$DoFnInvoker.invokeProcessElement(Unknown Source)
*2019-02-18 (11:38:38) org.apache.http.client.ClientProtocolException:
Unexpected response status: 502*
org.apache.http.client.ClientProtocolException: Unexpected response status: 502
com.theappsolutions.nanostream.http.NanostreamResponseHandler.handleResponse(NanostreamResponseHandler.java:39)
com.theappsolutions.nanostream.http.NanostreamResponseHandler.handleResponse(NanostreamResponseHandler.java:17)
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:223)
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:165)
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:140)
com.theappsolutions.nanostream.util.HttpHelper.executeRequest(HttpHelper.java:105)
com.theappsolutions.nanostream.http.NanostreamHttpService.generateAlignData(NanostreamHttpService.java:58)
com.theappsolutions.nanostream.aligner.MakeAlignmentViaHttpFn.processElement(MakeAlignmentViaHttpFn.java:49)
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#72>, or mute the
thread
<https://github.com/notifications/unsubscribe-auth/AD01ZIsAmzEfSnfYLZUadq7pw2OmZyMQks5vOgZYgaJpZM4a_8Xt>
.
--
Group leader, Institute for Molecular Bioscience, University of Queensland
Senior Lecturer, Imperial College
http://academickarma.org/0000-0002-4300-455X
http://orcid.org/0000-0002-4300-455X
|
The last stack trace indicates http 502. You may have flooded the alignment
cluster. How many reads are you submitting per batch?
…On Mon, Feb 18, 2019, 09:48 Firedrops ***@***.***> wrote:
I have tried increasing the MACHINE_TYPE to n1-standard-8, which is 8
vCPUs and 30 GB RAM, should be more than enough for any of the reference
files.
Large files (~>100 kb?) still get stuck with these error logs:
*2019-02-18 (11:33:28) Processing stuck in step Alignment for at least
05m00s without outputting or completing in state pro...*
Processing stuck in step Alignment for at least 05m00s without outputting or completing in state process
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:170)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at org.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:137)
at org.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:153)
at org.apache.http.impl.io.SessionInputBufferImpl.readLine(SessionInputBufferImpl.java:282)
at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:138)
at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56)
at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259)
at org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163)
at org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:165)
at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273)
at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272)
at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185)
at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89)
at org.apache.http.impl.execchain.ServiceUnavailableRetryExec.execute(ServiceUnavailableRetryExec.java:85)
at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:111)
at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:72)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:221)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:165)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:140)
at com.theappsolutions.nanostream.util.HttpHelper.executeRequest(HttpHelper.java:105)
at com.theappsolutions.nanostream.http.NanostreamHttpService.generateAlignData(NanostreamHttpService.java:58)
at com.theappsolutions.nanostream.aligner.MakeAlignmentViaHttpFn.processElement(MakeAlignmentViaHttpFn.java:49)
at com.theappsolutions.nanostream.aligner.MakeAlignmentViaHttpFn$DoFnInvoker.invokeProcessElement(Unknown Source)
*2019-02-18 (11:38:28) Processing stuck in step Alignment for at least
10m00s without outputting or completing in state pro...*
Processing stuck in step Alignment for at least 10m00s without outputting or completing in state process
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:170)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at org.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:137)
at org.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:153)
at org.apache.http.impl.io.SessionInputBufferImpl.readLine(SessionInputBufferImpl.java:282)
at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:138)
at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56)
at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259)
at org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163)
at org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:165)
at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273)
at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272)
at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185)
at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89)
at org.apache.http.impl.execchain.ServiceUnavailableRetryExec.execute(ServiceUnavailableRetryExec.java:85)
at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:111)
at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:72)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:221)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:165)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:140)
at com.theappsolutions.nanostream.util.HttpHelper.executeRequest(HttpHelper.java:105)
at com.theappsolutions.nanostream.http.NanostreamHttpService.generateAlignData(NanostreamHttpService.java:58)
at com.theappsolutions.nanostream.aligner.MakeAlignmentViaHttpFn.processElement(MakeAlignmentViaHttpFn.java:49)
at com.theappsolutions.nanostream.aligner.MakeAlignmentViaHttpFn$DoFnInvoker.invokeProcessElement(Unknown Source)
*2019-02-18 (11:38:38) org.apache.http.client.ClientProtocolException:
Unexpected response status: 502*
org.apache.http.client.ClientProtocolException: Unexpected response status: 502
com.theappsolutions.nanostream.http.NanostreamResponseHandler.handleResponse(NanostreamResponseHandler.java:39)
com.theappsolutions.nanostream.http.NanostreamResponseHandler.handleResponse(NanostreamResponseHandler.java:17)
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:223)
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:165)
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:140)
com.theappsolutions.nanostream.util.HttpHelper.executeRequest(HttpHelper.java:105)
com.theappsolutions.nanostream.http.NanostreamHttpService.generateAlignData(NanostreamHttpService.java:58)
com.theappsolutions.nanostream.aligner.MakeAlignmentViaHttpFn.processElement(MakeAlignmentViaHttpFn.java:49)
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#72>, or mute the
thread
<https://github.com/notifications/unsubscribe-auth/AAanP0iwBJtc7x5rlRz5B9M4VUHzKvhXks5vOgZYgaJpZM4a_8Xt>
.
|
Just 1. I'm still further testing, it's a bit slow since it takes the 5 minutes to see this error pop up. For now it looks like big .fasta files are OK, but .fastq files are not. UPDATE: ** 2019-02-18 (15:16:55) java.net.SocketException: Broken pipe**
As @lachlancoin suggested, this might be a batching issue, possibly implemented in a way that works well with |
For specifics, I am using the current provisioning script ( The following modifications were made:
I wonder if this issue might have been solved previously but not yet committed to the main branch? Most of the commits there are about a week old or more, and these issues have been mentioned in #23 so @obsh and @Pseverin would have known about them for a while. |
I think we'll make batch size configurable, to try smaller fastq batches with the aligner. Also there is a new build of |
I agree, we just ran into the problem again with the EDTA sample. We'll try 100 and maybe 50 tomorrow, it'd be a good idea to pull the batch size out into an argument, since our builds seemed imperfect the last few times. |
Have tried down to batch size 25, seems to slow down the entire pipeline, no firestore results generated after ~30 mins run time on alignment step. We got the 5 min error in the end and the whole thing had to be cancelled. Also, it seems that once the 5 min pipeline occurs, the whole provisioning cluster needs to be restarted. If we only restart the dataflow, we would immediately get UPDATE: Nevermind, it seems restarting the provisioning cluster doesn't help either. It seems very random, sometimes works sometimes doesn't, even with exact same builds and fastq files. Occasionally also getting 404 errors |
done now, see optional -
I've experimented with batch size, looks that bigger batch size actually improves performance as in this case bwa starting time adds less overhead. Default value is 2000 as it worked well on "dogbite" dataset in my tests. I assume that at least Also in #95 we introduced optional |
I am still having a problem with large fastq, see #98 (connection refused during alignment step). So basically the dataflow stores at the alignment step and nothing comes out of it. This fastq had 4000 records, and I set a batch size of 500 (and using the standard bwa docker). The scripts I use are here: I was wondering, if its possible to avoid the CGI step, which is problematic by instead using Pubsub. I have some thoughts which I will put in a new issue. |
I have tried increasing the provisioning
MACHINE_TYPE
ton1-standard-8
, which is 8 vCPUs and 30 GB RAM, should be more than enough for any of the reference files.Large files (>~100 kb?) still get stuck with these error logs. If these appear, the pipeline appears to be unsalvageable and need to be cancelled and restarted.
2019-02-18 (11:33:28) Processing stuck in step Alignment for at least 05m00s without outputting or completing in state pro...
2019-02-18 (11:38:28) Processing stuck in step Alignment for at least 10m00s without outputting or completing in state pro...
2019-02-18 (11:38:38) org.apache.http.client.ClientProtocolException: Unexpected response status: 502
The text was updated successfully, but these errors were encountered: