Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bulk: malformed JSON hangs bulk service #7641

Open
onnozweers opened this issue Aug 21, 2024 · 2 comments
Open

Bulk: malformed JSON hangs bulk service #7641

onnozweers opened this issue Aug 21, 2024 · 2 comments
Assignees
Labels
bulk bulk service

Comments

@onnozweers
Copy link
Contributor

Dear dCache devs,

My colleague Natalie found out, that a malformed JSON hangs the bulk service.

The logs show the full request and the error:

20 Aug 2024 10:22:17 (Frontend-hedgehog14) [] Failed to parse JSON "{ "activity": "PIN", "arguments": {"lifetime": "10",  "lifetimeUnit":"MINUTES"}, "expand_directories": "ALL", "target": ["/users/anatolid/tape//1GBfile"]}": Unexpected character (' ' (code 160)): was expecting double-quote to start field name
 at [Source: (String)"{ "activity": "PIN", "arguments": {"lifetime": "10",  "lifetimeUnit":"MINUTES"}, "expand_directories": "ALL", "target": ["/users/anatolid/tape//1GBfile"]}"; line: 1, column: 55]
20 Aug 2024 10:22:17 (bulk) [] ConcurrentRequestProcessor exiting...

Not sure the bad character ("code 160", a non-breaking space?) will get through after pasting in this issue. Here's a hexdump of the character:

[onno@ui ~]# echo ' ' | hexdump
0000000 a0c2 000a                              
0000003

A hexdump of a normal space:

[onno@ui ~]# echo ' ' | hexdump
0000000 0a20                                   
0000002

So, after submitting the request, we see that the request and all following requests are stuck as "QUEUED":

[root@hedgehog14 ~]# ssh -T -l admin -p "22224" "dcachetest.grid.surfsara.nl" "\s bulk request ls"
ID           | ARRIVED             |            MODIFIED |        OWNER |     STATUS | UID
43           | 2024/08/07-15:49:47 | 2024/08/07-15:49:47 |  39147:35932 |  COMPLETED | cfdee0cc-bd8e-4407-95d2-762e456dca6a
44           | 2024/08/07-15:51:05 | 2024/08/07-15:51:05 |  39147:35932 |  COMPLETED | 1ca2167b-9f45-4bad-afff-7eee33f106e2
45           | 2024/08/07-15:59:30 | 2024/08/07-15:59:30 |  39147:35932 |  COMPLETED | f3701757-8dfc-4603-bc26-509062682ad0
46           | 2024/08/07-16:00:07 | 2024/08/07-16:00:07 |  39147:35932 |  COMPLETED | 95c8d078-d8ed-4640-a740-432da584f640
47           | 2024/08/07-16:00:38 | 2024/08/07-16:00:38 |  39147:35932 |  COMPLETED | fffa7038-ecd4-44f2-9182-cc33ea1a7051
48           | 2024/08/07-16:10:23 | 2024/08/07-16:10:23 |  39147:35932 |  COMPLETED | 630ea987-d0ed-4d98-8ecf-c1eb6246a95b
49           | 2024/08/20-10:22:17 | 2024/08/20-10:22:17 |  39147:35932 |     QUEUED | 5d3d8437-2c7c-47e1-9a93-c30aa5e8fc83
50           | 2024/08/20-10:25:35 | 2024/08/20-10:25:35 |  39147:35932 |     QUEUED | d35f92cf-18fb-4241-996d-b1a8ecddee12
51           | 2024/08/20-10:34:59 | 2024/08/20-10:34:59 |  39147:35932 |     QUEUED | 1b47ed2a-665d-4f7b-ad91-2eb90564bebb
52           | 2024/08/20-10:38:23 | 2024/08/20-10:38:23 |  39147:35932 |     QUEUED | 2491428e-a71b-4263-9bd9-2d592d01090a
53           | 2024/08/20-14:02:17 | 2024/08/20-14:02:17 |  31029:31631 |     QUEUED | 991f10cd-d370-4d67-bc56-aa6209a5cb92

Listing the request fails because of a JSON error (both with API and with admin commands).

[root@hedgehog14 /var/log]# ssh -T -l admin -p "22224" "dcachetest.grid.surfsara.nl" "\s bulk request info 5d3d8437-2c7c-47e1-9a93-c30aa5e8fc83"
(3) com.google.common.util.concurrent.UncheckedExecutionException: org.json.JSONException: Expected a ':' after a key at 3 [character 4 line 1] from request info

A database query shows some more info:

# psql -U ******** bulk -c 'select * from request_arguments;'
 rid |              arguments              
-----+-------------------------------------
  39 | lifetime:60,lifetimeUnit:MINUTES
  40 | lifetime:60,lifetimeUnit:MINUTES
  41 | lifetime:60,lifetimeUnit:MINUTES
  42 | lifetime:60,lifetimeUnit:MINUTES
  43 | lifetime:60,lifetimeUnit:MINUTES
  44 | lifetime:60,lifetimeUnit:MINUTES
  45 | lifetime:60,lifetimeUnit:MINUTES
  46 | lifetime:60,lifetimeUnit:MINUTES
  47 | lifetime:60,lifetimeUnit:MINUTES
  48 | lifetime:60,lifetimeUnit:MINUTES
  49 |  "lifetimeUnit":MINUTES,lifetime:10
  50 | lifetime:300
  51 | lifetime:300
  52 | lifetime:300
(14 rows)

I was unable to remove the request with admins commands. The problem can be fixed by deleting the request directly from the database:

# psql -U ******** bulk -c "delete from bulk_request where id = 49;"

And then restarting the domain containing the bulk service.

So, we have a situation where any user with a crafted API request can hang up the bulk service.

It would be nice if JSON errors don't hang up the bulk service but instead are returned as failed.

Cheers,
Onno

@lemora lemora added the bulk bulk service label Aug 21, 2024
@DmitryLitvintsev DmitryLitvintsev self-assigned this Aug 21, 2024
@DmitryLitvintsev
Copy link
Member

Hi Onno,
Could you cut&paste stacktrace from bulk log file in here?
(also may be it would have been prudent to open RT ticket for this one)

@onnozweers
Copy link
Contributor Author

Hi Dmitry,

(also may be it would have been prudent to open RT ticket for this one)

Oops you're right, my apologies.

Here's a trace from our test server, running a 10.1.0 master snapshot at commit d772420. We see the same behavior in production at 9.2.20.

20 Aug 2024 11:20:03 (bulk) [Frontend-hedgehog14 BulkRequestStatus] Uncaught exception in thread pool-16-thread-2
com.google.common.util.concurrent.UncheckedExecutionException: org.json.JSONException: Expected a ':' after a key at 3 [character 4 line 1]
        at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2085)
        at com.google.common.cache.LocalCache.get(LocalCache.java:4011)
        at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:4034)
        at com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:5010)
        at org.dcache.services.bulk.store.jdbc.request.JdbcBulkRequestStore.get(JdbcBulkRequestStore.java:835)
        at org.dcache.services.bulk.store.jdbc.request.JdbcBulkRequestStore.valid(JdbcBulkRequestStore.java:919)
        at org.dcache.services.bulk.store.jdbc.request.JdbcBulkRequestStore.getKey(JdbcBulkRequestStore.java:353)
        at org.dcache.services.bulk.BulkService.lambda$messageArrived$4(BulkService.java:243)
        at org.dcache.util.CDCExecutorServiceDecorator$WrappedRunnable.run(CDCExecutorServiceDecorator.java:130)
        at org.dcache.util.BoundedExecutor$Worker.run(BoundedExecutor.java:247)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
        at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: org.json.JSONException: Expected a ':' after a key at 3 [character 4 line 1]
        at org.json.JSONTokener.syntaxError(JSONTokener.java:503)
        at org.json.JSONObject.<init>(JSONObject.java:225)
        at org.json.JSONObject.<init>(JSONObject.java:402)
        at org.dcache.services.bulk.store.jdbc.request.JdbcBulkRequestDao.toFullRequest(JdbcBulkRequestDao.java:243)
        at org.springframework.jdbc.core.RowMapperResultSetExtractor.extractData(RowMapperResultSetExtractor.java:94)
        at org.springframework.jdbc.core.RowMapperResultSetExtractor.extractData(RowMapperResultSetExtractor.java:61)
        at org.springframework.jdbc.core.JdbcTemplate$1.doInPreparedStatement(JdbcTemplate.java:723)
        at org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTemplate.java:651)
        at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:713)
        at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:744)
        at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:757)
        at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:810)
        at org.dcache.services.bulk.store.jdbc.JdbcBulkDaoUtils.get(JdbcBulkDaoUtils.java:171)
        at org.dcache.services.bulk.store.jdbc.request.JdbcBulkRequestDao.get(JdbcBulkRequestDao.java:156)
        at org.dcache.services.bulk.store.jdbc.request.JdbcBulkRequestStore$RequestLoader.load(JdbcBulkRequestStore.java:146)
        at org.dcache.services.bulk.store.jdbc.request.JdbcBulkRequestStore$RequestLoader.load(JdbcBulkRequestStore.java:142)
        at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3570)
        at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2312)
        at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2189)
        at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2079)
        ... 12 common frames omitted

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bulk bulk service
Projects
None yet
Development

No branches or pull requests

4 participants