Skip to content

Commit

Permalink
Merge pull request #494 from stac-utils/jcw/post-ingest-notification
Browse files Browse the repository at this point in the history
  • Loading branch information
jwalgran authored May 17, 2023
2 parents 859d280 + d0a6ef5 commit 0e3b0c6
Show file tree
Hide file tree
Showing 12 changed files with 562 additions and 42 deletions.
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,10 @@ and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.

## [Unreleased] - TBD

### Added

- Publish ingest results to a post-ingest SNS topic

### Changed

- Remove node streams-based ingest code to prepare for post-ingest notifications
Expand Down
18 changes: 15 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -90,9 +90,9 @@ subgraph ingest[Ingest]
ingestSnsTopic[Ingest SNS Topic]
ingestQueue[Ingest SQS Queue]
ingestLambda[Ingest Lambda]
postIngestSnsTopic[Post-Ingest SNS Topic]
ingestDeadLetterQueue[Ingest Dead Letter Queue]
failedIngestLambda[Failed Ingest Lambda]
end
users[Users]
Expand All @@ -109,9 +109,10 @@ opensearch[(OpenSearch)]
itemsForIngest --> ingestSnsTopic
ingestSnsTopic --> ingestQueue
ingestQueue --> ingestLambda
ingestQueue --> ingestDeadLetterQueue
ingestLambda --> opensearch
ingestLambda --> postIngestSnsTopic
ingestDeadLetterQueue --> failedIngestLambda
%% API workflow
Expand Down Expand Up @@ -916,6 +917,17 @@ ingestion will either fail (in the case of a single Item ingest) or if auto-crea

If a collection or item is ingested, and an item with that id already exists in STAC, the new item will completely replace the old item.

After a collection or item is ingested, the status of the ingest (success or failure) along with details of the collection or item are sent to a post-ingest SNS topic. To take action on items after they are ingested subscribe an endpoint to this topic.

Messages published to the post-ingest SNS topic include the following atributes that can be used for filtering:

| attribute | type | values |
| ------------ | ------ | ------------------------ |
| recordType | String | `Collection` or `Item` |
| ingestStatus | String | `successful` or `failed` |
| collection | String | |


### Ingesting large items

There is a 256 KB limit on the size of SQS messages. Larger items can by publishing a message to the `stac-server-<stage>-ingest` SNS topic in with the format:
Expand All @@ -936,7 +948,7 @@ Stac-server can also be subscribed to SNS Topics that publish complete STAC Item

### Ingest Errors

Errors that occur during ingest will end up in the dead letter processing queue, where they are processed by the `stac-server-<stage>-failed-ingest` Lambda function. Currently all the failed-ingest Lambda does is log the error, see the CloudWatch log `/aws/lambda/stac-server-<stage>-failed-ingest` for errors.
Errors that occur while consuming items from the ingest queue will end up in the dead letter processing queue.

## Supporting Cross-cluster Search and Replication

Expand Down
13 changes: 13 additions & 0 deletions serverless.example.yml
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,9 @@ provider:
# PRE_HOOK: ${self:service}-${self:provider.stage}-preHook
# API_KEYS_SECRET_ID: ${self:service}-${self:provider.stage}-api-keys
# POST_HOOK: ${self:service}-${self:provider.stage}-postHook
# If you will be subscribing to post-ingest SNS notifications make
# sure that STAC_API_URL is set so that links are updated correctly
STAC_API_URL: "https://some-stac-server.com"
iam:
role:
statements:
Expand Down Expand Up @@ -72,6 +75,8 @@ functions:
handler: index.handler
memorySize: 512
timeout: 60
environment:
POST_INGEST_TOPIC_ARN: !Ref postIngestTopic
package:
artifact: dist/ingest/ingest.zip
events:
Expand Down Expand Up @@ -101,6 +106,14 @@ resources:
Type: "AWS::SNS::Topic"
Properties:
TopicName: ${self:service}-${self:provider.stage}-ingest
postIngestTopic:
# After a collection or item is ingested, the status of the ingest (success
# or failure) along with details of the collection or item are sent to this
# SNS topic. To take future action on items after they are ingested
# suscribe an endpoint to this topic
Type: AWS::SNS::Topic
Properties:
TopicName: ${self:service}-${self:provider.stage}-post-ingest
deadLetterQueue:
Type: AWS::SQS::Queue
Properties:
Expand Down
13 changes: 11 additions & 2 deletions src/lambdas/ingest/index.js
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
/* eslint-disable import/prefer-default-export */
import got from 'got' // eslint-disable-line import/no-unresolved
import { createIndex } from '../../lib/databaseClient.js'
import { ingestItems } from '../../lib/ingest.js'
import { ingestItems, publishResultsToSns } from '../../lib/ingest.js'
import getObjectJson from '../../lib/s3-utils.js'
import logger from '../../lib/logger.js'

Expand Down Expand Up @@ -60,8 +60,17 @@ export const handler = async (event, _context) => {
: [event]

try {
await ingestItems(stacItems)
const results = await ingestItems(stacItems)
logger.debug('Ingested %d items: %j', stacItems.length, stacItems)

const postIngestTopicArn = process.env['POST_INGEST_TOPIC_ARN']

if (postIngestTopicArn) {
logger.debug('Publishing to post-ingest topic: %s', postIngestTopicArn)
publishResultsToSns(results, postIngestTopicArn)
} else {
logger.debug('Skkipping post-ingest notification since no topic is configured')
}
} catch (error) {
logger.error(error)
throw (error)
Expand Down
4 changes: 2 additions & 2 deletions src/lib/api.js
Original file line number Diff line number Diff line change
Expand Up @@ -383,7 +383,7 @@ export const parsePath = function (inpath) {
}

// Impure - mutates results
const addCollectionLinks = function (results, endpoint) {
export const addCollectionLinks = function (results, endpoint) {
results.forEach((result) => {
const { id } = result
let { links } = result
Expand Down Expand Up @@ -438,7 +438,7 @@ const addCollectionLinks = function (results, endpoint) {
}

// Impure - mutates results
const addItemLinks = function (results, endpoint) {
export const addItemLinks = function (results, endpoint) {
results.forEach((result) => {
let { links } = result
const { id, collection } = result
Expand Down
105 changes: 79 additions & 26 deletions src/lib/ingest.js
Original file line number Diff line number Diff line change
@@ -1,27 +1,43 @@
import { getItemCreated } from './database.js'
import { addItemLinks, addCollectionLinks } from './api.js'
import { dbClient, createIndex } from './databaseClient.js'
import logger from './logger.js'
import { publishRecordToSns } from './sns.js'
import { isCollection, isItem } from './stac-utils.js'

const COLLECTIONS_INDEX = process.env['COLLECTIONS_INDEX'] || 'collections'

export class InvalidIngestError extends Error {
constructor(message) {
super(message)
this.name = 'InvalidIngestError'
}
}

const hierarchyLinks = ['self', 'root', 'parent', 'child', 'collection', 'item', 'items']

export async function convertIngestObjectToDbObject(
// eslint-disable-next-line max-len
/** @type {{ hasOwnProperty: (arg0: string) => any; type: string, collection: string; links: any[]; id: any; }} */ data
) {
let index = ''
logger.debug('data', data)
if (data && data.type === 'Collection') {
if (isCollection(data)) {
index = COLLECTIONS_INDEX
} else if (data && data.type === 'Feature') {
} else if (isItem(data)) {
index = data.collection
} else {
return null
throw new InvalidIngestError(
`Expeccted data.type to be "Collection" or "Feature" not ${data.type}`
)
}

// remove any hierarchy links in a non-mutating way
const hlinks = ['self', 'root', 'parent', 'child', 'collection', 'item', 'items']
if (!data.links) {
throw new InvalidIngestError('Expected a "links" proporty on the stac object')
}
const links = data.links.filter(
(/** @type {{ rel: string; }} */ link) => !hlinks.includes(link.rel)
(/** @type {{ rel: string; }} */ link) => !hierarchyLinks.includes(link.rel)
)
const dbDataObject = { ...data, links }

Expand Down Expand Up @@ -77,9 +93,7 @@ export async function writeRecordToDb(
// if this isn't a collection check if index exists
const exists = await client.indices.exists({ index })
if (!exists.body) {
const msg = `Index ${index} does not exist, add before ingesting items`
logger.debug(msg)
throw new Error(msg)
throw new InvalidIngestError(`Index ${index} does not exist, add before ingesting items`)
}
}

Expand Down Expand Up @@ -112,33 +126,72 @@ export async function writeRecordsInBulkToDb(records) {
}
}

async function asyncMapInSequence(objects, asyncFn) {
function logIngestItemsResults(results) {
results.forEach((result) => {
if (result.error) {
if (result.error instanceof InvalidIngestError) {
// Attempting to ingest invalid stac objects is not a system error so we
// log it as info and not error
logger.info('Invalid ingest item', result.error)
} else {
logger.error('Error while ingesting item', result.error)
}
} else {
logger.debug('Ingested item %j', result)
}
})
}

export async function ingestItems(items) {
const results = []
for (const object of objects) {
for (const record of items) {
let dbRecord
let result
let error
try {
// This helper is inteneted to be used with the objects must be processed
// in sequence so we intentionally await each iteration.
// We are intentionally writing records one at a time in sequence so we
// disable this rule
// eslint-disable-next-line no-await-in-loop
dbRecord = await convertIngestObjectToDbObject(record)
// eslint-disable-next-line no-await-in-loop
const result = await asyncFn(object)
results.push(result)
} catch (error) {
results.push(error)
result = await writeRecordToDb(dbRecord)
} catch (e) {
error = e
}
results.push({ record, dbRecord, result, error })
}
logIngestItemsResults(results)
return results
}

function logErrorResults(results) {
// Impure - mutates record
function updateLinksWithinRecord(record) {
const endpoint = process.env['STAC_API_URL']
if (!endpoint) {
logger.info('STAC_API_URL not set, not updating links within ingested record')
return record
}
if (!isItem(record) && !isCollection(record)) {
logger.info('Record is not a collection or item, not updating links within ingested record')
return record
}

record.links = record.links.filter(
(/** @type {{ rel: string; }} */ link) => !hierarchyLinks.includes(link.rel)
)
if (isItem(record)) {
addItemLinks([record], endpoint)
} else if (isCollection(record)) {
addCollectionLinks([record], endpoint)
}
return record
}

export async function publishResultsToSns(results, topicArn) {
results.forEach((result) => {
if (result instanceof Error) {
logger.error('Error while ingesting item', result)
if (result.record && !result.error) {
updateLinksWithinRecord(result.record)
}
publishRecordToSns(topicArn, result.record, result.error)
})
}

export async function ingestItems(items) {
const records = await asyncMapInSequence(items, convertIngestObjectToDbObject)
const results = await asyncMapInSequence(records, writeRecordToDb)
logErrorResults(results)
return results
}
45 changes: 45 additions & 0 deletions src/lib/sns.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
import { sns } from './aws-clients.js'
import logger from './logger.js'
import { isCollection, isItem } from './stac-utils.js'

const attrsFromPayload = function (payload) {
let type = 'unknown'
let collection = ''
if (isCollection(payload.record)) {
type = 'Collection'
collection = payload.record.id || ''
} else if (isItem(payload.record)) {
type = 'Item'
collection = payload.record.collection || ''
}

return {
recordType: {
DataType: 'String',
StringValue: type
},
ingestStatus: {
DataType: 'String',
StringValue: payload.error ? 'failed' : 'successful'
},
collection: {
DataType: 'String',
StringValue: collection
}
}
}

/* eslint-disable-next-line import/prefer-default-export */
export async function publishRecordToSns(topicArn, record, error) {
const payload = { record, error }
try {
await sns().publish({
Message: JSON.stringify(payload),
TopicArn: topicArn,
MessageAttributes: attrsFromPayload(payload)
}).promise()
logger.info(`Wrote record ${record.id} to ${topicArn}`)
} catch (err) {
logger.error(`Failed to write record ${record.id} to ${topicArn}: ${err}`)
}
}
7 changes: 7 additions & 0 deletions src/lib/stac-utils.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
export function isCollection(record) {
return record && record.type === 'Collection'
}

export function isItem(record) {
return record && record.type === 'Feature'
}
Loading

0 comments on commit 0e3b0c6

Please sign in to comment.