You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Issue: Collapse and Expand Results. In results same host exists multiple times (number of shards). Solrj in this case creates invalid expandedResults object and ClassCastException (SolrDocumentList to SolrDocument) in line 161 docs.addAll(expandedResults.get(key));
Solution: Use grouping query to fix logic and ClassCastException bug.
protected void populateBuffer() {
SolrQuery query = new SolrQuery();
if (lastNextFetchDate == null) {
lastNextFetchDate = Instant.now();
lastStartOffset = 0;
lastTimeResetToNOW = Instant.now();
}
// reset the value for next fetch date if the previous one is too
// old
else if (resetFetchDateAfterNSecs != -1) {
Instant changeNeededOn =
Instant.ofEpochMilli(
lastTimeResetToNOW.toEpochMilli() + (resetFetchDateAfterNSecs * 1000));
if (Instant.now().isAfter(changeNeededOn)) {
LOG.info(
"lastDate reset based on resetFetchDateAfterNSecs {}",
resetFetchDateAfterNSecs);
lastNextFetchDate = Instant.now();
lastStartOffset = 0;
}
}
query.setQuery("*:*")
.addFilterQuery("nextFetchDate:[* TO " + lastNextFetchDate + "]")
.setStart(lastStartOffset)
.setRows(this.maxNumResults);
if (StringUtils.isNotBlank(diversityField) && diversityBucketSize > 0) {
query.set("indent", "true").set("group", "true").set("group.field", diversityField)
.set("group.limit", diversityBucketSize).set("group.sort", "nextFetchDate asc");
}
LOG.debug("QUERY => {}", query.toString());
try {
long startQuery = System.currentTimeMillis();
QueryResponse response = connection.getClient().query(query);
long endQuery = System.currentTimeMillis();
queryTimes.addMeasurement(endQuery - startQuery);
SolrDocumentList docs = new SolrDocumentList();
LOG.debug("Response : {}", response.toString());
// add the main results
if (response.getResults() != null) {
docs.addAll(response.getResults());
}
int groupsTotal = 0;
// get groups
if (response.getGroupResponse() != null) {
for (GroupCommand groupCommand : response.getGroupResponse().getValues()) {
for (Group group : groupCommand.getValues()) {
groupsTotal++;
LOG.debug("Group : {}", group);
docs.addAll(group.getResult());
}
}
}
int numhits = (response.getResults()!=null)?response.getResults().size():groupsTotal;
// no more results?
if (numhits == 0) {
lastStartOffset = 0;
lastNextFetchDate = null;
} else {
lastStartOffset += numhits;
}
String prefix = mdPrefix.concat(".");
int alreadyProcessed = 0;
int docReturned = 0;
for (SolrDocument doc : docs) {
String url = (String) doc.get("url");
docReturned++;
// is already being processed - skip it!
if (beingProcessed.containsKey(url)) {
alreadyProcessed++;
continue;
}
Metadata metadata = new Metadata();
Iterator<String> keyIterators = doc.getFieldNames().iterator();
while (keyIterators.hasNext()) {
String key = keyIterators.next();
if (key.startsWith(prefix)) {
Collection<Object> values = doc.getFieldValues(key);
key = key.substring(prefix.length());
Iterator<Object> valueIterator = values.iterator();
while (valueIterator.hasNext()) {
String value = (String) valueIterator.next();
metadata.addValue(key, value);
}
}
}
buffer.add(url, metadata);
}
LOG.info(
"SOLR returned {} results from {} buckets in {} msec including {} already being processed",
docReturned,
numhits,
(endQuery - startQuery),
alreadyProcessed);
} catch (Exception e) {
LOG.error("Exception while querying Solr", e);
}
}
The text was updated successfully, but these errors were encountered:
Could you please contribute a PR instead and link it to this issue? It will make it easier to see the difference in code and comment on your suggestions. Thanks!
syefimov
added a commit
to syefimov/storm-crawler
that referenced
this issue
Apr 6, 2023
[ ] Bug report
storm-crawler-solr 2.8
Class: com.digitalpebble.stormcrawler.solr.persistence.SolrSpout
Method: populateBuffer()
Solr: Solr 8,8.2 (cloud mode)
Issue: Collapse and Expand Results. In results same host exists multiple times (number of shards). Solrj in this case creates invalid expandedResults object and ClassCastException (SolrDocumentList to SolrDocument) in line 161 docs.addAll(expandedResults.get(key));
Solution: Use grouping query to fix logic and ClassCastException bug.
The text was updated successfully, but these errors were encountered: