Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactored Pipeline processor handling #2218

Merged
merged 3 commits into from
Jun 26, 2024
Merged

Conversation

McFistyBuns
Copy link
Contributor

Refactored processor handling to more closely resemble what Elasticsearch ingest pipeline endpoint expects.

Fixes #1810

All three ways of setting the processors should now work as expected:

$pipeline = new Pipeline($client);
$pipeline->setId('pageattachment')->setDescription('Extract attachment information');

//Create attachment processor
$attachproc = new AttachmentProcessor('pageDescBinary');
$attachproc->setIndexedChars(-1);
$attachproc->setTargetField('desc_attachment');

//Create remove processor 
$removeproc = new RemoveProcessor('pageDescBinary');

//Create second attachment processor 
$attachproc2 = new AttachmentProcessor('pageSpreadsheetBinary');
$attachproc2->setIndexedChars(-1);
$attachproc2->setTargetField('spreadsheet_attachment');

//Create second remove processor 
$removeproc2 = new RemoveProcessor('pageSpreadsheetBinary');

//Add processors to the pipeline
$pipeline->addProcessor($attachproc);
$pipeline->addProcessor($removeproc);
$pipeline->addProcessor($attachproc2);
$pipeline->addProcessor($removeproc2);

$response = $pipeline->create();
$processors = [
    [
        'attachment' => [
            'field' => 'pageDescBinary',
            'indexed_chars' => -1,
            'target_field' => 'desc_attachment',
        ],
    ],
    [
        'remove' => [
            'field' => 'pageDescBinary',
        ],
    ],
    [
        'attachment' => [
            'field' => 'pageSpreadsheetBinary',
            'indexed_chars' => -1,
            'target_field' => 'spreadsheet_attachment',
        ],
    ],
    [
        'remove' => [
            'field' => 'pageSpreadsheetBinary',
        ],
    ],
];
$pipeline->setRawProcessors($processors);
$pipeline->create();
$pipeline->setProcessors([$attachproc, $removeproc, $attachproc2, $removeproc2]);
$pipeline->create();

Should now produce the expected pipeline in elasticsearch:

curl -X GET http://localhost:9200/_ingest/pipeline/pageattachment?pretty=true
{
  "pageattachment" : {
    "processors" : [
      {
        "attachment" : {
          "field" : "pageDescBinary",
          "indexed_chars" : -1,
          "target_field" : "desc_attachment"
        }
      },
      {
        "remove" : {
          "field" : "pageDescBinary"
        }
      },
      {
        "attachment" : {
          "field" : "pageSpreadsheetBinary",
          "indexed_chars" : -1,
          "target_field" : "spreadsheet_attachment"
        }
      },
      {
        "remove" : {
          "field" : "pageSpreadsheetBinary"
        }
      }
    ],
    "description" : "Extract attachment information"
  }
}

Refactored processor handling to more closely resemble what Elasticsearch ingest pipeline endpoint expects.

Fixes ruflin#1810
@ruflin
Copy link
Owner

ruflin commented Jun 26, 2024

Thanks for the addition. I'm a big fan of PR's that remove code instead of adding new one :-) Feeding all to params simplifies also the handling.

It seems the linting rule has a tiny complaint, could you take a look? Could you also add an entry to the changelog file?

Screenshot 2024-06-26 at 09 45 44

@McFistyBuns
Copy link
Contributor Author

Sorry about that. I thought I had run PHP-CS-Fixer. Must have made that test change after I ran it.

@ruflin ruflin merged commit 8b1826d into ruflin:8.x Jun 26, 2024
17 checks passed
@ruflin
Copy link
Owner

ruflin commented Jun 26, 2024

Thanks for the contribution @McFistyBuns !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Impossible to create pipeline with multiple processors with same name
2 participants