Java library for parsing AFP (MO:DCA) printer data streams.
NOTE: This project is still work in progress...
afpbox has no runtime dependencies on other libraries. This was a design decision and will (hopefully) never change.
Because afpbox is available at jcenter it is very easy to use afpbox in your projects. At first, add afpbox to your build file. If you use Maven, add the following to your build file:
<dependency>
<groupId>de.textmode.afpbox</groupId>
<artifactId>afpbox</artifactId>
<version>0.4</version>
<type>pom</type>
</dependency>
If you use Gradle, add this:
dependencies {
compile 'de.textmode.afpbox:afpbox:0.4'
}
AFP (MODCA) is a record oriented data stream. For this reason you need to implement a RecordReader
first. afpbox comes with two common implementations
of a RecordReader
.
The StandardRecordReader
is probably the RecordReader
of your choice. It reads the special control character X'5A' of the
record and determines the record length from the following two bytes.
The MvsRecordReader
expects that every AFP record is prefixed with four bytes. The record length is determined from the first two bytes. The following
two bytes are ignored. This record format corresponds to the record format VB on z/OS (formerly known as OS/390, which was formerly known as MVS).
Now when you have a RecordReader
you further need a RecordHandler
. The main idea behind the RecordHandler
is that the application can control
which structured fields have to be parsed and which not. You have to implement a RecordHandler
according to your needs.
When you have a RecordReader
and a RecordHandler
you are ready to create a AfpParser
. Let's build a sample application that will
count the pages of an AFP file so you'll get the idea behind the design of afpbox:
int pageCounter = 0;
final InputStream is = new FileInputStream("myfile.afp");
final RecordHandler rh = new RecordHandler() {
@Override
public void handleLineRecord(final Record record) {
// We just ignore line records (we don't support mixed-mode files in this sample).
}
@Override
public boolean handleStructuredFieldIntroducer(final StructuredFieldIntroducer sfi) {
// *ONLY* if the read record is a "Begin Page" (BPG) structured field: parse
// the structured field and pass the passed structured field to method
// "handleStructuredField" of the RecordHandler.
return sfi.getStructuredFieldIdentifier() == StructuredFieldIdentifier.BPG;
}
@Override
public void handleStructuredField(final StructuredField sf) {
// We only get invoked on structured field "Begin Page" (BPG) - see above...
++pageCounter;
}
@Override
public void handleFaultyStructuredField(final FaultyStructuredField sf) {
// Hopefully we don't see faulty structured fields in our file...
}
};
new AfpParser(new StandardRecordReader(is), rh).parse();
System.out.println("Pages in this file: " + pageCounter);
If you want to parse PTOCA control sequences, you have to combine the PTOCA data of all PTX structured fields (within a Presentation Text Block) and parse this combined data.
afpbox provides a PtocaParser
for this PTOCA data. To use this PtocaParser
you need to implement
a PtocaControlSequenceHandler
according to your needs. The idea of the design is somehow the same
as for the RecordHandler
above - the application decides which control sequences are parsed
and which not.
Here is an example how to use the PtocaParser
. This sample removes all NOPs from the PtocaControlSequence
block and constructs a new PTOCA block. The sample is rather dumb and incomplete but it shows the idea
behind the PtocaControlSequenceHandler
and how to use it.
final ByteArrayOutputStream baos = new ByteArrayOutputStream();
PtocaParser.parse(ptocaBlock, new PtocaControlSequenceHandler() {
@Override
public boolean handleControSequence(final int functionType, final byte[] data, final int off) {
// *ONLY* if the PTOCA function type is not "No Operation" (NOP - no matter if chained or unchained) parse
// the PTOCA control sequence and invoke "handleControSequence" of the PtocaControlSequenceHandler.
return functionType != PtocaControlSequenceFunctionType.NOP_UNCHAINED && functionType != PtocaControlSequenceFunctionType.NOP_CHAINED;
}
@Override
public void handleControSequence(final PtocaControlSequence controlSequence) {
baos.write(controlSequence.getData());
}
@Override
public void handleCodePoints(final byte[] codePoints, final int off, final int len) {
baos.write(controlSequence.getData(), off, len);
}
});
The following table shows which Structured Fields are currently supported ("supported" means that afpbox can parse the Stuctured Field and create a specific Java object for it).
Acronym | Identifier | Structured Field Name | Supported |
---|---|---|---|
BAG | X'D3A8C9' | Begin Active Environment Group | ❌ |
BBC | X'D3A8EB' | Begin Bar Code Object | ❌ |
BDA | X'D3EEEB' | Bar Code Data | ❌ |
BDD | X'D3A6EB' | Bar Code Data Descriptor | ❌ |
BDG | X'D3A8C4' | Begin Document Environment Group | ❌ |
BDI | X'D3A8A7' | Begin Document Index | ❌ |
BDT | X'D3A8A8' | Begin Document | ❌ |
BFG | X'D3A8C5' | Begin Form Environment Group | ❌ |
BFM | X'D3A8CD' | Begin Form Map | ❌ |
BGR | X'D3A8BB' | Begin Graphics Object | ❌ |
BII | X'D3A87B' | Begin IM Image | ❌ |
BIM | X'D3A8FB' | Begin Image Object | ❌ |
BMM | X'D3A8CC' | Begin Medium Map | ❌ |
BMO | X'D3A8DF' | Begin Overlay | ❌ |
BNG | X'D3A8AD' | Begin Named Page Group | ❌ |
BOC | X'D3A892' | Begin Object Container | ❌ |
BOG | X'D3A8C7' | Begin Object Environment Group | ❌ |
BPF | X'D3A8A5' | Begin Print File | ❌ |
BPG | X'D3A8AF' | Begin Page | ❌ |
BPS | X'D3A85F' | Begin Page Segment | ❌ |
BPT | X'D3A89B' | Begin Presentation Text Object | ❌ |
BRG | X'D3A8C6' | Begin Resource Group | ❌ |
BRS | X'D3A8CE' | Begin Resource | ❌ |
BSG | X'D3A8D9' | Begin Resource Environment Group | ❌ |
CDD | X'D3A692' | Container Data Descriptor | ❌ |
CTC | X'D3A79B' | Composed Text Control | ❌ |
EAG | X'D3A9C9' | End Active Environment Group | ❌ |
EBC | X'D3A9EB' | End Bar Code Object | ❌ |
EDG | X'D3A9C4' | End Document Environment Group | ❌ |
EDI | X'D3A9A7' | End Document Index | ❌ |
EDT | X'D3A9A8' | End Document | ❌ |
EFG | X'D3A9C5' | End Form Environment Group | ❌ |
EFM | X'D3A9CD' | End Form Map | ❌ |
EGR | X'D3A9BB' | End Graphics Object | ❌ |
EII | X'D3A97B' | End IM Image | ❌ |
EIM | X'D3A9FB' | End Image Object | ❌ |
EMM | X'D3A9CC' | End Medium Map | ❌ |
EMO | X'D3A9DF' | End Overlay | ❌ |
ENG | X'D3A9AD' | End Named Page Group | ❌ |
EOC | X'D3A992' | End Object Container | ❌ |
EOG | X'D3A9C7' | End Object Environment Group | ❌ |
EPF | X'D3A9A5' | End Print File | ❌ |
EPG | X'D3A9AF' | End Page | ❌ |
EPS | X'D3A95F' | End Page Segment | ❌ |
EPT | X'D3A99B' | End Presentation Text Object | ❌ |
ERG | X'D3A9C6' | End Resource Group | ❌ |
ERS | X'D3A9CE' | End Resource | ❌ |
ESG | X'D3A9D9' | End Resource Environment Group | ❌ |
FGD | X'D3A6C5' | Form Environment Group Descriptor | ❌ |
GAD | X'D3EEBB' | Graphics Data | ❌ |
GDD | X'D3A6BB' | Graphics Data Descriptor | ❌ |
ICP | X'D3AC7B' | IM Image Cell Position | ❌ |
IDD | X'D3A6FB' | Image Data Descriptor | ❌ |
IEL | X'D3B2A7' | Index Element | ❌ |
IID | X'D3A67B' | Image Input Descriptor | ❌ |
IMM | X'D3ABCC' | Invoke Medium Map | ❌ |
IOB | X'D3AFC3' | Include Object | ❌ |
IOC | X'D3A77B' | IM Image Output Control | ❌ |
IPD | X'D3EEFB' | Image Picture Data | ❌ |
IPG | X'D3AFAF' | Include Page | ❌ |
IPO | X'D3AFD8' | Include Page Overlay | ❌ |
IPS | X'D3AF5F' | Include Page Segment | ❌ |
IRD | X'D3EE7B' | IM Image Raster Data | ❌ |
LLE | X'D3B490' | Link Logical Element | ❌ |
MBC | X'D3ABEB' | Map Bar Code Object | ❌ |
MCC | X'D3A288' | Medium Copy Count | ❌ |
MCD | X'D3AB92' | Map Container Data | ❌ |
MCF | X'D3AB8A' | Map Coded Font | ❌ |
MCF-1 | X'D3B18A' | Map Coded Font Format-1 | ❌ |
MDD | X'D3A688' | Medium Descriptor | ❌ |
MDR | X'D3ABC3' | Map Data Resource | ❌ |
MFC | X'D3A088' | Medium Finishing Control | ❌ |
MGO | X'D3ABBB' | Map Graphics Object | ❌ |
MIO | X'D3ABFB' | Map Image Object | ❌ |
MMC | X'D3A788' | Medium Modification Control | ❌ |
MMD | X'D3ABCD' | Map Media Destination | ❌ |
MMO | X'D3B1DF' | Map Medium Overlay | ❌ |
MMT | X'D3AB88' | Map Media Type | ❌ |
MPG | X'D3ABAF' | Map Page | ❌ |
MPO | X'D3ABD8' | Map Page Overlay | ❌ |
MPS | X'D3B15F' | Map Page Segment | ❌ |
MPT | X'D3AB9B' | Map Presentation Text | ❌ |
MSU | X'D3ABEA' | Map Suppression | ❌ |
NOP | X'D3EEEE' | No Operation | ✅ |
OBD | X'D3A66B' | Object Area Descriptor | ❌ |
OBP | X'D3AC6B' | Object Area Position | ❌ |
OCD | X'D3EE92' | Object Container Data | ❌ |
PEC | X'D3A7A8' | Presentation Environment Control | ❌ |
PFC | X'D3B288' | Presentation Fidelity Control | ❌ |
PGD | X'D3A6AF' | Page Descriptor | ❌ |
PGP | X'D3B1AF' | Page Position | ❌ |
PGP-1 | X'D3ACAF' | Page Position Format-1 | ❌ |
PMC | X'D3A7AF' | Page Modification Control | ❌ |
PPO | X'D3ADC3' | Preprocess Presentation Object | ❌ |
PTD | X'D3B19B' | Presentation Text Data Descriptor | ❌ |
PTD-1 | X'D3A69B' | Presentation Text Descriptor Format-1 | ❌ |
PTX | X'D3EE9B' | Presentation Text Data | ❌ |
TLE | X'D3A090' | Tag Logical Element | ❌ |
The following table shows the Triplets and the current status of the corresponding support of the Triplet ("supported" means that afpbox can parse the Triplet and create a specific Java object for it).
ID | Name | Supported |
---|---|---|
X'4D' | Area Definition | ❌ |
X'80' | Attribute Qualifier | ❌ |
X'36' | Attribute Value | ❌ |
X'26' | Character Rotation | ❌ |
X'96' | CMR Tag Fidelity | ❌ |
X'01' | Coded Graphic Character Set Global ID | ✅ |
X'75' | Color Fidelity | ❌ |
X'91' | Color Management Resource Descriptor | ❌ |
X'4E' | Color Specification | ❌ |
X'65' | Comment | ✅ |
X'8B' | Data-Object Font Descriptor | ❌ |
X'43' | Descriptor Position | ✅ |
X'97' | Device Appearance | ❌ |
X'50' | Encoding Scheme ID | ❌ |
X'22' | Extended Resource Local ID | ❌ |
X'88' | Finishing Fidelity | ❌ |
X'85' | Finishing Operation | ❌ |
X'20' | Font Coded Graphic Character Set Global Identifier | ❌ |
X'1F' | Font Descriptor Specification | ❌ |
X'78' | Font Fidelity | ❌ |
X'5D' | Font Horizontal Scale Factor | ❌ |
X'84' | Font Resolution and Metric Technology | ❌ |
X'02' | Fully Qualified Name | ✅ |
X'9A' | Image Resolution | ❌ |
X'73' | IMM Insertion (Retired) | ❌ |
X'9D' | Keep Group Together | ❌ |
X'27' | Line Data Object Position Migration (Retired) | ❌ |
X'62' | Local Date and Time Stamp | ✅ |
X'8C' | Locale Selector | ❌ |
X'04' | Mapping Option | ❌ |
X'45' | Media Eject Control | ❌ |
X'87' | Media Fidelity | ❌ |
X'56' | Medium Map Page Number | ❌ |
X'68' | Medium Orientation | ❌ |
X'8F' | MO:DCA Function Set | ❌ |
X'18' | MO:DCA Interchange Set | ❌ |
X'4B' | Object Area Measurement Units | ❌ |
X'4C' | Object Area Size | ❌ |
X'57' | Object Byte Extent | ❌ |
X'2D' | Object Byte Offset | ❌ |
X'63' | Object Checksum (Retired) | ❌ |
X'10' | Object Classification | ❌ |
X'9C' | Object Container Presentation Space Size | ❌ |
X'5E' | Object Count | ❌ |
X'21' | Object Function Set Specification (Retired) | ❌ |
X'5A' | Object Offset | ❌ |
X'64' | Object Origin Identifier (Retired) | ❌ |
X'59' | Object Structured Field Extent | ❌ |
X'58' | Object Structured Field Offset | ❌ |
X'46' | Page Overlay Conditional Processing (Retired) | ❌ |
X'81' | Page Position Information | ❌ |
X'82' | Parameter Value | ❌ |
X'83' | Presentation Control | ❌ |
X'71' | Presentation Space Mixing Rules | ❌ |
X'70' | Presentation Space Reset Mixing | ❌ |
X'95' | Rendering Intent | ❌ |
X'24' | Resource Local ID | ❌ |
X'6C' | Resource Object Include | ❌ |
X'21' | Resource Object Type | ❌ |
X'25' | Resource Section Number | ✅ |
X'47' | Resource Usage Attribute (Retired) | ❌ |
X'86' | Text Fidelity | ❌ |
X'1D' | Text Orientation (Retired) | ❌ |
X'74' | Toner Saver | ❌ |
X'FF' | Triplet Extender | ❌ |
X'72' | Universal Date and Time Stamp | ✅ |
X'8E' | UP3i Finishing Operation | ❌ |
The following table shows which PTOCA Control Sequence are currently supported ("supported" means that afpbox can parse the PTOCA Control Sequence and create a specific Java object for it).
Acronym | Control Sequence Name | Supported |
---|---|---|
AMB | Absolute Move Baseline | ✅ |
AMI | Absolute Move Inline | ✅ |
BLN | Begin Line | ✅ |
BSU | Begin Suppression | ✅ |
DBR | Draw B-axis Rule | ✅ |
DIR | Draw I-axis Rule | ✅ |
ESU | End Suppression | ✅ |
GAR | Glyph Advance Run | ✅ |
GIR | Glyph ID Run | ✅ |
GLC | Glyph Layout Control | ✅ |
GOR | Glyph Offset Run | ✅ |
NOP | No Operation | ✅ |
OVS | Overstrike | ✅ |
RMB | Relative Move Baseline | ✅ |
RMI | Relative Move Inline | ✅ |
RPS | Repeat String | ✅ |
SBI | Set Baseline Increment | ✅ |
SCFL | Set Coded Font Local | ✅ |
SEC | Set Extended Text Color | ✅ |
SIA | Set Intercharacter Adjustment | ✅ |
SIM | Set Inline Margin | ✅ |
STC | Set Text Color | ✅ |
STO | Set Text Orientation | ✅ |
SVI | Set Variable Space Character Increment | ✅ |
TBM | Temporary Baseline Move | ✅ |
TRN | Transparent Data | ✅ |
UCT | Unicode Complex Text | ✅ |
USC | Underscore | ✅ |
The following table shows which GOCA Drawing Orders are currently supported ("supported" means that afpbox can parse the GOCA Drawing Order and create a specific Java object for it).
Acronym | Identifier | Structured Field Name | Supported |
---|---|---|---|
GBAR | X'68' | Begin Area | ❌ |
GBCP | X'DE' | Begin Custom Pattern | ❌ |
GBIMG | X'D1' | Begin Image at Given Position | ❌ |
GBOX | X'C0' | Box at Given Position | ❌ |
GCBEZ | X'E5' | Cubic Bezier Curve at Given Position | ❌ |
GCBIMG | X'91' | Begin Image at Current Position | ❌ |
GCBOX | X'80' | Box at Current Position | ❌ |
GCCBEZ | X'A5' | Cubic Bezier Curve at Current Position | ❌ |
GCCHST | X'83' | Character String at Current Position | ❌ |
GCFARC | X'87' | Full Arc at Current Position | ❌ |
GCFLT | X'85' | Fillet at Current Position | ❌ |
GCHST | X'C3' | Character String at Given Position | ❌ |
GCLINE | X'81' | Line at Current Position | ❌ |
GCMRK | X'82' | Marker at Current Position | ❌ |
GCOMT | X'01' | Comment | ❌ |
GCPARC | X'A3' | Partial Arc at Current Position | ❌ |
GCRLINE | X'A1' | Relative Line at Current Position | ❌ |
GDPT | X'DF' | Delete Pattern | ❌ |
GEAR | X'60' | End Area | ❌ |
GECP | X'5E' | End Custom Pattern | ❌ |
GEIMG | X'93' | End Image | ❌ |
GEPROL | X'3E' | End Prolog | ❌ |
GFARC | X'C7' | Full Arc at Given Position | ❌ |
GFLT | X'C5' | Fillet at Given Position | ❌ |
GIMD | X'92' | Image Data | ❌ |
GLGD | X'FEDC' | Linear Gradient | ❌ |
GLINE | X'C1' | Line at Given Position | ❌ |
GMRK | X'C2' | Marker at Given Position | ❌ |
GNOP1 | X'00' | No-Operation | ❌ |
GPARC | X'E3' | Partial Arc at Given Position | ❌ |
GRGD | X'FEDD' | Radial Gradient | ❌ |
GRLINE | X'E1' | Relative Line at Given Position | ❌ |
GSAP | X'22' | Set Arc Parameters | ❌ |
GSBMX | X'0D' | Set Background Mix | ❌ |
GSCA | X'34' | Set Character Angle | ❌ |
GSCC | X'33' | Set Character Cell | ❌ |
GSCD | X'3A' | Set Character Direction | ❌ |
GSCH | X'35' | Set Character Shear | ❌ |
GSCLT | X'20' | Set Custom Line Type | ❌ |
GSCOL | X'0A' | Set Color | ❌ |
GSCP | X'21' | Set Current Position | ❌ |
GSCR | X'39' | Set Character Precision | ❌ |
GSCS | X'38' | Set Character Set | ❌ |
GSECOL | X'26' | Set Extended Color | ❌ |
GSFLW | X'11' | Set Fractional Line Width | ❌ |
GSGCH | X'04' | Segment Characteristics | ❌ |
GSLE | X'1A' | Set Line End | ❌ |
GSLJ | X'1B' | Set Line Join | ❌ |
GSLT | X'18' | Set Line Type | ❌ |
GSLW | X'19' | Set Line Width | ❌ |
GSMC | X'37' | Set Marker Cell | ❌ |
GSMP | X'3B' | Set Marker Precision (obsolete) | ❌ |
GSMS | X'3C' | Set Marker Set | ❌ |
GSMT | X'29' | Set Marker Symbol | ❌ |
GSMX | X'0C' | Set Mix | ❌ |
GSPCOL | X'B2' | Set Process Color | ❌ |
GSPIK | X'43' | Set Pick Identifier | ❌ |
GSPRP | X'A0' | Set Pattern Reference Point | ❌ |
GSPS | X'08' | Set Pattern Set | ❌ |
GSPT | X'28' | Set Pattern Symbol | ❌ |
???? | X'71' | End Segment | ❌ |
If you want to contribute to afpbox, you're welcome. But please make sure that your changes keep the quality of afpbox at least at it's current level. So please make sure that your contributions comply with the afpbox coding conventions (formatting etc.) and that your contributions are validated by JUnit tests.
It is easy to check this - just build the source with gradle
before creating a pull request. The gradle default tasks will run checkstyle, findbugs and build the JavaDoc. If everything goes well, you're welcome to create a pull request.
Hint: If you use Eclipse as your IDE, you can simply run gradle eclipse
to create the Eclipse project files. Furthermore you can import Eclipse formatter settings (see file config/eclipse-formatter.xml
) as well as Eclipse preferences (see file config/eclipse-preferences.epf
) that will assist you in formatting the afpbox source code according the used coding conventions (no tabs, UTF-8 encoding, indent by 4 spaces, no line longer than 120 characters, etc.).
The following reference materials were used to implement this parser:
- MO:DCA Reference (Mixed Object Document Content Architecture Reference)
- GOCA Reference (Graphics Object Content Architecture for AFP Reference)
- BCOCA Reference (Bar Code Object Content Architecture Reference)
- CMOCA Reference (Color Management Object Content Architecture Reference)
- FOCA Reference (Font Object Content Architecture Reference)
- IOCA Reference (Image Object Content Architecture Reference)
- MOCA Reference (Metadata Object Content Architecture Reference)
- PTOCA Reference (Presentation Text Object Content Architecture Reference)
- Line Data Reference (Programming Guide and Line Data Reference)
All those documents are available at the web site of the AFP Consortium