Switch OpenWayback to CDX Indexing
CDX indexes scale better for large archives.
Steps
- Generate CDX files:
bin/cdx-indexer archive.warc.gz cdx-index/index.cdx
- Edit
WEB-INF/wayback.xml
:<!-- Disable BDB collection --> <!-- <ref bean="localbdbcollection" /> --> <ref bean="localcdxcollection" />
- Configure
CDXCollection.xml
withCompositeSearchResultSource
if you have multiple indexes. - Map ARC/WARC paths using
FlatFileResourceFileLocationDB
.
Diagram
flowchart LR A[CDX files] --> B[CompositeSearchResultSource] B --> C[OpenWayback queries]
Restart Tomcat after changes and monitor logs for missing paths.