Switch OpenWayback to CDX Indexing
CDX indexes scale better for large archives.
Steps
- Generate CDX files:
bin/cdx-indexer archive.warc.gz cdx-index/index.cdx - Edit
WEB-INF/wayback.xml:<!-- Disable BDB collection --> <!-- <ref bean="localbdbcollection" /> --> <ref bean="localcdxcollection" /> - Configure
CDXCollection.xmlwithCompositeSearchResultSourceif you have multiple indexes. - Map ARC/WARC paths using
FlatFileResourceFileLocationDB.
Diagram
flowchart LR
A[CDX files] --> B[CompositeSearchResultSource]
B --> C[OpenWayback queries]
Restart Tomcat after changes and monitor logs for missing paths.