Tune OpenWayback ZipNum Cluster Searches
ZipNum optimises CDX access by chunking indexes.
Configuration
<property name="source">
<bean class="org.archive.wayback.resourceindex.ZipNumClusterSearchResultSource">
<property name="cluster">
<bean class="org.archive.format.gzip.zipnum.ZipNumCluster">
<property name="summaryFile" value="/data/cdx/summary.txt" />
<property name="locFile" value="/data/cdx/loc.txt" />
</bean>
</property>
</bean>
</property>
Generate summary/loc files with the Hadoop job described in the docs.
Diagram
flowchart LR A[ZipNum summary] --> B[ZipNumCluster] C[ZipNum loc] --> B B --> D[OpenWayback search]
Monitor catalina.out
for ZipNum
errors after configuration to ensure the loc URLs are reachable.