WARC - the general format for webarchiving, which is used by archive.org, the Australian Web Archive, and many other well-known Internet archives.
The WARC (Web ARChive) file format offers a convention for concatenating multiple resource records (data objects), each consisting of a set of simple text headers and an arbitrary data block into one long file (с).
To find the warc files (for the testing tools) just type in Google:
inurl:warc.gz site:archive․org
Useful tools
1) Warcat:
It allows you to see the list of files in the archive (command "list") and unpack it (command "extract").
2) replayweb.page:
If the warc file is small, you can view its contents with this extreme simple online tool. Also it's possible to deploy ReplayWeb on your own server.
3) metawarc:
This python script allows you to quickly analyze the structure of the warc file and collect metadata from all the files in the archive.
The WARC (Web ARChive) file format offers a convention for concatenating multiple resource records (data objects), each consisting of a set of simple text headers and an arbitrary data block into one long file (с).
To find the warc files (for the testing tools) just type in Google:
inurl:warc.gz site:archive․org
Useful tools
1) Warcat:
It allows you to see the list of files in the archive (command "list") and unpack it (command "extract").
2) replayweb.page:
If the warc file is small, you can view its contents with this extreme simple online tool. Also it's possible to deploy ReplayWeb on your own server.
3) metawarc:
This python script allows you to quickly analyze the structure of the warc file and collect metadata from all the files in the archive.