The story goes like this. ComputerCraft is a mod that adds programming to Minecraft. You write Lua code that gets executed by a bespoke interpreter with access to world APIs, and now you’re writing code instead of having fun. Computers have limited disk space, and my /nix folder is growing out of control, so I need to compress code.The laziest option would be to use LibDeflate, but its decoder is larger than both the gains from compression and my personal boundary for copying code. So the question becomes: what’s the shortest, simplest, most ratio-efficient compression algorithm?I initially thought this was a complex question full of tradeoffs, but it turns out it’s very clear-cut. My answer is bzip, even though this algorithm has been critiqued multiple times and has fallen into obscurity since xz and zstd became popular.First lookI’m compressing a 327 KB file that contains Lua code with occasional English text sprinkled in comments and documentation. This is important: bzip excels at text-like data rather than binary data. However, my results should be reproducible on other codebases, as the percentages seem to be mostly constant within that category.Let’s compare multiple well-known encoders on this data:uncompressed: 327005(gzip) zopfli --i100: 75882zstd -22 --long --ultra: 69018xz -9: 67940brotli -Z: 67859 (recompiled without a dictionary)lzip -9: 67651bzip2 -9: 63727bzip3: 61067The bzip family is a clear winner by a large margin. It even beats lzip, whose docs say “‘lzip -9’ compresses most files more than bzip2” (I guess code is not “most files”). How does it achieve this? Well, it turns out that bzip is not like the others.AlgorithmsYou see, all other popular compression algorithms are actually the same thing at the core. They’re all based on LZ77, a compression scheme that boils down to replacing repetitive text with short links to earlier occurrences.The main difference is in how literal strings and backreferences are encoded as bit streams, and this is highly non-trivial. Since links can have wildly different offsets, lengths, and frequencies from location to location, a good algorithm needs to predict and succinctly encode these parameters.But bzip does not use LZ77. bzip uses BWT, which reorders characters in the text to group them by context – so instead of predicting tokens based on similar earlier occurrences, you just need to look at the last few symbols. And, surprisingly, with the BWT order, you don’t even need to store where each symbol came from!For example, if the word hello is repeated in text multiple times, with LZ77 you’ll need to find and insert new references at each occurrence. But with BWT, all continuations of hell are grouped together, so you’ll likely just have a sequence of many os in a row, and similarly with other characters, which simple run-length encoding can deal with.BWT comes with some downsides. For example, if you concatenate two texts in different English dialects, e.g. using color vs colour, BWT will mix the continuations of colo in an unpredictable order and you’ll have to encode a weird sequence of rs and us, whereas LZ77 would prioritize recent history. You can remedy this by separating input by formats, but for consistent data like code, it works just fine as is.bzip2 and bzip3 are both based on BWT and differ mostly in how the BWT output is compressed. bzip2 uses a variation on RLE, while bzip3 tries to be more intelligent. I’ll focus on bzip2 for performance reasons, but most conclusions apply to bzip3, too.HeuristicsThere is another interesting thing about BWT. You might have noticed that I’m invoking bzip3 without passing any parameters like -9. That’s because bzip3 doesn’t take them. In fact, even invoking bzip2 with -9 doesn’t do much.LZ77-based methods support different compression levels because searching for earlier occurrences is time-consuming, and sometimes it’s preferable to use a literal string instead of a difficult-to-encode reference, so there is some brute-force. BWT, on the other hand, is entirely deterministic and free of heuristics.Furthermore, there is no degree of freedom in determining how to efficiently encode the lengths and offsets of backreferences, since there are none. There are run lengths, but that’s about it – it’s a single number, and it’s smaller than typical offsets.All of that is to say: if you know what the bzip2 pipeline looks like, you can quickly achieve similar compression ratios without fine-tuning and worrying about edge cases. My unoptimized ad-hoc bzip2-like encoder compresses the same input to about 67 KB – better than lzip and with clear avenues for improvement.DecodersThat covers the compression format, but what about the size of the decoder? Measuring ELFs is useless when targeting Lua, and Lua libraries like LibDeflate don’t optimize code size for self-extracting archives, so at risk of alienating readers with fancy words and girl math, I’ll have to eyeball this for everything but bzip2.A self-extracting executable doesn’t have to decode every archive – just one. We can skip sanity checks, headers, inline metadata into code, and tune the format for easier decoding. As such, I will only look at the core decompression loops.gzip, zstd, xz, brotli, and lzip all start by doing LZ77. Evaluating “copy” tokens is a simple loop that won’t take much code. Where they differ is in how those tokens are encoded into bits:Here’s an example of a Huffman code. Suppose there are 5 tokens with different frequencies: A (60%), B (20%), C (10%), D (5%), E (5%). Write A = 0, B = 10, C = 110, D = 1110, E = 1111. The more frequent a token is, the shorter its encoding. To decode a bit stream, pull bits one by one until you find an exact match.gzip does some light pre-processing and then applies Huffman coding, which assigns unambiguous bit sequences to tokens and then concatenates them, optimizing for total length based on the token frequency distribution. Huffman codes can be parsed in ~250 bytes, the bit trie might take ~700 bytes, and the glue should fit in ~500 bytes. Let’s say 1.5 KB in total.xz encodes tokens bit-by-bit instead of treating them as atoms, which allows the coder to adjust probabilities dynamically, yielding good ratios without encoding any tables at the cost of performance. Bit-by-bit parsing will take more space than usual, but avoiding tables is a huge win, so let’s put at 1 KB.
3月10日,国家互联网应急中心发布关于OpenClaw安全应用的风险提示。近期,OpenClaw(“小龙虾”,曾用名Clawdbot、Moltbot)应用下载与使用情况火爆,国内主流云平台均提供了一键部署服务。此款智能体软件依据自然语言指令直接操控计算机完成相关操作。为实现“自主执行任务”的能力,该应用被授予了较高的系统权限,包括访问本地文件系统、读取环境变量、调用外部服务应用程序编程接口(API)以及安装扩展功能等。然而,由于其默认的安全配置极为脆弱,攻击者一旦发现突破口,便能轻易获取系统的完全控制权。 建议相关单位和个人用户在部署和应用OpenClaw时,强化网络控制,不将OpenClaw默认管理端口直接暴露在公网上,通过身份认证、访问控制等安全控制措施对访问服务进行安全管理。对运行环境进行严格隔离,使用容器等技术限制OpenClaw权限过高问题。,这一点在搜狗输入法中也有详细论述
。关于这个话题,手游提供了深入分析
Пропавший в дикой местности в США турист пять дней выживал в одиночку с травмированными конечностями. Об этом сообщило издание New York Post.
В России начнут строже наказывать за нарушение правил пересечения границы20:12。超级权重对此有专业解读