Published on: 2014-07-17T11:27:57+00:00
Java is currently limited to 16 bits per character, while new languages may use 32 or 24 bits per character. This means that Java lacks literals for codepoints, unlike other languages. In Unicode literals, Java expresses UTF-16 encoding, which allows only 16 bits per character. However, it is possible to represent codepoint 0x010400 in UTF-16 encoding by writing "\uD801\uDC00". There has been a bug found in the Java compiler or language regarding null code points. Passphrases that contain control characters are considered invalid, and any character below U+0020 should not be allowed for UI compatibility across multiple platforms.The email messages also contain an advertisement for Black Duck Code Sight, which offers easy access to enterprise code and can search up to 200,000 lines of code. The software powers the world's largest code search on Ohloh, the Black Duck Open Hub. Additionally, there are links to the Bitcoin-development mailing list, which allows users to subscribe or unsubscribe and access archives of past messages.During a discussion about BIP38 implementation, Andreas Schildbach found a problem with the original test case. He updated bitcoinj to fix this issue and proposed a new test vector. There were issues with null characters and other weird issues lurking. Aaron Voisine recommended that instead of removing control characters from passwords, the spec should require that passwords containing control characters are invalid. He also recommended disallowing any character below U+0020 (space) for UI compatibility across multiple platforms. There were concerns about JVM-based wallets not supporting Unicode NFC and therefore suggest limiting the spec to the subset of Unicode that all popular platforms can support. It was suggested to filter ISO control characters to solve the problem.Black Duck is offering a free copy of Code Sight, the same software that powers the world's largest code search on Ohloh, the Black Duck Open Hub. With this offer, users can index and search up to 200,000 lines of code in their enterprise. The Black Duck Open Hub is one of the biggest code search engines available and provides users with an easy way to access all the code in their enterprise. Users who want fast and easy access to their code can take advantage of this free offer.The conversation revolves around the implementation of Bitcoinj code for BIP38, a bitcoin improvement proposal. Andreas Schildbach made changes to his implementation of string filtering so that it could handle SMP chars. Aaron Voisine has tested Andreas' implementation and found that it behaves exactly like in his modified testcase. Andreas suggests that they still need to filter control chars and will look into it again. He also provides a fix for bitcoinj and proposes a new test vector. The passphrase contains GREEK UPSILON WITH HOOK, COMBINING ACUTE ACCENT, NULL, DESERET CAPITAL LETTER LONG I, and PILE OF POO characters. The private key is encoded with this passphrase and produces a BIP38 key. Aaron recommends disallowing any character below U+0020 (space) and invalidating passwords containing control characters. He also offers to submit a PR once they figure out why Andreas's passphrase was different from what he got. Mike Hearn suggests limiting the spec to the subset of unicode that all popular platforms can support because Java uses 16 bit characters internally, which might not support astral planes. Andreas proposes banning/filtering ISO control characters to solve the problem. The conversation ends with a link to Black Duck Code Sight for enterprise code indexing and searching.The email thread discusses a problem in the test case of bitcoinj regarding the encoding of a private key with a given passphrase that contains control characters. Andreas Schildbach had implemented only the decoding side of BIP38 and proposed a test vector for the same. The discussion includes concerns about the accuracy of the test vector, the use of control characters and emoticons in passphrases, and the compatibility of JVM based wallets with Unicode NFC. There is also mention of the Black Duck Code Sight software.The Bitcoin protocol's BIP 38 (password-protected private keys) test vector has been under scrutiny due to a non-standard UTF-8 character (pile of poo) and discrepancies between different implementations. Control characters and whitespaces are generally not recommended in passphrases, but emoticons like pile-of-poo are acceptable as they are easily accessible on mobile keyboards. The NULL character, however, is not included in smartphone keyboards to avoid issues with null-terminated strings. Suggestions include removing the problematic test vector or using a more realistic test string.Bitcoinj recently added an implementation of BIP 38 password-protected private keys. However, there are concerns about the accuracy of the third test vector and its practicality. The NFC normalized version of the input string does not match the results of the Java Unicode normalizer, and different implementations disagree on the outcome.
Updated on: 2023-08-01T09:44:11.463764+00:00