Sneak peek: A new ASN.1 API for Python
If you’ve ever worked with cryptography, PKI schemes, or low-level networking in Python, you’ve likely encountered ASN.1. ASN.1 undergirds every TLS handshake (via X.509 path validation), provides the serialization layer for core internet protocols like LDAP, SNMP, and 3GPP, and generally operates as the lingua franca of cryptographic primitive and protocol representation.
ASN.1’s critical role is complemented by a colorful security history: implementations of ASN.1’s encoding rules have historically been a rich source of memory corruption and denial-of-service vulnerabilities. Similarly, ASN.1’s presence at the lowest layers of the internet’s protocols makes performance and a lack of parser differentials a critical requirement.
Python has multiple excellent ASN.1 implementations (like pyasn1, asn1, and asn1tools), but these generally fall into the latter category: being written purely in Python makes performance a concern, and integration into a stack where other ASN.1 parsers are used (e.g., at the X.509 layer) introduces a differential risk.
We’re changing that: with the help of funding from Alpha-Omega, we’re building an ASN.1 API for PyCA Cryptography that addresses three key shortcomings in the Python ecosystem today:
- Performance: This new API will use a pure Rust ASN.1 parser, giving us close-to-native parsing performance.
- Differential reduction: The parser mentioned above is already used by PyCA Cryptography for its X.509 APIs. This will reduce the need for “mix and match” approaches to ASN.1 parsing, which in turn drive differential vulnerabilities.
- Modernization: The new API will expose a declarative
dataclasses
style interface replete with type hints, making it familiar, idiomatic, and compatible with type checkers.
For example, an ASN.1 definition like this:
Doohickies ::= SEQUENCE {
tschotchkes OCTET STRING,
baubles INTEGER,
knickknacks UTF8String,
whatchamacallits SEQUENCE OF OBJECT IDENTIFIER,
gizmos SET OF GeneralizedTime OPTIONAL
}
…will correspond to the following Python code:
from datetime import datetime
from cryptography.hazmat import asn1
@asn1.sequence
class Doohickies:
tschotchkes: bytes
baubles: int
knickknacks: str
whatchamacallits: list[asn1.ObjectIdentifier]
gizmos: set[datetime] | None
doohickies = Doohickies.from_der(b"...")
print(doohickies.tschotchkes)
doohickies.to_der() # b"..."
This work is a logical continuation of our previous work on X.509 path validation, as funded by the Sovereign Tech Fund. It reflects our ongoing commitment to improving the Python ecosystem, particularly in the areas of cryptography and supply chain security.
Please get in touch if you’re interested in learning more, or funding similar work!
Some quick background on ASN.1
ASN.1, or Abstract Syntax Notation One, is an interface description language (IDL). That’s a fancy way of saying that it’s a syntax for describing data structures in a language- and platform-agnostic manner.
Confusingly, ASN.1 is not itself a serialization format. Instead, it defines encoding rules, which in turn define serialization and deserialization of ASN.1 structures in different settings. In practice, ASN.1 is synonymous1 with the Distinguished Encoding Rules, or DER.
We’ll treat “ASN.1” and “DER” as interchangeable for the purposes of this post. Instead of delving too deeply into the intricacies of both (Let’s Encrypt covers them excellently), we’ll focus on the properties of DER that have kept it relevant for decades:
DER is a canonical encoding: There’s only one way to encode a given ASN.1 structure in DER. In other words, the encoding of an ASN.1 structure in DER is deterministic and can be round-tripped while preserving bit-for-bit equality.
DER is relatively compact: DER defines a binary format and, as a consequence of being canonical, forbids non-minimal encodings of integers, booleans, and times.
DER is a self-describing and self-delimiting encoding: A given DER message can be fully and soundly parsed without prior reference to a schema or format description beyond the encoding rules of DER themselves.
These properties lend themselves naturally to what web developers would call “progressive enhancement”: an application that consumes DER can decode the specific structures it cares about while skipping the ones it doesn’t, decoding only their length in order to jump ahead to the next one.
DER supports arbitrary-precision integers: The
INTEGER
type in DER is functionally unconstrained in size, which makes it suitable for representing the kinds of large numbers that regularly appear in cryptographic settings (e.g., primes).
Put together, these properties make DER very popular in cryptographic, networking, and telecommunications settings.
More precisely, it’s very popular in the guts of each of these settings: ASN.1 is used to represent the X.509 certificates that secure the world’s TLS traffic, is widely used with PEM-encoded formats, and provides the description and serialization for much of the internet’s lower protocol layers.
Motivating an ASN.1 library for Python
You might reasonably ask: why does Python need this?
After all, most Python developers aren’t touching ASN.1 on a daily basis, and those that do are mostly doing so in predefined ways (such as X.509 certificates). Why does the ecosystem need generic support for ASN.1?
The answer to this is that, for better or worse, there are many situations in which Python developers need to do ASN.1 encoding and decoding outside of the “standard” shapes of X.509 and other well-known formats and protocols.
This can be seen in the Sigstore ecosystem: Sigstore is primarily an ordinary RFC 5280–style PKI, but it also includes some custom X.509 extensions for its own purposes. For example, an excerpt of a Sigstore log entry shows the following extensions:
OIDC Issuer: https://token.actions.githubusercontent.com
Runner Environment: github-hosted
Source Repository URI: https://github.com/pypa/sampleproject
Source Repository Ref: refs/heads/main
Source Repository Owner URI: https://github.com/pypa
If we want to consume these from Python (e.g., for the purposes of verifying a Sigstore certificate against a policy), we need to extract them:
from cryptography import x509
raw_cert = b"""
-----BEGIN CERTIFICATE-----
MIIGoTCCBiigAwIBAgITFai+PDKak1xA1HLq0mskqhDV5zAKBggqhkjOPQQDAzA3
MRUwEwYDVQQKEwxzaWdzdG9yZS5kZXYxHjAcBgNVBAMTFXNpZ3N0b3JlLWludGVy
bWVkaWF0ZTAeFw0yNDExMDYyMjM3MDdaFw0yNDExMDYyMjQ3MDdaMAAwWTATBgcq
hkjOPQIBBggqhkjOPQMBBwNCAARbx1Fse2Ln00On5aFaL+lHNGFYLaqeKDduplZD
PJS+w2PjYfNPL0g/n4sDWEQFZfyIExEWKulZ2GKNzAc0+SmUo4IFSDCCBUQwDgYD
VR0PAQH/BAQDAgeAMBMGA1UdJQQMMAoGCCsGAQUFBwMDMB0GA1UdDgQWBBT/uSEI
XmQzuRkppWXrTKVkfZFJbzAfBgNVHSMEGDAWgBTf0+nPViQRlvmo2OkoVaLGLhhk
PzBhBgNVHREBAf8EVzBVhlNodHRwczovL2dpdGh1Yi5jb20vcHlwYS9zYW1wbGVw
cm9qZWN0Ly5naXRodWIvd29ya2Zsb3dzL3JlbGVhc2UueW1sQHJlZnMvaGVhZHMv
bWFpbjA5BgorBgEEAYO/MAEBBCtodHRwczovL3Rva2VuLmFjdGlvbnMuZ2l0aHVi
dXNlcmNvbnRlbnQuY29tMBIGCisGAQQBg78wAQIEBHB1c2gwNgYKKwYBBAGDvzAB
AwQoNjIxZTQ5NzRjYTI1Y2U1MzE3NzNkZWY1ODZiYTNlZDhlNzM2YjNmYzAVBgor
BgEEAYO/MAEEBAdSZWxlYXNlMCAGCisGAQQBg78wAQUEEnB5cGEvc2FtcGxlcHJv
amVjdDAdBgorBgEEAYO/MAEGBA9yZWZzL2hlYWRzL21haW4wOwYKKwYBBAGDvzAB
CAQtDCtodHRwczovL3Rva2VuLmFjdGlvbnMuZ2l0aHVidXNlcmNvbnRlbnQuY29t
MGMGCisGAQQBg78wAQkEVQxTaHR0cHM6Ly9naXRodWIuY29tL3B5cGEvc2FtcGxl
cHJvamVjdC8uZ2l0aHViL3dvcmtmbG93cy9yZWxlYXNlLnltbEByZWZzL2hlYWRz
L21haW4wOAYKKwYBBAGDvzABCgQqDCg2MjFlNDk3NGNhMjVjZTUzMTc3M2RlZjU4
NmJhM2VkOGU3MzZiM2ZjMB0GCisGAQQBg78wAQsEDwwNZ2l0aHViLWhvc3RlZDA1
BgorBgEEAYO/MAEMBCcMJWh0dHBzOi8vZ2l0aHViLmNvbS9weXBhL3NhbXBsZXBy
b2plY3QwOAYKKwYBBAGDvzABDQQqDCg2MjFlNDk3NGNhMjVjZTUzMTc3M2RlZjU4
NmJhM2VkOGU3MzZiM2ZjMB8GCisGAQQBg78wAQ4EEQwPcmVmcy9oZWFkcy9tYWlu
MBgGCisGAQQBg78wAQ8ECgwIMTQ4OTk1OTYwJwYKKwYBBAGDvzABEAQZDBdodHRw
czovL2dpdGh1Yi5jb20vcHlwYTAWBgorBgEEAYO/MAERBAgMBjY0NzAyNTBjBgor
BgEEAYO/MAESBFUMU2h0dHBzOi8vZ2l0aHViLmNvbS9weXBhL3NhbXBsZXByb2pl
Y3QvLmdpdGh1Yi93b3JrZmxvd3MvcmVsZWFzZS55bWxAcmVmcy9oZWFkcy9tYWlu
MDgGCisGAQQBg78wARMEKgwoNjIxZTQ5NzRjYTI1Y2U1MzE3NzNkZWY1ODZiYTNl
ZDhlNzM2YjNmYzAUBgorBgEEAYO/MAEUBAYMBHB1c2gwWQYKKwYBBAGDvzABFQRL
DElodHRwczovL2dpdGh1Yi5jb20vcHlwYS9zYW1wbGVwcm9qZWN0L2FjdGlvbnMv
cnVucy8xMTcxMzAzODk4MS9hdHRlbXB0cy8xMBYGCisGAQQBg78wARYECAwGcHVi
bGljMIGKBgorBgEEAdZ5AgQCBHwEegB4AHYA3T0wasbHETJjGR4cmWc3AqJKXrje
PK3/h4pygC8p7o4AAAGTA5/X5AAABAMARzBFAiA6nYK0GxqVzJutrjrYA1bAIKHU
jGrsHMLrOJTTEUiERAIhAJZotATnSwlKt7C3Zwhx3fcSrhGfOakTlM2w+8qmltcj
MAoGCCqGSM49BAMDA2cAMGQCMB+ilsPgy4ynUG9GtqDEBqW8+ZqjX6LpuxQqjCr7
s4ytyt2ppFdgjrGrG1DY4nSZtQIwblrgq9t9izAMTkJeqhQBs2OUiyIJZipceD5v
AAE/Nfgd/9uK0MZAHFsLgalqOBl8
-----END CERTIFICATE-----
"""
cert = x509.load_pem_x509_certificate(raw_cert)
# 1.3.6.1.4.1.57264.1.16 corresponds to Source Repository Owner URI above
ext = cert.extensions.get_extension_for_oid(x509.ObjectIdentifier("1.3.6.1.4.1.57264.1.16")).value
ext.value # => b'\x0c\x17https://github.com/pypa'
As we can see, the X.509 extension’s value is itself DER encoded, and PyCA Cryptography’s APIs (rightfully) leave it up to us to interpret it2.
So, we need some kind of DER parser. Luckily, Python is a mature ecosystem,
and we can avail ourselves of pyasn1
:
from pyasn1.codec.der.decoder import decode
from pyasn1.type.char import UTF8String
ext_value = decode(ext.value, UTF8String)[0].decode()
ext_value # => 'https://github.com/pypa'
Now we have our inner extension value, and we can get on with our lives.
But why a new library?
But wait: if we have pyasn1
, why do we need a new ASN.1 library?
The answer to this is threefold, and is not a knock against pyasn1
(which is an excellent library that performs its role admirably):
Performance: Python is not a fast language, and
pyasn1
is written in pure Python. The Python ecosystem has historically compensated for that by putting performance-sensitive code in native extensions: at first C, but now increasingly Rust. By leveragingrust-asn1
, we can approach the performance of native code without leaving the comforts of Python.Differential reduction: The ASN.1 ecosystem is notoriously heterogenous, and implementations of ASN.1 vary widely in their conformance to the strict requirements of DER.
In particular, many implementations have found it tempting to apply Postel’s Law to the parsing of incoming “DER” data, allowing improperly canonicalized or outright malformed data so long as the user’s intent can be inferred. This has had a deleterious effect on both protocol evolution and security: protocols struggle to evolve under the pressure of unspecified behavior, and parser differentials are a consistent source of major security incidents.
For this reason, reducing the number of independent parsers for a single format in a given codebase is generally a sound engineering choice. PyCA Cryptography is already built up around
rust-asn1
, so it makes sense to use the exact same parsing routines in a new ASN.1 library.Modernization:
dataclasses
anddataclass
-style declarative APIs have taken the Python ecosystem by storm, and for good reason: they’re uniform, integrate cleanly with type checkers3, and define types as code rather than as data.pyasn1
has a fantastic declarative API, but that API predates the dataclass concept and therefore needs to mix code and data to define its types. Modernizing this API would be at least as difficult (in our estimation) as creating a new one fromrust-asn1
but without the performance and differential reduction benefits.
Stay tuned for more
This is just a sneak peek; watch this space for updates!
We’re still early in the development process for this work; our plan is as follows:
- Build an initial version with support for
@asn1.sequence
and@asn1.enum
as the main decorators, along with support for ASN.1’s basic types and modifiers (e.g.,OPTIONAL
,DEFAULT
,IMPLICIT
, andEXPLICIT
). - Integrate this version into PyCA Cryptography, tentatively as
cryptography.asn1
orcryptography.hazmat.asn1
or similar, then work on deduplicating types where possible. For example, thecryptography.x509.ObjectIdentifier
type is already present and should be shared or reused across both APIs. - Get it released with a major version of PyCA Cryptography!
We’d like to thank Alpha-Omega for funding this work, as well as the PyCA Cryptography maintainers for their support and design review.
ASN.1 is also unfortunately widely used with the Basic Encoding Rules, or BER. Unlike DER, BER is not a canonical encoding and has historically been a source of memory corruption and interoperability issues in PKI ecosystems. ↩︎
The reason for this is subtle: X.509 itself says that an extension’s value is just an
OCTET STRING
(i.e., raw bytes), while RFC 5280 says that theOCTET STRING
should itself contain the DER encoding of an ASN.1 value corresponding to the extension’s OID. See RFC 5280 4.1 for the exact language. ↩︎Thanks in no small part due to
@typing.dataclass_transform
, as introduced in PEP 681. ↩︎