Content Type Module

Working Examples

See content-type examples for complete working examples.

The Tarpit Content Type module provides comprehensive content parsing, decompression, and deserialization capabilities for handling various MIME types and encodings in web applications.

Features

MIME Type Processing: Parse and handle various content types
Content Decompression: Support for gzip, deflate, and other compression formats
Built-in Deserializers: JSON, form data, and text content parsing
Character Encoding: Support for multiple character encodings (UTF-8, etc.)
URL Encoding: Comprehensive URL encoding/decoding utilities
Streaming Support: Handle both Buffer and Readable stream inputs
Extensible Architecture: Easy to add custom deserializers and decompressors

Quick Start

import { Platform } from '@tarpit/core'
import { ContentTypeModule, ContentReaderService } from '@tarpit/content-type'

// Basic usage in a service
@TpService()
class MyService {
    
    constructor(private content_reader: ContentReaderService) {}
    
    async process_request(body: Buffer, headers: { [key: string]: string }) {
        const content = await this.content_reader.read(body, {
            content_type: headers['content-type'] || 'text/plain',
            content_encoding: headers['content-encoding'] || 'identity'
        })
        
        console.log('Content type:', content.type)
        console.log('Charset:', content.charset)
        console.log('Parsed data:', content.data)
        
        return content
    }
}

const platform = new Platform({})
    .import(ContentTypeModule)
    .import(MyService)

Core Concepts

1. MIMEContent Structure

The module works with a standardized MIMEContent structure:

interface MIMEContent<T> {
    type: string | undefined           // MIME type (e.g., 'application/json')
    charset: string | undefined        // Character encoding (e.g., 'utf-8')
    parameters: { [prop: string]: string }  // Additional content-type parameters
    raw: Buffer                        // Raw binary content
    text?: string                      // Decoded text content
    data?: T                          // Deserialized structured data
}

2. Processing Pipeline

Content processing follows a three-stage pipeline:

Decompression: Handle compressed content (gzip, deflate)
Text Decoding: Convert binary to text using appropriate charset
Deserialization: Parse structured data based on content type

3. Content Reader Service

The main service that orchestrates the entire content processing pipeline:

@TpService({ inject_root: true })
class ContentReaderService {
    async read(raw: Readable | Buffer, options: ParseContentOptions): Promise<MIMEContent<any>>
    async deserialize(content: MIMEContent<any>): Promise<any>
}

Supported Content Types

JSON Content

// Content-Type: application/json
const json_content = await content_reader.read(buffer, {
    content_type: 'application/json',
    content_encoding: 'identity'
})

console.log(json_content.data) // Parsed JavaScript object

Form Data

// Content-Type: application/x-www-form-urlencoded
const form_content = await content_reader.read(buffer, {
    content_type: 'application/x-www-form-urlencoded',
    content_encoding: 'identity'
})

console.log(form_content.data) // Parsed form fields object

Text Content

// Content-Type: text/plain; charset=utf-8
const text_content = await content_reader.read(buffer, {
    content_type: 'text/plain; charset=utf-8',
    content_encoding: 'identity'
})

console.log(text_content.text) // Decoded text string

Built-in Deserializers

JSON Deserializer

Handles JSON content with error tolerance:

import { json_deserialize } from '@tarpit/content-type'

const content: MIMEContent<any> = {
    type: 'application/json',
    charset: 'utf-8',
    parameters: {},
    raw: Buffer.from('{"name": "John", "age": 30}')
}

const result = json_deserialize(content)
console.log(result) // { name: "John", age: 30 }

Form Deserializer

Parses URL-encoded form data:

import { form_deserialize } from '@tarpit/content-type'

const content: MIMEContent<any> = {
    type: 'application/x-www-form-urlencoded',
    charset: 'utf-8',
    parameters: {},
    raw: Buffer.from('name=John&age=30&hobbies=reading&hobbies=coding')
}

const result = form_deserialize(content)
console.log(result) // { name: "John", age: "30", hobbies: ["reading", "coding"] }

Text Deserializer

Simple text decoding with charset support:

import { text_deserialize } from '@tarpit/content-type'

const content: MIMEContent<any> = {
    type: 'text/plain',
    charset: 'utf-8',
    parameters: {},
    raw: Buffer.from('Hello, 世界!')
}

const result = text_deserialize(content)
console.log(content.text) // "Hello, 世界!"

URL Encoding Utilities

The URLEncoding namespace provides comprehensive URL encoding/decoding:

Parsing Query Strings

import { URLEncoding } from '@tarpit/content-type'

// Simple parsing
const params = URLEncoding.parse('name=John&age=30')
console.log(params) // { name: "John", age: "30" }

// Array values
const multi_params = URLEncoding.parse('tags=js&tags=ts&tags=node')
console.log(multi_params) // { tags: ["js", "ts", "node"] }

// Complex nested structures
const nested = URLEncoding.parse('user[name]=John&user[profile][age]=30')
console.log(nested) // { user: { name: "John", profile: { age: "30" } } }

URL Encoding

// Simple encoding
const encoded = URLEncoding.stringify({ name: 'John Doe', city: 'New York' })
console.log(encoded) // "name=John%20Doe&city=New%20York"

// Array encoding
const arrays = URLEncoding.stringify({ tags: ['js', 'ts', 'node'] })
console.log(arrays) // "tags=js&tags=ts&tags=node"

// Nested object encoding
const nested = URLEncoding.stringify({ 
    user: { 
        name: 'John', 
        profile: { age: 30 } 
    } 
})
console.log(nested) // "user[name]=John&user[profile][age]=30"

Content Decompression

Automatic Decompression

The module automatically handles compressed content based on Content-Encoding headers:

// Handles gzip compressed content
const gzipped_content = await content_reader.read(gzipped_buffer, {
    content_type: 'application/json',
    content_encoding: 'gzip'
})

// Handles deflate compressed content
const deflated_content = await content_reader.read(deflated_buffer, {
    content_type: 'application/json',
    content_encoding: 'deflate'
})

Supported Compression Formats

gzip: Most common compression format
deflate: Standard deflate compression
identity: No compression (default)
br: Brotli compression (if available)

Character Encoding Support

UTF-8 (Default)

const utf8_content = await content_reader.read(buffer, {
    content_type: 'text/plain; charset=utf-8',
    content_encoding: 'identity'
})

Other Encodings

// Latin-1 encoding
const latin1_content = await content_reader.read(buffer, {
    content_type: 'text/plain; charset=iso-8859-1',
    content_encoding: 'identity'
})

// ASCII encoding
const ascii_content = await content_reader.read(buffer, {
    content_type: 'text/plain; charset=us-ascii',
    content_encoding: 'identity'
})

Custom Deserializers

Creating Custom Deserializers

import { MIMEContent, Deserializer } from '@tarpit/content-type'

// Custom XML deserializer
class XmlDeserializer implements Deserializer {
    deserialize(content: MIMEContent): any {
        if (content.type === 'application/xml' || content.type === 'text/xml') {
            // Parse XML content
            return this.parse_xml(content.text || '')
        }
        return undefined
    }
    
    private parse_xml(xml_text: string): any {
        // XML parsing logic
        return { parsed: 'xml' }
    }
}

// Register custom deserializer
@TpService()
class CustomContentService {
    constructor(private content_reader: ContentReaderService) {
        content_reader.register_deserializer(new XmlDeserializer())
    }
}

Integration with HTTP Module

Automatic Content Processing

When using with the HTTP module, content is automatically processed:

import { TpRouter, Post, JsonBody, FormBody, TextBody } from '@tarpit/http'

@TpRouter('/api')
class ContentHandler {
    
    @Post('json')
    async handle_json(body: JsonBody<{ name: string, age: number }>) {
        // JSON content automatically parsed
        const name = body.ensure('name', Jtl.string)
        const age = body.ensure('age', Jtl.integer)
        
        return { received: { name, age } }
    }
    
    @Post('form')
    async handle_form(body: FormBody) {
        // Form data automatically parsed
        const name = body.ensure('name', Jtl.string)
        const email = body.ensure('email', Jtl.string.email())
        
        return { received: { name, email } }
    }
    
    @Post('text')
    async handle_text(body: TextBody) {
        // Text content automatically decoded
        const content = body.content
        
        return { 
            length: content.length,
            preview: content.substring(0, 100) 
        }
    }
}

Error Handling

Content Processing Errors

try {
    const content = await content_reader.read(buffer, {
        content_type: 'application/json',
        content_encoding: 'gzip'
    })
} catch (error) {
    if (error instanceof DecompressionError) {
        console.error('Failed to decompress content:', error.message)
    } else if (error instanceof DeserializationError) {
        console.error('Failed to parse content:', error.message)
    } else if (error instanceof EncodingError) {
        console.error('Failed to decode text:', error.message)
    }
}

Graceful Fallbacks

const content = await content_reader.read(buffer, {
    content_type: 'application/json',
    content_encoding: 'identity',
    fallback_to_text: true  // Fall back to text if JSON parsing fails
})

if (content.data) {
    // Successfully parsed as JSON
    console.log('JSON data:', content.data)
} else if (content.text) {
    // Fell back to text
    console.log('Text content:', content.text)
}

Best Practices

1. Handle Multiple Content Types

// ✅ Good - Support multiple content types
@TpRouter('/api')
class FlexibleHandler {
    @Post('data')
    async handle_data(req: TpRequest) {
        const content_type = req.headers.get('content-type') || 'text/plain'
        
        if (content_type.includes('application/json')) {
            const json_body = req.get_json_body()
            return this.process_json(json_body.data)
        } else if (content_type.includes('application/x-www-form-urlencoded')) {
            const form_body = req.get_form_body()
            return this.process_form(form_body.data)
        } else {
            const text_body = req.get_text_body()
            return this.process_text(text_body.content)
        }
    }
}

2. Validate Content Size

// ✅ Good - Check content size before processing
const MAX_CONTENT_SIZE = 10 * 1024 * 1024 // 10MB

if (buffer.length > MAX_CONTENT_SIZE) {
    throw new Error('Content too large')
}

const content = await content_reader.read(buffer, options)

3. Use Appropriate Encoding

// ✅ Good - Respect charset parameters
const content_type_header = req.headers.get('content-type')
const parsed_type = parse_content_type(content_type_header)

const content = await content_reader.read(buffer, {
    content_type: parsed_type.type,
    content_encoding: req.headers.get('content-encoding') || 'identity'
})

4. Handle Compression Efficiently

// ✅ Good - Use compression for large responses
@TpRouter('/api')
class DataHandler {
    @Get('large-dataset')
    async get_large_data(res: TpResponse) {
        const data = await this.get_large_dataset()
        
        // Enable compression for large responses
        res.headers.set('Content-Encoding', 'gzip')
        res.headers.set('Content-Type', 'application/json')
        
        return data
    }
}

Performance Considerations

Memory Management

Streaming: Use streaming for large content when possible
Buffer Reuse: Avoid unnecessary buffer copying
Garbage Collection: Release references to large buffers promptly

Compression

Selective Compression: Only compress content above certain thresholds
Compression Levels: Balance compression ratio vs. CPU usage
Client Support: Check Accept-Encoding headers before compressing

Character Encoding

UTF-8 Default: Use UTF-8 as default for better compatibility
Encoding Detection: Implement fallback encoding detection for ambiguous content
BOM Handling: Properly handle Byte Order Marks in text content

Next Steps

Explore the examples repository for real-world usage patterns
Learn about creating custom deserializers for specialized content types
Understand performance optimization techniques for content processing
Integrate with other Tarpit modules for complete web application solutions

Features​

Quick Start​

Core Concepts​

1. MIMEContent Structure​

2. Processing Pipeline​

3. Content Reader Service​

Supported Content Types​

JSON Content​

Form Data​

Text Content​

Built-in Deserializers​

JSON Deserializer​

Form Deserializer​

Text Deserializer​

URL Encoding Utilities​

Parsing Query Strings​

URL Encoding​

Content Decompression​

Automatic Decompression​

Supported Compression Formats​

Character Encoding Support​

UTF-8 (Default)​

Other Encodings​

Custom Deserializers​

Creating Custom Deserializers​

Integration with HTTP Module​

Automatic Content Processing​

Error Handling​

Content Processing Errors​

Graceful Fallbacks​

Best Practices​

1. Handle Multiple Content Types​

2. Validate Content Size​

3. Use Appropriate Encoding​

4. Handle Compression Efficiently​

Performance Considerations​

Memory Management​

Compression​

Character Encoding​

Next Steps​