📘 Disclaimer: This book is published under a Creative Commons license and is freely available via GitHub.

Text Processing in Python by David Mertz pdf

Text Processing in Python (2.3 2.x) by David Mertz -- David Mertz -- bookcover

Text Processing in Python (2.3 2.x) by David Mertz

✒️ By David Mertz



Ever wondered how Python can turn messy text into structured data? ‘Text Processing in Python’ by David Mertz is your answer. This book guides you through every step, from basic string handling to advanced parsing techniques. Whether you’re a total newbie or a seasoned coder, you’ll find clear explanations, practical code, and real-life examples. It’s a must-have if you work with data or just want to make sense of the endless flow of text online.


Some book contents

Preface / Introduction

  • What Is Text Processing?
  • The Philosophy of Text Processing
  • What You’ll Need to Use This Book
  • Conventions Used in This Book
  • A Word on Source Code Examples
  • External Resources
    • General Resources
    • Books
    • Software Directories
    • Specific Software

1. PYTHON BASICS

1.1 Techniques and Patterns

  • Utilizing Higher-Order Functions in Text Processing
  • Exercise: More on combinatorial functions
  • Specializing Python Datatypes
  • Base Classes for Datatypes
  • Exercise: Filling out the forms (or deciding not to)
  • Problem: Working with lines from a large file

1.2 Standard Modules

  • Working with the Python Interpreter
  • Working with the Local Filesystem
  • Running External Commands and Accessing OS Features
  • Special Data Values and Formats

1.3 Other Modules in the Standard Library

  • Serializing and Storing Python Objects
  • Platform-Specific Operations
  • Working with Multimedia Formats
  • Miscellaneous Other Modules

2. BASIC STRING OPERATIONS

2.1 Some Common Tasks

  • Problem: Quickly sorting lines on custom criteria
  • Problem: Reformatting paragraphs of text
  • Problem: Column statistics for delimited or flat-record files
  • Problem: Counting characters, words, lines, and paragraphs
  • Problem: Transmitting binary data as ASCII
  • Problem: Creating word or letter histograms
  • Problem: Reading a file backwards by record, line, or paragraph

2.2 Standard Modules

  • Basic String Transformations
  • Strings as Files, and Files as Strings
  • Converting Between Binary and ASCII
  • Cryptography
  • Compression
  • Unicode

2.3 Solving Problems

  • Exercise: Many ways to take out the garbage
  • Exercise: Making sure things are what they should be
  • Exercise: Finding needles in haystacks (full-text indexing)

3. REGULAR EXPRESSIONS

3.1 A Regular Expression Tutorial

  • Just What Is a Regular Expression, Anyway?
  • Matching Patterns in Text: The Basics
  • Matching Patterns in Text: Intermediate
  • Advanced Regular Expression Extensions

3.2 Some Common Tasks

  • Problem: Making a text block flush left
  • Problem: Summarizing command-line option documentation
  • Problem: Detecting duplicate words
  • Problem: Checking for server errors
  • Problem: Reading lines with continuation characters
  • Problem: Identifying URLs and email addresses in texts
  • Problem: Pretty-printing numbers

3.3 Standard Modules

  • Versions and Optimizations
  • Simple Pattern Matching
  • Regular Expression Modules

Book Description

Text Processing in Python by David Mertz is more than just another programming guide. It’s a hands-on journey through the world of text manipulation using Python. The book covers everything from simple string operations to complex parsing and regular expressions. If you’ve ever wanted to clean up messy data or automate boring tasks, this book gives you the tools and confidence to dive right in.

Book Overview

This book isn’t just for codersit’s for anyone who deals with text. The chapters start with the basics: strings, files, and encoding. As you move along, you’ll discover advanced topics like tokenization and parsing XML or HTML. The author’s style is approachable and never dry. He sprinkles in humor and real-world examples, making even tricky concepts feel doable. For those who love hands-on learning, every chapter has code snippets you can run and tweak right away.

If you’re curious about how Python handles massive datasets or want to peek into data-intensive text processing using MapReduce, this book is a great starting point before jumping into more complex tools.

Why Read This Book

Let’s face ittext data is everywhere. From tweets to logs, emails to web pages, it’s endless! This book helps you make sense of it all. I love how the author breaks down tough concepts with humor and clear language. You’ll never feel lost or overwhelmed. Plus, you’ll gain practical skills you can use at work or on personal projects. Want to impress your boss by automating that weekly report? Or build your own search engine? This book gives you the foundation.

Who This Book Is For

Are you new to Python? Or maybe you’re a seasoned developer tired of copy-pasting regex from Stack Overflow? Either way, “Text Processing in Python” has something for you. Beginners will appreciate the step-by-step approach and friendly tone. Experienced programmers will find clever techniques and best practices that make life easier. If you’ve ever struggled with data cleaning or extracting information from unstructured text, this is your guide.

What You Will Learn

  • How to handle strings and files like a pro
  • The secrets behind regular expressions (they’re not as scary as they look!)
  • Ways to parse HTML, XML, and other markup languages
  • Efficient techniques for searching, replacing, and transforming text
  • Practical tips for tokenizing words and sentences
  • How to automate tedious tasks involving large amounts of text
  • Real-world applications in web scraping and log analysis
  • An introduction to scaling up text processing tasks with distributed computing frameworks

Book Details


Length: 505 Pages

Language: English

PDF Size: 1.5 Mbs

Category: 

Report Broken Link

File Copyright Claim

Comments

Leave a Reply

Categories

Related Posts

Split List into Columns
PDF Viewer

Please wait while the PDF is loading...
📘 Download PDF Book