Your Web News in One Place

Help Webnuz

Referal links:

Sign up for GreenGeeks web hosting
July 10, 2019 05:01 pm GMT

Fixing Python Markdown Code Blocks with Python!

A quick note: all of the raw markdown code blocks in the code snippets here are shown with single quotes rather than actual backticks because all the nested backticks plays havoc with Dev.to's markdown renderer. If you're a copy paster, you'll have to change those back to backticks in the python code.

Im trying out Ulysses for writing, and so far Im really liking it. But yesterday, I went to export one of my posts to markdown to get ready to post on my blog, and I found a slight hiccup. Ulysses formats all code blocks with tabs instead of spaces. Now, theres no way that Im going to get into that whole thing, but I will at least say that PEP8 tells us that the standard for Python is 4 spaces. What kind of self-respecting technical blogger would I be if I posted code samples withshuddertabs in my Python code?

So I asked them!

And they responded really quickly and politely told me not at this time. Which is fine.

I immediately began researching to figure out how I was going to solve the problem myself. Could I look into customizing Markdown formats? Tweaking the export process? But before I could start spiraling down that rabbit hole, it hit me: its just text!

Im a programmer! I have the power of the universe at my fingertips! (On my machine, at least.) So I got to work on a script for a platform that was designed to work with, modify, and tweak streams of text: the command line.

Bash One-Liner

My first thought was that its a real simple substitution. I can do this in a single command!

$ sed $'s/\t/    /g' example.mdHello this is text'''pythondef thing():    print("Spaces!")'''There should be four spaces there.

And honestly, thats probably good enough. But I was on a roll, and I wanted a little more fine-grained control.

Lets do it in Python

Ill show you the whole thing for those that are just here for the copy/paste, and then Ill step through the important bits and how they work. Essentially, we loop through each line of the input stream, and if were in a Python code block and theres a tab, we replace the tabs with spaces. I pull in argparse and fileinput from the standard library to put a little polish on the user experience. Heres what I came up with:

#!/usr/bin/env python3import argparseimport fileinputparser = argparse.ArgumentParser(description="Convert tabs in markdown files to spaces.")parser.add_argument("-a", "--all", action="store_true", help="Convert all tabs to spaces")parser.add_argument("-n", "--number", type=int, default=4, help="Number of spaces to use.")parser.add_argument('files', metavar='FILE', nargs='*', help="files to read, if empty, stdin is used")args = parser.parse_args()if args.all:    start_tag = "'''"else:    start_tag = "'''python"in_code_block = Falsefor line in fileinput.input(files=args.files):    if line.startswith(start_tag) and not in_code_block:        in_code_block = True    elif line.startswith("'''") and in_code_block:        in_code_block = False    elif in_code_block:        line = line.expandtabs(args.number)    print(line, end="")

The meat of the business logic is here:

in_code_block = Falsefor line in fileinput.input(files=args.files):    if line.startswith(start_tag) and not in_code_block:        in_code_block = True    elif line.startswith("'''") and in_code_block:        in_code_block = False    elif in_code_block:        line = line.expandtabs(args.number)    print(line, end="")

Loop through each line, keeping track of if were in a code block (more on the specifics of that in a minute) or not. If were in a code block, expand the tabs! Finally, output the new version of the line.

But whats all that other stuff? Even in the main code, theres a reference to fileinput. What the heck?

Using fileinput

fileinput is a neat (and frankly underrated) module in Pythons standard library that allows scripts to load input from one or many file arguments and even STDIN super ergonomically. The most common use case is in the docs for it and its almost comically short:

import fileinputfor line in fileinput.input():    process(line)

With these lines of code, you can call your script with as many filenames as you want, and Python will string their contents together into one stream of text. For example, if you have a script that prints the capitalized version of all the text it receives called capitalize.py, you could run it like this:

$ python3 capitalize.py README.md hello.txt banana.rb# THIS IS THE TITLE OF MY READMECHECK OUT THE README CONTENTS.SO MANY CONTENTS.NOW HELLO.TXT IS HEREYOOOOOOOODEF BANANA    PUTS 'A MAN, A PLAN, CANAL BANANAMA'END

But Ryan, in your script it looks different! Youre not using it the same way! Thats right. Im combining it with another CLI power module:

Parsing Arguments with argparse

argparse is the standard library way of handling command line arguments, flags, options, and providing a little bit of a user interface. Its particular syntax is one that I always have to look up, but its lightweight, works well, and does what I want. Heres that relevant code. Youll see how it starts to tie into the fileinput section above as well.

parser = argparse.ArgumentParser(description="Convert tabs in markdown files to spaces.")parser.add_argument("-a", "--all", action="store_true", help="Convert all tabs to spaces")parser.add_argument("-n", "--number", type=int, default=4, help="Number of spaces to use.")parser.add_argument('files', metavar='FILE', nargs='*', help="files to read, if empty, stdin is used")args = parser.parse_args()if args.all:    start_tag = "'''"else:    start_tag = "'''python"

We go in three stages:

  1. First we create the ArgumentParser. It will managing all of our parsing for us.
  2. Then we add arguments and specify their behavior. In this case, I added an --all flag to make it so we could eradicate all tabs and restore order to all of our code blocks, and a --number flag to tell it how many spaces to make each tab. This might be useful if Ive got Ruby or JS examples where I prefer 2 spaces. Lastly, I add a *args-flavor positional argument for all of the filenames the user wants to provide.
  3. Finally, now that everything is specified, we parse and process the args. Depending on the type and action we specify for each input, we can expect different behaviors.

The last little trick is how we tie argparse and fileinput together with this little line:

for line in fileinput.input(files=args.files):

fileinput.input takes an optional list of filenames, rather than trying to get them from the passed in script arguments. Because argparse gobbles up all the command line arguments, we need to tell it to pass those filenames through so fileinput can do its thing. And it all works like a charm!

TABS ARE FOR PEOPLE WHO MIX THEIR CORN AND POTATOES TOGETHER

No, I dont have strong opinions on trivial things, why do you ask? In any case, until that feature request to Ulysses makes its way into their queue, Ive got my little script, and it makes me happy! How well does it work, you ask? Well, do you see tabs or spaces in the code examples in this post?

P.S. I know you command line one-liners are lurking out there just dying to throw an awk or sed string at me that will do exactly what I want without me having to write a single line of extremely readable, maintainable, non-regex Python. I want to see it! Let 'em fly!


Original Link: https://dev.to/rpalo/fixing-python-markdown-code-blocks-with-python-2mng

Share this article:    Share on Facebook
View Full Article

Dev To

An online community for sharing and discovering great ideas, having debates, and making friends

More About this Source Visit Dev To