Posts Tagged ‘files’

20091110 Too many open files

I am doing something that requires manipulating a lot of files, and I fell in the classical too many open files error trap.

A way of finding out which files are being used by a process is to type

ps -ax

in a terminal, then identify the guilty process and its PID. Let’s imagine its PID is 9090. Now to list every one of its open files you just run this (again, in the terminal):

lsof -p 9090

Or you can get a raw estimate by piping that through wc and getting the number of lines in the return value of lsof:

lsof -p 9090 | wc -l

That will return a number, like for example “33″.

It’s interesting to know that the output from lsof does not only show files in the windows way of referring to a file as a folder or data written in the disk, but returns files in the UNIX way, i.e., everything is a file, including pipes, sockets, files-files, etc.

I have changed my code meanwhile to be a bit more austere in regards to the number of open files, but this is an interesting tool nevertheless. If you run it for example with Firefox, you can even see which font files Firefox is using:

...
firefox 3847 sole  mem    REG                8,1   224692   272248 /usr/share/fonts/truetype/msttcorefonts/Arial_Bold_Italic.ttf
firefox 3847 sole  mem    REG                8,1   622020     6209 /usr/share/fonts/truetype/ttf-dejavu/DejaVuSans.ttf
...

Or which plug-ins has it loaded:

...
firefox 3847 sole  mem    REG                8,1   101536    13774 /usr/lib/mozilla/plugins/libtotem-cone-plugin.so
firefox 3847 sole  mem    REG                8,1   117960      741 /var/lib/flashplugin-installer/npwrapper.libflashplayer.so
...

etc etc :)

20080513 Split files into folders by letter

I had a lot of files in one folder. It is not very practical to browse the folder that way, so I decided to create a little script which would split the files into different folders, using the first letter of the file for naming the folder, as in a, b, c, d… but using Python this time!

import os
import shutil
import sys

if len(sys.argv) > 1:
        folder = sys.argv[1]
else:
        folder = '.'

for item in os.listdir(folder):

        full_path = os.path.join(folder, item)
       
        if os.path.isdir(full_path):
                continue
       
        dst_folder = os.path.join(folder, item[0].lower())
       
        if not os.path.exists(dst_folder):
                os.mkdir(dst_folder)
               
        shutil.move(full_path, os.path.join(dst_folder, item))

Download from svn (this should always be the most up to date version)

To use simply go to your favourite terminal window and type

python split_files_into_folders_by_letter.py /path/to/messy/folder

It will pick the first letter of each file, create a folder with the lowercased letter (if it doesn’t exist yet) and move the file to the folder. It won’t move folders! (because of the os.path.isdir check).

If there’s any pythonista in the audience with ideas for improving this, feel free to leave your suggestions in the comments, thanks! :-)

Note

There wasn’t any special reason for not doing it with Ruby; I simply wanted to know what the differences would be. And while the semantics aren’t very different (e.g. where it says os.path.join we would use File.join in Ruby), there is a huge difference between them and it’s called documentation. Python documentation is hosted in Python’s website, it is reviewed (and mostly written too) by Guido van Rossum (Python’s author) and it’s up to date and very easy to browse and read. Whereas Ruby docs were hosted elsewhere until very recently, are a pain to browse and pretty much a dump of automatically generated docs from source, which I find very uncomfortable to use. It still feels a little bit weird to use python’s indenting style but I’m really liking it. A pity there isn’t a Hpricot in python :-)