...making Linux just a little more fun!
Teal (teal at mailshack.com)
Sun 3 Sep 2006 13:58:28 PDT
What's eating up your hard-drive?
Most linux'ers familiar with the bash shell know that df
is
good for finding out just how much space is being taken up in a
partition. They may also know that du
lists each folder in
the current dir, and the size of all that folder's contents.
Those are neat commands, but not that informative. The latter inspired me to come up with a more helpful shell one-liner that points out clear as day the files which are sucking up your space. I keep it handy to clean out my tiny 40GB hard drive every now and then. I also shared it with someone who runs a 160GB personal server, and they were very thankful. So if it's useful for me, and useful for him, I can be moderately sure that it'll be useful for you, too. Here it is:
cd ~; du -Sa --block-size=MB | sed -r '/^0/d' | sort -nr | less
You may have to wait a minute for it to get the size of all the files (with my small HD, takes me about 20 seconds).
This is only to scan your home directory for big files. To scan your
root directory, change the ~
at the beginning to
/
... and while it's scanning, press Ctrl+C, and then 'q'
to quit. Or after it's done and the results are shown, just press 'q' to
leave the pager program and go back to your prompt.
[Neil] - That's an interesting variation on the usual approach. Most people use 'find' to pick out large files, which I find preferable, e.g.
find ~ -size +250k -lswill list every file under your home directory larger than 250kB. If you want it sorted
find ~ -size +250k -ls | sort -nr -k 7will do that.
As the saying goes "there's more than one way to do it" and your approach works just fine.
[Ben] - It may be that one solution is significantly faster than another
(although I rather doubt it); I'd certainly like to find out. I wish I
knew how to flush the page cache that 'find', etc. use to keep the
relevant info ('du' uses the same one); I'd have liked to compare the
speed of the two solutions, as well as perhaps 'ls -lR|sort -nrk5
'.
However, no matter what, Teal's is a good, useful approach to solving
(or at least reporting) a common problem. Heck, I just cleaned out a
bunch of thumbnails (187MB!) going back to... umm, given that I've been
just carrying my '~' structure forward all along, back to when I started
using Linux, probably.
ben at Fenrir:~$ time find ~ -size +250k -ls | sort -nr -k 7 > /dev/null real 0m45.453s user 0m0.120s sys 0m0.500sMaybe I'll remember to test one of the others when I next turn this laptop on.
[Rick] - Here's my own favourite solution to that problem:
:r /usr/local/bin/largest20 #!/usr/bin/perl -w # You can alternatively just do: # find . -xdev -type f -print0 | xargs -r0 ls -l | sort -rn +4 | head -20 use File::Find; @ARGV = $ENV{ PWD } unless @ARGV; find ( sub { $size{ $File::Find::name } = -s if -f; }, @ARGV ); @sorted = sort { $size{ $b } <=> $size{ $a } } keys %size; splice @sorted, 20 if @sorted > 20; printf "%10d %s\n", $size{$_}, $_ for @sorted
[Ben] - [smile] Why, thank you. Nice to see it making the rounds. Original credit to Randal Schwartz, of course, but I've mangled the thing quite a bit since then.
[Neil] - The advantages of the find solution are
- It is somewhat more portable, the options to du used in teal's solution aren't available on some old distros I can't escape from.
- It's easier to fine tune the file size threshold.
- When sorted, it sorts in exact file size (but not exact disk usage). The du based solution won't sort a set of 1.2MB, 1.8MB and 1.6MB files into order of size.
In terms of speed, there may be an advantage in not having to remove small files from the initial list, but I would expect that difference to be lost in the noise.
[Nate (Teal)] - Hrm... the 'du' tool can sort based on a smaller size, you'd just have to set the block-size to say, kb, or just stick with bytes like find does, and you can fine-tune the files the 'du' tool shows based on size with grep. But of course, neither of those are as intuitive or easy-to-use the find solution, so 'du' is still worse in that aspect.
I have to say, I'm pretty humbled. It'd probably be better to just include the 'find' solution, or Moen's perl-based solution in the Gazette than my 'du' cruft.
[Ben] - Heck no, Nate. The point of all those tools in Linux is well represented by the motto of Perl, "TMTOWTDI": There's More Than One Way To Do It. It was nice to see someone else applying some brainpower to solving a common problem in a useful way.
[Nate (Teal)] - Good stuff, there.
[Ben] - Yep. Yours included.
[Rick] - As Ben reminded me, he's one of the most recent people to polish up that Perl gem ('largest20'): I'm merely one of the many people passing around variations of it -- and grateful for their craftsmanship.
Peter Knaggs (peter.knaggs at gmail.com)
Thu Sep 7 19:02:15 PDT 2006
Old news to frequent ethereal users I guess, but back in July 2006 ethereal became "wireshark". It seems that the company Ethereal, Inc. is keeping the old name.
If you've been using the command line version tethereal, you're probably wondering what to call it now. Well tethereal has become "tshark".
Kapil Hari Paranjape (kapil at imsc.res.in)
Tue Sep 12 20:07:35 PDT 2006
Hello,
If you have ever wanted to do the Guardian sudoku and not wanted to waste trees then you need to find a way to annotate PDF files on your computer.
"flpsed" (FL toolkit PostScript EDitor) to the rescue.
Install "flpsed" and import any PDF file for annotation. The interface is simple and intuitive.
This can also be used to fill forms which are not quite in the PDF form format. More about that in the next tip.
It can also be used to annotate PS files of course.
Regards,
Kapil.
[Ben] - That's a great tool, Kapil. I've needed something like that for ages - many of the contracts that I get sent by my clients are in PDF, and up until now, I've been converting them to PS, editing them in Gimp, and reconverting them to PDF before shipping them back. This will save me tons of time - thanks! I hope others will find it at least as useful.
[Kapil] - Don't shoot (as in photograph) the messenger :)
I too am extremely grateful to the author (Mortan Brix Pedersen morten at wtf.de) of "flpsed".
Glad to have been of help.
Kapil Hari Paranjape (kapil at imsc.res.in)
Tue Sep 12 23:49:11 PDT 2006
Hello,
"Real" PDF forms are quite common nowadays. How does edit them with a "Real" editor like vi (OK also emacs :))?
"pdftk" (PDF ToolKit) to the rescue.
Suppose that "form.pdf" is your PDF form.
1. Extract the form information:
pdftk form.pdf generate_fdf output form.fdf
2. This only gets the text fields to get an idea of all the fields do:
pdftk form.pdf dump_data_fields output form.fields
3. Sometimes the field names are cryptic. It helps to also view the form:
xpdf form.pdf
or
pdftotext -layout form.pdf; less form.txt
(if you insist on text-mode)
4. You can now edit the file form.fdf and fill in the fields marked with the string '\n%%EOF\n'.
Once you have edited form.fdf you can generate the filled in form with:
pdftk form.fdf fill_form form.fdf output filled.pdf
or
pdftk form.fdf fill_form form.fdf output filled.pdf flatten
to get a non-editable pdf.
Some additional hints:
1. If your form.fdf file contains no '\n%%EOF\n' strings then you are out of luck---it means your PDF form is only a printable form and cannot be filled on the computer (but see the hint about "flpsed").
2. Checkboxes/buttons will not appear in the fdf file. You can use form.fields to find out what these fields are called and introduce entries in the fdf file as (here replace FN by the field name)
<</V (Yes) /T (FN) >>
or
<</V (Off) /T (FN) >>
3. It helps to have three windows open. One for editing, one for viewing the form.fields and one for viewing the filled pdf file.
4. You may also want to periodically update the filling of the form to see whether the filling works.
Remarks:
Clearly this is crying for someone to write a nice interface---why don't I you ask? I will ... but don't hold your breath.
You can skip all of this and use Adobe's Distiller, but most readers should be able to guess why I don't want to use that!
Benjamin A. Okopnik (ben at linuxgazette.net)
Wed 27 Sep 2006 11:24:37 PDT
Much of the available CD-ripping software out there produces files with names like 'trackname_01.wav' or '01_track.wav' instead of actual song names. Yes, there's software available that will look up CDDB entries... but what if your CD isn't in the CDDB, or you don't have a net connection readily available?
'wavren' to the rescue. :)
This script, when executed in a directory containing the 'standard' track names, takes the name of a file that contains the names of the songs on that album and returns a paired list of the current track name and the line in the file that it will be renamed to. It will exit with an error message if the lists aren't the same length, and it will not actually rename anything until you specify a '-rename' argument. Example:
ben@Fenrir:/tmp/foo$ ls 01.wav 02.wav 03.wav 04.wav 05.wav 06.wav 07.wav 08.wav 09.wav 10.wav names ben@Fenrir:/tmp/foo$ cat names 01. Hells Bells 02. Shoot To Thrill 03. What Do You Do For Money Honey 04. Given The Dog A Bone 05. Let Me Put My Love Into You 06. Back In Black 07. You Shook Me All Night Long 08. Have A Drink On Me 09. Shake A Leg 10. Rock And Roll Ain't Noise Pollution ben@Fenrir:/tmp/foo$ wavren names "01.wav" will be "01. Hells Bells.wav" "02.wav" will be "02. Shoot To Thrill.wav" "03.wav" will be "03. What Do You Do For Money Honey.wav" "04.wav" will be "04. Given The Dog A Bone.wav" "05.wav" will be "05. Let Me Put My Love Into You.wav" "06.wav" will be "06. Back In Black.wav" "07.wav" will be "07. You Shook Me All Night Long.wav" "08.wav" will be "08. Have A Drink On Me.wav" "09.wav" will be "09. Shake A Leg.wav" "10.wav" will be "10. Rock And Roll Ain't Noise Pollution.wav"
If the lineup isn't exactly how you want it, you can either renumber the original files, or change the order of the lines in the "names" file. Also note that you can rename mp3 files, etc., just by changing the 'ext' variable at the top of the script to reflect the extension that you're looking for.
Talkback: Discuss this article with The Answer Gang
Kat likes to tell people she's one of the youngest people to have learned to program using punchcards on a mainframe (back in '83); but the truth is that since then, despite many hours in front of various computer screens, she's a computer user rather than a computer programmer.
When away from the keyboard, her hands have been found full of knitting needles, various pens, henna, red-hot welding tools, upholsterer's shears, and a pneumatic scaler.