sort files before compressing

Was just looking for quick and easy way to sort the files, which should be compressed/archived using tar and bzip/gz/xz etc. As another guy already figured out before I did, it’s rather important in which order the files are stored in an archive. I don’t have a specific example handy, but he claims that the difference in size between a sorted and an unsorted tar.xz is about 20% (source).
Now of just sorting the files by path/name, which I think is not the most effective way, I sort it by the file’s suffix (eg. 01_music.mp3 -> ‘mp3′ will be used as the criteria) instead, which should be even more effective.

So next time you compress some huge amount of data, just use the lines below to save some space:

$ find /media/backup_xxx | awk -F '.' ' { print $NF"---"$0 } ' | sort | awk -F'---' ' { print $2}' >filelist-sorted.txt
$ tar -cv --no-recursion -T filelist-sorted.txt | xz -9 >backup_xxx.tar.xz

Cheers
Raphi

Posted in tech | Tagged , , , , , , | Leave a comment

Cheat at draw something using bash

Hi folks,
Couldn’t resist and had some fun with ‘draw something’ for Android. Now some people tend to draw very very bad.. (..including me..), which makes it almost impossible to guess their drawings. To fix this issue, I was looking out for an easy way to figure out what they’ve been drawing. A quick tcpdump-session and draw something revealed the path for the wordlist. So now that we have the official wordlist, we can append a grep and find the words they’ve been drawing. Find below an example. Just change egrep parameters to match your ‘situation’. So for example, I had the letters ‘bceeijlorwwy’ available, and the wordlength was 7.


$ curl -q http://static.iminlikewithyou.com/drawsomething/wordlist.csv 2>/dev/null | awk -F, ' { print $1 } ' | egrep '^[bceeijlorwwy]${7}'
bicycle
[...]

Cheers,
Raphi

Posted in tech | Tagged , , , , | Leave a comment

disable mod_security for a single IP

I’ve just been bugged by the mod_security plugin for Apache. I wanted to paste a kernelmessage where ‘/proc’ is mentioned. Sadly this request got silently dropped. couldn’t see any issue, nor 403 and the likes. A quick look at the error log indicated this:

[Tue Feb 14 10:31:58 2012] [error] [client XXX] mod_security: Access denied with code 403. Pattern match "/proc/" at POST_PAYLOAD [severity "EMERGENCY"] [hostname "pfuender.net"] [uri "/"]

So I decided to quickly disable the mod_security for the IP that I’m using. You can do so by putting the following line within your .htaccess file.

SetEnvIfNoCase Remote_Addr ^8\.8\.8\.8$ MODSEC_ENABLE=Off

Thanks to this post: askapache.com/htaccess..

Cheers,
Raphi

Posted in tech | Tagged , , | Leave a comment

Raid0 Recovery Scenario – Part 1

Situation

Well, just got 2 harddisks in for recovery. The PC manufacturer apparently decided to have 2 harddisks in a RAID0, but now since one of the harddisks failed, there was no way to simply get the data back. As you might know, RAID0 sucks! If you only have one drive working, you might be able to find some headers of a JPG for example, but when you try to view it, you’ll notice that only parts of the image are intact. This is because of the striping, which stores a block (usually sth like 64K, 128K) on disk0 and the next block is on disk1. So one disk is useless for sure!

fixing the harddrives

First of all I wanted to get an image (using ddrescue) of both disks. Disk0 went through smoothly, so I had the image within a blink of an eye.. Disk1 was a bit more of a hassle, it initially didn’t want to initialize properly (clickin’).. Since I’ve had two identical harddrives (in terms of firmware, model etc.) I’ve simply swapped the ECB among them, and voila, disk1 was also spinning up and spitting out the data! :-D nice! So let’s proceed with the 2 disk-images.

rebuilding the raid0

Now that I’ve got the disks imaged, I can start to first rebuild the RAID0 part. My plan was to get rid of the RAID0, which was 2x 500GB, and rebuild it to a single drive with 1TB.
First looking for something pre-built software to do this part, I started using ‘pyraid’, which is able to recover RAID0 and RAID5. Soon after, I’ve noted that the performance is not what I expected, so according to my calculations, it would have taken several days to ‘rebuild’ the drive.
Bash is my friend, it was up to me to quickly put a few lines together to get this job done.

#!/bin/bash
OUTPUT=/dev/sdc ## the target device
BLOCKSIZE=128k ## or stripesize, however you call it
echo -n >mass-dd.sh
for i in $(seq 0 3815553); do

    INPUT=/dev/loop0 ## raid0 disk1
    INPUTsector=$i
    OUTPUTsector=$(($INPUTsector*2))
    echo "sudo dd if=$INPUT bs=$BLOCKSIZE skip=$i seek=$OUTPUTsector count=1 of=$OUTPUT conv=notrunc 2>/dev/null" >>mass-dd.sh

    INPUT=/dev/loop1 ## raid0 disk2
    INPUTsector=$i
    OUTPUTsector=$(($INPUTsector*2+1))
    echo "sudo dd if=$INPUT bs=$BLOCKSIZE skip=$i seek=$OUTPUTsector count=1 of=$OUTPUT conv=notrunc 2>/dev/null" >>mass-dd.sh

done

This script is just a quick helper, which will write to a file called ‘mass-dd.sh’, which then contains all the dd calls to rebuild the raid. Afterwards use one of these lines to start rebuilding:

## start rebuilding
$ cat mass-dd.sh | sh
## resume rebuilding at a known point
$ grep -A 500000000 'skip=3751208' mass-dd.sh | sh

## start rebuilding with several threads, might give a performance improvement, can't say.. Needs GNU Parallel installed
$ cat mass-dd.sh | parallel -k -j 4 sh {}
$ grep -A 500000000 'skip=3751208' mass-dd.sh | parallel -k -j 4 sh {}

## only read from one input-file at a time, might increase the performance (can also be run using GNU Parallel)
$ grep /dev/loop0 mass.sh | sh
$ grep /dev/loop1 mass.sh | sh

So, that’s it for part one, haven’t finished part two yet. Next one will have the details about fixing the logical part with the partition-table and filesystem.

Cheers,
Raphi

Posted in data recovery | Tagged , , , , , , , , | 1 Comment

Show ‘HTTP://’ prefix in Firefox 7

..freshly upgraded to Firefox 7, but apparently the ‘HTTP://’ is gone in the addressbar. If you’re looking under ‘Options’ to get it activated again, you’re wrong. So quickly the steps to re-enable it again:

  • open firefox
  • browse to ‘about:config’
  • search for ‘browser.urlbar.trimURLs’
  • change the value to ‘false’

That’s it already!

Cheers,
Raphi

BTW, why would somebody want a ‘feature’ like that…? tzzzz… :-D

Posted in tech | Tagged , , , | Leave a comment