• Subscribe

    Subscribe to This Week In Panospace by eMail.
    Subscribe in a reader
  • License

    Creative Commons License
    This work is © 2008-2012
    by Yuval Levy
    and licensed under a
    Creative Commons License.
  • Entries

    February 2008
    M T W T F S S
     123
    45678910
    11121314151617
    18192021222324
    2526272829  
  • Archives

Parallax

Two adjacent images, like many of those that Hugin’s users may want to stitch to a bigger picture. Unfortunately, these two won’t stitch well.

Bad Parallax

Looking at the detail below the relative position of the finger and the library’s vertical support beam has shifted when moving the camera.

parallax detail

This shift is called parallax and it is the most common problem for stitching pictures.

wrong NPP turn

Feel it yourself: Stretch an arm in front of you, like the above picture. Raise a finger and note its position relative to an object in the background. Turn your head left and right and notice how the position of the finger relative to the background changes. Two images taken like that won’t stitch.

Now try a variation on the above experiment: Don’t move your head. Close one eye and move the other eye left to right. Unless wearing glasses or contacts, you will notice that the relative position has not shifted. The nodal point of the rotation is better aligned with the no parallax point (NPP) of your eye.

right npp turn

Your eye focuses all rays of light into one point before expanding them on the retina. Most lenses do the same before expanding the image on the sensor. The NPP is this one theoretical point inside the lens where all the rays of light focus.

Turning a camera around the lens’ NPP is the single most important precaution to achieve a perfect stitch.

To do this, we need to identify the NPP of the camera and constrain its rotation to that point. How to identify the NPP will be topic of a future tutorial.

Network Storage for Image Processing

I’ve been processing images with Hugin and other tools over the network since I upgraded to a Giga Ethernet Switch. Offloading file handling to a server has numerous benefits. Most important: a quieter office. A member of the French panorama community asked me for details about system performance. I share them here, hoping that others might find the information helpful too.

Unless you are using VIPS or GEGL, storage for image processing means large file transfer from memory and back, often to a scratch disk or to swap memory.

Before an image can be processed, the data must be transferred from media to memory. It passes a few interfaces along the road, and the slowest interface is the bottleneck that determines overall performance. For a file on a hard disk, this means:

  • media transfer, when the data is copied from the platter to the disk’s electronics and back
  • data transfer between the the disk’s electronics and the controller
  • bus transfer from the controller to the chipset
  • memory transfer between chipset, memory and processor

If we add a network layer there is also the network transfer between the two computers to consider.

Two performance measurement are applicable:

  • the latency of the interface, or how quick it reacts to commands
  • it’s transfer rate, or how much data it can transmit over a given period of time

Transfer rate

Data transfert for modern hard disks is either Ultra 320 SCSI at 320 MB/s or SATA-II at 300 MB/s. These are purely theoretical maximums. In reality, the much slower media transfer influences the speed at which data moves to and from the disk. Some manufacturer data about maximum sustained transfer rates (the higher the better):

On most common configurations, 80 MB/s is the bottleneck on the way to the controller, often a RAID controller.

Depending on the RAID configuration, transfer scale. RAID0 (good for performance, not for backup) hits transfer rates of 150MB/s with two disks, or 290MB/s with five disks.

Next interface is the bus, PCI, PCI-X or PCI-Express (PCIe). Legacy PCI does 127MB/s – still enough but could become a bottleneck for a high performance RAID. No more bottlenecks with PCI-X (PCI compatible) or PCIe (not PCI compatible) – up to 4GB/s. PCIe is available on most motherboards today. It has replaced AGP as the bus for the notoriously bandwidth eating video cards.

In the chipset transfer rates are measured in GB/s. Sometimes the data takes a shortcut when the chipset addresses directly the controller, as shown on the schema on page 3 of this chipset data sheet.

At this point we have the choice: process the image here, sharing CPU and RAM with the transfer process; or forward the data over the network to a workstation whose resources will be fully dedicated to the image processing.

Gigabit ethernet is the current standard. At 125MB/s hardly a bottleneck for a common configurations. And 10-Gigabit ethernet is not yet at SOHO prices, but already available commercially too. 1.25GB/s.

Access Time

Again some manufacturer data about access time (the lower the better):

The latency inside the computer is negligible (measured in ns), and the extra latency introduced by the network is about 0.2ms

Conclusion

In most common cases, the network is not a bottleneck. The Sustained Transfer Rate (STR) of hard disks still is, despite recent improvements in data density through perpendicular writing. By the time current network technology becomes the bottleneck, next generation 10-Giga Ethernet will be an affordable upgrade.

I have not felt any significant performance difference since stitching over the network. But there is less noise in my office Solid state drives as an alternative to hard disks are strong on AC but not yet on STR. Using one as the system drive of the image processing workstation will reduce ambient noise and make working more comfortable.

snapshot 26-feb-2007

Phil Harvey released exiftool 7.19 to correct an incompatibility with some TIFF-reading applications. This was enough reason to release new snapshot installers, even if the pace has slowed down a little bit. Harry is on holiday this week. OSX users will have to be patient. Pablo has some bug fixes up his sleeve, but they are not yet ready for a commit. I expect a new snapshot soon.

Ah, good news: Google Summer of Code 2008 is on. The code from three of our last year’s projects is now integrated in Hugin, and the fourth one, FreePV, has a life of his own.

We’re looking for an admin to run our 2008 Google Summer of Code participation.

RAID

All hard disks will fail. Backup strategies have been devised to avoid data loss with the inevitable hard disk failure. This article is not a comprehensive backup strategy. It explains the concept of RAID and how it can fit in a backup strategy.

R stands for redundancy. The only backup strategy that works. Copy. Rinse. Repeat.

R stands for replication. In the days of analog recording, an expensive process resulting in degraded quality. Digital copies are cheap and lossless.

RAID is a redundant array of individual disks. It automates the replication process. A safe haven for data when used properly. Beware of a false sense of safety.

Levels

There are a few different types (levels) of RAID. The most popular are:

  • RAID 1, known as mirror, keeps two disks in sync.
  • RAID 5 distributes data across many disks and uses one for a check sum.
  • RAID 0 is not a real RAID: there is no redundancy

Storage Capacity and Speed

50% of the total storage capacity is usable with RAID 1, because data is kept in two copies. With four disks, RAID 5 achieves better efficiency (75%). RAID generally increases speed by distributing load. RAID 5 and 0 are faster than RAID 1.

Modern RAID controllers come mostly with 4, 8 or 16 channels. This is the number of drives that can be attached. Some cards can be synchronized within a twin card for double the channels. Get the card with more channels if you can afford it.

Failure Resilience

The main reason to adopt RAID is to prevent the consequences of hardware failure. Both RAID 1 and RAID 5 are resilient to the failure of a single disk. If more than one disk fails chances are that the data is gone. RAID 0 is not failure resistant. The probability of two disks failing at the same time is close to zero. Buying two disks from two separate production batches helps decreasing that probability further.

The advantage of RAID 1 is that most of them are compatible with standard controllers. In case of controller failure, connect a disk to a standard controller for immediate data access. Back up!

If the RAID 5 controller fails, the data is no longer accessible until a controller with same algorithm is installed. Finding a replacement can be difficult.

Hardware RAID vs. software RAID

RAID is fully automatic and transparent to the user. It requires some processing during both read and write. Purely hardware driven RAID controllers tend to be more expensive because they have their own processor and RAM.

Cheaper controllers are equally effective in terms of storage safety, but use the CPU and RAM resource of the host computer. This can be a potential bottleneck if the RAID is hosted on the workstation.

The controllers linked above are more than adequate for individual or small team use. I’ve installed them in many locations. More expensive and advanced RAID cards are available, amongst other from Adaptec and 3ware. I use an Adaptec SCSI RAID controller for a mission critical database server.

In the past few years many motherboards started offering on board RAID. I don’t think much of them. First and foremost because a motherboard failure may also make the whole RAID useless. Then also because those things tend to be reduced versions of the real things, like on board video and sound.

Where to host the RAID ?

The RAID can be hosted inside a workstation or on a network server. With today’s giga ethernet, the speed penalty of accessing files over the network instead of directly is negligible for most applications. The dedicated server solution makes the impact of a software RAID less critical. Most important: good ventilation of the disks to increase their longevity.

Monitoring RAID

RAID is easy to set up and gives the impression that it is fire and forget. It is not. The management utility that come with professional RAID cards helps monitoring the health of the arrays. The RocketRAID family has a useful web based management tool. It can set up regular tasks (such as weekly overnight verification) and email alerts. Then it is really fire and forget.

Comprehensive Backup Strategy

RAID is online storage. It immunizes data against the failure of a single hard drive. It does not immunize data against system malfunctions: if the system overwrites it, it’s gone. Offline Backup is the solution for that. It also does not immunize data against location-failures, such as physical theft or fire. Off-Site backup is the solution for that. I will discuss comprehensive backup strategies in a future article.

Space, the final frontier

Development pace has slowed down. This is inevitable for 100% volunteers driven projects when people don’t have unlimited free time. In the meantime, the interest is growing and some people are looking at their hardware, either to improve Hugin’s performance in Windows or to dual boot Linux-Windows and expand their toolbox, and since Hugin enables the creation of ever larger images, space is the final frontier.

Assuming that CPU, GPU, Motherboard and RAM are held constant, the choice of hard disks layout and controllers can influence both system performance and price tag.

At the time of writing, the sweet spot hard disk size (best price/GB) is 500GB. The only reason to buy other capacities is performance. For a high density storage facility performance is capacity or energy consumption; for a workstation performance is speed.

The two key speed metrics are access time (AT) and sustained transfer rate (STR).

AT is the time it takes to position the head on the right location of the disk. It is measured in ms, with modern 7.2K RPM drives averaging less than 10ms. The noisy and expensive 15K RPM SCSI drives achieve averages of 3ms. The 10K RPM SATA raptors achieve 5ms. Fast AT benefits random, small, frequent, concurrent read/write operations.

STR  is the bandwidth provided once the data transfer has started. Usually one factor is the bottleneck: information density. The new drives with perpendicular recording (500GB, 750GB, 1000GB) have the highest density. A large cache (the new drives have 32MB), as well as defragmented, healthy data, help as well. High STR benefits the transfer of large chunks of data at once.

Whether AT or STR is the relevant bottleneck is determined by the application. Databases and, to a lesser extent, workstation system drives, benefit from AT. Photoshop scratch is an application that benefits from STR rather than AT, and the same is true of many other imaging applications, such as Hugin.

For most imaging uses, 7.2K RPM drives are more than adequate. To further improve system performance, use the latest generation 7.2K RPM drives. To even further improve performance, spread the load across multiple hard disks (not partitions):

  • One disk for the system and application
  • One disk for the data
  • One scratch disk

For further performance increases, use RAID. See the next article for a discussion of RAID.

New Snapshot and Google Rumors

A bunch of bug fixes. SVN2904 is available in the downloads section.

Windows users can now try the latest, improved match-n-shift – CP detection for fisheye images has never been better!

(Ubuntu) Linux users have easily access to match-n-shift, as well as to the experimental matchpoint code.

For OSX, Harry built SVN2897 yesterday.

Last but not least: there is a wind of love coming from Google’s Open Source Program Office. It starts to feel like summer. We’ll have a release before the spring ;-)

installer snapshot 18.Feb.2008

New with the Windows Hugin snapshot installer: Bruno’s match-n-shift – however this is still the slow version. Bruno just modified it to use the faster PTmender instead of ImageMagick in combination with nona. He may produce a windows executable from the perl code soon. I wish I could do that, but I stills have not found my way through the cryptic world of Perl in Windows. It runs perfectly in Ubuntu Linux.

Harry has updated his package for OSX today too.

All downloads, as usual, from the downloads page. Give it a try and report bugs back. There should not be so many left.

morning build

Pablo fixed some bugs this morning. A new snapshot installer is in the download section. A few small changes to the installer:

  • autopano-sift-c is now the default (instead of autopano-c-complete perl, while waiting for match-n-shift)
  • autopano-c-complete.vbs is no longer part of the installer
  • setting the application path is no longer default

installer snapshot 17.Feb.2008

Pablo has been working hard on the weekend again. Find the latest installer in the download section.

Bugfixes only. I wanted to add MatchPoint but failed to compile it in Windows.

The bug tracker is down to 13 bugs in the 0.7.0 category (for this release) and a total of 70 open bug reports, many of them outdated.

Zoran’s return

MatchPoint, as it is now officially called, was a Google Summer of Code 2007 project. Zoran Mesec, mentored by Dr. Herbert Bay, worked on a feature detector to replace the patented SIFT algorithm. He succeeded, but at the end of the summer his code had still performance issues. He kept improving it on his little spare time from university. He committed some code just before his exams, and now he committed again a version that he asked the community to  test.

Building it in Linux was easy, and it passed the test as drop in replacement for generatekeys in autopano quite nicely, and it is actually slightly faster than generatekeys.