In embedded vision systems, the efficiency of pre-processing architectures have a ripple effect on post-processing functions such as feature extraction, classification and recognition. In this work, we investigated a pre-processing architecture for smart camera system, integrating a thermal and vision sensors, by considering the constraints of post-processing. By utilizing the locality feature of the system, we performed pre-processing on the camera node by using FPGA and post-processing on the client device by using the microprocessor platform, NVIDIA Tegra. The study shows that for outdoor people surveillance applications with complex background and varying lighting conditions, the pre-processing architecture, which transmits thermal binary Region-of-Interest (ROI) images, offers better classification accuracy and smaller complexity as compared to alternative approaches.