In this article, we consider practical approaches to Costa precoding (also known as dirty paper coding). Specifically, we propose a symbol-by-symbol scheme for cancellation of interference known at the transmitter in a relay-aided downlink channel. For finite-alphabet signaling and interference, we derive the optimal (in terms of maximum mutual information) modulator under a given power constraint. A sub-optimal modulator is also proposed by formulating an optimization problem that maximizes the minimum distance of the signal constellation, and this non-convex optimization problem is approximately solved by semi-definite relaxation. For the case of binary signaling with binary interference, we obtain a closed-form solution for the sub-optimal modulator, which only suffers little performance degradation compared to the optimal modulator in the region of interest. For more general signal constellations and more general interference distributions, we propose an optimized Tomlinson-Harashima precoder (THP), which uniformly outperforms conventional THP with heuristic parameters. Bit-level simulation shows that the optimal and sub-optimal modulators can achieve significant gains over the THP benchmark as well as over non-Costa reference schemes, especially when the power of the interference is larger than the power of the noise.