Directly applying CLIP to point cloud action recognition can cause severe accuracy collapse. In this paper, we propose PointActionCLIP, which successfully prevents this transfer degradation with a triplepath CLIP, including the image path, the sequence path, and the label path. Specifically, the image path projects the 3D point cloud sequence onto a 2D image sequence and uses a visual encoder to extract its feature. It also captures the temporal feature of the image sequence with a temporal encoding transformer. The sequence path adopts a pretrained sequence encoder to encode the original point cloud sequence to obtain its spatiotemporal feature. The label path encodes the candidate labels with a text encoder. Finally, we fuse the output of the three paths to obtain the predicted action label. Extensive experiments validate that PointActionCLIP outperforms state-of-the-art (SOTA) methods.