chemscripts生成cdxml文件时,cdx中似乎有开始docx文件的路径和文件名数据,如果有中文,会形成乱码。使用utf-8打开文件时,会导致读写错误,可以使用, errors=’ignore’参数或是使用latin-1编码读写。然后将错误部分替代后再存盘。
encoding = ‘utf-8’
with open(file_path, ‘r’, encoding=encoding, errors=’ignore’) as file:
content = file.read()
inchikey = os.path.splitext(cdxml_file)[0]
content_new = re.sub(r'(Name\s*=\s*[\'”])[^\'”]*([\'”])’, rf’\1{inchikey}\2′,content,count=1)
with open(cdxml_path_filename, ‘w’, encoding=’utf-8′) as file:
file.write(content_new)