Skip to content

02337: Catenyms

Eulerian Path, http://cs101.openjudge.cn/practice/02337/

A catenym is a pair of words separated by a period such that the last letter of the first word is the same as the last letter of the second. For example, the following are catenyms:

dog.gopher
gopher.rat
rat.tiger
aloha.aloha
arachnid.dog

A compound catenym is a sequence of three or more words separated by periods such that each adjacent pair of words forms a catenym. For example,

aloha.aloha.arachnid.dog.gopher.rat.tiger

Given a dictionary of lower case words, you are to find a compound catenym that contains each of the words exactly once.

输入

The first line of standard input contains t, the number of test cases. Each test case begins with 3 <= n <= 1000 - the number of words in the dictionary. n distinct dictionary words follow; each word is a string of between 1 and 20 lowercase letters on a line by itself.

输出

For each test case, output a line giving the lexicographically least compound catenym that contains each dictionary word exactly once. Output "***" if there is no solution.

样例输入

2
6
aloha
arachnid
dog
gopher
rat
tiger
3
oak
maple
elm

样例输出

aloha.arachnid.dog.gopher.rat.tiger
***

来源

Waterloo local 2003.01.25

这是一道经典的图论问题,可以抽象为在有向图中寻找欧拉通路(Eulerian Path)。

算法分析

  1. 建图

    • 将 26 个字母看作图的节点。
    • 每个单词看作从其“首字母”到“尾字母”的一条有向边。
    • 因为题目要求找到字典序最小的解,我们首先将所有单词按字典序进行升序排序
  2. 判断欧拉通路的存在性: 一个有向图存在欧拉通路的条件是:

    • 连通性:忽略边方向后,所有具有边连接的顶点必须在同一个连通分量中。
    • 度数条件
      • 要么所有节点的入度等于出度(此时存在欧拉回路,起点可以是任意有边节点的最小字母)。
      • 要么恰好有一个节点满足 出度 - 入度 = 1(起点),且恰好有一个节点满足 入度 - 出度 = 1(终点),其余节点入度等于出度。
  3. 寻找字典序最小的通路

    • 为了得到字典序最小的序列,我们使用 Hierholzer 算法(或类似的深度优先搜索 DFS)。
    • 在 DFS 过程中,当一个节点有多个出边时,为了保证最终结果字典序最小,我们需要通过一种巧妙的策略:在 DFS 递归时,按单词字典序从小到大选择边,但为了配合 pop() 操作和后序遍历,我们需要将出边表按字典序升序排列,并从后往前选择(即先选择字典序大的单词进行递归,这样字典序小的单词会最后被处理并最先在反转后的结果中出现)
    • 更简单的做法:将每个节点的出边(单词)按字典序降序排列,然后每次 pop() 出最后一个单词(即当前最小的单词)进行 DFS,最后将结果列表反转。
  4. 最后验证

    • 如果 DFS 遍历到的边数等于总单词数 n,说明找到了覆盖所有单词的路径;否则,说明图不连通,输出 ***

Python 代码实现

python
import sys

# 增加递归深度以处理 N=1000 的情况
sys.setrecursionlimit(10000)

def solve():
    # 使用 fast I/O 读取所有输入
    input_data = sys.stdin.read().split()
    if not input_data:
        return
    
    it = iter(input_data)
    try:
        t_cases = int(next(it))
    except StopIteration:
        return
    
    for _ in range(t_cases):
        try:
            n = int(next(it))
        except StopIteration:
            break
        
        words = []
        for _ in range(n):
            words.append(next(it))
        
        # 1. 字典序排序
        # 我们希望在 DFS 中先走字典序小的边。
        # 配合 pop(),我们将单词按降序排列,这样 pop() 拿到的就是最小的单词。
        words.sort(reverse=True)
        
        adj = [[] for _ in range(26)]
        in_deg = [0] * 26
        out_deg = [0] * 26
        chars_present = [False] * 26
        
        for w in words:
            u = ord(w[0]) - ord('a')
            v = ord(w[-1]) - ord('a')
            adj[u].append(w)
            out_deg[u] += 1
            in_deg[v] += 1
            chars_present[u] = chars_present[v] = True
            
        # 2. 查找起点并检查度数条件
        start_node = -1
        out_minus_in_1 = 0
        in_minus_out_1 = 0
        possible = True
        
        for i in range(26):
            diff = out_deg[i] - in_deg[i]
            if diff == 1:
                out_minus_in_1 += 1
                start_node = i
            elif diff == -1:
                in_minus_out_1 += 1
            elif diff == 0:
                continue
            else:
                possible = False
                break
        
        # 欧拉通路判别
        if not ((out_minus_in_1 == 0 and in_minus_out_1 == 0) or 
                (out_minus_in_1 == 1 and in_minus_out_1 == 1)):
            possible = False

        if not possible:
            print("***")
            continue

        # 如果是欧拉回路,从最小的具有出度的字符开始
        if start_node == -1:
            for i in range(26):
                if out_deg[i] > 0:
                    start_node = i
                    break
        
        # 3. Hierholzer 算法寻找路径
        res_path = []
        
        def dfs(u):
            curr_adj = adj[u]
            while curr_adj:
                # 弹出当前节点最小的单词(因为之前是 reverse 排序)
                w = curr_adj.pop()
                v = ord(w[-1]) - ord('a')
                dfs(v)
                # 后序加入路径
                res_path.append(w)
        
        if start_node != -1:
            dfs(start_node)
        
        # 4. 连通性检查及输出
        if len(res_path) != n:
            print("***")
        else:
            # 路径是后序添加的,需要反转
            print(".".join(reversed(res_path)))

if __name__ == "__main__":
    solve()

关键点总结

  • 排序与 DFS 的结合:在 Hierholzer 算法中,想要得到字典序最小的路径,必须在每个节点选择出边时,总是优先选择字典序最小的那条。由于代码使用了 res_path.append(w) 后序添加并最终反转的逻辑,在 while 循环中先选择字典序小的单词进入递归是正确的。
  • 连通性:本题中,除了度数判断外,最简单判断连通性的方法就是看最后得到的路径长度是否等于单词总数 n
  • 性能:使用 sys.stdin.read().split() 能够极大提高 Python 处理大量单词时的速度。排序复杂度为 O(NlogN),DFS 复杂度为 O(N),整体效率很高。

问题建模

  1. 图的构造
    • 顶点:26 个字母 'a'–'z'
    • 有向边:每个单词 w 视为一条从 w[0] 指向 w[-1] 的边,边上存储整个单词。
  2. 欧拉路径的必要条件
    • 最多一个顶点满足 出度 = 入度 + 1(起点)。
    • 最多一个顶点满足 入度 = 出度 + 1(终点)。
    • 其余顶点都满足 入度 = 出度
    • 涉及到的所有顶点在“无向”意义上必须连通。
  3. 字典序最小
    • 对每个出发字母,用一个 最小堆heapq)按完整单词排序,取边时总是弹出堆顶,保证当前能选的最小单词优先使用。
  4. Hierholzer 算法
    • 从合法的起点(如果不存在“出度 = 入度 + 1”则从最小字母出发)出发,DFS 弹栈构造路径,最后逆序输出。

代码实现

python
import sys
import heapq
from collections import defaultdict, deque


def find_eulerian_path(words):
    indeg = defaultdict(int)
    outdeg = defaultdict(int)
    adj = defaultdict(list)
    used_letters = set()

    # 1. 构造图:入度、出度,并在 adj[u] 中维护 (word, v) 的最小堆
    for w in words:
        u, v = w[0], w[-1]
        outdeg[u] += 1
        indeg[v] += 1
        used_letters |= {u, v}
        heapq.heappush(adj[u], (w, v))

    # 2. 检查度数条件,找可能的起点
    start, plus1, minus1 = None, 0, 0
    for ch in used_letters:
        o, i = outdeg[ch], indeg[ch]
        if o == i + 1:
            plus1 += 1
            start = ch
        elif i == o + 1:
            minus1 += 1
        elif i != o:
            return None
    if not ((plus1 == 1 and minus1 == 1) or (plus1 == 0 and minus1 == 0)):
        return None

    # 3. 如果没有唯一起点,就从最小的有出度的字母开始
    if start is None:
        start = min(ch for ch in used_letters if outdeg[ch] > 0)

    # 4. 连通性检查(无向图)
    seen = {start}
    q = deque([start])
    undirected = defaultdict(list)
    for u in adj:
        for _, v in adj[u]:
            undirected[u].append(v)
            undirected[v].append(u)
    while q:
        u = q.popleft()
        for v in undirected[u]:
            if v not in seen:
                seen.add(v)
                q.append(v)
    if seen != used_letters:
        return None

    # 5. Hierholzer:DFS 弹栈
    path = deque()

    def dfs(u):
        heap = adj[u]
        while heap:
            w, v = heapq.heappop(heap)
            dfs(v)
            path.appendleft(w)

    dfs(start)

    # 6. 检查是否用了所有单词
    if len(path) != len(words):
        return None
    return '.'.join(path)


def solve():
    input = sys.stdin.readline
    t = int(input())
    for _ in range(t):
        n = int(input())
        words = [input().strip() for _ in range(n)]
        ans = find_eulerian_path(words)
        print(ans if ans is not None else "***")


if __name__ == "__main__":
    sys.setrecursionlimit(1000000)
    solve()
  • 复杂度
    • 建图和入度/出度统计:O(N log N)(每个单词入堆)。
    • DFS(Hierholzer):O(N log N)。
  • 能正确处理最大 N=1000 的用例,并保证字典序最小。
python
"""
https://blog.51cto.com/u_15684947/5384135
"""
from typing import List, Tuple
import sys

class EulerPath:
    def __init__(self):
        self.maxn = 1005
        self.vis = [0] * self.maxn
        self.in_ = [0] * 128
        self.out = [0] * 128
        self.s = [""] * self.maxn
        self.ans = [""] * self.maxn
        self.len_ = [0] * self.maxn
        self.vv = [[] for _ in range(128)]
        self.tot = 0

    def dfs(self, st: str):
        up = len(self.vv[ord(st)])
        for i in range(up):
            cur = self.vv[ord(st)][i]
            if self.vis[cur[1]]:
                continue
            self.vis[cur[1]] = 1
            self.dfs(cur[0][self.len_[cur[1]] - 1])
            self.ans[self.tot] = cur[0]
            self.tot += 1

    def solve(self):
        t = int(input().strip())
        for _ in range(t):
            self.tot = 0
            n = int(input().strip())
            for i in range(1, 128):
                self.in_[i] = self.out[i] = 0
                self.vv[i].clear()
            for i in range(1, n + 1):
                self.vis[i] = 0
                self.s[i] = input().strip()
                self.len_[i] = len(self.s[i])
            minn = 'z'
            for i in range(1, n + 1):
                st = self.s[i][0]
                ed = self.s[i][self.len_[i] - 1]
                self.vv[ord(st)].append((self.s[i], i))
                self.in_[ord(ed)] += 1
                self.out[ord(st)] += 1
                minn = min(minn, ed, st)
            flag = 1
            ru = 0
            chu = 0
            for i in range(ord('a'), ord('z') + 1):
                self.vv[i] = sorted(self.vv[i])
                if not self.in_[i] and not self.out[i]:
                    continue
                if self.in_[i] == self.out[i]:
                    continue
                elif self.in_[i] - self.out[i] == 1:
                    ru += 1
                elif self.out[i] - self.in_[i] == 1:
                    chu += 1
                    minn = chr(i)
                else:
                    flag = 0
                    break
            if flag == 0 or ru > 1 or chu > 1 or ru != chu:
                print("***")
            else:
                self.dfs(minn)
                if self.tot != n:
                    print("***")
                else:
                    for i in range(n - 1, -1, -1):
                        if i != n - 1:
                            print(".", end='')
                        print(self.ans[i], end='')
                    print()

if __name__ == "__main__":
    sys.setrecursionlimit(1000000)
    euler_path = EulerPath()
    euler_path.solve()