98 F.3d 1356
NOTICE: Federal Circuit Local Rule 47.6(b) states that opinions and orders which are designated as not citable as precedent shall not be employed or cited as precedent. This does not preclude assertion of issues of claim preclusion, issue preclusion, judicial estoppel, law of the case or the like based on a decision of the Court rendered in a nonprecedential opinion or order.
In re Ira A. GERSON.
No. 95-1503.
United States Court of Appeals, Federal Circuit.
Sept. 6, 1996.
Before MAYER, MICHEL and BRYSON, Circuit Judges.
MICHEL, Circuit Judge.
DECISION
As undisputed real party in interest, Motorola, Inc. appeals on behalf of the inventors, Ira A. Gerson and Mark A. Jasiuk, from the June 20, 1995 decision of the United States Patent and Trademark Office's Board of Patent Appeals and Interferences ("Board") upholding the examiner's final rejection of claims 1 through 14 of U.S. Patent Application No. 07/755,393 (the "Application"), constituting all the claims in the Application. The appeal was submitted for our decision after oral argument on April 3, 1996. Because the Board construed the Application's claim in an overly broad manner, inconsistent with the Application specification, and because, as correctly construed, the claimed invention would not have been obvious in view of those teachings the Board properly found in the cited prior art, the decision of the Board is reversed.
BACKGROUND
The invention claimed in the Application is directed towards a method for providing encoded speech information. A continuous speech signal is first separated into segments. The input speech signal can then be characterized as being primarily "voiced" or "unvoiced" at each speech segment. Unvoiced speech largely corresponds to consonant sounds, and voiced speech largely corresponds to vowel sounds. Claim 11 of the Application reads:
1. A method for providing information, comprising the steps of:
A) providing a plurality of coding modes for speech coding an input speech segment, wherein at least two of the coding modes correspond to substantially voiced input speech signals;
B) selecting one of the coding modes as a function, at least in part, of periodicity of an input speech signal.
According to Claim 1, (1) a plurality of modes for coding a speech signal are provided for application to segments of an input speech signal, (2) at least two of these coding modes are for encoding speech segments that are substantially voiced speech, and (3) the function by which one of the coding modes is selected is based, in part, on the periodicity, essentially repetitiveness, of the speech signal for a given segment relative to prior segments. Voiced speech tends to be periodic whereas unvoiced speech does not. Therefore, periodicity is a measure of whether a segment is substantially voiced, and if so, to what extent. Accordingly, the claimed invention uses the periodicity of a speech segment first to determine whether and to what degree that speech segment is substantially voiced or unvoiced, and that determination is then used to select one of the coding modes for that speech segment.
The examiner rejected Claim 1 of the Application under 35 U.S.C. § 103 as being obvious over U.S.Patent No. 4,074,069 to Tokura et al. ("Tokura") in light of U.S.Patent No. 4,933,957 to Bottau et al. ("Bottau"). The Board upheld the examiner's rejection on appeal.
Tokura teaches the use of different coding modes to encode voiced and unvoiced speech. It does not, however, teach the use of a plurality of coding modes to encode voiced speech. Bottau discloses a low bit rate method for encoding an input speech signal by deriving a short term residual signal and then computing a long term residual signal therefrom which is then converted into a normalized codeword by a Code-Excited Linear Predictive ("CELP") coder. In one embodiment of Bottau, the low and high frequency components of the input signal are each encoded separately.
In upholding the examiner's decision, the Board found that Tokura suggested the "use of two coding modes ... divided between voiced and unvoiced conditions" and that Bottau disclosed " 'at least two of the coding modes correspond[ing] to substantially voiced input speech signals,' as broadly claimed." Accordingly, the Board found the claims "to be broader than the prior art will allow" and properly rejected under 35 U.S.C. § 103.
Although not disputing the Board's characterization of Tokura, the appellant argues that the Board erred because (1) no motivation was shown to combine Tokura and Bottau in order to apply Bottau to only substantially voiced speech signals, and (2) Bottau fails to disclose multiple coding modes for voiced speech as claimed in the Application.
MOTIVATION TO COMBINE
In finding a motivation to combine, the Board stated:
As claimed, with regard to the coding modes, all that is required is the provision of a plurality of coding modes wherein at least two of the coding modes "correspond to substantially voiced input speech signals." Since various excitation codes and methods for achieving such are known in the art, it would have been obvious to the artisan that more than one of these codes may be applied to the voiced input speech signal of instant claim 1.
(emphasis added). Applicants argue that the Board failed to show that any objective teachings in the prior art suggested the combination of Tokura and Bottau or any other of the "various excitation codes and methods ... known in the art" to arrive at the claimed invention. We agree. Tokura teaches using the distinction between voiced and unvoiced speech as the basis on which coding modes should be selected. Bottau teaches the application of a particular low bit rate coding method to a speech signal. One embodiment of Bottau also teaches that the speech signal may be partitioned into high and low frequency components which are separately encoded. Neither reference suggests that substantially voiced segments of an incoming speech should be encoded by one of at least two of a plurality of coding modes or that the selection of one coding mode should be based on periodicity of the speech segment.
Neither the examiner nor the Board adequately specifies what teaching in Tokura, Bottau, or any other reference would suggest to the artisan that one of multiple coding schemes should be applied to only the substantially voiced segments of a speech signal. "There must be some reason, suggestions or motivation found in the prior art whereby a person of ordinary skill in the field of the invention would make the combination." In re Oetiker, 977 F.2d 1443, 1447, 24 USPQ2d 1443, 1446 (Fed.Cir.1992). Neither Tokura nor Bottau identifies the desirability of providing more than a single coding mode for substantially voiced speech. Bottau fails to distinguish between voiced and unvoiced speech and Tokura teaches that the only distinction to be made is between voiced and unvoiced speech.
The Commissioner attempted on appeal to argue that Bottau is in effect a 35 U.S.C. § 102 anticipating reference: "As set forth above, Bottau fully discloses every recited limitation of claim 1." If Bottau alone anticipated the claimed invention, the lack of a motivation to combine Bottau with Tokura would of course be irrelevant. However, this argument was neither addressed by nor presented to the Board and is therefore not properly before this court. Furthermore, it is clear from the discussion below that Bottau alone is not anticipating.
THE CLAIMED INVENTION
In order to compare the claim with the prior art, the first step is to properly construe the claim. In its analysis, the Board reveals a fundamental misunderstanding of the invention as claimed when it states:
While we would agree with appellants that neither one nor a combination of the applied references teaches or suggests the disclosed invention, wherein, as shown in Figure 6, after a determination is made of the periodicity of pitch information, a determination is made as to whether the signal is a voiced or unvoiced condition, selecting one, single, coding mode if the latter and selecting one of a plurality of coding modes, dependent on decibel level, if the former, the instant claimed invention is not seen to be so limited.
(emphasis in original). The claimed invention does not recite, nor does figure 6 show, a method for selecting speech coding modes based on the decibel level of the speech signal as the Board seems to assume. Figure 6, depicted below, is a flow chart representation of the preferred embodiment of the claimed invention.
NOTE: OPINION CONTAINS TABLE OR OTHER DATA THAT IS NOT VIEWABLE
According to the Figure 6, coding modes 1-4 are selected based on the decibel level of a signal. However, if Figure 6 is read in conjunction with the rest of the specification, it is clear that that signal is a representation of the periodicity of the speech signal segments as calculated in block 602. The preferred embodiment discloses one way of representing the periodicity of the speech signal: calculating two signals representative of the periodicity of the individual speech segments, each of which is expressed in decibels. This corresponds directly with limitation B of Claim 1 which requires that "one of the coding modes [be selected] as a function, at least in part, of periodicity of an input speech signal." Although the claimed invention is not limited to the particular method described in the specification for calculating and expressing periodicity, Claim 1 is limited to the recited method whereby one coding mode is selected, based at least in part on periodicity of the speech segment, from a plurality of coding modes and where, if the speech segment is substantially voiced, the coding mode is selected from one of at least two coding modes for such speech.
BOTTAU
The Board found a plurality of coding modes corresponding to those in the claimed invention in the short and long term residual signals of Bottau:
Alternatively, Bottau teaches the derivation of "a short term signal" and "a long term signal," each of which may be considered a coded form, or a coding mode, of a voiced condition, as broadly claimed.
Contrary to the Board's characterization, however, the "short term signal" and "long term signal" are not distinct coding modes, but are, instead, two signals derived in the process of calculating a single coded form for the incoming speech signal. The Commissioner admits as much in his brief:
As to "a plurality of coding modes" in above limitation (1), Bottau teaches or suggests two different coding modes implemented in CELP coder 14 and the coder 64. A72, column 2, lines 45-50 and A75, column 8, lines 45-52. Coder 64 is distinct from the coder 14 since the generated coefficients from each coding mode (E and G, k) and their synthesis steps are different.
The short and long term residual signals described by the Board as two different coding modes are merely signals which form the basis of one of the coding modes; namely, the coding mode implemented in the CELP coder 14. The second of the Commissioner's "two different coding modes," implemented in Coder 64, is part of the second embodiment of the Bottau disclosure and was not discussed or even mentioned by the Board.
Further evidence of the Board's misinterpretation of the teachings of Bottau can be found where the Board states:
Moreover, as disclosed at column 2, line 43-51 of Bottau, the residual signal samples are subdivided into blocks which are then encoded (i.e. processed into a CELP coder). Since more than one codeword is obtained [column 2, line 51 of Bottau], each codeword being fairly construed to be derived from a "coding mode," as claimed, application of these different codewords to voiced input speech signals would result in "at least two of the coding modes correspond[ing] to substantially voiced input speech signals," as broadly claimed.
The portion of the specification referenced by the Board states that "each of said blocks is then processed into a Code-Excited Linear Predictive (CELP) coder (14) wherein K sequences of L samples are made available as normalized codewords." The referenced codewords are entries in the CELP coder table and cannot be construed as being derived from a "coding mode." These codewords are calculated beforehand ("[a]ssuming the prestored codewords be normalized" Bottau, col. 2, lines 55-56 (emphasis added)). The entries in the CELP coder are compared against the incoming speech segment and the segment is replaced by a much shorter codeword, for transmission, representative of the pattern of the speech segment. Accordingly, separate codewords are not "applied" to the incoming speech segment; instead, the CELP coder is applied to the input speech segment and a single codeword is the result. This is the process regardless of whether the speech segment is substantially voiced or not.
Furthermore, Claim 1 requires the selection of one of the coding modes as "a function, at least in part, of periodicity" of the speech signal. The Board fails to explain how, under even its interpretation of the Bottau disclosure, the prior art discloses the selection of a coding mode based in part on the periodicity of a segment of the speech signal. The Commissioner's brief tries to correct for this omission in the Board's opinion by referring to the second embodiment of Bottau wherein the high frequency component of the speech signal is filtered out and separately encoded: "[t]he low band and high band components of voiced input signals are separated [in Bottau] as a function of their frequency or periodicity by using a low-frequency pass filter." (emphasis added). However, for Bottau to "teach" the claimed invention in accordance with the Commissioner's argument, we would have to equate "frequency" of the input speech signal with applicants' "periodicity." These phenomena, however, are quite different, particularly in light of the manner in which "periodicity" is defined in the Application specification.
According to the claimed invention as defined in the specification, an incoming speech signal is parsed into a plurality of segments. In the preferred embodiment, speech coding is performed on a frame-by-frame basis wherein each frame is composed of four segments or subframes. For each subframe, a signal P subi is calculated, and for each frame, a signal P subf is calculated. P subi is measured in decibels (dB) and is representative of how much an earlier segment or subframe "substantially conforms to the information in the current segment." This is a measure of the periodicity of the given subframe. P subf is a signal, also measured in dB, that indicates the degree of periodicity of the entire frame compared to an earlier frame. The determination of the coding mode to use to encode the subframes in a given frame is based on the values of the signals P subi and P subf . Accordingly, the "periodicity" function of Claim 1(B), as used and defined in the specification, is a function of how similar recurring signal patterns are over time, not the frequency or decibel level of the signal at a particular time, as the Commissioner and the Board incorrectly supposed.
Finally, the Commissioner makes the argument that, according to the specification, the Bottau coding method is only applied to voiced signals: "Bottau teaches that the two modes are derived for voiced speech because the input s(n) is already sampled (not shown) to provide only voiced signals. A72, column 2, lines 27-30." The portion of the specification referenced by the Commissioner states:
The voice signal to be transmitted, sampled at 8 kHz and digitally PCM encoded with 12 bits per sample in a conventional analog to Digital converter (not shown) provides samples s(n).
(emphasis added). Nothing in this section of the Bottau specification indicates that the incoming speech signal has been sampled only to include voiced, as distinct from unvoiced, speech. Granted, Bottau refers to a "voice signal"; however, the voice signal of Bottau is clearly equivalent to the input speech signal of the present invention and is not related to the voiced nature of sampled signal segments as the Commissioner argues.
Neither Bottau nor Tokura teaches providing a plurality of coding modes for coding a segment of a speech signal wherein at least two of these coding modes are for encoding speech segments that are substantially voiced speech, and selecting one coding mode based, in part, on the periodicity of the speech signal of the segment.
CONCLUSION
The Board reversibly erred in (1) finding a motivation to combine Tokura, Bottau and "various excitation codes and methods ... known in the art" to arrive at the claimed invention which applies multiple coding modes to substantially voiced speech where such motivation clearly was not shown in the cited prior art; (2) construing Claim 1 to require only "the provision of a plurality of coding modes wherein at least two of the coding modes 'correspond to substantially voiced input speech signals,' " where the claim also requires that a coding mode be selected based on the periodicity of the speech signal; and (3) finding Bottau to teach the application of at least two, distinct coding modes to substantially voiced speech. Thus, even though there were multiple known ways of encoding a speech signal, it would not have been obvious from Tokura and Bottau to encode an input speech signal according to the claimed invention. Therefore, the decision of the Board is reversed.
The Board determined that "all claims stand or fall together," which was not and is not disputed by the applicants. Accordingly, the Board's opinion and our opinion only discuss Claim 1 of the Application